vignettes/tsFeatureExtraction.Rmd
tsFeatureExtraction.Rmd
In many cases, decomposing a time series into different components, seasonal, trend and irregular, can provide insights for time series analysis.
In this example, we will start with the 'Cornerstone'
sample dataset 'old_Faithful_Temp'
. The dataset contains
time stamps and a numerical column and in total 282085 observations. You
can also see the data were collected every minute.
show the to analyzed data
To extract features choose menu 'Analyses'
→ 'CornerstoneR'
→
'Time Series Feature Extraction'
as shown in the following
screenshot.
Menu
In the appearing dialog select the variables as in the screenshot.
VariableSelection
'OK'
confirms your selection and the following window
appears.
R Script
open the menu 'R Script'
→
'Script Variables'
. You can customize the
'pattern'
used to group the data and create the time
series.
The choice of the pattern also depends on the frequency that data was
collected. You can compare different patterns to check which one better
extracts the features of your time series. Default is
'daily over months'
.
We will use the script variables as in the screenshot of this example.
R Script Variables Menu
Now close this dialog with 'OK'
and click the execute
button (green arrow) or choose the menu 'R Script'
→ 'Execute'
and
all calculations are done via ‘R’. Calculations are done if the text at
the lower left status bar contains
'Last execute error state: OK'
. Our results are available
via the menus 'Summaries'
and 'Graphs'
as
shown in the following screenshot.
Open the 'Feature summary table'
. The first three
columns are grouped data according to the pattern we set earlier within
the script variables. The values of the different components follow. If
you have selected further numerical columns, the components for all
variables will be summarized in one table.
Feature summary table
Open the 'Feature plot for temp'
. The seasonal, trend
and remainder component as well as the input data are plotted in one
graph with the time stamps on the x-axis 'time'
. The gray
bar on the right side of plot indicate the influence of data variation
on the different components, i.g. a smaller bar means this component is
large related to variation of data.
Feature plot
If your dataset is NOT equally distanced according to the pattern defined within the script variables, this function will output the input dataset and indicators for the potential problems.
Open the 'drive_ride'
from sample dataset, select
'Date'
and 'Racing'
as predictors, and use the
pattern 'daily over month'
. After running the function you
will get a 'Dirty Dataset'
under
'Summaries'
.
Dirty Dataset
The 0(FALSE) and 1(TRUE) in third to fifth column are checking if your input has equal distance as required by the pattern.
For example, in row 310, 5th column, the ‘1’ means the time stamp ‘29-Dec-06’ is missing; in row 312, 3rd column, the ‘0’ means the data are crossing years, which is not allowed for the pattern you chose.
The meanings of the different indicators are defined as follows: