vignettes/tsFeatureExtraction.Rmd
tsFeatureExtraction.Rmd
In many cases, decomposing a time series into different components, seasonal, trend and irregular, can provide insights for time series analysis.
In this example, we will start with the ‘Cornerstone’ sample dataset ‘old_Faithful_Temp’. The dataset contains time stamps and a numerical column and in total 282085 observations. You can also see the data were collected every minute.
show the to analyzed data
To extract features choose menu ‘Analyses’ -> ‘CornerstoneR’ -> ‘Time Serie Feature Extraction’ as shown in the following screenshot.
Menu
In the appearing dialog select the variables as in the screenshot.
VariableSelection
‘OK’ confirms your selection and the following window appears.
R Script
open the menu ‘R Script’ \(\rightarrow\) ‘Script Variables’. You can customize the ‘pattern’ used to group the data and create the time series.
Default is ‘daily over months’.
We will use the script variables as in the screenshot of this example.
R Script Variables Menu
Now close this dialog with ‘OK’ and click the execute button (green
arrow) or choose the menu ‘R Script’ \(\rightarrow\) ‘Execute’ and all
calculations are done via ‘R’. Calculations are done if the text at the
lower left status bar contains ‘Last execute error state: OK’. Our
results are available via the menus ‘Summaries’ and ‘Graphs’ as shown in
the following screenshot.
Open the ‘Feature summary table’. The first three columns are grouped data according to the pattern we set earlier within the script variables. The values of the different components follow. If you have selected further numerical columns, the components for all variables will be summarized in one table.
Feature summary table
Open the ‘Feature plot for temp’. The seasonal, trend and remainder component as well as the input data are plotted in one graph with the time stamps on the x-axis ‘time’. The gray bar on the right side of plot indicate the influence of data variation on the different components, i.g. a smaller bar means this component is large related to variation of data.
Feature plot
If your dataset is NOT equally distanced according to the pattern defined within the script variables, this function will output the input dataset and indicators for the potential problems.
Open the ‘drive_ride’ from sample dataset, select ‘Date’ and ‘Racing’ as predictors, and use the pattern ‘daily over month’. After running the function you will get a ‘Dirty Dataset’ under ‘Summaries’.
Dirty Dataset
The 0(FALSE) and 1(TRUE) in third to fifth column are checking if your input has equal distance as required by the pattern.
For example, in row 310, 5th column, the ‘1’ means the time stamp ‘29-Dec-06’ is missing; in row 312, 3rd column, the ‘0’ means the data are crossing years, which is not allowed for the pattern you chose.
The meanings of the different indicators are defined as follows: