`vignettes/tsFeatureExtraction.Rmd`

`tsFeatureExtraction.Rmd`

In many cases, decomposing a time series into different components, seasonal, trend and irregular, can provide insights for time series analysis.

In this example, we will start with the ‘Cornerstone’ sample dataset ‘old_Faithful_Temp’. The dataset contains time stamps and a numerical column and in total 282085 observations. You can also see the data were collected every minute.

To extract features choose menu ‘Analyses’ -> ‘CornerstoneR’ -> ‘Time Serie Feature Extraction’ as shown in the following screenshot.

In the appearing dialog select the variables as in the screenshot.

‘OK’ confirms your selection and the following window appears.

open the menu ‘R Script’ \(\rightarrow\) ‘Script Variables’. You can customize the ‘pattern’ used to group the data and create the time series.

- for maximal
**one**day/hour data use: secondly over minutes, minutely over hours - for maximal
**one**year data use: hourly over days, daily over weeks, daily over months, weekly over months, monthly over quarters - for
**multiple**years data use: monthly over years, quarterly over years.

Default is ‘daily over months’.

We will use the script variables as in the screenshot of this example.

Now close this dialog with ‘OK’ and click the execute button (green arrow) or choose the menu ‘R Script’ \(\rightarrow\) ‘Execute’ and all calculations are done via ‘R’. Calculations are done if the text at the lower left status bar contains ‘Last execute error state: OK’. Our results are available via the menus ‘Summaries’ and ‘Graphs’ as shown in the following screenshot.

Open the ‘Feature summary table’. The first three columns are grouped data according to the pattern we set earlier within the script variables. The values of the different components follow. If you have selected further numerical columns, the components for all variables will be summarized in one table.

Open the ‘Feature plot for temp’. The seasonal, trend and remainder component as well as the input data are plotted in one graph with the time stamps on the x-axis ‘time’. The gray bar on the right side of plot indicate the influence of data variation on the different components, i.g. a smaller bar means this component is large related to variation of data.

If your dataset is NOT equally distanced according to the pattern defined within the script variables, this function will output the input dataset and indicators for the potential problems.

Open the ‘drive_ride’ from sample dataset, select ‘Date’ and ‘Racing’ as predictors, and use the pattern ‘daily over month’. After running the function you will get a ‘Dirty Dataset’ under ‘Summaries’.

The 0(FALSE) and 1(TRUE) in third to fifth column are checking if your input has equal distance as required by the pattern.

For example, in row 310, 5th column, the ‘1’ means the time stamp ‘29-Dec-06’ is missing; in row 312, 3rd column, the ‘0’ means the data are crossing years, which is not allowed for the pattern you chose.

The meanings of the different indicators are defined as follows:

- sameYear/Day/Hour: if data are collected in the same year/day/hour.
- missingQuart: if quarters of input data are continuous from 1st to 4th.
- missingMon: if months of input data are continuous from Jan to Dec.
- missingDay: if days of input data are continuous from 001 to 365.
- missingMin/Sec: if minutes/seconds of input data are continuous from 00 to 60.