`vignettes/tsFeatureExtraction.Rmd`

`tsFeatureExtraction.Rmd`

In many cases, decomposing a time series into different components, seasonal, trend and irregular, can provide insights for time series analysis.

In this example, we will start with the `'Cornerstone'`

sample dataset `'old_Faithful_Temp'`

. The dataset contains
time stamps and a numerical column and in total 282085 observations. You
can also see the data were collected every minute.

To extract features choose menu `'Analyses'`

\(\rightarrow\) `'CornerstoneR'`

\(\rightarrow\)
`'Time Series Feature Extraction'`

as shown in the following
screenshot.

In the appearing dialog select the variables as in the screenshot.

`'OK'`

confirms your selection and the following window
appears.

open the menu `'R Script'`

\(\rightarrow\)
`'Script Variables'`

. You can customize the
`'pattern'`

used to group the data and create the time
series.

- for maximal
**one**day/hour data use: secondly over minutes, minutely over hours - for maximal
**one**year data use: hourly over days, daily over weeks, daily over months, weekly over months, monthly over quarters - for
**multiple**years data use: monthly over years, quarterly over years.

The choice of the pattern also depends on the frequency that data was
collected. You can compare different patterns to check which one better
extracts the features of your time series. Default is
`'daily over months'`

.

We will use the script variables as in the screenshot of this example.

Now close this dialog with `'OK'`

and click the execute
button (green arrow) or choose the menu `'R Script'`

\(\rightarrow\) `'Execute'`

and
all calculations are done via ‘R’. Calculations are done if the text at
the lower left status bar contains
`'Last execute error state: OK'`

. Our results are available
via the menus `'Summaries'`

and `'Graphs'`

as
shown in the following screenshot.

Open the `'Feature summary table'`

. The first three
columns are grouped data according to the pattern we set earlier within
the script variables. The values of the different components follow. If
you have selected further numerical columns, the components for all
variables will be summarized in one table.

Open the `'Feature plot for temp'`

. The seasonal, trend
and remainder component as well as the input data are plotted in one
graph with the time stamps on the x-axis `'time'`

. The gray
bar on the right side of plot indicate the influence of data variation
on the different components, i.g. a smaller bar means this component is
large related to variation of data.

If your dataset is NOT equally distanced according to the pattern defined within the script variables, this function will output the input dataset and indicators for the potential problems.

Open the `'drive_ride'`

from sample dataset, select
`'Date'`

and `'Racing'`

as predictors, and use the
pattern `'daily over month'`

. After running the function you
will get a `'Dirty Dataset'`

under
`'Summaries'`

.

The 0(FALSE) and 1(TRUE) in third to fifth column are checking if your input has equal distance as required by the pattern.

For example, in row 310, 5th column, the ‘1’ means the time stamp ‘29-Dec-06’ is missing; in row 312, 3rd column, the ‘0’ means the data are crossing years, which is not allowed for the pattern you chose.

The meanings of the different indicators are defined as follows:

- sameYear/Day/Hour: if data are collected in the same year/day/hour.
- missingQuart: if quarters of input data are continuous from 1st to 4th.
- missingMon: if months of input data are continuous from Jan to Dec.
- missingDay: if days of input data are continuous from 001 to 365.
- missingMin/Sec: if minutes/seconds of input data are continuous from 00 to 60.