Time Series Feature Extraction

Initial Situation and Goal

In many cases, decomposing a time series into different components, seasonal, trend and irregular, can provide insights for time series analysis.

Time serie feature extraction

In this example, we will start with the 'Cornerstone' sample dataset 'old_Faithful_Temp'. The dataset contains time stamps and a numerical column and in total 282085 observations. You can also see the data were collected every minute.

show the to analyzed data

To extract features choose menu 'Analyses' $\rightarrow$ 'CornerstoneR' $\rightarrow$ 'Time Series Feature Extraction' as shown in the following screenshot.

In the appearing dialog select the variables as in the screenshot.

VariableSelection

'OK' confirms your selection and the following window appears.

R Script

open the menu 'R Script' $\rightarrow$ 'Script Variables'. You can customize the 'pattern' used to group the data and create the time series.

for maximal one day/hour data use: secondly over minutes, minutely over hours
for maximal one year data use: hourly over days, daily over weeks, daily over months, weekly over months, monthly over quarters
for multiple years data use: monthly over years, quarterly over years.

The choice of the pattern also depends on the frequency that data was collected. You can compare different patterns to check which one better extracts the features of your time series. Default is 'daily over months'.

We will use the script variables as in the screenshot of this example.

R Script Variables Menu

Now close this dialog with 'OK' and click the execute button (green arrow) or choose the menu 'R Script' $\rightarrow$ 'Execute' and all calculations are done via ‘R’. Calculations are done if the text at the lower left status bar contains 'Last execute error state: OK'. Our results are available via the menus 'Summaries' and 'Graphs' as shown in the following screenshot. Result Menu

Open the 'Feature summary table'. The first three columns are grouped data according to the pattern we set earlier within the script variables. The values of the different components follow. If you have selected further numerical columns, the components for all variables will be summarized in one table.

Feature summary table

Open the 'Feature plot for temp'. The seasonal, trend and remainder component as well as the input data are plotted in one graph with the time stamps on the x-axis 'time'. The gray bar on the right side of plot indicate the influence of data variation on the different components, i.g. a smaller bar means this component is large related to variation of data.

Feature plot

Remarks

If your dataset is NOT equally distanced according to the pattern defined within the script variables, this function will output the input dataset and indicators for the potential problems.

Open the 'drive_ride' from sample dataset, select 'Date' and 'Racing' as predictors, and use the pattern 'daily over month'. After running the function you will get a 'Dirty Dataset' under 'Summaries'.

Dirty Dataset

The 0(FALSE) and 1(TRUE) in third to fifth column are checking if your input has equal distance as required by the pattern.

For example, in row 310, 5th column, the ‘1’ means the time stamp ‘29-Dec-06’ is missing; in row 312, 3rd column, the ‘0’ means the data are crossing years, which is not allowed for the pattern you chose.

The meanings of the different indicators are defined as follows:

sameYear/Day/Hour: if data are collected in the same year/day/hour.
missingQuart: if quarters of input data are continuous from 1st to 4th.
missingMon: if months of input data are continuous from Jan to Dec.
missingDay: if days of input data are continuous from 001 to 365.
missingMin/Sec: if minutes/seconds of input data are continuous from 00 to 60.

Xi Zhou

2024-03-04

Initial Situation and Goal

Time serie feature extraction

Remarks