Initial Situation and Goal

Time series forecasting consists of making predictions based on past time stamped data. ARIMA (Auto Regressive Integrated Moving Average) is a class of models used to forecast time series. This function takes an equidistant time series and returns the best fitting ARIMA model according to AICc (corrected Akaike Information Criterion).

The Time Series Models Function

The function in CornerstoneR fits the best ARIMA model to the time series. The ARIMA model has three terms:

  1. p: the order of the Auto Regression (AR), also called lag order;
  2. d: the number of differencing required to make the time series stationary, also called the degree of differencing;
  3. q: the order of moving average model (MA), also called the order of moving average.

The function takes the range defined in the 'Script Variables' for p, d and q and returns the best fitted ARIMA model based on the comparison of AICc values. The inputs in 'Script Variables' are:

  • Minimum p
  • Maximum p
  • Maximum d
  • Minimum q
  • Maximum q
  • Number of Forecasts

The minimum d is by default 0. The number of forecasts is the number of predictions to be estimated with the function.

The function also works for data that have pre-defined groups. If a grouping column exists, the function will perform the ARIMA model for each group in the data.

In the next steps, we will present an example of the Time Series Models function in Cornerstone.

Example

In this example, we will use the 'Trash' data set provided in Cornerstone as build-in TestData. This data set contains 13 columns and 768 observations with details about the waste incineration in three different cities.

Trash data

Choose the menu 'Analyses' \(\rightarrow\) 'CornerstoneR' \(\rightarrow\) 'Time Series Models'. In the next dialog select 'Time' as Predictors; 'Feed Rate' and 'Water Flow' as Responses; 'City' as Group by and press 'OK'.

Dialog to select the varibles of the data set

Dialog to select the varibles of the data set

We can customize the minimum and maximum of p and q, the maximum d, as well as the number of forecasts. To do that open the menu 'R Script' \(\rightarrow\) 'Script Variables'.

Dialog to select the varibles of the data set

Dialog to select the varibles of the data set

We will keep the default settings for p, d and q. Set the Number of Forecasts to 12. Close this dialog with ‘OK’ and click the execute button (green arrow) or choose the menu 'R Script' \(\rightarrow\) 'Execute' and all calculations are done via 'R'. Calculations are done if the text at the lower left status bar contains 'Last execute error state: OK'. Our results are available via the menu 'Summaries' and 'Graphs'.

The summary 'Fit Estimate' shows the original responses, the fitted values of the model and the residuals for each response and group. The forecast values of the responses are also gathered in this summary.

Summary table: Fit Estimate

Summary table: Fit Estimate

The summary 'Goodness of Fit' shows the summary table of goodness of fit measurements per group and response. The table includes BIC (Bayesian Information Criterion), AIC (Akaike Information Criterion), AICc (corrected Akaike Information Criterion) R-Squared and adjusted R-Squared. The R-Squared is calculated as:

\[ adj R^2 = 1 - \frac{(1 - R^2)\times (N-1)}{N - p -1} \] where \(R^2\) is the sample R-Squared, \(N\) is the sample size and \(p\) is the number of predictors. In case the fitted ARIMA model has all terms as 0, the R-Squared and adjusted R-Squared will not be calculated, because such a fitted model will be constant (no variance).

Summary table: Goodness of Fit

Summary table: Goodness of Fit

Finally, we can visualize the original time series with the forecast values in the Graphs tab. An example of such a plot is the Forecast for 'Water Flow' for the city Poughkeepsie.

Graph: Forecast Plot

Graph: Forecast Plot