sl3_timeseries.Rmd

# Time Series Analysis with the [`sl3`](https://sl3.tlverse.org/) R package

## Authors: Ivana Malenica and [Nima Hejazi](https://nimahejazi.org)

## Date: 08 May 2018

_Attribution:_ based on materials originally produced by Jeremy Coyle, Nima
Hejazi, Ivana Malenica, and Oleg Sofrygin

## Introduction

In this demonstration, we will illustrate how to use the `sl3` R package for the
statistical analysis of time series data. We will build on the concepts of
machine learning pipelines and ensemble models (introduced in other `sl3`
demos), paying careful attention to how the `sl3` infrastructure can be
leveraged to generate optimal predictions for dependent data structures where
many observations are collected on each observational unit.

## Resources

* The `sl3` R package homepage: https://sl3.tlverse.org/
* The `sl3` R package repository: https://github.com/tlverse/sl3

```{r global_options, include=FALSE}
knitr::opts_chunk$set(fig.width=12, fig.height=8, fig.path='Figs/',
                      warning=FALSE, message=FALSE)
```

## Setup

First, we'll load the packages required for this exercise and load a simple data
set (`cpp_imputed` below) that we'll use for demonstration purposes:

```{r setup}
set.seed(49753)

# packages we'll be using
library(data.table)
library(origami)
library(sl3)
library(xts)

# load data
data(bsds)
head(bsds)

#Create a time-series object:
tsdata<-xts(bsds$cnt, order.by=as.POSIXct(bsds$dteday))

#Visualize the time-series:
PerformanceAnalytics::chart.TimeSeries(tsdata, auto.grid = FALSE, main = "Count of total rental bikes")

```

## Cross-validation

We need to specify proper cross-validation for dependent data. In order to do that, we use the `orgami` package, part of the `tlverse` sofware ecosystem, that supports several different cross-validation techniques for time-series data.

```{r cvs}
#Define cross-validation approriate for dependent data.

#Rolling origin:
folds = origami::make_folds(tsdata, fold_fun=folds_rolling_origin, first_window = 10, validation_size = 10, gap = 0, batch = 1)
folds[[1]]
folds[[2]]

#We can increase each training sample by batch size:
folds = origami::make_folds(tsdata, fold_fun=folds_rolling_origin, first_window = 10, validation_size = 10, gap = 0, batch = 10)
folds[[1]]
folds[[2]]

#We can also specify how far in the future we want to train our model on.
folds = origami::make_folds(tsdata, fold_fun=folds_rolling_origin, first_window = 10, validation_size = 10, gap = 5, batch = 10)
folds[[1]]
folds[[2]]

#Rolling window:
folds = origami::make_folds(tsdata, fold_fun=folds_rolling_window, window_size = 10, validation_size = 10, gap = 0, batch = 1)
folds[[1]]
folds[[2]]

#Final setup
folds = origami::make_folds(tsdata, fold_fun=folds_rolling_window, window_size = 50, validation_size = 30, gap = 0, batch = 50)

```

To use this data set with `sl3`, the object must be wrapped in a customized
`sl3` container, an __`sl3` "Task"__ object. A _task_ is an idiom for all of the
elements of a prediction problem other than the learning algorithms and
prediction approach itself -- that is, a task delineates the structure of the
data set of interest and any potential metadata (e.g., observation-level
weights).

```{r univariate_ts_task}
# here are the covariates we are interested in and, of course, the outcome
# specify covariates for sl3 task
covars <- "cnt"
outcome <- "cnt"

# create the sl3 task and take a look at it
ts_uni_task <- sl3_Task$new(data = bsds, covariates = covars,
                        outcome = outcome, outcome_type = "continuous", folds=folds)

# let's take a look at the sl3 task
ts_uni_task

# specify how far out we're going to predict on this univariate time series
n_ahead_param <- 2
```

## `sl3` Learners

`Lrnr_base` is the base class for defining machine learning algorithms, as well
as fits for those algorithms to particular `sl3_Tasks`. Different machine
learning algorithms are defined in classes that inherit from `Lrnr_base`. All
learners (e.g., `Lrnr_arima`) inherit from this R6 class. We will use the term
"learners" to refer to the family of classes that inherit from `Lrnr_base`.
Learner objects can be constructed from their class definitions using the `make
learner` function (or more directly through invoking the `$new()` method). Let's
now look at a simple time series learner, `Lrnr_arima`, which defines a
classical autoregressive integrated moving average model for univariate
time-series.

```{r arima_instantiate}
# make learner object
lrnr_arima <- Lrnr_arima$new(n.ahead = n_ahead_param)
```

Because all learners inherit from `Lrnr_base`, they have many features in
common, and can be used interchangeably. All learners define three main methods:
`train`, `predict`, and `chain`. The first, `train`, takes an `sl3_task` object,
and returns a `learner_fit`, which has the same class as the learner that was
trained:

```{r arima_fit}
# fit learner to task data
fit_arima <- lrnr_arima$train(ts_uni_task)

# verify that the learner is fit
fit_arima$is_trained
```

Here, we fit the learner to the time series task we defined above. Both
`lrnr_arima` and `fit_arima` are objects of class `Lrnr_arima`, although the
former defines a learner and the latter defines a fit of that learner. We can
distinguish between the learners and learner fits using the `is_trained` field,
which is true for fits but not for learners.

Now that we've fit a learner, we can generate predictions using the predict
method:

```{r arima_predict}
# get learner predictions
pred_arima <- fit_arima$predict()
head(pred_arima)
```

Here, we did not specify a `task` object for which we might want to generate
predictions. In this case, we get the predictions from the training data because
the `predict` method defaults to using the task provided during training (called
the "training task"). Alternatively, we could have provided a different task for
which we want to generate predictions.

The final important learner method, chain, will be discussed below, in the
section on learner composition. As with `sl3_Task`, learners have a variety of
fields and methods we haven’t discussed here. More information on these is
available in the help for `Lrnr_base`.

## Pipelines

Based on the concept popularized by
[`scikit-learn`](http://scikit-learn.org/stable/index.html) `sl3` implements the
notion of [machine learning pipelines](http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html),
which prove to be useful in a wide variety of data analytic settings.

A pipeline is a set of learners to be fit sequentially, where the fit from one
learner is used to define the task for the next learner. There are many ways in
which a learner can define the task for the downstream learner. The chain method
defined by learners defines how this will work.

The `Pipeline` class automates this process. It takes an arbitrary number of
learners and fits them sequentially, training and chaining each one in turn.
Since `Pipeline` is a learner like any other, it shares the same interface. We
can define a pipeline using `make_learner`, and use `train` and `predict` just
as we did before:

## Stacks

Like `Pipelines`, `Stacks` combine multiple learners. Stacks train learners
simultaneously, so that their predictions can be either combined or compared.

Before proceeding, let's first instantiate a few more standard time series
learners:

```{r ts_classical_learners}
lrnr_tsdyn_linear <- Lrnr_tsDyn$new(learner = "linear", m = 1,
                                    n.ahead = n_ahead_param)
lrnr_tsdyn_setar <- Lrnr_tsDyn$new(learner = "setar", m = 1, model = "TAR",
                                   n.ahead = n_ahead_param)
lrnr_tsdyn_lstar <- Lrnr_tsDyn$new(learner = "lstar", m = 1,
                                   n.ahead = n_ahead_param)
lrnr_garch <- Lrnr_rugarch$new(n.ahead = n_ahead_param)
lrnr_expsmooth <- Lrnr_expSmooth$new(n.ahead = n_ahead_param)
lrnr_harmonicreg <- Lrnr_HarmonicReg$new(n.ahead = n_ahead_param, K = 7,
                                         freq = 105)
```

Again, `Stack` is just a special learner and so has the same interface as all
other learners:

```{r ts_stack_example}
ts_stack <- Stack$new(lrnr_arima, lrnr_tsdyn_linear, lrnr_tsdyn_setar,
                      lrnr_tsdyn_lstar)
ts_stack_fit <- ts_stack$train(ts_uni_task)
ts_stack_preds <- ts_stack_fit$predict()
head(ts_stack_preds)
```

Above, we've defined and fit a stack comprised of a few different classical
time series models. We could have included any arbitrary set of learners and
pipelines, the latter of which are themselves just learners. We can see that the
predict method now returns a matrix, with a column for each learner included in
the stack.

## Model Fitting with Cross-Validation

Having defined a stack, we might want to compare the performance of learners in
the stack, which we may do using cross-validation. The `Lrnr_cv` learner wraps
another learner and performs training and prediction in a cross-validated
fashion, using separate training and validation splits as defined by
`task$folds`.

Below, we define a new `Lrnr_cv` object based on the previously defined stack
and train it and generate predictions on the validation set:

```{r cv_ts_example}
cv_ts_stack <- Lrnr_cv$new(ts_stack)
cv_ts_fit <- cv_ts_stack$train(ts_uni_task)
cv_ts_preds <- cv_ts_fit$predict()
```

```{r cv_ts_risks}
risks <- cv_ts_fit$cv_risk(loss_squared_error)
print(risks)
```

We can combine all of the above elements, `Pipelines`, `Stacks`, and
cross-validation using `Lrnr_cv`, to easily define a stacked ensemble learner
(Super Learner). The Super Learner algorithm works by fitting a "meta-learner",
which combines predictions from multiple stacked learners. It does this while
avoiding over-fitting by training the meta-learner on validation-set predictions
in a manner that is cross-validated. Using some of the objects we defined in the
above examples, this becomes a very simple operation:

```{r ts_sl_manual_part1}
metalearner <- make_learner(Lrnr_nnls)
cv_ts_uni_task <- cv_ts_fit$chain()
ts_ml_fit <- metalearner$train(cv_ts_uni_task)
```

Here, we used a special learner, `Lrnr_nnls`, for the meta-learning step. This
fits a non-negative least squares meta-learner. It is important to note that any
learner can be used as a meta-learner.

The Super Learner finally produced is defined as a pipeline with the learner
stack trained on the full data and the meta-learner trained on the
validation-set predictions. Below, we use a special behavior of pipelines: if
all objects passed to a pipeline are learner fits (i.e., `learner$is_trained` is
`TRUE`), the result will also be a fit:

```{r ts_sl_manual_part2}
sl_ts_pipeline <- make_learner(Pipeline, ts_stack_fit, ts_ml_fit)
sl_ts_preds <- sl_ts_pipeline$predict()
head(sl_ts_preds)
```

An optimal stacked regression model (or Super Learner) may be fit in a more
streamlined manner using the `Lrnr_sl` learner. For simplicity, we will use the
same set of learners and meta-learning algorithm as we did before:

```{r ts_sl_builtin}
lrnr_sl <- Lrnr_sl$new(learners = ts_stack, metalearner = metalearner)
ts_sl_fit <- lrnr_sl$train(ts_uni_task)
ts_sl_auto_preds <- ts_sl_fit$predict()
head(ts_sl_auto_preds)
```

We can see that this generates the same predictions as the more hands-on
definition above.

## Multivariate Time Series Analysis

In the previous sections, we defined the `task` object of interest on a simple
time series data set, where the objective was univariate time series analysis.
This was useful for illustrative purposes, but, generally speaking, not how most
applied problems are posed or solved. Below, we consider multivariate time
series analysis using extensions of the same tools we've seen so far.

```{r multivar_ts_task}
# Define new data and task:
covars <- c("temp", "atemp")
outcome <- c("temp", "atemp")
ts_mv_task <- sl3_Task$new(bsds, covariates = covars, outcome = outcome)
```

```{r multivar_ts_learners}
# instantiate a few classical multivariate time series learners
lrnr_tsdyn_linevar <- Lrnr_tsDyn$new(learner = "lineVar", lag = 2,
                                     n.ahead = n_ahead_param)
lrnr_tsdyn_vecm <- Lrnr_tsDyn$new(learner = "VECM", lag = 2,
                                  n.ahead = n_ahead_param, type = "linear")
mv_ts_stack <- Stack$new(lrnr_tsdyn_linevar, lrnr_tsdyn_vecm)
```

Now, let's train these models and predict for a multivariate time series like
the one we specified above:

```{r multivar_ts_stack_example, eval=FALSE}
mv_ts_fit <- mv_ts_stack$train(ts_mv_task)
mv_ts_pred <- mv_ts_fit$predict()
print(mv_ts_pred)
```

Next, let's try constructing an ensemble model of these learners using
cross-validation (note that we'll rely on the same NNLS metalearner we used
previously):

```{r multivar_ts_sl_example, eval=FALSE}
mv_sl <- Lrnr_sl$new(learners = mv_ts_stack, metalearner = metalearner)
mv_sl_fit <- mv_sl$train(ts_mv_task)
mv_sl_pred <- mv_sl_fit$predict()
print(mv_sl_pred)
```