Recent data science for economics research has found that machine learning can vastly improve upon the current state of hypothesis testing, especially for identifying qualities of time series. This R
package implements a number of ML-based tests that offer dramatic gains in accuracy, namely for detecting unit roots (subsequent versions for Stata and Python will be developed in due course). This package is built to test three core unit root DGPs using the ML algorithm from Cornwall, Chen, and Sauley (2021):
In addition, the R
package also has the capability of training custom ML-based unit root tests.
Install this package from this Github repository, then load it up!
devtools::install_github("DataScienceforPublicPolicy/unitrootML")
library(unitrootML)
To illustrate the test in practice, we will need some test data. We simulate five time series: two with a unit root (gamma = 1
) and three without a unit root (gamma < 1
). These time series are stored in a list object. Let's take a look at a few of these time series using autoplot
from the forecast
package and arranged into a grid using grid.arrange
.
#Example time series
set.seed(123)
out <- list(series1 = ts(dgp_enders1(gamma = 1), freq = 12, start = c(2010,1)),
series2 = ts(dgp_enders1(gamma = 1), freq = 12, start = c(2010,1)),
series3 = ts(dgp_enders1(gamma = 0.98), freq = 12, start = c(2010,1)),
series4 = ts(dgp_enders1(gamma = 0.8), freq = 12, start = c(2010,1)),
series5 = ts(dgp_enders1(gamma = 0.5), freq = 12, start = c(2010,1)))
#Plot
gridExtra::grid.arrange(autoplot(out[[1]]), autoplot(out[[2]]),
autoplot(out[[3]]), autoplot(out[[4]]),
autoplot(out[[5]]), ncol = 2)
To screen these time series, apply the ml_test
function using an efficient pre-trained gradient boosting algorithm. The function expects a list object containing time series objects (bank =
) and a pvalue
to identify the optimal decision threshold. In this case, we assume a pvalue = 0.05
and testing_summary
to print a summary for the testing results.
res <- ml_test(bank = out,
pvalue = 0.05)
testing_summary(res)
The res
object captures the outputs of a battery of tests, including pre-trained ML-based unit root tests. The first element summarizes the verdict that each unit root test reached when evaluating each time series (e.g. unit root versus stationary). While each of the traditional tests may disagree, it is worth noting that the tests built on Random Forest (verdict.ranger
) and Gradient Boosting (verdict.xgbTree
) offer markedly improved diagnostic accuracy.
verdicts <- res$verdicts
print(verdicts[,1:4])
The test statistics and their critical thresholds are recorded in the results
element.
results <- res$results
print(results[,1:4])
Lastly, all the input features for the ML-based tests that were calculated from the time series list are available in the features
element.
feats <- res$features
feats[,1:4]
It is possible to also create and use your own custom ML tests (see included vignettes). We have made available a testing package (1gb) that include both a gradient boost-based test and random forest-based test. Note that in Cornwall, Chen, and Sauley (2021), these two different ensemble-based tests yield nearly identical performance.
res <- ml_test(bank = out,
original = FALSE,
custom_model = base_models,
pvalue = 0.05)
Reach out to Gary Cornwall ([email protected]) and Jeff Chen ([email protected]).