A modern implementation of the Super Learner algorithm for ensemble learning and model stacking
Authors: Jeremy Coyle, Nima Hejazi, Ivana Malenica, Oleg Sofrygin
sl3
is a modern implementation of the Super Learner algorithm of @vdl2007super. The Super Learner algorithm performs ensemble learning in one of two fashions:
- The "discrete" Super Learner can be used to select the best prediction algorithm among a supplied library of learning algorithms ("learners" in the
sl3
nomenclature) -- that is, that algorithm which minimizes the cross-validated risk with respect to some appropriate loss function. - The "ensemble" Super Learner can be used to assign weights to specified learning algorithms (in a user-supplied library) in order to create a combination of these learners that minimizes the cross-validated risk with respect to an appropriate loss function. This notion of weighted combinations has also been called stacked regression [@breiman1996stacked].
Install the most recent stable release from GitHub via devtools
:
devtools::install_github("jeremyrcoyle/sl3")
If you encounter any bugs or have any specific feature requests, please file an issue.
The best places to start are the vignettes:
- Modern Machine Learning in R
vignette("intro_sl3")
- Defining New sl3 Learners
vignette("custom_lrnrs")
- SuperLearner Benchmarks
vignette("SuperLearner_benchmarks")
sl3
makes the process of applying screening algorithms, learning algorithms, combining both types of algorithms into a stacked regression model, and cross-validating this whole process essentially trivial. The best way to understand this is to see the sl3
package in action:
set.seed(49753)
# packages we'll be using
library(data.table)
library(SuperLearner)
#> Loading required package: nnls
#> Super Learner
#> Version: 2.0-22
#> Package created on 2017-07-18
library(origami)
library(sl3)
# load example data set
data(cpp_imputed)
# here are the covariates we are interested in and, of course, the outcome
covars <- c("apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs",
"sexn")
outcome <- "haz"
task <- make_sl3_Task(data = cpp_imputed, covariates = covars,
outcome = outcome, outcome_type="continuous")
# set up screeners and learners via built-in functions and pipelines
slscreener <- make_learner(Lrnr_pkg_SuperLearner_screener, "screen.glmnet")
glm_learner <- make_learner(Lrnr_glm)
screen_and_glm <- make_learner(Pipeline, slscreener, glm_learner)
lrnr_glmnet <- make_learner(Lrnr_glmnet)
# stack learners into a model (including screeners and pipelines)
learner_stack <- make_learner(Stack, lrnr_glmnet, glm_learner, screen_and_glm)
stack_fit <- learner_stack$train(task)
#> Loading required package: glmnet
#> Loading required package: Matrix
#> Loading required package: foreach
#> Loaded glmnet 2.0-13
preds <- stack_fit$predict()
head(preds)
#> Lrnr_glmnet_NULL_deviance_10_1_100 Lrnr_glm
#> 1: 0.35345519 0.36298498
#> 2: 0.35345519 0.36298498
#> 3: 0.24554305 0.25993072
#> 4: 0.24554305 0.25993072
#> 5: 0.24554305 0.25993072
#> 6: 0.02953193 0.05680264
#> Lrnr_pkg_SuperLearner_screener_screen.glmnet___Lrnr_glm
#> 1: 0.36228209
#> 2: 0.36228209
#> 3: 0.25870995
#> 4: 0.25870995
#> 5: 0.25870995
#> 6: 0.05600958
It is our hope that sl3
will grow to be widely used for creating stacked regression models and the cross-validation of pipelines that make up such models, as well as the variety of other applications in which the Super Learner algorithm plays a role. To that end, contributions are very welcome, though we ask that interested contributors consult our contribution guidelines
prior to submitting a pull request.
After using the sl3
R package, please cite the following:
@misc{coyle2017sl3,
author = {Coyle, Jeremy R and Hejazi, Nima S and Malenica, Ivana and
Sofrygin, Oleg},
title = {{sl3}: Modern Pipelines for Machine Learning and {Super
Learning}},
year = {2017},
howpublished = {\url{https://github.com/jeremyrcoyle/sl3}},
url = {http://dx.doi.org/DOI_HERE},
doi = {DOI_HERE}
}
© 2017 Jeremy R. Coyle, Nima S. Hejazi, Ivana Malenica, Oleg Sofrygin
The contents of this repository are distributed under the GPL-3 license. See file LICENSE
for details.