R/`sl3`: modern Super Learning with pipelines

A modern implementation of the Super Learner algorithm for ensemble learning and model stacking

Authors: Jeremy Coyle, Nima Hejazi, Ivana Malenica, Oleg Sofrygin

What's `sl3`?

sl3 is a modern implementation of the Super Learner algorithm of @vdl2007super. The Super Learner algorithm performs ensemble learning in one of two fashions:

The "discrete" Super Learner can be used to select the best prediction algorithm among a supplied library of learning algorithms ("learners" in the sl3 nomenclature) -- that is, that algorithm which minimizes the cross-validated risk with respect to some appropriate loss function.
The "ensemble" Super Learner can be used to assign weights to specified learning algorithms (in a user-supplied library) in order to create a combination of these learners that minimizes the cross-validated risk with respect to an appropriate loss function. This notion of weighted combinations has also been called stacked regression [@breiman1996stacked].

Installation

Install the most recent stable release from GitHub via devtools:

devtools::install_github("jeremyrcoyle/sl3")

Issues

If you encounter any bugs or have any specific feature requests, please file an issue.

Documentation

The best places to start are the vignettes:

Modern Machine Learning in R vignette("intro_sl3")
Defining New sl3 Learners vignette("custom_lrnrs")
SuperLearner Benchmarks vignette("SuperLearner_benchmarks")

Examples

sl3 makes the process of applying screening algorithms, learning algorithms, combining both types of algorithms into a stacked regression model, and cross-validating this whole process essentially trivial. The best way to understand this is to see the sl3 package in action:

set.seed(49753)

# packages we'll be using
library(data.table)
library(SuperLearner)
#> Loading required package: nnls
#> Super Learner
#> Version: 2.0-22
#> Package created on 2017-07-18
library(origami)
library(sl3)

# load example data set
data(cpp_imputed)

# here are the covariates we are interested in and, of course, the outcome
covars <- c("apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs",
            "sexn")
outcome <- "haz"

task <- make_sl3_Task(data = cpp_imputed, covariates = covars,
                      outcome = outcome, outcome_type="continuous")

# set up screeners and learners via built-in functions and pipelines
slscreener <- make_learner(Lrnr_pkg_SuperLearner_screener, "screen.glmnet")
glm_learner <- make_learner(Lrnr_glm)
screen_and_glm <- make_learner(Pipeline, slscreener, glm_learner)
lrnr_glmnet <- make_learner(Lrnr_glmnet)

# stack learners into a model (including screeners and pipelines)
learner_stack <- make_learner(Stack, lrnr_glmnet, glm_learner, screen_and_glm)
stack_fit <- learner_stack$train(task)
#> Loading required package: glmnet
#> Loading required package: Matrix
#> Loading required package: foreach
#> Loaded glmnet 2.0-13
preds <- stack_fit$predict()
head(preds)
#>    Lrnr_glmnet_NULL_deviance_10_1_100   Lrnr_glm
#> 1:                         0.35345519 0.36298498
#> 2:                         0.35345519 0.36298498
#> 3:                         0.24554305 0.25993072
#> 4:                         0.24554305 0.25993072
#> 5:                         0.24554305 0.25993072
#> 6:                         0.02953193 0.05680264
#>    Lrnr_pkg_SuperLearner_screener_screen.glmnet___Lrnr_glm
#> 1:                                              0.36228209
#> 2:                                              0.36228209
#> 3:                                              0.25870995
#> 4:                                              0.25870995
#> 5:                                              0.25870995
#> 6:                                              0.05600958

Contributions

It is our hope that sl3 will grow to be widely used for creating stacked regression models and the cross-validation of pipelines that make up such models, as well as the variety of other applications in which the Super Learner algorithm plays a role. To that end, contributions are very welcome, though we ask that interested contributors consult our contribution guidelines prior to submitting a pull request.

After using the sl3 R package, please cite the following:

    @misc{coyle2017sl3,
      author = {Coyle, Jeremy R and Hejazi, Nima S and Malenica, Ivana and
        Sofrygin, Oleg},
      title = {{sl3}: Modern Pipelines for Machine Learning and {Super
        Learning}},
      year  = {2017},
      howpublished = {\url{https://github.com/jeremyrcoyle/sl3}},
      url = {http://dx.doi.org/DOI_HERE},
      doi = {DOI_HERE}
    }

License

The contents of this repository are distributed under the GPL-3 license. See file LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 297 Commits
R		R
data		data
docs		docs
inst		inst
man-roxygen		man-roxygen
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
Makefile		Makefile
NAMESPACE		NAMESPACE
README-backup.Rmd		README-backup.Rmd
README-refs.bib		README-refs.bib
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
appveyor.yml		appveyor.yml
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md
deploy.sh		deploy.sh
sl3.Rproj		sl3.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R/`sl3`: modern Super Learning with pipelines

What's `sl3`?

Installation

Issues

Documentation

Examples

Contributions

License

References

About

Releases

Packages

Languages

License

osofr/sl3

Folders and files

Latest commit

History

Repository files navigation

R/sl3: modern Super Learning with pipelines

What's sl3?

Installation

Issues

Documentation

Examples

Contributions

License

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

R/`sl3`: modern Super Learning with pipelines

What's `sl3`?

Packages