nested_cross_val

This repository proposes a python implementation of nested cross-validation compatible with scikit-learn API.

Our implementation stands out from already existing ones for three main reasons :

It integrates a dask implementation for managing large data sets and complex pipelines and save precious computational time (more details here).
It gives access to the fitted estimators and their attributes. Therefore the user can add scores without having to refit the whole model or run different analyses with the attributes of each estimator (ex : feature importance analysis through a stability study).
It provides some plotting tools to visualize and analyze easily the results of the nested cross-validation (see here).

Installation

$ pip install git+https://github.com/ncaptier/nested_cross_val#egg=nested_cross_val

Experiments

We provide a jupyter notebook for an illustration of our nested cross-validation pipeline with real data :
*Classification of lung cancer subtype from bulk transcriptomics data

Data

The data set which goes with the jupyter notebook lung_cancer_classification.ipynb can be found in the .zip file data.zip. Please extract locally the data set before running the notebook.

Example

from sklearn.linear_model import LogisticRegression
from nested_cross_val.base import NestedCV

estimator = LogisticRegression(solver='saga' ,penalty='l1' , max_iter = 2000)

param_grid = {'C': np.logspace(-2, 2, 20)}

ncv = NestedCV(estimator = estimator , params = param_grid , cv_inner = 5  , cv_outer = 5  , 
               scoring_inner = 'roc_auc' , scoring_outer = {'roc_auc' : 'roc_auc' , 'average_precision' : 'average_precision'})

ncv.fit(X , y)

Acknowledgements

This package was created as a part of my PhD in the Computational Systems Biology of Cancer group of Institut Curie and the LITO laboratory.

References

"Bias in error estimation when using cross-validation for model selection" - S. Varma & R. Simon 2006

Name	Name	Last commit message	Last commit date
Latest commit ncaptier Update plotting.py Feb 3, 2022 3c9f738 · Feb 3, 2022 History 15 Commits
nested_cross_val	nested_cross_val	Update plotting.py	Feb 3, 2022
.gitattributes	.gitattributes	Initial commit	Feb 5, 2021
LICENSE	LICENSE	Initial commit	Feb 5, 2021
README.md	README.md	Update README.md	Feb 5, 2021
data.zip	data.zip	Create data.zip	Feb 5, 2021
lung_cancer_classification.ipynb	lung_cancer_classification.ipynb	Update lung_cancer_classification.ipynb	Feb 2, 2022
ncv_image.png	ncv_image.png	Create ncv_image.png	Feb 5, 2021
requirements.txt	requirements.txt	Create requirements.txt	Feb 5, 2021
setup.py	setup.py	Update setup.py	Feb 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nested_cross_val

Installation

Experiments

Data

Example

Acknowledgements

References

About

Releases

Packages

Languages

License

ncaptier/nested_cross_val

Folders and files

Latest commit

History

Repository files navigation

nested_cross_val

Installation

Experiments

Data

Example

Acknowledgements

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages