Scikit-learn Beginner's Template

A basic multivariate analysis of simulated physics data using the machine learning toolkit scikit-learn. The entire analysis example is conveniently presented in form of Jupyter Notebooks.

This project mainly addresses people who are new to machine learning, scikit-learn or Python and should provide them with code implementations of basic analysis steps and plots. It does not necessarily claim to yield optimal analysis results - not even good ones, necessarily - but should rather illustrate some fundamental concepts of machine learning application.

Prerequisites

The provided example makes use of the following Python libraries/frameworks:

Machine Learning Concepts

The following concepts are covered and implemented:

Data visualization
- Variable histograms
- Scatter matrix
- Correlation matrix
- RadViz
Data preprocessing
- Standard scaling
Data split into training, validation and test samples
Model definition and training with decision trees / random forest
Overfitting the data and how to prevent it
Feature importances
Model evaluation using training and validation sample
- MVA output distribution
- Cut efficiencies plot / MVA cut optimization
- ROC curve
- Precision-recall curve
- Confusion matrix
- Classification report
Model application to the test sample / classifier performance assessment

Content overview and file description

Notebooks:

example analysis - random forest.ipynb: Example analysis of simulated physics data using a random forest classifier.
overtraining demo - random forest.ipynb: Using simulated physics data and a random forest classifier to demonstrate the effects and consequences of overtraining.

Contributing

This project is far from perfect, not least because the author is relatively new to Python and scikit-learn himself.

If you have any suggestions or improvements, feel free to let me know or contribute in any way you like.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Many thanks to S. Lehner for valuable feedback on first drafts of this project.
Following PurpleBooth's README style
All badges made by shields.io

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scikit-learn Beginner's Template

Prerequisites

Machine Learning Concepts

Content overview and file description

Contributing

License

Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
LICENSE		LICENSE
README.md		README.md
example analysis - random forest.ipynb		example analysis - random forest.ipynb
overtraining demo - random forest.ipynb		overtraining demo - random forest.ipynb

License

tempse/sklearn-beginners-template

Folders and files

Latest commit

History

Repository files navigation

Scikit-learn Beginner's Template

Prerequisites

Machine Learning Concepts

Content overview and file description

Contributing

License

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages