Skip to content

Kaggle Days Paris - Competitive GBDT Specification and Optimization Workshop

Notifications You must be signed in to change notification settings

pierrelandry/kaggledays-2019-gbdt

 
 

Repository files navigation

KaggleDays_Paris

kaggledays-2019-gbdt

Original workshop in Paris:

Open in Colab

First part of the updated workshop (basics of Skopt and GBM):

Open in Colab

Second part of the updated workshop (LightGBM, XGBoost, CAtBoost, NAS):

Open in Colab

Video of the workshop on Kaggle Youtube channel

https://www.youtube.com/watch?v=YQL45hDuP-o

https://www.youtube.com/watch?v=YQL45hDuP-o

Kaggle Days Paris

Competitive GBDT Specification and Optimization Workshop

Instructors

  • Luca Massaron @lmassaron - Data Scientist / Author / Google Developer Expert in Machine Learning

competition dataset notebook discussion

competition dataset notebook discussion

About the workshop

Gradient Boosting Decision Trees (GBDT) presently represent the state of the art for building predictors for flat table data. However, they seldom perform the best out-of-the-box (using default values) because of the many hyper-parameters to tune. Especially in the most recent GBDT implementations, such as LightGBM, the over-sophistication of hyper-parameters renders finding the optimal settings by hand or simple grid search difficult because of high combinatorial complexity and long running times for experiments.

Random Optimization (BERGSTRA, James; BENGIO, Yoshua. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 2012, 13.Feb: 281-305.) and Bayesian Optimization (SNOEK, Jasper; LAROCHELLE, Hugo; ADAMS, Ryan P. Practical bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems. 2012. p. 2951-2959) are often the answer you'll find from experts.

In this workshop we demonstrate how to use different optimization approaches based on Scikit-Optimize, a library built on top of NumPy, SciPy and Scikit-Learn, and we present an easy and fast approach to set them ready and usable.

Prerequisites

You should be aware of the role and importance of hyper-parameter optimization in machine learning.

Obtaining the Tutorial Material

In order to make the workshop easily accessible, we are offering cloud access:

We also have a brief exercise that can be found at:

The solution can be found here.

All the materials can be cloned from Github at the kaggledays-2019-gbdt repository. We also have prepared a stand-alone Windows installation using WinPython (just require us for the link).

Local installation notes

In order to successfully run this workshop on your local computer, you need a Python3 installation (we suggest installing the most recent Anaconda distribution) and at least the following packages:

  • numpy >= 1.15.4
  • pandas >= 0.23.4
  • scipy >= 1.1.0
  • skopt >= 0.5.2
  • sklearn >= 0.20.2
  • lightgbm >= 2.2.2
  • xgboost >= 0.81
  • catboost >= 0.12.2

About

Kaggle Days Paris - Competitive GBDT Specification and Optimization Workshop

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%