Skip to content

📊 Comparison of Bayesian hyperparameter optimization libraries in python

Notifications You must be signed in to change notification settings

PeeteKeesel/bayes-opt-battle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 Bayesian Optimization Comparison

Python optuna bayesian-optimization scikit-optimize hyperopt

Ever wondered which bayesian optimization framework to use for your project? We try to help you with that :)

This repository provides a general comparison of different bayesian optimization frameworks.

📚 Table of Contents

🎯 Summary

We used Huggingfaces spotify tracks dataset. The objective was to predict the popularity of a mobile device. It is about a 10-class classification problem. We performed a short EDA on the dataset: huggingface__spotify_tracks/eda.ipynb.

  • Number of rows: 114,000
  • Number of columns: 21
  • Target feature: popularity
    • Number of distinct values: 10

We used different machine learning models to show-case the results.

RandomForestClassifier

    Performance Results : RandomForestClassifier 
    ========================================
                    Train     Test      Delta(train,test)   
        Accuracy  : 0.9867    0.5579       -0.4289
        Precision : 0.9817    0.6632       -0.3185
        Recall    : 0.9590    0.5680       -0.3910
        F1-Score  : 0.9698    0.6044       -0.3653

Tuning time: None, since no tuning has been performed
50 trials
Optuna Results
==============
                Train     Test      Delta(train,test)   
    Accuracy  : 0.5235    0.5239        0.0004
    Precision : 0.2844    0.2842       -0.0002
    Recall    : 0.2990    0.2993        0.0003
    F1-Score  : 0.2264    0.2273        0.0009

Tuning time: 6 min 24 sec
TODO
5 trials
BayesianOptimization Results
============================
                Train     Test      Delta(train,test)   
    Accuracy  : 0.4859    0.4874        0.0015
    Precision : 0.2789    0.2779       -0.0010
    Recall    : 0.2744    0.2753        0.0010
    F1-Score  : 0.2060    0.2066        0.0006

Tuning time: 0 min 23 sec
5 trials
Hyperopt Results
===============
                Train     Test      Delta(train,test)   
    Accuracy  : 0.2058    0.2058       -0.0000
    Precision : 0.0206    0.0206       -0.0000
    Recall    : 0.1000    0.1000        0.0000
    F1-Score  : 0.0341    0.0341       -0.0000

Tuning time: 0 min 30 sec
TODO
Library 🤖 Tune Time ⌛ Precision Recall F1-Score Notebook 📕
Baseline None Train: 1.000
Test: 0.862
Train: 1.000
Test: 0.863
Train: 1.000
Test: 0.862
baselines.ipynb
Optuna 1m9s Train: 0.818
Test: 0.778
Train: 0.821
Test: 0.786
Train: 0.813
Test: 0.772
optuna.ipynb
BayesianOptimization Train:
Test:
Train:
Test:
Train:
Test:
bayesianoptimization.ipynb
BayesSearchCV Train:
Test:
Train:
Test:
Train:
Test:
bayessearchcv.ipynb
hyperopt Train:
Test:
Train:
Test:
Train:
Test:
hyperopt.ipynb
gp_minimize Train:
Test:
Train:
Test:
Train:
Test:
gpminimize.ipynb
Image 1 Image 2 Image 2 Image 2

Installation

To install the conda environment with conda, run the following command:

conda env create -f environment.yml

💡 Library Descriptions

1️⃣ Optuna

# Pip
% pip install optuna
# Conda
% conda install -c conda-forge optuna

In Optuna you need to define an objective, create a study via create_study and optimize your objective. An example implementation would be as follows:

def objective(trial):
    # Defining the hyperparameter search space.
    n_estimators = trial.suggest_int('n_estimators', 50, 300)
    # ...
    
    # Build the pipeline
    clf = RandomForestClassifier(n_estimators=n_estimators,
                                 # ...  
                                 )
    
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_val)
    acc = accuracy_score(y_val, y_pred)
    
    return acc

# Creating a study and running Optuna optimization.
study = optuna.create_study(study_name='my_optuna_study',
                            direction='maximize')
study.optimize(objective, 
               n_trials=100)

# Obtain the best found parameters.
best_params = study.best_params

While tuning Optuna provides logs in the following format. This shows the log of the 1st, 150th and last trial:

[I 2023-08-07 16:07:39,539] A new study created in memory with name: my_optuna_study
[I 2023-08-07 16:07:39,809] Trial 0 finished with value: 0.1864406779661017 and parameters: {'n_estimators': 158, 'max_depth': 10, 'min_samples_split': 0.578748660317236, 'min_samples_leaf': 0.12410371173766938}. Best is trial 0 with value: 0.1864406779661017.
...
[I 2023-08-07 16:08:21,456] Trial 150 finished with value: 0.1694915254237288 and parameters: {'n_estimators': 271, 'max_depth': 3, 'min_samples_split': 0.38567485569136783, 'min_samples_leaf': 0.13389811287208112}. Best is trial 55 with value: 0.22033898305084745.
...
[I 2023-08-07 16:09:02,867] Trial 299 finished with value: 0.1864406779661017 and parameters: {'n_estimators': 139, 'max_depth': 8, 'min_samples_split': 0.4866643968671017, 'min_samples_leaf': 0.19301363057724275}. Best is trial 55 with value: 0.22033898305084745.
# Pip
$ pip install bayesian-optimization
# Conda
$ conda install -c conda-forge bayesian-optimization
# Pip
$ pip install scikit-optimize
# Conda
% conda install -c conda-forge scikit-optimize

4️⃣ hyperopt

# Pip
% pip install hyperopt
# Conda
% conda install -c conda-forge hyperopt

5️⃣ gp_minimize

# Pip
$ pip install scikit-optimize
# Conda
% conda install -c conda-forge scikit-optimize

Releases

No releases published

Packages

No packages published