Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow gridsearch for hyper tuning #36

Open
kjgm opened this issue Sep 5, 2023 · 0 comments
Open

allow gridsearch for hyper tuning #36

kjgm opened this issue Sep 5, 2023 · 0 comments

Comments

@kjgm
Copy link

kjgm commented Sep 5, 2023

Typically, models are trained after tuning the parameters. A common approach is to tune with for example a grid search on the parameters. Sklearn has utility functions for this. pymurtree should be able to work with this.

This requires the implementatin of two previous issues:

  1. implement the sklearn estimator interface
  2. check for similar or different data in the fit method. Possibly, the solver for each dataset could be stored in memory, depending on how the gridsearch runs. If it runs: For each split in the data, for each parameter setting, then the cache can be re-used efficiently. If it runs for each parameter setting, for each split in the data, then the cache would be removed after each run, thus motivating to store the solvers for each different data set.
import pymurtree
import numpy
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

x = numpy.array([[0, 1, 0, 1], [1, 1, 0, 0], [1, 0, 0, 1], 
                 [1, 1, 1, 1], [0, 1, 1, 1], [0, 1, 0, 1],
                 [0, 0, 1, 1], [1, 0, 1, 0], [1, 0, 1, 1],
                 [1, 0, 1, 1], [0, 0, 0, 0], [0, 0, 1, 0],
                 [1, 0, 0, 1], [1, 1, 0, 1], [1, 1, 0, 0]])
y = numpy.array([5, 5, 4, 4, 5,
                 4, 4, 5, 5, 4,
                 4, 4, 5, 5, 5]) 

model = pymurtree.OptimalDecisionTreeClassifier(max_depth=3, verbose=False)
parameters = {
 "max_num_nodes": list(range(0, 8))  
}

## To see how this is expected to work, compare with sklearn.tree.DecisionTreeClassifier
#model = DecisionTreeClassifier(max_depth=3)
#parameters = {
# "max_leaf_nodes": list(range(2, 9))  
#}

tuning_model = GridSearchCV(
    model, param_grid=parameters, scoring="accuracy", cv=5, verbose=0
)
tuning_model.fit(x, y)
model = pymurtree.OptimalDecisionTreeClassifier(**tuning_model.best_params_)
#model = DecisionTreeClassifier(**tuning_model.best_params_)

model.fit(x, y)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant