allow gridsearch for hyper tuning #36

kjgm · 2023-09-05T08:43:13Z

Typically, models are trained after tuning the parameters. A common approach is to tune with for example a grid search on the parameters. Sklearn has utility functions for this. pymurtree should be able to work with this.

This requires the implementatin of two previous issues:

implement the sklearn estimator interface
check for similar or different data in the fit method. Possibly, the solver for each dataset could be stored in memory, depending on how the gridsearch runs. If it runs: For each split in the data, for each parameter setting, then the cache can be re-used efficiently. If it runs for each parameter setting, for each split in the data, then the cache would be removed after each run, thus motivating to store the solvers for each different data set.

import pymurtree
import numpy
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

x = numpy.array([[0, 1, 0, 1], [1, 1, 0, 0], [1, 0, 0, 1], 
                 [1, 1, 1, 1], [0, 1, 1, 1], [0, 1, 0, 1],
                 [0, 0, 1, 1], [1, 0, 1, 0], [1, 0, 1, 1],
                 [1, 0, 1, 1], [0, 0, 0, 0], [0, 0, 1, 0],
                 [1, 0, 0, 1], [1, 1, 0, 1], [1, 1, 0, 0]])
y = numpy.array([5, 5, 4, 4, 5,
                 4, 4, 5, 5, 4,
                 4, 4, 5, 5, 5]) 

model = pymurtree.OptimalDecisionTreeClassifier(max_depth=3, verbose=False)
parameters = {
 "max_num_nodes": list(range(0, 8))  
}

## To see how this is expected to work, compare with sklearn.tree.DecisionTreeClassifier
#model = DecisionTreeClassifier(max_depth=3)
#parameters = {
# "max_leaf_nodes": list(range(2, 9))  
#}

tuning_model = GridSearchCV(
    model, param_grid=parameters, scoring="accuracy", cv=5, verbose=0
)
tuning_model.fit(x, y)
model = pymurtree.OptimalDecisionTreeClassifier(**tuning_model.best_params_)
#model = DecisionTreeClassifier(**tuning_model.best_params_)

model.fit(x, y)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow gridsearch for hyper tuning #36

allow gridsearch for hyper tuning #36

kjgm commented Sep 5, 2023

allow gridsearch for hyper tuning #36

allow gridsearch for hyper tuning #36

Comments

kjgm commented Sep 5, 2023