[FEATURE] - Grid search across model parameters AND thresholds with Thresholder() without refitting #551

mcallaghan · 2022-11-16T16:05:08Z

Thanks for this great set of extensions to sklearn.

The Tresholder() model is quite close to something I've been looking for for a while.

I'm looking to include threshold optimisation as part of a broader parameter search.

I can perhaps best describe the desired behaviour as follows

for each parameters in grid:
    fit model with parameters
    for each threshold in thresholds:
        evaluate model

However, if I pass a model that has not yet been fit to Thresholder(), then, even with refit=False, the same model is fit also for each threshold.

Is there an easy way around this? Thinking about this the best way to achieve this would be tinkering with the GridSearchCV code, but perhaps you have an idea and would also find this interesting?

Thanks!

The text was updated successfully, but these errors were encountered:

MBrouns · 2022-11-16T16:34:35Z

I havent tested this so maybe I'm completely off the mark, but I think you can do this by nesting GridSearchCV objects:

model = make_pipeline(
   ...,
   LogisticRegression()
)

param_gridsearch = GridSearchCV(
   model,
   param_grid=...
)

param_gridsearch.fit()

threshol_gridsearch = GridSearchCV(
   Thresholder(param_gridsearch, refit=False),
   param_grid={'threshold: [0.1, 0.2, ...]}
)

FBruzzesi · 2023-09-26T14:51:35Z

@MBrouns before closing the issue, could it be worth adding an example in the docs?

FBruzzesi · 2023-09-26T14:54:13Z

Having a closer look at this: actually the two approaches are a bit different.
The implementation of

for each parameters in grid:
    fit model with parameters
    for each threshold in thresholds:
        evaluate model

would still require to run thresholder for each fitted model, while the suggestion is to run it only on the best model.

Maybe a nested GridSearchCV does the trick? (I never tried that)

mod = GridSearchCV(
    estimator = Thresholder(
        GridSearchCV(
            estimator = SomeModel(),
            param_grid={...},
            ...
        ),
        threshold=0.1,
        refit=False
    ),
    param_grid = {
        "threshold": np.linspace(0.1, 0.9, 10),
        },
    ...
)

_ = mod.fit(X, y)

mcallaghan added the enhancement New feature or request label Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] - Grid search across model parameters AND thresholds with Thresholder() without refitting #551

[FEATURE] - Grid search across model parameters AND thresholds with Thresholder() without refitting #551

mcallaghan commented Nov 16, 2022

MBrouns commented Nov 16, 2022 •

edited

Loading

FBruzzesi commented Sep 26, 2023

FBruzzesi commented Sep 26, 2023

[FEATURE] - Grid search across model parameters AND thresholds with Thresholder() without refitting #551

[FEATURE] - Grid search across model parameters AND thresholds with Thresholder() without refitting #551

Comments

mcallaghan commented Nov 16, 2022

MBrouns commented Nov 16, 2022 • edited Loading

FBruzzesi commented Sep 26, 2023

FBruzzesi commented Sep 26, 2023

MBrouns commented Nov 16, 2022 •

edited

Loading