Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] - Grid search across model parameters AND thresholds with Thresholder() without refitting #551

Open
mcallaghan opened this issue Nov 16, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@mcallaghan
Copy link

Thanks for this great set of extensions to sklearn.

The Tresholder() model is quite close to something I've been looking for for a while.

I'm looking to include threshold optimisation as part of a broader parameter search.

I can perhaps best describe the desired behaviour as follows

for each parameters in grid:
    fit model with parameters
    for each threshold in thresholds:
        evaluate model

However, if I pass a model that has not yet been fit to Thresholder(), then, even with refit=False, the same model is fit also for each threshold.

Is there an easy way around this? Thinking about this the best way to achieve this would be tinkering with the GridSearchCV code, but perhaps you have an idea and would also find this interesting?

Thanks!

@mcallaghan mcallaghan added the enhancement New feature or request label Nov 16, 2022
@MBrouns
Copy link
Collaborator

MBrouns commented Nov 16, 2022

I havent tested this so maybe I'm completely off the mark, but I think you can do this by nesting GridSearchCV objects:

model = make_pipeline(
   ...,
   LogisticRegression()
)

param_gridsearch = GridSearchCV(
   model,
   param_grid=...
)

param_gridsearch.fit()

threshol_gridsearch = GridSearchCV(
   Thresholder(param_gridsearch, refit=False),
   param_grid={'threshold: [0.1, 0.2, ...]}
)

@FBruzzesi
Copy link
Collaborator

@MBrouns before closing the issue, could it be worth adding an example in the docs?

@FBruzzesi
Copy link
Collaborator

Having a closer look at this: actually the two approaches are a bit different.
The implementation of

for each parameters in grid:
    fit model with parameters
    for each threshold in thresholds:
        evaluate model

would still require to run thresholder for each fitted model, while the suggestion is to run it only on the best model.

Maybe a nested GridSearchCV does the trick? (I never tried that)

mod = GridSearchCV(
    estimator = Thresholder(
        GridSearchCV(
            estimator = SomeModel(),
            param_grid={...},
            ...
        ),
        threshold=0.1,
        refit=False
    ),
    param_grid = {
        "threshold": np.linspace(0.1, 0.9, 10),
        },
    ...
)

_ = mod.fit(X, y)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants