Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of #129: Allow more tuning of the model hyperparameters. #130

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

sandervh14
Copy link
Contributor

====> Do not merge yet! See below for explanation. <=====

Story Title

#129 Allow more tuning of the model hyperparameters.

Changes made

Altered the constructor of both LogisticRegressionModel and LinearRegressionModel to support override of the default hyperparameters, via kwargs passing.
Documented in the docstring of both models.
See the issue description for more extensive explanation of why one would want this.

** Do not merge this yet with the Development branch. ** Unit tests have not yet been adapted yet. Was a speedy commit of an idea I was playing with in a notebook, when investigating how to further tune the models for a client. I'll take the time next week to adapt the unit tests.

How does the solution address the problem

This PR allows a data scientist to pass argument overrides for the constructor of the scikit-learn model used behind the scenes in Cobra, to try to tune further on a model's performance, if possible vs. Cobra's default settings for the model.

Linked issues

Resolves #129

@sandervh14 sandervh14 linked an issue Apr 8, 2022 that may be closed by this pull request
@sandervh14 sandervh14 changed the base branch from master to develop April 8, 2022 08:55
@sandervh14 sandervh14 requested a review from sborms April 8, 2022 08:55
@sandervh14 sandervh14 self-assigned this Apr 8, 2022
@sandervh14 sandervh14 added the enhancement New feature or request label Apr 8, 2022
@sandervh14 sandervh14 added this to the v1.2.0 milestone Apr 8, 2022
@sborms
Copy link
Contributor

sborms commented Apr 13, 2022

Looks good Sander! Code is clear to me.

Two remarks:

  • We have a get_intercept() function in the model classes - what will the result be if no intercept is fitted? This is a barely used helper function, but better to show a graceful error if there is no intercept to retrieve.
  • Agree that for linear models the extra parameters are already taken care of by Cobra, but let's allow for the flexibility. Maybe add a caveat in one of the classes (so at least we have it written down somewhere).

Comment on lines 292 to 296
default_kwargs = dict(fit_intercept=True)
for kwarg, val in default_kwargs.items():
if kwarg not in kwargs:
kwargs[kwarg] = val
self.linear = LinearRegression(**kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This for loop can be replaced by a oneliner by using .update() (see https://www.programiz.com/python-programming/methods/dictionary/update). In that case, default_kwargs best to be renamed to something like model_kwargs.

Suggested change
default_kwargs = dict(fit_intercept=True)
for kwarg, val in default_kwargs.items():
if kwarg not in kwargs:
kwargs[kwarg] = val
self.linear = LinearRegression(**kwargs)
model_kwargs = dict(fit_intercept=True)
model_kwargs.update(kwargs)
self.linear = LinearRegression(**model_kwargs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice Jano, I was going to commit it straight away with the github interface, but then it isn't applied to LogisitcRegression as well. I'll have a look at it this afternoon and will include Sam's remarks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An idea to resolve duplicate code (and documentation to some extent) is to create a BaseModel class from which both the LinearRegression and LogisticRegression class inherit.
But this is of course out of scope for this PR and should be considered in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An idea to resolve duplicate code (and documentation to some extent) is to create a BaseModel class from which both the LinearRegression and LogisticRegression class inherit.

I remembered this comment while fixing #126, I also found there was quite some duplication and I had time to do this, so I've done this today with the solution to #126, yippee! :-). For details, see explanation in #128 (comment).

But this is of course out of scope for this PR and should be considered in a separate PR.

I took the liberty to include the superclassing abstraction in #126 anyway, instead of a new issue & PR dedicated to it, since the evaluate() which I was fixing unit tests for, takes up a BIG chunk of the code of both LinearRegression and LogisticRegressionModel.

@sandervh14
Copy link
Contributor Author

Hey Sam:

  • Agree that for linear models the extra parameters are already taken care of by Cobra, but let's allow for the flexibility.

You mean only linear regression right? Do you agree with me that for logistic regression it's certainly a good idea? (most of all for tuning regularization C?)

@sborms
Copy link
Contributor

sborms commented Apr 15, 2022

Hey Sam:

  • Agree that for linear models the extra parameters are already taken care of by Cobra, but let's allow for the flexibility.

You mean only linear regression right? Do you agree with me that for logistic regression it's certainly a good idea? (most of all for tuning regularization C?)

Indeed, linear regression. It's definitely a good idea, it's fine to have the flexibility for both logistic & linear regression.

@sandervh14
Copy link
Contributor Author

sandervh14 commented Jun 1, 2022

Done. Advice: merge #126 first, before this one! I think that'll make merging easier.

@sandervh14 sandervh14 modified the milestones: 2023-03, 2023-04 Apr 7, 2023
@sandervh14 sandervh14 modified the milestones: 2023-04, 2023-05 May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow more tuning of the model hyperparameters.
3 participants