Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation for #126: supporting evaluation & forward feature selection for custom metrics (e.g. lift) #128

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

sandervh14
Copy link
Contributor

@sandervh14 sandervh14 commented Apr 8, 2022

====> Do not merge yet! See below for explanation. <=====

Story Title

See issue #126: Tuning classification model on different metrics (lift, recall, precision,...)

Changes made

  • Adapted forward feature selection class:
    • it can now accept metric, metric_args and metric_kwargs and higher_is_better as argument.
    • Extensive documentation of what to fill in in those parameters.
    • compute_model_performances: passing metric, metric_args and metric_kwargs to model evaluate()
    • _find_next_best_model(): idem + adapted logic for selecting new model as next best, according to higher_is_better logic.
  • Adapted both LogisticRegressionModel and LinearRegressionModel to support a custom metric, metric_args and metric_kwargs in evaluate().

I tried out my changes in a notebook on a model, looks good. :-)
But do not merge yet!
I'm stuck on the 25 failing unit tests, mainly due to the cryptical error "mocker not found":

test setup failed
file C:\Users\sander.vandenhautte\PycharmProjects\cobra\tests\model_building\test_forward_selection.py, line 55
      @pytest.mark.parametrize("model_type", ["classification", "regression"])
      def test_compute_model_performances(self, mocker, model_type):
E       fixture 'mocker' not found
>       available fixtures: anyio_backend, anyio_backend_name, anyio_backend_options, cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
>       use 'pytest --fixtures [testpath]' for help on them.

=> Are we using fixtures? With a find, I can't locate where mocker is instantiated.

Anyway, next to this question, of course fixing the unit tests remain, before we can merge this into develop.
But meanwhile Hendrik can use the issue branch, and I can discuss with you Sam how to fix the above unit test shizzle. :-)
I'll plan in that work for next week.

How does the solution address the problem

This PR lets any data scientist the freedom to pick a different evaluation metric for forward feature selection or just for simple evaluation of a model.

Linked issues

Resolves #126

@sandervh14 sandervh14 self-assigned this Apr 8, 2022
@sandervh14 sandervh14 added the enhancement New feature or request label Apr 8, 2022
@sandervh14 sandervh14 added this to the v1.2.0 milestone Apr 8, 2022
@sandervh14 sandervh14 requested a review from sborms April 8, 2022 08:55
Copy link
Contributor

@sborms sborms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, good additions! Looks almost finished, I just added a few documentation remarks, and some questions about the (code) distinction between y_pred, y_score, etc.

cobra/evaluation/evaluator.py Outdated Show resolved Hide resolved
cobra/model_building/forward_selection.py Outdated Show resolved Hide resolved
cobra/model_building/models.py Outdated Show resolved Hide resolved
cobra/model_building/forward_selection.py Show resolved Hide resolved
cobra/model_building/models.py Outdated Show resolved Hide resolved
cobra/model_building/models.py Outdated Show resolved Hide resolved
cobra/model_building/models.py Outdated Show resolved Hide resolved
cobra/model_building/forward_selection.py Show resolved Hide resolved
cobra/model_building/models.py Outdated Show resolved Hide resolved
cobra/model_building/models.py Outdated Show resolved Hide resolved
@sborms
Copy link
Contributor

sborms commented Apr 13, 2022

Wrt the mocking, you need to install pytest-mock (pip install pytest-mock).

@sandervh14
Copy link
Contributor Author

Wrt the mocking, you need to install pytest-mock (pip install pytest-mock).

Thanks, I've added that info to the Contributing guidelins & workflow for in case other will have the same question as I did.

@sandervh14
Copy link
Contributor Author

sandervh14 commented May 25, 2022

Hey Sam, just FYI: I'm committing my code changes I did based on your review, but then some meetings came up in between because I'm going to start at a new client, I couldn't finish the unit tests, will do that after the long weekend.

Sander Vanden Hautte added 2 commits June 1, 2022 14:27
@sandervh14
Copy link
Contributor Author

Hi Sam.

I've finished changes for this one. I started fixing the failing unit tests, but during the work I started noticing how much code of LogisticRegressionModel and LinearRegressionModel actually copied eachother, so I've also abstracted the duplicate code out of those classes into a base class Model.

So maybe have a look at it again if you have the time, before merging. Or actually you also just merge, in my opinion it's quite fine, and all unit tests + the new unit tests run successfully.

Optional read:
TLDR: Writing unit tests for code that accept a function (the "metric") that is only specified "at runtime" (i.e. in the unit test scope, in the case of the unit tests), proved to be hard to "just like that" monkey patch with pytest-mock, at least with my knowledge of it. To make my problem clear, and the solution I made, I clarify with the below, longer (sorry) explanation.

  1. How does the below test function achieve patching the function roc_auc_score by stating that "cobra.model_building.models.roc_auc_score" can be patched, whereas actually in models.py (the file which is unit tested), the roc_auc_score function is actually imported (as from sklearn.metrics import roc_auc_score) instead of being implemented in that file? I would have expected the path to be specified for patching then as "sklearn.metrics.roc_auc_score" instead of "cobra.model_building.models.roc_auc_score" then wrong? Or is this just pytest-mock magic, stating that any function created in that module scope - either by definition in that library, or by importing - must be monkey patched?

(I've added pytest-mock to my development plan to learn this magic!)

from sklearn.metrics import top_k_accuracy_score

def test_evaluate_no_metric_specified(self, mocker):
    X = mock_data()
    y = pd.Series([1] * 5 + [0] * 5)

    def mock_roc_auc_score(y_true, y_score):
        return 0.79

    (mocker
     .patch("cobra.model_building.LogisticRegressionModel.score_model",
            mock_score_model_classification))

    (mocker
     .patch("cobra.model_building.models.roc_auc_score",
            mock_roc_auc_score))

    model = LogisticRegressionModel()
    actual = model.evaluate(X, y)  # implied: metric=None

    assert actual == 0.79
  1. My problem in writing the unit tests was that the following didn't work:
def mock_top_k_accuracy_score(y_true, y_score,
                              *, k=2, normalize=True,
                              sample_weight=None, labels=None):
   # ...
    return 0.14

(mocker
 .patch("sklearn.metrics.top_k_accuracy_score",
        mock_top_k_accuracy_score))

model = LogisticRegressionModel()
actual = model.evaluate(X, y,
                        metric=top_k_accuracy_score)

... which is why I went for passing a custom scoring function directly, rather than mocking one from sklearn:

       def top_k_accuracy_score(y_true, y_score,
                                 *, k=2, normalize=True,
                                 sample_weight=None, labels=None):
            if not np.array_equal(y_true, y):
                raise ValueError("LogisticRegressionModel.evaluate() did not "
                                 "succeed in passing the correct y_true "
                                 "argument.")
            if not np.array_equal(y_score,
                                  mock_score_model_classification_output):
                raise ValueError("LogisticRegressionModel.evaluate() did not "
                                 "succeed in passing the correct y_score "
                                 "argument.")

            return 0.14

        model = LogisticRegressionModel()
        actual = model.evaluate(X, y,
                                metric=top_k_accuracy_score)

So, I sort of circumvented the mocking difficulty, but I think in the end the custom function tests the functionality well anyway.
Do let me know though if you think it needs to be done in another way, though.

Finally: YAAY, a new pull request that is finished!

Comment on lines +87 to +100
if higher_is_better is None:
if metric is None:
if self.MLModel == LogisticRegressionModel:
# If no custom evaluation metric is chosen,
# the LogisticRegressionModel uses AUC as default metric,
# so "higher is better" evaluation logic is applied on the
# evaluation scores.
self.higher_is_better = True
elif self.MLModel == LinearRegressionModel:
# If no custom evaluation metric is chosen,
# the LinearRegressionModel uses RMSE as default metric,
# so "lower is better" evaluation logic is applied on the
# evaluation scores.
self.higher_is_better = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really convinced this is the best way to do this. metric should also support str, which makes it a lot more user-friendly.
Since higher_is_better is completely dependent on the metric, I think that a mapping of metric -> higher_is_better would be more efficient. The type of model should not be relevant, only for the default value. It is true that some (or most) metrics are only used for classification or regression and not both, but we shouldn't make our code more complex because of this.


default_higher_is_better = self.MLModel == LogisticRegressionModel
metric_map = {
    "RMSE": False,
    "AUC": True,
    None: default_higher_is_better
}
if isinstance(metric, str):
    self.higher_is_better = metric_map[metric]
else:  # metric should be a Callable in this case; or you could add another if statement: if callable(metric):
    # and higher_is_better should be mandatory in this case, else we raise an error
    <some raising here in case of missing higher_is_better or if metric is not a callable>

Comment on lines +102 to +106
raise ValueError("The configured machine learning model is "
"not the standard logistic regression or "
"linear regression model. "
"Therefore, please fill the metric and "
"higher_is_better arguments.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which other types are supported? L79-L82 only states 2 types.

Comment on lines +24 to +25
linear : LinearRegression
scikit-learn linear regression model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only has predictors as attribute

@@ -229,42 +251,72 @@ def compute_variable_importance(self, data: pd.DataFrame) -> pd.DataFrame:
return (df.sort_values(by="importance", ascending=False)
.reset_index(drop=True))

def _is_valid_dict(self, model_dict: dict) -> bool:
def _is_valid_dict(self, model_dict: dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _is_valid_dict(self, model_dict: dict):
def _is_valid_dict(self, model_dict: dict) -> bool:

self.predictors = [] # placeholder to keep track of a list of predictors
self._eval_metrics_by_split = {}
super().__init__()
self.logit = LogisticRegression(fit_intercept=True, C=1e9,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change logit to model in LogisticRegression class and linear to model in LinearRegression class, you can place the fit() method in the base Model class. Both self.logit and self.linear have the same functionality, i.e. the actual model of this class, so the more general name should be self.model or something similar (and equal).

Comment on lines +602 to +607
return sqrt(
mean_squared_error(y_true=y_true,
# LinearRegressionModel actually returns y_pred
# inside y_score:
y_pred=y_score)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return sqrt(
mean_squared_error(y_true=y_true,
# LinearRegressionModel actually returns y_pred
# inside y_score:
y_pred=y_score)
)
return mean_squared_error(y_true=y_true,
# LinearRegressionModel actually returns y_pred
# inside y_score:
y_pred=y_score,
squared=False)
)

@sandervh14 sandervh14 modified the milestones: 2023-03, 2023-04 Apr 7, 2023
@sandervh14 sandervh14 modified the milestones: 2023-04, 2023-05 May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tuning classification model on different metrics (lift, recall, precision,...)
3 participants