Implementation for #126: supporting evaluation & forward feature selection for custom metrics (e.g. lift) #128

sandervh14 · 2022-04-08T08:06:46Z

====> Do not merge yet! See below for explanation. <=====

Story Title

See issue #126: Tuning classification model on different metrics (lift, recall, precision,...)

Changes made

Adapted forward feature selection class:
- it can now accept metric, metric_args and metric_kwargs and higher_is_better as argument.
- Extensive documentation of what to fill in in those parameters.
- compute_model_performances: passing metric, metric_args and metric_kwargs to model evaluate()
- _find_next_best_model(): idem + adapted logic for selecting new model as next best, according to higher_is_better logic.
Adapted both LogisticRegressionModel and LinearRegressionModel to support a custom metric, metric_args and metric_kwargs in evaluate().

I tried out my changes in a notebook on a model, looks good. :-)
But do not merge yet!
I'm stuck on the 25 failing unit tests, mainly due to the cryptical error "mocker not found":

test setup failed
file C:\Users\sander.vandenhautte\PycharmProjects\cobra\tests\model_building\test_forward_selection.py, line 55
      @pytest.mark.parametrize("model_type", ["classification", "regression"])
      def test_compute_model_performances(self, mocker, model_type):
E       fixture 'mocker' not found
>       available fixtures: anyio_backend, anyio_backend_name, anyio_backend_options, cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
>       use 'pytest --fixtures [testpath]' for help on them.

=> Are we using fixtures? With a find, I can't locate where mocker is instantiated.

Anyway, next to this question, of course fixing the unit tests remain, before we can merge this into develop.
But meanwhile Hendrik can use the issue branch, and I can discuss with you Sam how to fix the above unit test shizzle. :-)
I'll plan in that work for next week.

How does the solution address the problem

This PR lets any data scientist the freedom to pick a different evaluation metric for forward feature selection or just for simple evaluation of a model.

Linked issues

Resolves #126

sborms

Well done, good additions! Looks almost finished, I just added a few documentation remarks, and some questions about the (code) distinction between y_pred, y_score, etc.

cobra/evaluation/evaluator.py

cobra/model_building/forward_selection.py

cobra/model_building/models.py

cobra/model_building/forward_selection.py

cobra/model_building/models.py

cobra/model_building/forward_selection.py

cobra/model_building/models.py

sborms · 2022-04-13T13:18:48Z

Wrt the mocking, you need to install pytest-mock (pip install pytest-mock).

sandervh14 · 2022-05-25T12:34:38Z

Wrt the mocking, you need to install pytest-mock (pip install pytest-mock).

Thanks, I've added that info to the Contributing guidelins & workflow for in case other will have the same question as I did.

sandervh14 · 2022-05-25T16:26:16Z

Hey Sam, just FYI: I'm committing my code changes I did based on your review, but then some meetings came up in between because I'm going to start at a new client, I couldn't finish the unit tests, will do that after the long weekend.

…reworking into base + child classes (had too much common code).

…er version).

sandervh14 · 2022-06-01T12:49:05Z

Hi Sam.

I've finished changes for this one. I started fixing the failing unit tests, but during the work I started noticing how much code of LogisticRegressionModel and LinearRegressionModel actually copied eachother, so I've also abstracted the duplicate code out of those classes into a base class Model.

So maybe have a look at it again if you have the time, before merging. Or actually you also just merge, in my opinion it's quite fine, and all unit tests + the new unit tests run successfully.

Optional read:
TLDR: Writing unit tests for code that accept a function (the "metric") that is only specified "at runtime" (i.e. in the unit test scope, in the case of the unit tests), proved to be hard to "just like that" monkey patch with pytest-mock, at least with my knowledge of it. To make my problem clear, and the solution I made, I clarify with the below, longer (sorry) explanation.

How does the below test function achieve patching the function roc_auc_score by stating that "cobra.model_building.models.roc_auc_score" can be patched, whereas actually in models.py (the file which is unit tested), the roc_auc_score function is actually imported (as from sklearn.metrics import roc_auc_score) instead of being implemented in that file? I would have expected the path to be specified for patching then as "sklearn.metrics.roc_auc_score" instead of "cobra.model_building.models.roc_auc_score" then wrong? Or is this just pytest-mock magic, stating that any function created in that module scope - either by definition in that library, or by importing - must be monkey patched?

(I've added pytest-mock to my development plan to learn this magic!)

from sklearn.metrics import top_k_accuracy_score

def test_evaluate_no_metric_specified(self, mocker):
    X = mock_data()
    y = pd.Series([1] * 5 + [0] * 5)

    def mock_roc_auc_score(y_true, y_score):
        return 0.79

    (mocker
     .patch("cobra.model_building.LogisticRegressionModel.score_model",
            mock_score_model_classification))

    (mocker
     .patch("cobra.model_building.models.roc_auc_score",
            mock_roc_auc_score))

    model = LogisticRegressionModel()
    actual = model.evaluate(X, y)  # implied: metric=None

    assert actual == 0.79

My problem in writing the unit tests was that the following didn't work:

def mock_top_k_accuracy_score(y_true, y_score,
                              *, k=2, normalize=True,
                              sample_weight=None, labels=None):
   # ...
    return 0.14

(mocker
 .patch("sklearn.metrics.top_k_accuracy_score",
        mock_top_k_accuracy_score))

model = LogisticRegressionModel()
actual = model.evaluate(X, y,
                        metric=top_k_accuracy_score)

... which is why I went for passing a custom scoring function directly, rather than mocking one from sklearn:

       def top_k_accuracy_score(y_true, y_score,
                                 *, k=2, normalize=True,
                                 sample_weight=None, labels=None):
            if not np.array_equal(y_true, y):
                raise ValueError("LogisticRegressionModel.evaluate() did not "
                                 "succeed in passing the correct y_true "
                                 "argument.")
            if not np.array_equal(y_score,
                                  mock_score_model_classification_output):
                raise ValueError("LogisticRegressionModel.evaluate() did not "
                                 "succeed in passing the correct y_score "
                                 "argument.")

            return 0.14

        model = LogisticRegressionModel()
        actual = model.evaluate(X, y,
                                metric=top_k_accuracy_score)

So, I sort of circumvented the mocking difficulty, but I think in the end the custom function tests the functionality well anyway.
Do let me know though if you think it needs to be done in another way, though.

Finally: YAAY, a new pull request that is finished!

ZlaTanskY · 2022-06-24T12:36:30Z

cobra/model_building/forward_selection.py

+        if higher_is_better is None:
+            if metric is None:
+                if self.MLModel == LogisticRegressionModel:
+                    # If no custom evaluation metric is chosen,
+                    # the LogisticRegressionModel uses AUC as default metric,
+                    # so "higher is better" evaluation logic is applied on the
+                    # evaluation scores.
+                    self.higher_is_better = True
+                elif self.MLModel == LinearRegressionModel:
+                    # If no custom evaluation metric is chosen,
+                    # the LinearRegressionModel uses RMSE as default metric,
+                    # so "lower is better" evaluation logic is applied on the
+                    # evaluation scores.
+                    self.higher_is_better = False


I am not really convinced this is the best way to do this. metric should also support str, which makes it a lot more user-friendly.
Since higher_is_better is completely dependent on the metric, I think that a mapping of metric -> higher_is_better would be more efficient. The type of model should not be relevant, only for the default value. It is true that some (or most) metrics are only used for classification or regression and not both, but we shouldn't make our code more complex because of this.

default_higher_is_better = self.MLModel == LogisticRegressionModel metric_map = { "RMSE": False, "AUC": True, None: default_higher_is_better } if isinstance(metric, str): self.higher_is_better = metric_map[metric] else: # metric should be a Callable in this case; or you could add another if statement: if callable(metric): # and higher_is_better should be mandatory in this case, else we raise an error <some raising here in case of missing higher_is_better or if metric is not a callable>

ZlaTanskY · 2022-06-24T12:39:37Z

cobra/model_building/forward_selection.py

+                    raise ValueError("The configured machine learning model is "
+                                     "not the standard logistic regression or "
+                                     "linear regression model. "
+                                     "Therefore, please fill the metric and "
+                                     "higher_is_better arguments.")


Which other types are supported? L79-L82 only states 2 types.

ZlaTanskY · 2022-06-24T12:43:49Z

cobra/model_building/models.py

+    linear : LinearRegression
+        scikit-learn linear regression model.


Only has predictors as attribute

ZlaTanskY · 2022-06-24T12:52:06Z

cobra/model_building/models.py

@@ -229,42 +251,72 @@ def compute_variable_importance(self, data: pd.DataFrame) -> pd.DataFrame:
        return (df.sort_values(by="importance", ascending=False)
                .reset_index(drop=True))

-    def _is_valid_dict(self, model_dict: dict) -> bool:
+    def _is_valid_dict(self, model_dict: dict):


Suggested change

def _is_valid_dict(self, model_dict: dict):

def _is_valid_dict(self, model_dict: dict) -> bool:

ZlaTanskY · 2022-06-24T12:58:50Z

cobra/model_building/models.py

-        self.predictors = []  # placeholder to keep track of a list of predictors
-        self._eval_metrics_by_split = {}
+        super().__init__()
+        self.logit = LogisticRegression(fit_intercept=True, C=1e9,


If you change logit to model in LogisticRegression class and linear to model in LinearRegression class, you can place the fit() method in the base Model class. Both self.logit and self.linear have the same functionality, i.e. the actual model of this class, so the more general name should be self.model or something similar (and equal).

ZlaTanskY · 2022-06-24T13:06:49Z

cobra/model_building/models.py

+        return sqrt(
+            mean_squared_error(y_true=y_true,
+                               # LinearRegressionModel actually returns y_pred
+                               # inside y_score:
+                               y_pred=y_score)
+        )


Suggested change

return sqrt(

mean_squared_error(y_true=y_true,

# LinearRegressionModel actually returns y_pred

# inside y_score:

y_pred=y_score)

)

return mean_squared_error(y_true=y_true,

# LinearRegressionModel actually returns y_pred

# inside y_score:

y_pred=y_score,

squared=False)

)

Support evaluating and forward feature selecting with a custom metric.

47327a3

sandervh14 linked an issue Apr 8, 2022 that may be closed by this pull request

Tuning classification model on different metrics (lift, recall, precision,...) #126

Open

sandervh14 self-assigned this Apr 8, 2022

sandervh14 added the enhancement New feature or request label Apr 8, 2022

sandervh14 added this to the v1.2.0 milestone Apr 8, 2022

sandervh14 requested a review from sborms April 8, 2022 08:55

sborms reviewed Apr 13, 2022

View reviewed changes

Revision of code based on Sam's review

6e68737

Sander Vanden Hautte added 2 commits June 1, 2022 14:27

model evaluation: fixing failed unit tests, new unit tests + models: …

46001ee

…reworking into base + child classes (had too much common code).

Fix for breaking unit test during CI/CD (my local sklearn had way old…

24442e2

…er version).

sandervh14 mentioned this pull request Jun 1, 2022

Implementation of #129: Allow more tuning of the model hyperparameters. #130

Open

ZlaTanskY reviewed Jun 24, 2022

View reviewed changes

sandervh14 modified the milestones: 2023-03, 2023-04 Apr 7, 2023

sandervh14 modified the milestones: 2023-04, 2023-05 May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation for #126: supporting evaluation & forward feature selection for custom metrics (e.g. lift) #128

Implementation for #126: supporting evaluation & forward feature selection for custom metrics (e.g. lift) #128

sandervh14 commented Apr 8, 2022 •

edited

Loading

sborms left a comment

sborms commented Apr 13, 2022

sandervh14 commented May 25, 2022

sandervh14 commented May 25, 2022 •

edited

Loading

sandervh14 commented Jun 1, 2022

ZlaTanskY Jun 24, 2022

ZlaTanskY Jun 24, 2022

ZlaTanskY Jun 24, 2022

ZlaTanskY Jun 24, 2022

ZlaTanskY Jun 24, 2022

ZlaTanskY Jun 24, 2022

		linear : LinearRegression
		scikit-learn linear regression model.

	def _is_valid_dict(self, model_dict: dict):
	def _is_valid_dict(self, model_dict: dict) -> bool:

Implementation for #126: supporting evaluation & forward feature selection for custom metrics (e.g. lift) #128

Are you sure you want to change the base?

Implementation for #126: supporting evaluation & forward feature selection for custom metrics (e.g. lift) #128

Conversation

sandervh14 commented Apr 8, 2022 • edited Loading

Story Title

Changes made

How does the solution address the problem

Linked issues

sborms left a comment

Choose a reason for hiding this comment

sborms commented Apr 13, 2022

sandervh14 commented May 25, 2022

sandervh14 commented May 25, 2022 • edited Loading

sandervh14 commented Jun 1, 2022

ZlaTanskY Jun 24, 2022

Choose a reason for hiding this comment

ZlaTanskY Jun 24, 2022

Choose a reason for hiding this comment

ZlaTanskY Jun 24, 2022

Choose a reason for hiding this comment

ZlaTanskY Jun 24, 2022

Choose a reason for hiding this comment

ZlaTanskY Jun 24, 2022

Choose a reason for hiding this comment

ZlaTanskY Jun 24, 2022

Choose a reason for hiding this comment

sandervh14 commented Apr 8, 2022 •

edited

Loading

sandervh14 commented May 25, 2022 •

edited

Loading