feat: Add evaluate function for classifiers #195

Pringled · 2025-02-16T15:06:39Z

This PR adds an evaluate function that can be used after fitting a classifier. It returns an sklearn classification report. It works for both multi and single label inputs, and str and int type labels. This makes it a bit easier to evaluate a model by abstracting away the logic for binarizing labels in the multilabel case, and in general provides a nice natural flow, e.g.:

# Load a model
classifier = StaticModelForClassification.from_pretrained(model_name="minishlab/potion-base-32M")
# Load a dataset
ds = load_dataset("setfit/subj")
# Fit 
classifier.fit(ds["train"]["text"], ds["train"]["label"], max_epochs=1)
# Evaluate
print(classifier.evaluate(ds["test"]["text"], ds["test"]["label"]))

The tests are also updated to include int type labels. While our typing doesn't officially support list[int], many of the datasets we benchmarked on have int type labels, and not including that in the tests is dangerous (I accidentally broke the int label logic in my previous PR but our tests didn't catch that, now they will). In a followup we can think about potentially updating our typing.

Co-authored-by: Stephan Tulkens <[email protected]>

…d-multilabel-classification

codecov · 2025-02-16T15:12:39Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines	Coverage Δ
model2vec/inference/__init__.py	`100.00% <100.00%> (ø)`
model2vec/inference/model.py	`92.38% <100.00%> (+1.79%)`	⬆️
model2vec/train/classifier.py	`97.38% <100.00%> (+0.07%)`	⬆️
tests/conftest.py	`100.00% <100.00%> (ø)`
tests/test_inference.py	`100.00% <100.00%> (ø)`
tests/test_trainable.py	`100.00% <100.00%> (ø)`

stephantul

I'm a bit on the fence about this one. If you follow my reasoning, you just end up with a thin wrapper around MultiLabelBinarizer.

stephantul · 2025-02-16T18:06:50Z

model2vec/train/classifier.py

@@ -227,6 +228,51 @@ def fit(
        self.eval()
        return self

+    def evaluate(


I feel like this function is hitting the wrong abstraction level. Here's some observations:

The function doesn't need to know what the classifier is, because multi_output labels look different, so you can leave out the check for self.multilabel.

There's no need to encode labels for non-multilabel output, you can just pass the un-encoded labels.

This function is not available to converted pipelines, but equally applicable.

So I would refactor this into a function that takes a bunch of labels, and then, based on the type and shape of the output, returns a report. This function is then called by this function.

So something like this:

def evaluate(self, ...): predictions = self.predict(...) return evaluate_single_or_multilabel(predictions, y)

The evaluate_single_or_multilabel then simplifies to:

def evaluate_single_or_multilabel(y, pred): if _is_multi_label_shaped(y): # Binarization etc. return classification_report(y_binarized, pred_binarized) return classification_report(y, pred)

That way you can also test these functions without needing to have models, and can also reuse them in other contexts. The consequence of all of this, however, is that evaluate simplifies to:

evaluate_single_or_multilabel(ds["label"], model.predict(ds["text"]))

So maybe having evaluate is not even necessary any more.

I refactored the code as per your suggestions.

In the inference model.py there's now evaluate_single_or_multilabel and _is_multi_label_shaped

both inference and train model.py have an evaluate function that calls evaluate_single_or_multilabel with the model predictions

single label case doesn't use a labelencoder anymore

This way evaluate is available to both trained models and pipeline converted models. I also updated the tests to reflect this.

As for your other comment: yes, this is a essentially a thin wrapper around MultiLabelBinarizer and classification_report. However, I think this is worth it. Consider the following example:

from sklearn.metrics import classification_report from sklearn.preprocessing import MultiLabelBinarizer predictions = classifier.predict(X, y) mlb = MultiLabelBinarizer(classes=classifier.classes) y_true = mlb.fit_transform(ds["test"]["labels"]) y_pred = mlb.transform(predictions) print(classification_report(y_true, y_pred, target_names=classifier.classes, zero_division=0))

Vs:

print(classifier.evaluate(X, y))

This is much easier to run and understand in my opinion, and fits in with the rest of our training code, which creates a wrapper around torch/lightning. While the function does not add much for the singelabel case, it does provide a unified interface and function, and even in that case it does give a slightly nicer way to evaluate IMO:

from sklearn.metrics import classification_report predictions = classifier.predict(X, y) print(classification_report(y, predictions, target_names=classifier.classes, zero_division=0))

Vs:

print(classifier.evaluate(X, y))

…d evaluate logic.

stephantul

Optional add-on: you don't need to pass classes to the evaluate function, you can derive the classes from the predicted or gold labels

model2vec/inference/model.py

Pringled and others added 30 commits February 14, 2025 12:12

Added multilabel option to training

453e5a9

Added multilabel option to training

0226494

Added multilabel option to training

a22d61a

Added multilabel option to training

68a4ae4

Added multilabel option to training

614069a

Added multilabel option to training

b50bc4a

Added threshold to predict

6831bfe

Updated docs

7bf46ea

Updated docs

d277e79

Removed fallback logic

d28b895

Updated docs

327ecb1

Updated docs

d38679f

Resolved feedback

6d80e90

Update model2vec/train/README.md

ad8ea8d

Co-authored-by: Stephan Tulkens <[email protected]>

Resolved feedback

b3363ff

Resolved feedback

15f4873

Resolved feedback

06dc246

Resolved feedback

43de6da

add multilabel targets, fix tests (#194)

8e944ab

Merge branch 'main' of https://github.com/MinishLab/model2vec into ad…

ff4043f

…d-multilabel-classification

Fixed bug with array conversion

5c9d397

Optimized inference performance

6a4f89b

Changed classes to np array

3609e62

Added int as possible label type

b4df861

Added int as possible label type

ba29feb

Use previous logic

3dcddf5

Updated type check

eccec80

Updated type check

f9037d9

Updated type check logic

2dc5b17

Fixed merge conflict

5003768

Pringled added 3 commits February 16, 2025 14:41

Added evaluate function

b6c00b8

Updated evaluate, updated tests to also include int type labels

a51f0bb

Updated docs

1c86d5e

Pringled requested a review from stephantul February 16, 2025 15:06

Fixed inference tests

f939695

stephantul reviewed Feb 16, 2025

View reviewed changes

Pringled added 2 commits February 17, 2025 09:24

Refactored evaluate. Made evaluate available for pipelines. Simplifie…

aa07183

…d evaluate logic.

Removed unused imports

69d990a

Pringled requested a review from stephantul February 17, 2025 08:35

stephantul approved these changes Feb 17, 2025

View reviewed changes

model2vec/inference/model.py Outdated Show resolved Hide resolved

Updated classes logic

065e04d

Pringled merged commit 722f939 into main Feb 17, 2025
6 checks passed

Pringled deleted the add-evaluate branch February 17, 2025 10:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add evaluate function for classifiers #195

feat: Add evaluate function for classifiers #195

Pringled commented Feb 16, 2025

codecov bot commented Feb 16, 2025 •

edited

Loading

stephantul left a comment

stephantul Feb 16, 2025

Pringled Feb 17, 2025

stephantul left a comment

feat: Add evaluate function for classifiers #195

feat: Add evaluate function for classifiers #195

Conversation

Pringled commented Feb 16, 2025

codecov bot commented Feb 16, 2025 • edited Loading

Codecov Report

stephantul left a comment

Choose a reason for hiding this comment

stephantul Feb 16, 2025

Choose a reason for hiding this comment

Pringled Feb 17, 2025

Choose a reason for hiding this comment

stephantul left a comment

Choose a reason for hiding this comment

codecov bot commented Feb 16, 2025 •

edited

Loading