-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classification Evaluation #17
Classification Evaluation #17
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 12 files at r1.
Reviewable status: 9 of 12 files reviewed, 16 unresolved discussions (waiting on @fcogidi and @Negiiiin)
mmlearn/datasets/core/dataset_info.py
line 1 at r1 (raw file):
"""Saving the necessary imformstion about a dataset for evaluation tasks."""
typo: information.
mmlearn/datasets/core/dataset_info.py
line 55 at r1 (raw file):
return self._class_count def set_class_count(self, count: int) -> None:
Do we really need this setter? I imagine that for a dataset info object we need to set this once at initialization, and it won't change after that. Are there scenarios in which this is still needed?
mmlearn/datasets/core/dataset_info.py
line 80 at r1 (raw file):
return self._label_mapping def set_label_mapping(self, mapping: Dict[str, Any]) -> None:
Same comment.
mmlearn/datasets/core/dataset_info.py
line 107 at r1 (raw file):
return self._label_embedding def set_label_embedding(self, embedding: Dict[str, Any]) -> None:
same comment.
mmlearn/tasks/classification.py
line 52 at r1 (raw file):
Previously, fcogidi (Franklin) wrote…
I think linear probing should be a separate task. It is not really a zero-shot task (the linear classifier needs to be trained), so it shouldn't inherit from
EvaluationHooks
.
I agree. Zero-shot classification is retrieval, whereas linear probing and regular fine-tuning classifications add a classification head on top of the encoder output. In linear-probing the encoder weights are frozen, and in fine-tuning classification all weights are fine-tuned. We should have a class for zero-shot and another for linear probing & fine-tuning.
mmlearn/tasks/classification.py
line 70 at r1 (raw file):
if self.mode == "zeroshot": class LabelDescriptionDataset(Dataset):
I'm not sure if defining classes inside conditional statements is a good software engineering practice. on_evaluation_epoch_start
is a recurring function, keep in mind that this if statement runs multiple times. It doesn't make sense that the class definition is embedded here.
mmlearn/tasks/classification.py
line 84 at r1 (raw file):
tokens = self.tokenizer(description) example = Example(
Does this mean that this class only works for image-text data? Can we make the implementation more general so it works for any two modalities?
mmlearn/tasks/classification.py
line 86 at r1 (raw file):
example = Example( { Modalities.RGB: torch.rand(3, 224, 224),
Hard-coding values like this is a big NO! Please use configs for such values.
Code quote:
3, 224, 224
mmlearn/tasks/classification.py
line 110 at r1 (raw file):
for name, dataset_info in all_dataset_info.items(): descriptions = [ "This image has a sign of " + label
Hardcoding a prompt like this seems like a very specific type of zero-shot classification. Can we implement a more general class, and perhaps introduce a notion of "prompt" which could take a value like this string?
projects/med_benchmarking/configs/experiment/linear_probing.yaml
Outdated
Show resolved
Hide resolved
d5b2e37
to
cc07fc7
Compare
…m/Negiiiin/mmlearn into feature/classification_evaluation
…che poetry dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 18 files at r5, 4 of 19 files at r6, 5 of 15 files at r7, 12 of 37 files at r8.
Reviewable status: 15 of 44 files reviewed, 8 unresolved discussions (waiting on @afkanpour and @Negiiiin)
PR Type
[Feature]
Short Description
Added zero-shot and linear-probing classification evaluation (accuracy, f1-score and AUC)
This change is