Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Issue with the scorer Attribute Initialization in ROUGE #470

Open
ryan-minato opened this issue Dec 20, 2024 · 0 comments
Open

[BUG] Issue with the scorer Attribute Initialization in ROUGE #470

ryan-minato opened this issue Dec 20, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@ryan-minato
Copy link
Contributor

Describe the bug

The ROUGE class does not initialize the scorer attribute in its constructor. However, the compute method attempts to use it directly, which causes an error under all circumstances.

def __init__(
self,
methods: str | list[str],
multiple_golds: bool = False,
bootstrap: bool = False,
normalize_gold: callable = None,
normalize_pred: callable = None,
aggregation_function: callable = None,
tokenizer: object = None,
):
"""A ROUGE wrapper method. Relies on `rouge_scorer`.
Args:
methods (str | list[str]): What type of ROUGE scoring to use. Can be one or any of `rouge1`, `rouge2`, `rougeL` or `rougeLsum`.
multiple_golds (bool, optional): Whether to compute ROUGE by allowing the comparison to several golds
at once, or to compute ROUGE on individual gold/prediction pairs and aggregate afterwards. Defaults to False.
bootstrap (bool, optional): Whether to use bootstrapping. Defaults to False.
aggregation_function (callable, optional): How to aggregate the item results. Defaults to max.
Used if there are several golds or predictions on which scores were computed.
normalize_gold (callable, optional): Function to use to normalize the reference strings.
Defaults to None if no normalization is applied.
normalize_pred (callable, optional): Function to use to normalize the predicted strings.
Defaults to None if no normalization is applied.
tokenizer (object, optional): An object with `tokenize` method to be used by rouge scorer. If None, rouge-scorer's
default tokenizer will be used.
"""
if aggregation_function and bootstrap:
logger.warning("Can't use both bootstrapping and an aggregation function in Rouge. Keeping bootstrap.")
self.aggregation_function = aggregation_function
if self.aggregation_function is None:
self.aggregation_function = np.mean
self.methods = as_list(methods)
if any(method not in self.ALLOWED_ROUGE_METHODS for method in self.methods):
raise ValueError(
f"Rouge was initialised with method {methods}, which is not in {','.join(self.ALLOWED_ROUGE_METHODS)}"
)
self.multiple_golds = multiple_golds
self.bootstrap = bootstrap
self.normalize_gold = normalize_gold
self.normalize_pred = normalize_pred
self.tokenizer = tokenizer
def compute(self, golds: list[str], predictions: list[str], **kwargs) -> float | dict:
"""Computes the metric(s) over a list of golds and predictions for one single sample.
Args:
golds (list[str]): Reference targets
predictions (list[str]): Predicted strings
Returns:
float or dict: Aggregated score over the current sample's items.
If several rouge functions have been selected, returns a dict which maps name and scores.
"""
from rouge_score import rouge_scorer
if self.scorer is None:
self.scorer = rouge_scorer.RougeScorer(self.methods, tokenizer=self.tokenizer)
# Normalize

Proposed Solution

Add an initialization for self.scorer as None in the __init__ method.

To Reproduce

Run any test involving ROUGE, such as the following:

lighteval accelerate \
    "pretrained=gpt2" \
    "helm|summarization:xsum-sampled|0|0"

Expected behavior

The test should execute without issues.

Version info

The issue was encountered with the version installed directly from the main branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant