[BUG] Issue with the scorer Attribute Initialization in ROUGE #470

ryan-minato · 2024-12-20T10:08:06Z

Describe the bug

The ROUGE class does not initialize the scorer attribute in its constructor. However, the compute method attempts to use it directly, which causes an error under all circumstances.

lighteval/src/lighteval/metrics/metrics_sample.py

Lines 442 to 501 in a1c610d

    
               def __init__( 
        
                   self, 
        
                   methods: str | list[str], 
        
                   multiple_golds: bool = False, 
        
                   bootstrap: bool = False, 
        
                   normalize_gold: callable = None, 
        
                   normalize_pred: callable = None, 
        
                   aggregation_function: callable = None, 
        
                   tokenizer: object = None, 
        
               ): 
        
                   """A ROUGE wrapper method. Relies on `rouge_scorer`. 
        
                   Args: 
        
                       methods (str | list[str]): What type of ROUGE scoring to use. Can be one or any of `rouge1`, `rouge2`, `rougeL` or `rougeLsum`. 
        
                       multiple_golds (bool, optional): Whether to compute ROUGE by allowing the comparison to several golds 
        
                           at once, or to compute ROUGE on individual gold/prediction pairs and aggregate afterwards. Defaults to False. 
        
                       bootstrap (bool, optional): Whether to use bootstrapping. Defaults to False. 
        
                       aggregation_function (callable, optional): How to aggregate the item results. Defaults to max. 
        
                           Used if there are several golds or predictions on which scores were computed. 
        
                       normalize_gold (callable, optional): Function to use to normalize the reference strings. 
        
                           Defaults to None if no normalization is applied. 
        
                       normalize_pred (callable, optional): Function to use to normalize the predicted strings. 
        
                           Defaults to None if no normalization is applied. 
        
                       tokenizer (object, optional): An object with `tokenize` method to be used by rouge scorer. If None, rouge-scorer's 
        
                           default tokenizer will be used. 
        
                   """ 
        
                   if aggregation_function and bootstrap: 
        
                       logger.warning("Can't use both bootstrapping and an aggregation function in Rouge. Keeping bootstrap.") 
        
                   self.aggregation_function = aggregation_function 
        
                   if self.aggregation_function is None: 
        
                       self.aggregation_function = np.mean 
        
                   self.methods = as_list(methods) 
        
                   if any(method not in self.ALLOWED_ROUGE_METHODS for method in self.methods): 
        
                       raise ValueError( 
        
                           f"Rouge was initialised with method {methods}, which is not in {','.join(self.ALLOWED_ROUGE_METHODS)}" 
        
                       ) 
        
                   self.multiple_golds = multiple_golds 
        
                   self.bootstrap = bootstrap 
        
                   self.normalize_gold = normalize_gold 
        
                   self.normalize_pred = normalize_pred 
        
                   self.tokenizer = tokenizer 
        
               def compute(self, golds: list[str], predictions: list[str], **kwargs) -> float | dict: 
        
                   """Computes the metric(s) over a list of golds and predictions for one single sample. 
        
                   Args: 
        
                       golds (list[str]): Reference targets 
        
                       predictions (list[str]): Predicted strings 
        
                   Returns: 
        
                       float or dict: Aggregated score over the current sample's items. 
        
                           If several rouge functions have been selected, returns a dict which maps name and scores. 
        
                   """ 
        
                   from rouge_score import rouge_scorer 
        
                   if self.scorer is None: 
        
                       self.scorer = rouge_scorer.RougeScorer(self.methods, tokenizer=self.tokenizer) 
        
                   # Normalize

Proposed Solution

Add an initialization for self.scorer as None in the __init__ method.

To Reproduce

Run any test involving ROUGE, such as the following:

lighteval accelerate \
    "pretrained=gpt2" \
    "helm|summarization:xsum-sampled|0|0"

Expected behavior

The test should execute without issues.

Version info

The issue was encountered with the version installed directly from the main branch.

The text was updated successfully, but these errors were encountered:

ryan-minato added the bug Something isn't working label Dec 20, 2024

ryan-minato mentioned this issue Dec 20, 2024

fix: scorer attribute initialization in ROUGE #471

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Issue with the scorer Attribute Initialization in ROUGE #470

[BUG] Issue with the scorer Attribute Initialization in ROUGE #470

ryan-minato commented Dec 20, 2024

[BUG] Issue with the scorer Attribute Initialization in ROUGE #470

[BUG] Issue with the scorer Attribute Initialization in ROUGE #470

Comments

ryan-minato commented Dec 20, 2024

Describe the bug

Proposed Solution

To Reproduce

Expected behavior

Version info