Add swiss legal evals as new community tasks #389

JoelNiklaus · 2024-11-11T11:03:56Z

Adds new community tasks with swiss legal evaluations. Currently translation tasks are supported but others may follow in the future.

clefourrier · 2024-11-12T09:16:18Z

@hynky1999 tagging you if you've got a couple minutes to check the templating when back from the offsite

community_tasks/swiss_legal_evals.py

hynky1999 · 2024-11-12T21:05:06Z

Re templates:
We don't have any template for translation tasks atm.
There are many variants to go with (see the image below), but I would prefer going with the [src]: [input] [tgt]: (A variant). Since translation is inherently cross-lingual tasks and it's not clear which language we should use (target or source?), such template allows us to be independant on language (the language labels are kinda standardized, but yeah they will be in latin).

@JoelNiklaus
Have you experimented different prompt formats?

Source: https://arxiv.org/pdf/2301.07069

I can quickly make a PR for the translation template and we can convert it to that.

JoelNiklaus · 2024-11-13T08:48:32Z

I haven't experimented with prompts yet. Yes, going with variant A sounds good.

Thanks so much!

JoelNiklaus · 2024-11-13T11:08:33Z

Btw. what is the reason you are not using the metrics from evaluate?

clefourrier · 2024-11-13T11:38:25Z

Evaluate is no longer actively maintained (it's indicated in the Github readme). We also wanted lighteval to be light, and not rely on a heap of dependencies.

JoelNiklaus · 2024-11-13T13:56:40Z

I see. I used the direct implementation for COMET and METEOR, rather than evaluate.

community_tasks/swiss_legal_evals.py

NathanHB · 2024-11-19T13:13:53Z

PR looks great ! Do the results on your evals look sound ?
Also, you can use the pre-commit hooks to format the files and fix the CI :)

pip install pre-commit
pre-commit install
pre-commit run --all-files

JoelNiklaus · 2024-11-20T10:40:42Z

Great, thanks!
Just ran the pre-commit hooks.

Couldn't run the evals yet because of the judge prompt. Hope to do that soon.

* implement tranlsation prompt * add small coment about tranlsation prompt * change formatting to reformat language dependant parts --------- Co-authored-by: Clémentine Fourrier <[email protected]>

…GPU memory.

…m claude to gemini.

Add swiss legal evals as new community tasks

e2a27a7

clefourrier requested a review from hynky1999 November 12, 2024 09:15

clefourrier reviewed Nov 12, 2024

View reviewed changes

JoelNiklaus added 2 commits November 12, 2024 10:34

Removed nltk and numpy dependencies.

aa409c8

Added short dataset descriptions.

a8ee2a5

Merge branch 'main' into add_swiss_legal_evals

8f68844

Removed open judge models and added COMET and METEOR.

c7f7038

hynky1999 mentioned this pull request Nov 13, 2024

Adds template for translation tasks #391

Merged

Merge branch 'main' into add_swiss_legal_evals

0ca5af6

NathanHB reviewed Nov 19, 2024

View reviewed changes

community_tasks/swiss_legal_evals.py Outdated Show resolved Hide resolved

Merge branch 'main' into add_swiss_legal_evals

1d51a01

Ran pre-commit hooks.

5d41ce0

JoelNiklaus added 9 commits November 20, 2024 11:52

Changed prompt template.

8194125

Added legal translation specific judge prompt.

c58ae44

Improved judge prompt.

ff3705f

Changed metric selection.

091ec11

Made generation_size dependent on the config.

5a47956

Fixed error in config.

6bf7fa2

Fixed error in config.

6cf1c2a

Added support for multiple devices.

b548801

Fixed some bugs for evaluation on GPUs.

ee2a83c

JoelNiklaus and others added 30 commits December 9, 2024 06:42

Merge branch 'main' into add_swiss_legal_evals

44ad734

Made metric selection easier.

7b77972

Fixed dict issue.

fcd9505

Added metric dependencies.

5a8ca46

Moving metrics to extended tasks.

bab94af

Merge branch 'main' into add_swiss_legal_evals

3746849

Merge branch 'main' into add_swiss_legal_evals

ddaadbf

Added support for judges from different providers.

09be56d

Added additional system and user prompts and few shot examples.

0aa8607

Removed debug relics.

c49e1e2

Fixed issue in judge prompt.

4418e82

Adapted getting predictions to new way for all metrics.

075ebd2

Added gemba mqm metric by default.

8ee2dbc

Fixed error in gemba score when errors are no dicts.

4408d0d

Added different judge configurations for gpt 4o.

be6d9ab

Fixed typo.

c7ca83f

Disabled short metrics for evaluation of longer sequences.

930cbc5

Added xcomet metrics to sentence level metrics.

61058b1

Fixed error in bleurt and enabled lazy loading of metrics to save on …

e043ee8

…GPU memory.

Refactored judge metric creation.

1c38c0a

Improved detailed judge prompt and changed secondary judge models fro…

e05ac6a

…m claude to gemini.

Changed judge order.

0aed063

Merge branch 'main' into add_swiss_legal_evals

d9078a7

Fixed stop sequence issue in press releases.

46eb62a

Fixed error in xcomet scores.

a78bc03

Made metric groups more easily configurable.

f6b50b4

Made comet score more robust.

7f36065

Moved unpack to the pipeline code.

cb6bfb4

Merge branch 'huggingface:main' into add_swiss_legal_evals

306ee76

Fixed bug in comet score.

866e770

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add swiss legal evals as new community tasks #389

Add swiss legal evals as new community tasks #389

JoelNiklaus commented Nov 11, 2024

clefourrier commented Nov 12, 2024 •

edited

Loading

hynky1999 commented Nov 12, 2024

JoelNiklaus commented Nov 13, 2024

JoelNiklaus commented Nov 13, 2024

clefourrier commented Nov 13, 2024 •

edited

Loading

JoelNiklaus commented Nov 13, 2024 •

edited

Loading

NathanHB commented Nov 19, 2024

JoelNiklaus commented Nov 20, 2024

Add swiss legal evals as new community tasks #389

Are you sure you want to change the base?

Add swiss legal evals as new community tasks #389

Conversation

JoelNiklaus commented Nov 11, 2024

clefourrier commented Nov 12, 2024 • edited Loading

hynky1999 commented Nov 12, 2024

JoelNiklaus commented Nov 13, 2024

JoelNiklaus commented Nov 13, 2024

clefourrier commented Nov 13, 2024 • edited Loading

JoelNiklaus commented Nov 13, 2024 • edited Loading

NathanHB commented Nov 19, 2024

JoelNiklaus commented Nov 20, 2024

clefourrier commented Nov 12, 2024 •

edited

Loading

clefourrier commented Nov 13, 2024 •

edited

Loading

JoelNiklaus commented Nov 13, 2024 •

edited

Loading