Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add swiss legal evals as new community tasks #389

Open
wants to merge 51 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
e2a27a7
Add swiss legal evals as new community tasks
JoelNiklaus Nov 11, 2024
aa409c8
Removed nltk and numpy dependencies.
JoelNiklaus Nov 12, 2024
a8ee2a5
Added short dataset descriptions.
JoelNiklaus Nov 12, 2024
8f68844
Merge branch 'main' into add_swiss_legal_evals
clefourrier Nov 13, 2024
c7f7038
Removed open judge models and added COMET and METEOR.
JoelNiklaus Nov 13, 2024
0ca5af6
Merge branch 'main' into add_swiss_legal_evals
clefourrier Nov 19, 2024
1d51a01
Merge branch 'main' into add_swiss_legal_evals
NathanHB Nov 19, 2024
5d41ce0
Ran pre-commit hooks.
JoelNiklaus Nov 20, 2024
8194125
Changed prompt template.
JoelNiklaus Nov 20, 2024
c58ae44
Added legal translation specific judge prompt.
JoelNiklaus Nov 21, 2024
ff3705f
Improved judge prompt.
JoelNiklaus Nov 21, 2024
091ec11
Changed metric selection.
JoelNiklaus Nov 21, 2024
5a47956
Made generation_size dependent on the config.
JoelNiklaus Nov 22, 2024
6bf7fa2
Fixed error in config.
JoelNiklaus Nov 22, 2024
6cf1c2a
Fixed error in config.
JoelNiklaus Nov 22, 2024
b548801
Added support for multiple devices.
JoelNiklaus Nov 22, 2024
ee2a83c
Fixed some bugs for evaluation on GPUs.
JoelNiklaus Nov 25, 2024
36b7e94
Added batch inference for heavy metrics and multiplied each score by …
JoelNiklaus Nov 26, 2024
5ba218f
Added few shot examples and did some refactoring.
JoelNiklaus Nov 26, 2024
8490841
Merge branch 'main' into add_swiss_legal_evals
JoelNiklaus Nov 26, 2024
576b847
Switched to an own judge class.
JoelNiklaus Nov 26, 2024
41bb59a
Fixed issue with judge metric not showing up in results.
JoelNiklaus Nov 26, 2024
d82cd91
Fixed issue with evaluation on GPUs.
JoelNiklaus Nov 27, 2024
1b13d9f
Speed up metric computation on GPUs.
JoelNiklaus Nov 27, 2024
df0f3f0
Added more logging.
JoelNiklaus Nov 27, 2024
980c257
Switched to sample level scores for faster evaluation.
JoelNiklaus Nov 28, 2024
9a60dc0
Added rescale_with_baseline for BERTScore for better differentiation.
JoelNiklaus Nov 29, 2024
8c7814f
Merge branch 'main' into add_swiss_legal_evals
JoelNiklaus Dec 2, 2024
819b949
Adapted metrics.
JoelNiklaus Dec 2, 2024
e758316
Switched to sacrebleu implementation for sentence level translation m…
JoelNiklaus Dec 2, 2024
d08163f
Added more stop sequences.
JoelNiklaus Dec 4, 2024
86c67bc
Made stop_sequence level specific.
JoelNiklaus Dec 5, 2024
f109945
Added gemba metric.
JoelNiklaus Dec 6, 2024
f357176
Updated logging.
JoelNiklaus Dec 9, 2024
2d4c0ed
Updated stop_sequence.
JoelNiklaus Dec 9, 2024
44ad734
Merge branch 'main' into add_swiss_legal_evals
JoelNiklaus Dec 9, 2024
7b77972
Made metric selection easier.
JoelNiklaus Dec 10, 2024
fcd9505
Fixed dict issue.
JoelNiklaus Dec 10, 2024
5a8ca46
Added metric dependencies.
JoelNiklaus Dec 11, 2024
bab94af
Moving metrics to extended tasks.
JoelNiklaus Dec 11, 2024
3746849
Merge branch 'main' into add_swiss_legal_evals
JoelNiklaus Dec 12, 2024
ddaadbf
Merge branch 'main' into add_swiss_legal_evals
JoelNiklaus Dec 17, 2024
09be56d
Added support for judges from different providers.
JoelNiklaus Dec 22, 2024
0aa8607
Added additional system and user prompts and few shot examples.
JoelNiklaus Dec 22, 2024
c49e1e2
Removed debug relics.
JoelNiklaus Dec 23, 2024
4418e82
Fixed issue in judge prompt.
JoelNiklaus Dec 23, 2024
075ebd2
Adapted getting predictions to new way for all metrics.
JoelNiklaus Dec 23, 2024
8ee2dbc
Added gemba mqm metric by default.
JoelNiklaus Dec 23, 2024
4408d0d
Fixed error in gemba score when errors are no dicts.
JoelNiklaus Dec 25, 2024
be6d9ab
Added different judge configurations for gpt 4o.
JoelNiklaus Dec 25, 2024
c7ca83f
Fixed typo.
JoelNiklaus Dec 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading