Add swiss legal evals as new community tasks #389

JoelNiklaus · 2024-11-11T11:03:56Z

Adds new community tasks with swiss legal evaluations. Currently translation tasks are supported but others may follow in the future.

clefourrier · 2024-11-12T09:16:18Z

@hynky1999 tagging you if you've got a couple minutes to check the templating when back from the offsite

community_tasks/swiss_legal_evals.py

hynky1999 · 2024-11-12T21:05:06Z

Re templates:
We don't have any template for translation tasks atm.
There are many variants to go with (see the image below), but I would prefer going with the [src]: [input] [tgt]: (A variant). Since translation is inherently cross-lingual tasks and it's not clear which language we should use (target or source?), such template allows us to be independant on language (the language labels are kinda standardized, but yeah they will be in latin).

@JoelNiklaus
Have you experimented different prompt formats?

Source: https://arxiv.org/pdf/2301.07069

I can quickly make a PR for the translation template and we can convert it to that.

JoelNiklaus · 2024-11-13T08:48:32Z

I haven't experimented with prompts yet. Yes, going with variant A sounds good.

Thanks so much!

JoelNiklaus · 2024-11-13T11:08:33Z

Btw. what is the reason you are not using the metrics from evaluate?

clefourrier · 2024-11-13T11:38:25Z

Evaluate is no longer actively maintained (it's indicated in the Github readme). We also wanted lighteval to be light, and not rely on a heap of dependencies.

JoelNiklaus · 2024-11-13T13:56:40Z

I see. I used the direct implementation for COMET and METEOR, rather than evaluate.

community_tasks/swiss_legal_evals.py

NathanHB · 2024-11-19T13:13:53Z

PR looks great ! Do the results on your evals look sound ?
Also, you can use the pre-commit hooks to format the files and fix the CI :)

pip install pre-commit
pre-commit install
pre-commit run --all-files

JoelNiklaus · 2024-11-20T10:40:42Z

Great, thanks!
Just ran the pre-commit hooks.

Couldn't run the evals yet because of the judge prompt. Hope to do that soon.

* implement tranlsation prompt * add small coment about tranlsation prompt * change formatting to reformat language dependant parts --------- Co-authored-by: Clémentine Fourrier <[email protected]>

…etrics.

Add swiss legal evals as new community tasks

e2a27a7

clefourrier requested a review from hynky1999 November 12, 2024 09:15

clefourrier reviewed Nov 12, 2024

View reviewed changes

JoelNiklaus added 2 commits November 12, 2024 10:34

Removed nltk and numpy dependencies.

aa409c8

Added short dataset descriptions.

a8ee2a5

Merge branch 'main' into add_swiss_legal_evals

8f68844

Removed open judge models and added COMET and METEOR.

c7f7038

hynky1999 mentioned this pull request Nov 13, 2024

Adds template for translation tasks #391

Merged

Merge branch 'main' into add_swiss_legal_evals

0ca5af6

NathanHB reviewed Nov 19, 2024

View reviewed changes

community_tasks/swiss_legal_evals.py Outdated Show resolved Hide resolved

Merge branch 'main' into add_swiss_legal_evals

1d51a01

Ran pre-commit hooks.

5d41ce0

JoelNiklaus added 9 commits November 20, 2024 11:52

Changed prompt template.

8194125

Added legal translation specific judge prompt.

c58ae44

Improved judge prompt.

ff3705f

Changed metric selection.

091ec11

Made generation_size dependent on the config.

5a47956

Fixed error in config.

6bf7fa2

Fixed error in config.

6cf1c2a

Added support for multiple devices.

b548801

Fixed some bugs for evaluation on GPUs.

ee2a83c

JoelNiklaus added 12 commits November 26, 2024 16:48

Fixed issue with judge metric not showing up in results.

41bb59a

Fixed issue with evaluation on GPUs.

d82cd91

Speed up metric computation on GPUs.

1b13d9f

Added more logging.

df0f3f0

Switched to sample level scores for faster evaluation.

980c257

Added rescale_with_baseline for BERTScore for better differentiation.

9a60dc0

Merge branch 'main' into add_swiss_legal_evals

8c7814f

Adapted metrics.

819b949

Switched to sacrebleu implementation for sentence level translation m…

e758316

…etrics.

Added more stop sequences.

d08163f

Made stop_sequence level specific.

86c67bc

Added gemba metric.

f109945

JoelNiklaus mentioned this pull request Dec 6, 2024

[FT] Add Gemba MQM Translation Metric #397

Closed

JoelNiklaus added 17 commits December 9, 2024 15:39

Updated logging.

f357176

Updated stop_sequence.

2d4c0ed

Merge branch 'main' into add_swiss_legal_evals

44ad734

Made metric selection easier.

7b77972

Fixed dict issue.

fcd9505

Added metric dependencies.

5a8ca46

Moving metrics to extended tasks.

bab94af

Merge branch 'main' into add_swiss_legal_evals

3746849

Merge branch 'main' into add_swiss_legal_evals

ddaadbf

Added support for judges from different providers.

09be56d

Added additional system and user prompts and few shot examples.

0aa8607

Removed debug relics.

c49e1e2

Fixed issue in judge prompt.

4418e82

Adapted getting predictions to new way for all metrics.

075ebd2

Added gemba mqm metric by default.

8ee2dbc

Fixed error in gemba score when errors are no dicts.

4408d0d

Added different judge configurations for gpt 4o.

be6d9ab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add swiss legal evals as new community tasks #389

Add swiss legal evals as new community tasks #389

JoelNiklaus commented Nov 11, 2024

clefourrier commented Nov 12, 2024 •

edited

Loading

hynky1999 commented Nov 12, 2024

JoelNiklaus commented Nov 13, 2024

JoelNiklaus commented Nov 13, 2024

clefourrier commented Nov 13, 2024 •

edited

Loading

JoelNiklaus commented Nov 13, 2024 •

edited

Loading

NathanHB commented Nov 19, 2024

JoelNiklaus commented Nov 20, 2024

Add swiss legal evals as new community tasks #389

Are you sure you want to change the base?

Add swiss legal evals as new community tasks #389

Conversation

JoelNiklaus commented Nov 11, 2024

clefourrier commented Nov 12, 2024 • edited Loading

hynky1999 commented Nov 12, 2024

JoelNiklaus commented Nov 13, 2024

JoelNiklaus commented Nov 13, 2024

clefourrier commented Nov 13, 2024 • edited Loading

JoelNiklaus commented Nov 13, 2024 • edited Loading

NathanHB commented Nov 19, 2024

JoelNiklaus commented Nov 20, 2024

clefourrier commented Nov 12, 2024 •

edited

Loading

clefourrier commented Nov 13, 2024 •

edited

Loading

JoelNiklaus commented Nov 13, 2024 •

edited

Loading