Make evaluator invariant of input request type order #215

sadra-barikbin · 2024-07-05T20:00:36Z

Hi there!

To make evaluator.py::evaluate(...,request_dicts,...) invariant of request type order in request_dicts.

As task.process_results() expects the responses to be in a specific order, evaluate() must prepare them in that very order, but currently it depends on the order in its request_dicts args. This PR proposes to use RequestType as the ordering reference to which both task.process_results() and evaluate() submit.

Fixes #193

sadra-barikbin · 2024-07-05T20:13:50Z

src/lighteval/evaluator.py

-        # ===== Unpack the request =====
-        prediction_list.sort(
-            key=lambda x: x.request_index
-        )  # When we use Loglikelihood for several tokens we have all the options here


This seems to be unnecessary as the lm responses are reordered back to their original order in lm's methods. I removed it in this PR because for example when the responses for a document is like [LoglikelihoodReturn(index=0), LoglikelihoodReturn(index=1), GreedyUntilReturn(index=0)] (which occurs now because RequestType.LOGLIKELIHOOD resides before RequestType.GREEDY_UNTIL in RequestType), this statement wrongly changes it to [LoglikelihoodReturn(index=0), GreedyUntilReturn(index=0)], LoglikelihoodReturn(index=1). If you think we should keep it, we could add the request type to the sort key.

The list sorted here will only contain request of the same type. It is used for loglikelihood requests. For example, for one loglikelihood request with 4 choices we would have 4 different request. We sort them here so that we have the same ordering from the task doc.

But by looking at the upper for loop, we find that all responses of of all types associated with a single task and single example go to this list.

oh yeah i forgot that we could have both greedy and multichoice returns for one task.
THis could be an issue when computing results. Did you check that removing the sorting was indeed fixing this issue ?

Yes I did. It could be verified by the example I put in #193

src/lighteval/metrics/__init__.py

NathanHB

lgtm ! Thanks for the fix :)

Do the changes

23e7ddd

sadra-barikbin commented Jul 5, 2024

View reviewed changes

NathanHB reviewed Jul 9, 2024

View reviewed changes

src/lighteval/metrics/__init__.py Show resolved Hide resolved

Merge branch 'main' into Fix-request-type-order-in-evaluator

aadd8f7

NathanHB approved these changes Jul 16, 2024

View reviewed changes

NathanHB and others added 2 commits July 16, 2024 16:38

Merge branch 'main' into Fix-request-type-order-in-evaluator

9f74643

Merge branch 'main' into Fix-request-type-order-in-evaluator

7e48cb0

clefourrier merged commit 951cd5b into huggingface:main Jul 17, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make evaluator invariant of input request type order #215

Make evaluator invariant of input request type order #215

sadra-barikbin commented Jul 5, 2024 •

edited

Loading

sadra-barikbin Jul 5, 2024 •

edited

Loading

NathanHB Jul 9, 2024

sadra-barikbin Jul 10, 2024

NathanHB Jul 10, 2024

sadra-barikbin Jul 10, 2024

NathanHB left a comment

Make evaluator invariant of input request type order #215

Make evaluator invariant of input request type order #215

Conversation

sadra-barikbin commented Jul 5, 2024 • edited Loading

sadra-barikbin Jul 5, 2024 • edited Loading

Choose a reason for hiding this comment

NathanHB Jul 9, 2024

Choose a reason for hiding this comment

sadra-barikbin Jul 10, 2024

Choose a reason for hiding this comment

NathanHB Jul 10, 2024

Choose a reason for hiding this comment

sadra-barikbin Jul 10, 2024

Choose a reason for hiding this comment

NathanHB left a comment

Choose a reason for hiding this comment

sadra-barikbin commented Jul 5, 2024 •

edited

Loading

sadra-barikbin Jul 5, 2024 •

edited

Loading