[BUG] DatasetNotFoundError: Dataset 'asas-ai/AraTrust-categorized' doesn't exist on the Hub or cannot be accessed. #441

BobTsang1995 · 2024-12-12T13:09:43Z

It seems that the asas-ai/AraTrust-categorized dataset does not exist on hugging face. can you guys fix it?@alielfilali01

Describe the bug

When trying to run lighteval with custom Arabic evaluation tasks, it fails with a DatasetNotFoundError, indicating that the dataset 'asas-ai/AraTrust-categorized' cannot be found on the Hugging Face Hub.

To Reproduce

Set up the conda environment with Python 3.10
Install required packages
Run the following command:

accelerate launch --multi_gpu --num_processes=8 -m lighteval \
accelerate "pretrained=/mnt/sg_nas/liheng/Marco_checkpoint/Qwen2-7B-mmmlu-latest/checkpoint-1150,dtype=bfloat16,max_length=16384" \
"examples/tasks/OALL_v2_tasks.txt" \
--custom-tasks "community_tasks/arabic_evals.py" \
--output-dir="./evals/"

Full Error Message

[rank3]: DatasetNotFoundError: Dataset 'asas-ai/AraTrust-categorized' doesn't exist on the Hub or cannot be accessed.

Version info

Operating System: (Linux)
Python Version: Python 3.10

clefourrier · 2024-12-12T13:14:20Z

Thanks for the issue! It would help a lot if you could follow the template, and provide, each time you report a bug:

the command you run
the full stack trace of the error

BobTsang1995 · 2024-12-12T13:31:26Z

Thanks for the issue! It would help a lot if you could follow the template, and provide, each time you report a bug:

the command you run

the full stack trace of the error

sry, my bad bro. I submitted the detailed error information above.

alielfilali01 · 2024-12-12T13:31:52Z

Hey @BobTsang1995 & @clefourrier
Plz feel free to close this issue since it was not a bug.

Dataset is public again (someone switched it to private by mistake). Find here
Now the task should run as expected.

@BobTsang1995 can you plz keep commenting here even after the issue get closed if you face any further issues ? Usually they are just minor things otherwise if it's something really wrong we can open a separate issue for it and adress it properly.
Thank you

clefourrier · 2024-12-12T13:32:51Z

@BobTsang1995 thanks a lot for the update, much better! :)

BobTsang1995 · 2024-12-12T13:34:20Z

Hey @BobTsang1995 & @clefourrier Plz feel free to close this issue since it was not a bug.

Dataset is public again (someone switched it to private by mistake). Find here Now the task should run as expected.

@BobTsang1995 can you plz keep commenting here even after the issue get closed if you face any further issues ? Usually they are just minor things otherwise if it's something really wrong we can open a separate issue for it and adress it properly. Thank you

thx a lot, I will try it again, appreciate for your repeat

alielfilali01 · 2024-12-12T13:34:29Z

One last thing.
I hope the name Qwen2-7B-mmmlu-latest doesn't mean that is trained on mmmlu >_<
Have a good day/night and plz keep the comments coming !

BobTsang1995 · 2024-12-12T13:39:20Z

One last thing. I hope the name Qwen2-7B-mmmlu-latest doesn't mean that is trained on mmmlu >_< Have a good day/night and plz keep the comments coming !

LOL! This is my unique naming method. We have indeed done a lot of work on the mmmlu benchmark. Our multilingual model will be coming soon, so you can look forward to it.

BobTsang1995 · 2024-12-12T13:49:17Z

another error occurred. it seems that some entries do not have these keys KeyError: 'sol3' @alielfilali01

alielfilali01 · 2024-12-12T14:01:18Z

Oh shot ! That's because of the last PR #440 !
AlGhafa prompt function is applied on all 9 subsets of AlGhafa Native (here) but not all of the subsets share the same columns !
This PR #442 should fix it ounce merged (make sure to make the changes on your local machine so you don't wait for the PR to be merged)

clefourrier · 2024-12-12T14:05:59Z

Last PR's fixed the issue that @BobTsang1995 had on the previous subset, because this line caused an error. Please find a way to explicitly provide the columns (for example, providing the full list of all allowed columns, instead of excluding some)

alielfilali01 · 2024-12-12T14:17:37Z

Oh sorry ! got mixed between the issues ! Ok i will investigate this further and update the associated PR

BobTsang1995 · 2024-12-12T17:20:04Z

Oh sorry ! got mixed between the issues ! Ok i will investigate this further and update the associated PR

There is still a problem with PR #444. The task_name passed to the alghafa_pfn() function has a prefix of community|xxx:, and the key cannot be found in the dictionary. I think maybe you should run the code before PR.

at the same time, It seems that you added an undefined parameter when inheriting the Doc class, resulting in an error: TypeError: Doc.init() got an unexpected keyword argument 'target_for_fewshot_sorting'

alielfilali01 · 2024-12-12T19:23:39Z

Hey @BobTsang1995 tnx for the highlight.

Well, the suite "community" is already part of the config so naturally it shouldn't be part of the task_name + the main task "alghafa:" is already defined in the tasks list which goes later to the final (end of the script) tasks table. So in theory it should be fine !

But you are correct i haven't tested the code before pushing as i was leaving desk actually before i checked the notif 🥲

I'll give it a look tomorrow and try to fix it asap. Thanks again for your feedback

BobTsang1995 · 2024-12-13T11:26:58Z

Hey @BobTsang1995 tnx for the highlight.

Well, the suite "community" is already part of the config so naturally it shouldn't be part of the task_name + the main task "alghafa:" is already defined in the tasks list which goes later to the final (end of the script) tasks table. So in theory it should be fine !

But you are correct i haven't tested the code before pushing as i was leaving desk actually before i checked the notif 🥲

I'll give it a look tomorrow and try to fix it asap. Thanks again for your feedback

Sorry to bother，Do you have any plans to fix this issue today?

alielfilali01 · 2024-12-13T14:27:39Z

It is fixed (test went well) and waiting for the PR #444 to be merged.
In the meantime feel free to run using this fork
Thanks @BobTsang1995 for your feedback !

BobTsang1995 · 2024-12-22T12:42:57Z

It is fixed (test went well) and waiting for the PR #444 to be merged. In the meantime feel free to run using this fork Thanks @BobTsang1995 for your feedback !

@alielfilali01 another question, bro now I want to eval 72B model, but accelerate always oom when i use model parallel，I'm wondering how you guys eval big size model when use lighteval.

accelerate launch --multi_gpu --num_processes=8 -m lighteval accelerate "pretrained=/mnt/sg_nas/liheng/Marco_checkpoint/Qwen2-72B,dtype=bfloat16,max_length=2048,model_parallel=True" --override-batch-size 1 "examples/tasks/OALL_v1_tasks.txt" --custom-tasks "community_tasks/arabic_evals.py" --output-dir="./evals/"

alielfilali01 · 2024-12-23T08:17:29Z

Hey @BobTsang1995
You are already using DP by setting --multi_gpu --num_processes=8 and no GPUs left for PP.
Consider setting --num_processes=4 for DP so you can still be able to shard the model on 2 GPUs with PP

Run this instead:

accelerate launch --multi_gpu --num_processes=4 -m lighteval accelerate "pretrained=/mnt/sg_nas/liheng/Marco_checkpoint/Qwen2-72B,dtype=bfloat16,max_length=2048,model_parallel=True" --override-batch-size 1  "examples/tasks/OALL_v1_tasks.txt" --custom-tasks "community_tasks/arabic_evals.py" --output-dir="./evals/"

PS: plz consider closing this issue.

BobTsang1995 · 2024-12-23T10:05:47Z

嘿@BobTsang1995 您已通过设置使用 DP --multi_gpu --num_processes=8，并且没有剩余 GPU 用于 PP。请考虑设置--num_processes=4DP，以便您仍能使用 PP 在 2 个 GPU 上分片模型

运行这个：
accelerate launch --multi_gpu --num_processes=4 -m lighteval accelerate "pretrained=/mnt/sg_nas/liheng/Marco_checkpoint/Qwen2-72B,dtype=bfloat16,max_length=2048,model_parallel=True" --override-batch-size 1  "examples/tasks/OALL_v1_tasks.txt" --custom-tasks "community_tasks/arabic_evals.py" --output-dir="./evals/"
PS：请考虑关闭此问题。

thx, bro have a good day~

BobTsang1995 added the bug Something isn't working label Dec 12, 2024

BobTsang1995 closed this as completed Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] DatasetNotFoundError: Dataset 'asas-ai/AraTrust-categorized' doesn't exist on the Hub or cannot be accessed. #441

[BUG] DatasetNotFoundError: Dataset 'asas-ai/AraTrust-categorized' doesn't exist on the Hub or cannot be accessed. #441

BobTsang1995 commented Dec 12, 2024 •

edited

Loading

clefourrier commented Dec 12, 2024

BobTsang1995 commented Dec 12, 2024

alielfilali01 commented Dec 12, 2024

clefourrier commented Dec 12, 2024

BobTsang1995 commented Dec 12, 2024

alielfilali01 commented Dec 12, 2024

BobTsang1995 commented Dec 12, 2024

BobTsang1995 commented Dec 12, 2024 •

edited

Loading

alielfilali01 commented Dec 12, 2024

clefourrier commented Dec 12, 2024 •

edited

Loading

alielfilali01 commented Dec 12, 2024

BobTsang1995 commented Dec 12, 2024 •

edited

Loading

alielfilali01 commented Dec 12, 2024

BobTsang1995 commented Dec 13, 2024

alielfilali01 commented Dec 13, 2024

BobTsang1995 commented Dec 22, 2024 •

edited

Loading

alielfilali01 commented Dec 23, 2024 •

edited

Loading

BobTsang1995 commented Dec 23, 2024

[BUG] DatasetNotFoundError: Dataset 'asas-ai/AraTrust-categorized' doesn't exist on the Hub or cannot be accessed. #441

[BUG] DatasetNotFoundError: Dataset 'asas-ai/AraTrust-categorized' doesn't exist on the Hub or cannot be accessed. #441

Comments

BobTsang1995 commented Dec 12, 2024 • edited Loading

Describe the bug

To Reproduce

Full Error Message

Version info

clefourrier commented Dec 12, 2024

BobTsang1995 commented Dec 12, 2024

alielfilali01 commented Dec 12, 2024

clefourrier commented Dec 12, 2024

BobTsang1995 commented Dec 12, 2024

alielfilali01 commented Dec 12, 2024

BobTsang1995 commented Dec 12, 2024

BobTsang1995 commented Dec 12, 2024 • edited Loading

alielfilali01 commented Dec 12, 2024

clefourrier commented Dec 12, 2024 • edited Loading

alielfilali01 commented Dec 12, 2024

BobTsang1995 commented Dec 12, 2024 • edited Loading

alielfilali01 commented Dec 12, 2024

BobTsang1995 commented Dec 13, 2024

alielfilali01 commented Dec 13, 2024

BobTsang1995 commented Dec 22, 2024 • edited Loading

alielfilali01 commented Dec 23, 2024 • edited Loading

BobTsang1995 commented Dec 23, 2024

BobTsang1995 commented Dec 12, 2024 •

edited

Loading

BobTsang1995 commented Dec 12, 2024 •

edited

Loading

clefourrier commented Dec 12, 2024 •

edited

Loading

BobTsang1995 commented Dec 12, 2024 •

edited

Loading

BobTsang1995 commented Dec 22, 2024 •

edited

Loading

alielfilali01 commented Dec 23, 2024 •

edited

Loading