Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset length question #36

Open
5taku opened this issue Jun 25, 2024 · 2 comments
Open

Dataset length question #36

5taku opened this issue Jun 25, 2024 · 2 comments

Comments

@5taku
Copy link

5taku commented Jun 25, 2024

Hello
im testing our learning using your code.
Thank you always.

Currently, I have created a dataset with a 1:1 ratio of 8k and 64k datasets.

Afterwards, learning was conducted using code, but

    q_embed = (q * cos) + (rotate_half(q) * sin)
RuntimeError: The size of tensor a (1024) must match the size of tensor b (8192) at non-singleton dimension 2
  0%|          | 0/301 [00:00<?, ?it/s]

An error has occurred.

My prediction is that there will be no problem with the 64k dataset, but a problem appears during the process of learning the 8k dataset.

Should I set the length of the dataset the same when learning?

For datasets shorter than seq-length, I am wondering whether I should pad it.

Thanks for your help.

--seq-length 65535 \
@jzhang38
Copy link
Owner

I feel this is a problem with RoPE cache. If we train on 64K seq, each card with 8K tokens, some RoPE implementation in HF will only spawn 8K RoPE sin-cos cache (such as Qwen2, Mistral, but llama3 does not have this issue). But the position index we use can can have range 0-64K

@5taku
Copy link
Author

5taku commented Jul 1, 2024

thank you reply 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants