Dataset length question #36

5taku · 2024-06-25T00:51:57Z

Hello
im testing our learning using your code.
Thank you always.

Currently, I have created a dataset with a 1:1 ratio of 8k and 64k datasets.

Afterwards, learning was conducted using code, but

    q_embed = (q * cos) + (rotate_half(q) * sin)
RuntimeError: The size of tensor a (1024) must match the size of tensor b (8192) at non-singleton dimension 2
  0%|          | 0/301 [00:00<?, ?it/s]

An error has occurred.

My prediction is that there will be no problem with the 64k dataset, but a problem appears during the process of learning the 8k dataset.

Should I set the length of the dataset the same when learning?

For datasets shorter than seq-length, I am wondering whether I should pad it.

Thanks for your help.

--seq-length 65535 \

The text was updated successfully, but these errors were encountered:

jzhang38 · 2024-06-25T03:40:34Z

I feel this is a problem with RoPE cache. If we train on 64K seq, each card with 8K tokens, some RoPE implementation in HF will only spawn 8K RoPE sin-cos cache (such as Qwen2, Mistral, but llama3 does not have this issue). But the position index we use can can have range 0-64K

5taku · 2024-07-01T00:14:14Z

thank you reply 👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset length question #36

Dataset length question #36

5taku commented Jun 25, 2024

jzhang38 commented Jun 25, 2024

5taku commented Jul 1, 2024

Dataset length question #36

Dataset length question #36

Comments

5taku commented Jun 25, 2024

jzhang38 commented Jun 25, 2024

5taku commented Jul 1, 2024