Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Size mismatch inside zigzag_ringattention backward #47

Open
jinghan23 opened this issue Sep 9, 2024 · 0 comments
Open

Size mismatch inside zigzag_ringattention backward #47

jinghan23 opened this issue Sep 9, 2024 · 0 comments

Comments

@jinghan23
Copy link

I met the following problem when applying sequence parallel on training with zigzag_ringattention. The problem may cause by the imbalanced embedding size. I'm working on 4 gpus so there are 4 processes, and the length of input embeds is 2999, which couldn't be devided without remainder. After inputs_embeds.chunk(), the 4 gpus get 4 sequences with the length 750, 750, 750 and 749. I think the problem happens when dealing with the sequence of length of 749, the block_seq_len, which is half 749 split the sequence in an imbalanced way.
I'm wondering if you met the similar problem and if there's any way to solve it, or I'm getting it wrong. This kind of failure to division should be a common probelm.

Traceback (most recent call last):
  File "HOME_PATH/files/LLaVA-OV/llava/train/train_mem.py", line 4, in <module>
    train()
  File "HOME_PATH/files/LLaVA-OV/llava/train/train.py", line 1717, in train
    trainer.train()
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/transformers/trainer.py", line 1938, in train
    return inner_training_loop(
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/transformers/trainer.py", line 2348, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/transformers/trainer.py", line 3275, in training_step
    self.accelerator.backward(loss)
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/accelerate/accelerator.py", line 2151, in backward
    self.deepspeed_engine_wrapped.backward(loss, **kwargs)
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 166, in backward
    self.engine.backward(loss, **kwargs)
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1976, in backward
    self.optimizer.backward(loss, retain_graph=retain_graph)
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 2213, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 288, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/ring_flash_attn/zigzag_ring_flash_attn.py", line 235, in backward
    dq, dk, dv = zigzag_ring_flash_attn_backward(
  File "HOME_PATH/.conda/envs/llava/lib/python3.10/site-packages/ring_flash_attn/zigzag_ring_flash_attn.py", line 160, in zigzag_ring_flash_attn_backward
    dq[:, block_seq_len:] += dq_buffer[:, :block_seq_len]
RuntimeError: The size of tensor a (375) must match the size of tensor b (374) at non-singleton dimension 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant