Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a running script for ‘dist_flash_attn’ #22

Open
LzhinFdu opened this issue Apr 26, 2024 · 5 comments
Open

Need a running script for ‘dist_flash_attn’ #22

LzhinFdu opened this issue Apr 26, 2024 · 5 comments

Comments

@LzhinFdu
Copy link

LzhinFdu commented Apr 26, 2024

Can you provide a script to run dist_flash_attn? I tried setting parallel_mode to dist_flash_attn but it didn't work successfully.

When trying to use 'dist_flash_attn' with 2*A100, process 0 is stuck in torch.cuda.synchronize() of _lightseq_forward of a certain decoderlayer, while process 1 runs to this step of the next decoderlayer. Strangely, the model gets stuck on the second sample. What might be causing this bug? Is there any way to solve this problem?

It seems that communication of process 0 in maybe_send_recv_fwd_qkvo is not completed.

@LzhinFdu LzhinFdu changed the title Stuck when training with ‘dist_flash_attn’ Need a running script for ‘dist_flash_attn’ Apr 29, 2024
@LzhinFdu
Copy link
Author

LzhinFdu commented May 7, 2024

Well, After making the input sequence length divisible by world_size * block_size, it can run normally.

@fahadh4ilyas
Copy link

Well, After making the input sequence length divisible by world_size * block_size, it can run normally.

What is block_size?

@LzhinFdu
Copy link
Author

Well, After making the input sequence length divisible by world_size * block_size, it can run normally.

What is block_size?

the block_size for flash-attn

@fahadh4ilyas
Copy link

Well, After making the input sequence length divisible by world_size * block_size, it can run normally.

What is block_size?

the block_size for flash-attn

I'm sorry I don't understand. I didn't find any block_size parameter in this repo. Could you please tell me where is it?

@LzhinFdu
Copy link
Author

Well, After making the input sequence length divisible by world_size * block_size, it can run normally.

What is block_size?

the block_size for flash-attn

I'm sorry I don't understand. I didn't find any block_size parameter in this repo. Could you please tell me where is it?

seems here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants