Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry Regarding Zero3 and Sequence Parallelism Compatibility #54

Open
SihengLi99 opened this issue Sep 28, 2024 · 2 comments
Open

Inquiry Regarding Zero3 and Sequence Parallelism Compatibility #54

SihengLi99 opened this issue Sep 28, 2024 · 2 comments

Comments

@SihengLi99
Copy link

Dear Peiyuan,

First, I want to express my sincere appreciation for this incredible work. Your contributions have been instrumental in advancing my project, and I am truly grateful for that.

However, I have a critical question regarding compatibility between Zero, particularly Zero3, and the Sequence Parallelism algorithms you are using. I noticed in the official implementation of Deepspeed that there are mentions of potential incompatibilities between Zero3 and Ulysses (as highlighted in this issue).

How did you ensure compatibility in your implementation? Have you encountered any conflicts? Additionally, are there any experimental results or benchmarks that you could share to demonstrate the reliability of the current implementation? Specifically, I am curious to know if the model shows improved long-context understanding with increased training steps.

I greatly appreciate your time and assistance on this matter and look forward to your response.

Best regards,
Siheng

@xs1997zju
Copy link

I think ring_flash and zero3 are orthogonal

@jzhang38
Copy link
Owner

jzhang38 commented Nov 12, 2024

Honestly, I have no idea.

I remember I can use the Ulysses implementation from USP . No error is encountered. The loss is the same as ring_flash_attn. So I called it a day.

But this is indeed a problem worth investigating. I don't know why they said zero3 and Ulysses are incompatible in the issue you linked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants