Inquiry Regarding Zero3 and Sequence Parallelism Compatibility #54

SihengLi99 · 2024-09-28T15:39:34Z

Dear Peiyuan,

First, I want to express my sincere appreciation for this incredible work. Your contributions have been instrumental in advancing my project, and I am truly grateful for that.

However, I have a critical question regarding compatibility between Zero, particularly Zero3, and the Sequence Parallelism algorithms you are using. I noticed in the official implementation of Deepspeed that there are mentions of potential incompatibilities between Zero3 and Ulysses (as highlighted in this issue).

How did you ensure compatibility in your implementation? Have you encountered any conflicts? Additionally, are there any experimental results or benchmarks that you could share to demonstrate the reliability of the current implementation? Specifically, I am curious to know if the model shows improved long-context understanding with increased training steps.

I greatly appreciate your time and assistance on this matter and look forward to your response.

Best regards,
Siheng

xs1997zju · 2024-11-12T09:07:51Z

I think ring_flash and zero3 are orthogonal

jzhang38 · 2024-11-12T17:59:08Z

Honestly, I have no idea.

I remember I can use the Ulysses implementation from USP . No error is encountered. The loss is the same as ring_flash_attn. So I called it a day.

But this is indeed a problem worth investigating. I don't know why they said zero3 and Ulysses are incompatible in the issue you linked.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry Regarding Zero3 and Sequence Parallelism Compatibility #54

Inquiry Regarding Zero3 and Sequence Parallelism Compatibility #54

SihengLi99 commented Sep 28, 2024

xs1997zju commented Nov 12, 2024

jzhang38 commented Nov 12, 2024 •

edited

Loading

Inquiry Regarding Zero3 and Sequence Parallelism Compatibility #54

Inquiry Regarding Zero3 and Sequence Parallelism Compatibility #54

Comments

SihengLi99 commented Sep 28, 2024

xs1997zju commented Nov 12, 2024

jzhang38 commented Nov 12, 2024 • edited Loading

jzhang38 commented Nov 12, 2024 •

edited

Loading