You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, I want to express my sincere appreciation for this incredible work. Your contributions have been instrumental in advancing my project, and I am truly grateful for that.
However, I have a critical question regarding compatibility between Zero, particularly Zero3, and the Sequence Parallelism algorithms you are using. I noticed in the official implementation of Deepspeed that there are mentions of potential incompatibilities between Zero3 and Ulysses (as highlighted in this issue).
How did you ensure compatibility in your implementation? Have you encountered any conflicts? Additionally, are there any experimental results or benchmarks that you could share to demonstrate the reliability of the current implementation? Specifically, I am curious to know if the model shows improved long-context understanding with increased training steps.
I greatly appreciate your time and assistance on this matter and look forward to your response.
Best regards,
Siheng
The text was updated successfully, but these errors were encountered:
Dear Peiyuan,
First, I want to express my sincere appreciation for this incredible work. Your contributions have been instrumental in advancing my project, and I am truly grateful for that.
However, I have a critical question regarding compatibility between Zero, particularly Zero3, and the Sequence Parallelism algorithms you are using. I noticed in the official implementation of Deepspeed that there are mentions of potential incompatibilities between Zero3 and Ulysses (as highlighted in this issue).
How did you ensure compatibility in your implementation? Have you encountered any conflicts? Additionally, are there any experimental results or benchmarks that you could share to demonstrate the reliability of the current implementation? Specifically, I am curious to know if the model shows improved long-context understanding with increased training steps.
I greatly appreciate your time and assistance on this matter and look forward to your response.
Best regards,
Siheng
The text was updated successfully, but these errors were encountered: