-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MLP/lm_head tp grain size setting. #6828
base: master
Are you sure you want to change the base?
Conversation
@delock Please review, thanks~ |
Hi @tjruwase @awan-10 , this PR from @Yejing-Lai adds a new parameter to We know it extends |
@delock, @Yejing-Lai, thanks for the PR. I agree that updating DeepSpeed/deepspeed/inference/config.py Lines 132 to 136 in 1b58ba5
Did you guys consider this choice? Either way, I have approved the PR. |
Thank you for your advice. I updated the code. Please review~ |
@Yejing-Lai, thanks for updating the code. LGTM. |
This PR aims to add MLP/lm_head tp size granularity setting to deepspeed.init_inference() API. It will be more flexible to set the MLP/lm_head sharding grain size.
DNN library favors tensor size in granularity of power of 2, we pick 64 as a default size.
We aim to be able to set the MLP/lm_head tp grain size flexibly. This is a preliminary solution. If there is a better solution, we can discuss it together. Thanks~