Add MLP/lm_head tp grain size setting. #6828

Yejing-Lai · 2024-12-06T08:49:09Z

This PR aims to add MLP/lm_head tp size granularity setting to deepspeed.init_inference() API. It will be more flexible to set the MLP/lm_head sharding grain size.

DNN library favors tensor size in granularity of power of 2, we pick 64 as a default size.

We aim to be able to set the MLP/lm_head tp grain size flexibly. This is a preliminary solution. If there is a better solution, we can discuss it together. Thanks~

Yejing-Lai · 2024-12-06T08:50:19Z

@delock Please review, thanks~

delock · 2024-12-07T12:24:47Z

Hi @tjruwase @awan-10 , this PR from @Yejing-Lai adds a new parameter to init_inference to set the MLP/lm_head granularity. The reason we need this parameter is different library will need different granularity to perform optimal. i.e. oneDNN needs granularity of 64. In quantization, granularity better set to the size of quantization group. Other library may need different granularity setting.

We know it extends init_inference where we think it's the right place to set autotp inference havior. Want's to hear your thoughts on this, thanks!

tjruwase · 2024-12-10T20:11:05Z

@delock, @Yejing-Lai, thanks for the PR. I agree that updating init_inference is right choice. However, I am curious whether it is better to make grain_size a sub-field of the existing tensor_parallel field.

DeepSpeed/deepspeed/inference/config.py

Lines 132 to 136 in 1b58ba5

    
               tensor_parallel: DeepSpeedTPConfig = Field({}, alias="tp") 
        
               """ 
        
               Configuration for tensor parallelism used to split the model across several 
        
               GPUs. Expects a dictionary containing values for :any:`DeepSpeedTPConfig`. 
        
               """

Did you guys consider this choice? Either way, I have approved the PR.

Yejing-Lai · 2024-12-11T03:19:13Z

@delock, @Yejing-Lai, thanks for the PR. I agree that updating init_inference is right choice. However, I am curious whether it is better to make grain_size a sub-field of the existing tensor_parallel field.

DeepSpeed/deepspeed/inference/config.py

Lines 132 to 136 in 1b58ba5

tensor_parallel: DeepSpeedTPConfig = Field({}, alias="tp")

"""

Configuration for tensor parallelism used to split the model across several

GPUs. Expects a dictionary containing values for :any:`DeepSpeedTPConfig`.

"""

Did you guys consider this choice? Either way, I have approved the PR.

Thank you for your advice. I updated the code. Please review~

tjruwase · 2024-12-12T01:04:51Z

Thank you for your advice. I updated the code. Please review~

@Yejing-Lai, thanks for updating the code. LGTM.

add tp grain size

55c3ddc

Yejing-Lai requested a review from awan-10 as a code owner December 6, 2024 08:49

Merge branch 'master' into lyj/tp_grain_size

6befb00

Merge branch 'master' into lyj/tp_grain_size

a5e68e0

tjruwase approved these changes Dec 10, 2024

View reviewed changes

tjruwase requested review from loadams and removed request for awan-10 December 10, 2024 20:03

loadams and others added 2 commits December 10, 2024 15:58

Merge branch 'master' into lyj/tp_grain_size

c884621

updated tp_grain_size to tensor_parallel field

1842815

Merge branch 'master' into lyj/tp_grain_size

9565d3a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MLP/lm_head tp grain size setting. #6828

Add MLP/lm_head tp grain size setting. #6828

Yejing-Lai commented Dec 6, 2024

Yejing-Lai commented Dec 6, 2024

delock commented Dec 7, 2024

tjruwase commented Dec 10, 2024

Yejing-Lai commented Dec 11, 2024

tjruwase commented Dec 12, 2024

Add MLP/lm_head tp grain size setting. #6828

Are you sure you want to change the base?

Add MLP/lm_head tp grain size setting. #6828

Conversation

Yejing-Lai commented Dec 6, 2024

Yejing-Lai commented Dec 6, 2024

delock commented Dec 7, 2024

tjruwase commented Dec 10, 2024

Yejing-Lai commented Dec 11, 2024

tjruwase commented Dec 12, 2024