[shardformer] update whisper model #5529

wangbluo · 2024-03-28T03:24:26Z

🚨 Issue number

[FEATURE]: Upgrade the transformers version from 4.33.0 to 4.36.0 for Shardformer. #5505

📝 What does this PR do?

[shardformer/modeling/whisper]: Upgrade transformers from version 4.33.0 to version 4.36.0 for the whisper model, including the whisper_encoder_forward function, whisper_decoder_forward function, and remove the restrictions of transformers version in WhisperPolicy.

* flash_attention forward upgrade * llama_model_forward * remove useless comment * update the requirements.txt * add the transformers version requirements * remove the LATEST VERSION try * [shardformer] update bloom model (#5518) * update bloom model * remove the version restriction * [shardformer] update_falcon (#5520) * [shardformer] update mistral model (#5511) * [shardformer] update gpt2 (#5502) * [shardformer] update gptj model (#5503) * [shardformer] update opt (#5522) * [shardformer] update t5 model (#5524) * [shardformer] update whisper model (#5529) * [shardformer] update vit model (#5530) * update vit model * remove the output_hidden_states * [shardformer] fix llama modeling * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [zero] support multiple (partial) backward passes (#5596) * [zero] support multiple (partial) backward passes * [misc] update requirements * [zero] support multiple (partial) backward passes (#5596) * [zero] support multiple (partial) backward passes * [misc] update requirements * fix conflicts * [doc] fix ColossalMoE readme (#5599) * fix readme * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * merge with main * merge with main * llama_model_forward * remove useless comment * remove the LATEST VERSION try * [shardformer] update bloom model (#5518) * update bloom model * remove the version restriction * [shardformer] update mistral model (#5511) * [shardformer] update opt (#5522) * [shardformer] update whisper model (#5529) * [shardformer] fix llama modeling * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606) * fix no pad token bug * fixed some auto parallel codegen bug, but might not run on torch 2.1 --------- Co-authored-by: Edenzzzz <[email protected]> * [shardformer] fix pipeline grad ckpt (#5620) * [shardformer] fix pipeline grad ckpt * [shardformer] fix whisper (#5628) * [test] fix llama model test * fix the opt upgrade (#5634) * [shardformer] fix attn replacement (#5636) * [shardformer] update flashattention replacement (#5637) * update transformers update transformers fix fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [test] fix llama test (#5638) * [gemini] fix buffer cast (#5639) * Fix shardformer upgrade (#5640) * fix llama model * fix the mistral * fix the shardformer model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [shardformer]support pipeline parallelism for mistral. (#5642) * [shardformer] fix attn replacement (#5636) * [shardformer] update flashattention replacement (#5637) * update transformers update transformers fix fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Feature] Support LLaMA-3 CPT and ST (#5619) * support LLaMA-3 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Run pre-commit --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [exampe] update llama example (#5626) * [plugin] support dp inside for hybriad parallel * [example] update llama benchmark * [example] update llama benchmark * [example] update llama readme * [example] update llama readme * [example] llama3 (#5631) * release llama3 * [release] llama3 * [release] llama3 * [release] llama3 * [release] llama3 * [test] fix llama test (#5638) * [gemini] fix buffer cast (#5639) * support pp for mistral * fix * fix fix fix * fix --------- Co-authored-by: Hongxin Liu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tong Li <[email protected]> Co-authored-by: binmakeswell <[email protected]> --------- Co-authored-by: Hongxin Liu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Camille Zhong <[email protected]> Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: flybird11111 <[email protected]> Co-authored-by: Tong Li <[email protected]> Co-authored-by: binmakeswell <[email protected]>

* flash_attention forward upgrade * llama_model_forward * remove useless comment * update the requirements.txt * add the transformers version requirements * remove the LATEST VERSION try * [shardformer] update bloom model (hpcaitech#5518) * update bloom model * remove the version restriction * [shardformer] update_falcon (hpcaitech#5520) * [shardformer] update mistral model (hpcaitech#5511) * [shardformer] update gpt2 (hpcaitech#5502) * [shardformer] update gptj model (hpcaitech#5503) * [shardformer] update opt (hpcaitech#5522) * [shardformer] update t5 model (hpcaitech#5524) * [shardformer] update whisper model (hpcaitech#5529) * [shardformer] update vit model (hpcaitech#5530) * update vit model * remove the output_hidden_states * [shardformer] fix llama modeling * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [zero] support multiple (partial) backward passes (hpcaitech#5596) * [zero] support multiple (partial) backward passes * [misc] update requirements * [zero] support multiple (partial) backward passes (hpcaitech#5596) * [zero] support multiple (partial) backward passes * [misc] update requirements * fix conflicts * [doc] fix ColossalMoE readme (hpcaitech#5599) * fix readme * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * merge with main * merge with main * llama_model_forward * remove useless comment * remove the LATEST VERSION try * [shardformer] update bloom model (hpcaitech#5518) * update bloom model * remove the version restriction * [shardformer] update mistral model (hpcaitech#5511) * [shardformer] update opt (hpcaitech#5522) * [shardformer] update whisper model (hpcaitech#5529) * [shardformer] fix llama modeling * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [hotfix] Fix examples no pad token & auto parallel codegen bug; (hpcaitech#5606) * fix no pad token bug * fixed some auto parallel codegen bug, but might not run on torch 2.1 --------- Co-authored-by: Edenzzzz <[email protected]> * [shardformer] fix pipeline grad ckpt (hpcaitech#5620) * [shardformer] fix pipeline grad ckpt * [shardformer] fix whisper (hpcaitech#5628) * [test] fix llama model test * fix the opt upgrade (hpcaitech#5634) * [shardformer] fix attn replacement (hpcaitech#5636) * [shardformer] update flashattention replacement (hpcaitech#5637) * update transformers update transformers fix fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [test] fix llama test (hpcaitech#5638) * [gemini] fix buffer cast (hpcaitech#5639) * Fix shardformer upgrade (hpcaitech#5640) * fix llama model * fix the mistral * fix the shardformer model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [shardformer]support pipeline parallelism for mistral. (hpcaitech#5642) * [shardformer] fix attn replacement (hpcaitech#5636) * [shardformer] update flashattention replacement (hpcaitech#5637) * update transformers update transformers fix fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Feature] Support LLaMA-3 CPT and ST (hpcaitech#5619) * support LLaMA-3 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Run pre-commit --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [exampe] update llama example (hpcaitech#5626) * [plugin] support dp inside for hybriad parallel * [example] update llama benchmark * [example] update llama benchmark * [example] update llama readme * [example] update llama readme * [example] llama3 (hpcaitech#5631) * release llama3 * [release] llama3 * [release] llama3 * [release] llama3 * [release] llama3 * [test] fix llama test (hpcaitech#5638) * [gemini] fix buffer cast (hpcaitech#5639) * support pp for mistral * fix * fix fix fix * fix --------- Co-authored-by: Hongxin Liu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tong Li <[email protected]> Co-authored-by: binmakeswell <[email protected]> --------- Co-authored-by: Hongxin Liu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Camille Zhong <[email protected]> Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: flybird11111 <[email protected]> Co-authored-by: Tong Li <[email protected]> Co-authored-by: binmakeswell <[email protected]>

update whisper model

eab5cc8

wangbluo requested a review from a team as a code owner March 28, 2024 03:24

ver217 approved these changes Apr 3, 2024

View reviewed changes

ver217 merged commit d7af2d8 into hpcaitech:feature/update-transformers Apr 3, 2024
1 of 2 checks passed

wangbluo added a commit to wangbluo/ColossalAI that referenced this pull request Apr 11, 2024

[shardformer] update whisper model (hpcaitech#5529)

cec2a43

wangbluo added a commit to wangbluo/ColossalAI that referenced this pull request Apr 11, 2024

[shardformer] update whisper model (hpcaitech#5529)

9fd1a3e

wangbluo added a commit to wangbluo/ColossalAI that referenced this pull request Apr 11, 2024

[shardformer] update whisper model (hpcaitech#5529)

0edc2ec

wangbluo added a commit that referenced this pull request Apr 18, 2024

[shardformer] update whisper model (#5529)

ab160a8

wangbluo deleted the update_whisper branch August 17, 2024 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[shardformer] update whisper model #5529

[shardformer] update whisper model #5529

wangbluo commented Mar 28, 2024

[shardformer] update whisper model #5529

[shardformer] update whisper model #5529

Conversation

wangbluo commented Mar 28, 2024

🚨 Issue number

📝 What does this PR do?