Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[gemini] gemini support tensor parallelism. (hpcaitech#4942)
* [colossalai]fix typo * [inference] Add smmoothquant for llama (hpcaitech#4904) * [inference] add int8 rotary embedding kernel for smoothquant (hpcaitech#4843) * [inference] add smoothquant llama attention (hpcaitech#4850) * add smoothquant llama attention * remove uselss code * remove useless code * fix import error * rename file name * [inference] add silu linear fusion for smoothquant llama mlp (hpcaitech#4853) * add silu linear * update skip condition * catch smoothquant cuda lib exception * prcocess exception for tests * [inference] add llama mlp for smoothquant (hpcaitech#4854) * add llama mlp for smoothquant * fix down out scale * remove duplicate lines * add llama mlp check * delete useless code * [inference] add smoothquant llama (hpcaitech#4861) * add smoothquant llama * fix attention accuracy * fix accuracy * add kv cache and save pretrained * refactor example * delete smooth * refactor code * [inference] add smooth function and delete useless code for smoothquant (hpcaitech#4895) * add smooth function and delete useless code * update datasets * remove duplicate import * delete useless file * refactor codes (hpcaitech#4902) * rafactor code * add license * add torch-int and smoothquant license * Update flash_attention_patch.py To be compatible with the new change in the Transformers library, where a new argument 'padding_mask' was added to forward function of attention layer. huggingface/transformers#25598 * [kernel] support pure fp16 for cpu adam and update gemini optim tests (hpcaitech#4921) * [kernel] support pure fp16 for cpu adam (hpcaitech#4896) * [kernel] fix cpu adam kernel for pure fp16 and update tests (hpcaitech#4919) * [kernel] fix cpu adam * [test] update gemini optim test * [format] applied code formatting on changed files in pull request 4908 (hpcaitech#4918) Co-authored-by: github-actions <[email protected]> * [gemini] support gradient accumulation (hpcaitech#4869) * add test * fix no_sync bug in low level zero plugin * fix test * add argument for grad accum * add grad accum in backward hook for gemini * finish implementation, rewrite tests * fix test * skip stuck model in low level zero test * update doc * optimize communication & fix gradient checkpoint * modify doc * cleaning codes * update cpu adam fp16 case * [hotfix] fix torch 2.0 compatibility (hpcaitech#4936) * [hotfix] fix launch * [test] fix test gemini optim * [shardformer] fix vit * [test] add no master test for low level zero plugin (hpcaitech#4934) * [format] applied code formatting on changed files in pull request 4820 (hpcaitech#4886) Co-authored-by: github-actions <[email protected]> * [nfc] fix some typo with colossalai/ docs/ etc. (hpcaitech#4920) * [Refactor] Integrated some lightllm kernels into token-attention (hpcaitech#4946) * add some req for inference * clean codes * add codes * add some lightllm deps * clean codes * hello * delete rms files * add some comments * add comments * add doc * add lightllm deps * add lightllm cahtglm2 kernels * add lightllm cahtglm2 kernels * replace rotary embedding with lightllm kernel * add some commnets * add some comments * add some comments * add * replace fwd kernel att1 * fix a arg * add * add * fix token attention * add some comments * clean codes * modify comments * fix readme * fix bug * fix bug --------- Co-authored-by: cuiqing.li <[email protected]> Co-authored-by: CjhHa1 <[email protected]> * [test] merge old components to test to model zoo (hpcaitech#4945) * [test] add custom models in model zoo * [test] update legacy test * [test] update model zoo * [test] update gemini test * [test] remove components to test * [inference] add reference and fix some bugs (hpcaitech#4937) * add reference and fix some bugs * update gptq init --------- Co-authored-by: Xu Kai <[email protected]> * [Inference]ADD Bench Chatglm2 script (hpcaitech#4963) * add bench chatglm * fix bug and make utils --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> * [Pipeline inference] Combine kvcache with pipeline inference (hpcaitech#4938) * merge kvcache with pipeline inference and refactor the code structure * support ppsize > 2 * refactor pipeline code * do pre-commit * modify benchmark * fix bench mark * polish code * add docstring and update readme * refactor the code * fix some logic bug of ppinfer * polish readme * fix typo * skip infer test * updated c++17 compiler flags (hpcaitech#4983) * [Inference] Dynamic Batching Inference, online and offline (hpcaitech#4953) * [inference] Dynamic Batching for Single and Multiple GPUs (hpcaitech#4831) * finish batch manager * 1 * first * fix * fix dynamic batching * llama infer * finish test * support different lengths generating * del prints * del prints * fix * fix bug --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> * [inference] Async dynamic batching (hpcaitech#4894) * finish input and output logic * add generate * test forward * 1 * [inference]Re push async dynamic batching (hpcaitech#4901) * adapt to ray server * finish async * finish test * del test --------- Co-authored-by: yuehuayingxueluo <[email protected]> * Revert "[inference]Re push async dynamic batching (hpcaitech#4901)" (hpcaitech#4905) This reverts commit fbf3c09. * Revert "[inference] Async dynamic batching (hpcaitech#4894)" This reverts commit fced140. * Revert "[inference] Async dynamic batching (hpcaitech#4894)" (hpcaitech#4909) This reverts commit fced140. * Add Ray Distributed Environment Init Scripts * support DynamicBatchManager base function * revert _set_tokenizer version * add driver async generate * add async test * fix bugs in test_ray_dist.py * add get_tokenizer.py * fix code style * fix bugs about No module named 'pydantic' in ci test * fix bugs in ci test * fix bugs in ci test * fix bugs in ci test * [infer]Add Ray Distributed Environment Init Scripts (hpcaitech#4911) * Revert "[inference] Async dynamic batching (hpcaitech#4894)" This reverts commit fced140. * Add Ray Distributed Environment Init Scripts * support DynamicBatchManager base function * revert _set_tokenizer version * add driver async generate * add async test * fix bugs in test_ray_dist.py * add get_tokenizer.py * fix code style * fix bugs about No module named 'pydantic' in ci test * fix bugs in ci test * fix bugs in ci test * fix bugs in ci test * support dynamic batch for bloom model and is_running function * [Inference]Test for new Async engine (hpcaitech#4935) * infer engine * infer engine * test engine * test engine * new manager * change step * add * test * fix * fix * finish test * finish test * finish test * finish test * add license --------- Co-authored-by: yuehuayingxueluo <[email protected]> * add assertion for config (hpcaitech#4947) * [Inference] Finish dynamic batching offline test (hpcaitech#4948) * test * fix test * fix quant * add default * fix * fix some bugs * fix some bugs * fix * fix bug * fix bugs * reset param --------- Co-authored-by: yuehuayingxueluo <[email protected]> Co-authored-by: Cuiqing Li <[email protected]> Co-authored-by: CjhHa1 <cjh18671720497outlook.com> * [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (hpcaitech#4965) * adding flash-decoding * clean * adding kernel * adding flash-decoding * add integration * add * adding kernel * adding kernel * adding triton 2.1.0 features for inference * update bloom triton kernel * remove useless vllm kernels * clean codes * fix * adding files * fix readme * update llama flash-decoding --------- Co-authored-by: cuiqing.li <[email protected]> * fix ColossalEval (hpcaitech#4992) Co-authored-by: Xu Yuanchen <[email protected]> * [doc]Update doc for colossal-inference (hpcaitech#4989) * update doc * Update README.md --------- Co-authored-by: cuiqing.li <[email protected]> * [hotfix] Fix the bug where process groups were not being properly released. (hpcaitech#4940) * Fix the bug where process groups were not being properly released. * test * Revert "test" This reverts commit 479900c. * [hotfix] fix the bug of repeatedly storing param group (hpcaitech#4951) * [doc] add supported feature diagram for hybrid parallel plugin (hpcaitech#4996) * [Pipeline Inference] Merge pp with tp (hpcaitech#4993) * refactor pipeline into new CaiInferEngine * updata llama modeling forward * merge tp with pp * update docstring * optimize test workflow and example * fix typo * add assert and todo * [release] update version (hpcaitech#4995) * [release] update version * [hotfix] fix ci * [gemini] gemini support tp [gemini] gemini support tp [gemini] gemini support tp [gemini] gemini support tp [gemini] gemini support tp * fix fix fix * update checkpointIO update checkpointIO update checkpointIO update checkpointIO update checkpointIO update checkpointIO update checkpointIO update checkpointIO update checkpointIO * support fused layernorm support fused layernorm support fused layernorm * update fusedlayernorm update fusedlayernorm update fusedlayernorm * add sequence parallel to gemini add sequence parallel to gemini * fix * fix comments fix comments fix comments * fix * fix t5 * clear cache * fix * activate ci * activate ci * fix * fix * fix * fix * revert * modify tp gather method modify tp gather method modify tp gather method modify tp gather method * fix test --------- Co-authored-by: Xu Kai <[email protected]> Co-authored-by: Zian(Andy) Zheng <[email protected]> Co-authored-by: Hongxin Liu <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <[email protected]> Co-authored-by: Baizhou Zhang <[email protected]> Co-authored-by: Zhongkai Zhao <[email protected]> Co-authored-by: digger yu <[email protected]> Co-authored-by: Cuiqing Li <[email protected]> Co-authored-by: cuiqing.li <[email protected]> Co-authored-by: CjhHa1 <[email protected]> Co-authored-by: Xu Kai <[email protected]> Co-authored-by: Jianghai <[email protected]> Co-authored-by: Bin Jia <[email protected]> Co-authored-by: アマデウス <[email protected]> Co-authored-by: yuehuayingxueluo <[email protected]> Co-authored-by: Yuanchen <[email protected]> Co-authored-by: Xu Yuanchen <[email protected]> Co-authored-by: littsk <[email protected]> Co-authored-by: ppt0011 <[email protected]>
- Loading branch information