[infer]Merge inference code to main branch #4576

isky-cd · 2023-08-31T14:00:24Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

* add * add infer example * finish * finish * stash * fix

* add token forward * fix tests * fix comments * add try import triton * add adapted license * add tests check

… and kv-cache manager (hpcaitech#4485) * added _vllm_rms_norm * change place * added tests * added tests * modify * adding kernels * added tests: * adding kernels * modify * added * updating kernels * adding tests * added tests * kernel change * submit * modify * added * edit comments * change name * change commnets and fix import * add * added

…#4495) * add kv cache memory manager * add stateinfo during inference * format * format * rename file * add kv cache test * revise on BatchInferState * file dir change

* added _vllm_rms_norm * change place * added tests * added tests * modify * adding kernels * added tests: * adding kernels * modify * added * updating kernels * adding tests * added tests * kernel change * submit * modify * added * edit comments * change name * change commnets and fix import * add * added * fix * add ops into init.py * add

* add engine for TP inference * move file path * update path * fix TPInferEngine * remove unused file * add engine test demo * revise TPInferEngine * fix TPInferEngine, add test * fix

* add kv cache memory manager * add stateinfo during inference * add * add infer example * finish * finish * format * format * rename file * add kv cache test * revise on BatchInferState * add inference test for llama * fix conflict * feature: add some new features for llama engine * adapt colossalai triton interface * Change the parent class of llama policy * add nvtx * move llama inference code to tensor_parallel * fix __init__.py * rm tensor_parallel * fix: fix bugs in auto_policy.py * fix:rm some unused codes * mv colossalai/tpinference to colossalai/inference/tensor_parallel * change __init__.py * save change * fix engine * Bug fix: Fix hang * remove llama_infer_engine.py --------- Co-authored-by: yuanheng-zhao <[email protected]> Co-authored-by: CjhHa1 <[email protected]>

* add bloom inference methods and policy * enable pass BatchInferState from model forward * revise bloom infer layers/policies * add engine for inference (draft) * add test for bloom infer * fix bloom infer policy and flow * revise bloom test * fix bloom file path * remove unused codes * fix bloom modeling * fix dir typo * fix trivial * fix policy * clean pr * trivial fix

…itech#4512)" (hpcaitech#4552) This reverts commit 17cfa57.

* create readme * add readme.md * fix typos

* add bloom inference methods and policy * enable pass BatchInferState from model forward * revise bloom infer layers/policies * add engine for inference (draft) * add test for bloom infer * fix bloom infer policy and flow * revise bloom test * fix bloom file path * remove unused codes * fix bloom modeling * fix dir typo * fix trivial * fix policy * clean pr * trivial fix * trivial

* add kv cache memory manager * add stateinfo during inference * add * add infer example * finish * finish * format * format * rename file * add kv cache test * revise on BatchInferState * add inference test for llama * fix conflict * feature: add some new features for llama engine * adapt colossalai triton interface * Change the parent class of llama policy * add nvtx * move llama inference code to tensor_parallel * fix __init__.py * rm tensor_parallel * fix: fix bugs in auto_policy.py * fix:rm some unused codes * mv colossalai/tpinference to colossalai/inference/tensor_parallel * change __init__.py * save change * fix engine * Bug fix: Fix hang * remove llama_infer_engine.py * bug fix: fix bugs about infer_state.is_context_stage * remove pollcies * fix: delete unused code * fix: delete unused code * remove unused coda * fix conflict --------- Co-authored-by: yuanheng-zhao <[email protected]> Co-authored-by: CjhHa1 <[email protected]>

* create readme * add readme.md * fix typos * upload fig

Fix docstring and comments in kv cache manager and bloom modeling

* change import vllm * import apply_rotary_pos_emb * change import location

* add installation req * fix * slight change * remove empty

* add installation req * fix * slight change * remove empty * add rmsnorm polciy * add * clean codes

* fix engine prepare data * add engine test * use bloom for testing * revise on test * revise on test

* fix diff device in engine

tiandiao123 · 2023-08-31T14:26:12Z

moved to another PR @yuehuayingxueluo

CjhHa1 and others added 28 commits August 24, 2023 15:42

[infer] Infer/llama demo (hpcaitech#4503)

c427366

* add * add infer example * finish * finish * stash * fix

[Kernels] add inference token attention kernel (hpcaitech#4505)

222953a

* add token forward * fix tests * fix comments * add try import triton * add adapted license * add tests check

combine codes (hpcaitech#4509)

64110b1

[feature] add KV cache manager for llama & bloom inference (hpcaitech…

2226c68

…#4495) * add kv cache memory manager * add stateinfo during inference * format * format * rename file * add kv cache test * revise on BatchInferState * file dir change

[Infer] Add TPInferEngine and fix file path (hpcaitech#4532)

35af65d

* add engine for TP inference * move file path * update path * fix TPInferEngine * remove unused file * add engine test demo * revise TPInferEngine * fix TPInferEngine, add test * fix

Revert "[infer] Add Bloom inference policy and replaced methods (hpca…

8b08b2d

…itech#4512)" (hpcaitech#4552) This reverts commit 17cfa57.

[Doc] Add colossal inference doc (hpcaitech#4549)

1d8f78e

* create readme * add readme.md * fix typos

[doc] add colossal inference fig (hpcaitech#4554)

e439499

* create readme * add readme.md * fix typos * upload fig

[NFC] fix docstring for colossal inference (hpcaitech#4555)

ea29fb3

Fix docstring and comments in kv cache manager and bloom modeling

fix docstring in llama modeling (hpcaitech#4557)

18335ac

Merge branch 'feature/colossal-inference' into merge_inference

56a1172

[Infer] check import vllm (hpcaitech#4559)

0852a38

* change import vllm * import apply_rotary_pos_emb * change import location

Merge branch 'feature/colossal-inference' into merge_inference

b641e1b

[DOC] add installation req (hpcaitech#4561)

da8cff6

* add installation req * fix * slight change * remove empty

[Feature] rms-norm transfer into inference llama.py (hpcaitech#4563)

8c0f81f

* add installation req * fix * slight change * remove empty * add rmsnorm polciy * add * clean codes

[infer] Fix tp inference engine (hpcaitech#4564)

c973827

* fix engine prepare data * add engine test * use bloom for testing * revise on test * revise on test

reset shardformer llama (hpcaitech#4569)

c13911f

bug fix: raise message for apply_rotary_pos_emb

4ad5819

fix conflict

164972e

[infer] Fix engine - tensors on different devices (hpcaitech#4570)

f1b4e02

* fix diff device in engine

bug fix: fix bugs in test_infer_engine.py

89f9540

Merge branch 'feature/colossal-inference' into merge_inference

76ce3a5

tiandiao123 closed this Aug 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[infer]Merge inference code to main branch #4576

[infer]Merge inference code to main branch #4576

isky-cd commented Aug 31, 2023

tiandiao123 commented Aug 31, 2023

[infer]Merge inference code to main branch #4576

[infer]Merge inference code to main branch #4576

Conversation

isky-cd commented Aug 31, 2023

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

tiandiao123 commented Aug 31, 2023