This repository has been archived by the owner on Oct 16, 2023. It is now read-only.
Overview
EnergonAI is a service framework for large-scale model inference, which is powered by ColossalAI. It support large model inference with tensor parallelism and pipeline parallelism. The most important example of this release is serving OPT. You can serve OPT-175B conveniently using EnergonAI.
What's Changed
- add InferenceEngine. by @dujiangsu in #1
- test engine switch func, it does not support pipeline now by @dujiangsu in #2
- Feature/pipeline by @dujiangsu in #3
- fp16 support by @dujiangsu in #4
- timer utils by @dujiangsu in #5
- deal with no pipeline by @dujiangsu in #6
- gen new ncclid and broadcast to devices in the same Tensor Parallelis… by @dujiangsu in #8
- evaluation by @dujiangsu in #9
- Feature/activation reuse by @dujiangsu in #10
- make require_grad to false by @dujiangsu in #11
- make the distributed program a single entrance by @dujiangsu in #12
- make host and port the arguement by @dujiangsu in #13
- triton run by @dujiangsu in #14
- make rpc shutdown correctly by @dujiangsu in #15
- scaale mask softmax kernel by @dujiangsu in #16
- Feature/examples by @dujiangsu in #17
- gpt model sync by @dujiangsu in #18
- Feature/example by @dujiangsu in #19
- example update by @dujiangsu in #20
- Feature/example by @dujiangsu in #21
- Update _operation.py by @MaruyamaAya in #22
- del useless by @dujiangsu in #24
- checkpoint function by @MaruyamaAya in #25
- fixed bugs with checkpoint path check by @MaruyamaAya in #26
- Md edit by @MaruyamaAya in #28
- make server correct by @dujiangsu in #29
- fixed bugs with bert checkpoint by @MaruyamaAya in #30
- add checkpoint function by @dujiangsu in #31
- Lzm develop by @MaruyamaAya in #33
- Lzm develop by @MaruyamaAya in #34
- add more bert model, add rm_padding func by @dujiangsu in #35
- update READEME by @dujiangsu in #36
- bert redundant computation, new examples dir and we will delete examp… by @dujiangsu in #38
- add bert example by @dujiangsu in #39
- Feature/variable len by @dujiangsu in #40
- block without timeout by @dujiangsu in #41
- rpc_worker, retturn results in order by @dujiangsu in #42
- retuurn in order by @dujiangsu in #43
- batch manager by ziming liu by @dujiangsu in #44
- Add comments to Batch Manager by @MaruyamaAya in #46
- correctness for tp only by @dujiangsu in #45
- move tokenizer out of manager and move select_top_k in to models by @MaruyamaAya in #47
- enable fp16 kernel by @dujiangsu in #48
- Delete example directory by @MaruyamaAya in #49
- fixed batch manager bugs and reformat codes by @MaruyamaAya in #50
- Reformat the codes by @MaruyamaAya in #51
- fix parameter mistake by @MaruyamaAya in #52
- update rm_padding in batch manager by @dujiangsu in #53
- update hf_gpt2 by @dujiangsu in #54
- add seq len and test api by @MaruyamaAya in #55
- combine two pipeline wrapper by @dujiangsu in #56
- update bert by @dujiangsu in #57
- version compatibility by @dujiangsu in #58
- modify batch manager by @MaruyamaAya in #59
- update requirement by @dujiangsu in #60
- add warm up phase for profiler by @MaruyamaAya in #61
- refactor batch manager by @MaruyamaAya in #62
- Update README.md by @dujiangsu in #63
- readme update by @dujiangsu in #64
- readme update by @dujiangsu in #65
- Update README.md by @dujiangsu in #66
- some details by @dujiangsu in #67
- some details by @dujiangsu in #68
- Update README.md by @dujiangsu in #70
- delete header error by @dujiangsu in #69
- update README by @dujiangsu in #71
- Update README.md by @dujiangsu in #72
- vit example by @dujiangsu in #73
- update vit example and update logging by @dujiangsu in #74
- Update README.md by @dujiangsu in #76
- gpt return correct data structure by @dujiangsu in #75
- change project name by @dujiangsu in #77
- make config globally available by @dujiangsu in #78
- update metaconfig by @dujiangsu in #79
- update metaconfig by @dujiangsu in #80
- Feature/trt by @dujiangsu in #81
- Link TensorRT as backend for single device execution by @dujiangsu in #82
- update readme by @dujiangsu in #83
- update readme by @dujiangsu in #84
- refactor batch manager related files by @MaruyamaAya in #85
- add comments and delete unnecessary codes. by @MaruyamaAya in #86
- update Readme by @dujiangsu in #87
- update Readme by @dujiangsu in #88
- update batcher for pipeline by @dujiangsu in #91
- fix batch wrapping bug by @MaruyamaAya in #92
- add offload manager by @MaruyamaAya in #93
- add basic model as component by @dujiangsu in #94
- match checkpoint for opt by @dujiangsu in #95
- fix bug in batch manager by @dujiangsu in #96
- timer with ignrore the first func by @dujiangsu in #97
- update readme by @dujiangsu in #98
- tTemporarily stop kernels for correctness by @dujiangsu in #101
- modify offload manager and add linear example by @MaruyamaAya in #103
- add linear func by @oahzxl in #105
- [docker] add dockerfile and change hardcode path by @feifeibear in #107
- change rpc timeout by @dujiangsu in #108
- [docker] add test_query.sh and update docker file by @feifeibear in #109
- add opt example by @ver217 in #111
- [NFC] global var should be in uppercase by @feifeibear in #112
- refactor load_checkpoint by @ver217 in #113
- refactor tp load checkpoint by @ver217 in #114
- fix hf gpt2 example by @ver217 in #115
- refactor opt server api by @ver217 in #116
- add benchmark by @ver217 in #117
- [example] refactor opt by @ver217 in #118
- [docker] polish lauch docker scripts by @feifeibear in #119
- [model] support topk, topp, temperature when generating by @ver217 in #120
- [opt] generate api receives top_k, top_p, and temperature by @ver217 in #121
- add serving queue by @dujiangsu in #122
- make the generation task within the engine by @dujiangsu in #123
- [opt] add async executor by @ver217 in #124
- [opt] add left padding by @ver217 in #126
- [opt] add 175b model by @ver217 in #127
- [opt] remove useless api by @ver217 in #128
- [hotfix] fix worker server shutdown by @ver217 in #129
- Generation model feature: add cache for removing the repeated computation in the loop. by @dujiangsu in #130
- [opt] add data validator and cors middleware by @ver217 in #132
- add a flag for disable cache by @dujiangsu in #131
- [opt] executor update making batch policy by @ver217 in #133
- improve the cache implementation by @dujiangsu in #134
- [opt] add cache and modify api by @ver217 in #135
- fit the opt_66B by @dujiangsu in #136
- 66B model load checkpoint by @dujiangsu in #137
- [opt] allow disabling multi procs loading by @ver217 in #140
- [opt] fit opt-175b by @ver217 in #139
- [opt] refactor server by @ver217 in #142
- processing 66b ckpt by @dujiangsu in #141
- [docker] rm source code after pip install by @feifeibear in #143
- [opt] add timeout option by @ver217 in #145
- [opt] add queue size option by @ver217 in #146
- update readme by @dujiangsu in #144
- update readme by @dujiangsu in #148
- Add prometheus endpoint for opt_server.py by @ofey404 in #149
- prometheus by @feifeibear in #150
- [nn] replace energonai.nn with colossalai.nn by @ver217 in #147
- [logging] remove logging module by @ver217 in #153
- [utils] remove useless utils by @ver217 in #154
- [utils] fix checkpointing import by @ver217 in #155
- [setup] add version control by @ver217 in #156
New Contributors
- @dujiangsu made their first contribution in #1
- @MaruyamaAya made their first contribution in #22
- @oahzxl made their first contribution in #105
- @feifeibear made their first contribution in #107
- @ver217 made their first contribution in #111
- @ofey404 made their first contribution in #149
Full Changelog: https://github.com/hpcaitech/EnergonAI/commits/v0.0.1