Releases · hpcaitech/EnergonAI

Overview

EnergonAI is a service framework for large-scale model inference, which is powered by ColossalAI. It support large model inference with tensor parallelism and pipeline parallelism. The most important example of this release is serving OPT. You can serve OPT-175B conveniently using EnergonAI.

What's Changed

add InferenceEngine. by @dujiangsu in #1
test engine switch func, it does not support pipeline now by @dujiangsu in #2
Feature/pipeline by @dujiangsu in #3
fp16 support by @dujiangsu in #4
timer utils by @dujiangsu in #5
deal with no pipeline by @dujiangsu in #6
gen new ncclid and broadcast to devices in the same Tensor Parallelis… by @dujiangsu in #8
evaluation by @dujiangsu in #9
Feature/activation reuse by @dujiangsu in #10
make require_grad to false by @dujiangsu in #11
make the distributed program a single entrance by @dujiangsu in #12
make host and port the arguement by @dujiangsu in #13
triton run by @dujiangsu in #14
make rpc shutdown correctly by @dujiangsu in #15
scaale mask softmax kernel by @dujiangsu in #16
Feature/examples by @dujiangsu in #17
gpt model sync by @dujiangsu in #18
Feature/example by @dujiangsu in #19
example update by @dujiangsu in #20
Feature/example by @dujiangsu in #21
Update _operation.py by @MaruyamaAya in #22
del useless by @dujiangsu in #24
checkpoint function by @MaruyamaAya in #25
fixed bugs with checkpoint path check by @MaruyamaAya in #26
Md edit by @MaruyamaAya in #28
make server correct by @dujiangsu in #29
fixed bugs with bert checkpoint by @MaruyamaAya in #30
add checkpoint function by @dujiangsu in #31
Lzm develop by @MaruyamaAya in #33
Lzm develop by @MaruyamaAya in #34
add more bert model, add rm_padding func by @dujiangsu in #35
update READEME by @dujiangsu in #36
bert redundant computation, new examples dir and we will delete examp… by @dujiangsu in #38
add bert example by @dujiangsu in #39
Feature/variable len by @dujiangsu in #40
block without timeout by @dujiangsu in #41
rpc_worker, retturn results in order by @dujiangsu in #42
retuurn in order by @dujiangsu in #43
batch manager by ziming liu by @dujiangsu in #44
Add comments to Batch Manager by @MaruyamaAya in #46
correctness for tp only by @dujiangsu in #45
move tokenizer out of manager and move select_top_k in to models by @MaruyamaAya in #47
enable fp16 kernel by @dujiangsu in #48
Delete example directory by @MaruyamaAya in #49
fixed batch manager bugs and reformat codes by @MaruyamaAya in #50
Reformat the codes by @MaruyamaAya in #51
fix parameter mistake by @MaruyamaAya in #52
update rm_padding in batch manager by @dujiangsu in #53
update hf_gpt2 by @dujiangsu in #54
add seq len and test api by @MaruyamaAya in #55
combine two pipeline wrapper by @dujiangsu in #56
update bert by @dujiangsu in #57
version compatibility by @dujiangsu in #58
modify batch manager by @MaruyamaAya in #59
update requirement by @dujiangsu in #60
add warm up phase for profiler by @MaruyamaAya in #61
refactor batch manager by @MaruyamaAya in #62
Update README.md by @dujiangsu in #63
readme update by @dujiangsu in #64
readme update by @dujiangsu in #65
Update README.md by @dujiangsu in #66
some details by @dujiangsu in #67
some details by @dujiangsu in #68
Update README.md by @dujiangsu in #70
delete header error by @dujiangsu in #69
update README by @dujiangsu in #71
Update README.md by @dujiangsu in #72
vit example by @dujiangsu in #73
update vit example and update logging by @dujiangsu in #74
Update README.md by @dujiangsu in #76
gpt return correct data structure by @dujiangsu in #75
change project name by @dujiangsu in #77
make config globally available by @dujiangsu in #78
update metaconfig by @dujiangsu in #79
update metaconfig by @dujiangsu in #80
Feature/trt by @dujiangsu in #81
Link TensorRT as backend for single device execution by @dujiangsu in #82
update readme by @dujiangsu in #83
update readme by @dujiangsu in #84
refactor batch manager related files by @MaruyamaAya in #85
add comments and delete unnecessary codes. by @MaruyamaAya in #86
update Readme by @dujiangsu in #87
update Readme by @dujiangsu in #88
update batcher for pipeline by @dujiangsu in #91
fix batch wrapping bug by @MaruyamaAya in #92
add offload manager by @MaruyamaAya in #93
add basic model as component by @dujiangsu in #94
match checkpoint for opt by @dujiangsu in #95
fix bug in batch manager by @dujiangsu in #96
timer with ignrore the first func by @dujiangsu in #97
update readme by @dujiangsu in #98
tTemporarily stop kernels for correctness by @dujiangsu in #101
modify offload manager and add linear example by @MaruyamaAya in #103
add linear func by @oahzxl in #105
[docker] add dockerfile and change hardcode path by @feifeibear in #107
change rpc timeout by @dujiangsu in #108
[docker] add test_query.sh and update docker file by @feifeibear in #109
add opt example by @ver217 in #111
[NFC] global var should be in uppercase by @feifeibear in #112
refactor load_checkpoint by @ver217 in #113
refactor tp load checkpoint by @ver217 in #114
fix hf gpt2 example by @ver217 in #115
refactor opt server api by @ver217 in #116
add benchmark by @ver217 in #117
[example] refactor opt by @ver217 in #1...