0.23.0-Alpha Release
Pre-release
Pre-release
This release solved several issues on DJLServing library and also brings some new features.
- Supporting load from workflow directory #714
- Fixed MME support with HF_MODEL_ID #712
- Added parallel loading for python models #770
- Fixed device mismatch issue #805
And more
What's Changed
- [serving] Adds workflow model loading for SageMaker by @frankfliu in #661
- [workflow] Allows model being shared between workflows by @frankfliu in #665
- [python] prints out error message if pip install failed by @frankfliu in #666
- update to djl 0.23.0 by @siddvenk in #668
- [docker] Fixes fastertransformer docker file by @frankfliu in #671
- [kserve] Fixes unit test for extra data type by @frankfliu in #673
- install fixed version for transformers and accelerate by @lanking520 in #672
- [ci] add performance testing by @tosterberg in #558
- add numpy fix by @lanking520 in #674
- SM Training job changes for AOT by @sindhuvahinis in #667
- Create model dir to prevent issues with no code experience in SageMaker by @siddvenk in #675
- Don't mount model dir for no code tests by @siddvenk in #676
- AOT upload checkpoints tests by @sindhuvahinis in #678
- [INF2][DLC] Update Neuron to 2.10 by @lanking520 in #681
- add stable diffusion support on INF2 by @lanking520 in #683
- [CI] add small fixes by @lanking520 in #684
- Add HuggingFace TGI publish and test pipeline by @xyang16 in #650
- Add shared memory arg to docker launch command in README by @rohithkrn in #685
- Update github-slug-action to v4.4.1 by @xyang16 in #686
- unset omp thread to prevent CLIP model delay by @lanking520 in #688
- Change the bucket for different object by @sindhuvahinis in #691
- [ci] make performance tests run in parallel by @tosterberg in #690
- [api] Update ChunkedBytesSupplier API by @frankfliu in #692
- [console] Fixes log file charset issue by @frankfliu in #693
- add neuronx new feature for generation by @lanking520 in #694
- [tgi] Add more models to TGI test pipeline by @xyang16 in #695
- [INF2] adding clip model support by @lanking520 in #696
- [plugin] Include djl s3 extension in djl-serving distribution by @frankfliu in #699
- [INF2] add bf16 support to SD by @lanking520 in #700
- [ci] Upgrade spotbugs to 5.0.14 by @frankfliu in #704
- Add support for streaming Seq2Seq models by @rohithkrn in #698
- add SageMaker MCE support by @lanking520 in #706
- fix the device mapping issue if visible devices is set by @lanking520 in #707
- fix the start gpu bug by @lanking520 in #709
- [INF2] give better room for more tokens by @lanking520 in #710
- bump up n positions by @lanking520 in #713
- Refactor logic for supporting HF_MODEL_ID to support MME use case by @siddvenk in #712
- [ci] reconfigure performance test time and machines by @tosterberg in #711
- [workflow] Support load model from workflow directory by @frankfliu in #714
- Add support for se2seq model loading in HF handler by @rohithkrn in #715
- Add unit test for empty model store initialization by @siddvenk in #716
- Fix no code tests in lmi test suite by @siddvenk in #717
- [serving] Load function from workflow directory by @frankfliu in #718
- [test] Reformat python code by @frankfliu in #720
- Creates S3 Cache Engine by @zachgk in #719
- [test] Refactor client.py by @frankfliu in #721
- update fastertransformers build instruction by @lanking520 in #722
- Add seq2seq streaming integ test by @rohithkrn in #724
- [test] Update tranformser-neuxornx gpt-j-b mode options by @frankfliu in #723
- [DeepSpeed][INF2] add vision components by @lanking520 in #725
- [python] Support pip install in offline mode by @frankfliu in #729
- [python] Add --no-index to pip install in offline mode by @frankfliu in #731
- adding llama model support by @lanking520 in #727
- tokenizer bug fixes by @lanking520 in #732
- [FT] change the dependencies so by @lanking520 in #734
- Remove TGI build and test pipeline by @xyang16 in #735
- ft_handler fix by @rohithkrn in #736
- [docker] Uses the same convention as tritonserver by @frankfliu in #738
- [ci] Upgrade jacoco to 0.8.8 to support JDK17+ by @frankfliu in #739
- [python] Fixes typo in fastertransformer handler by @frankfliu in #740
- [python] Adds text/plain content-type support by @frankfliu in #741
- [serving] Avoid unit-test hang by @frankfliu in #744
- Skeleton structure for sequence batch scheduler by @sindhuvahinis in #745
- update the wheel to have path fixed by @lanking520 in #747
- Adding project diagrams link to architecture.md by @alexkarezin in #742
- Add SageMaker integration test by @siddvenk in #705
- [python] Handle torch.cuda.OutOfMemoryError by @frankfliu in #749
- fix permissions for sm pysdk install script by @siddvenk in #751
- [serving] Improves model loading logging by @frankfliu in #750
- Asynchronous with PublisherBytesSupplier by @zachgk in #730
- [cache] Rename evn var DDB_TABLE_NAME to SERVING_DDB_TABLE_NAME by @frankfliu in #753
- [serving] Sets default minWorkers to 1 for GPU python model by @frankfliu in #755
- SM AOT Tests by @sindhuvahinis in #756
- [docker] Pin bitsandbytes version to 0.38.1 by @xyang16 in #754
- [fix] bump versions for new deepspeed wheel by @tosterberg in #733
- [fix] Fix bitsandbytes pip install by @xyang16 in #758
- [serving] Fixes log message by @frankfliu in #765
- add triton components in the nightly by @lanking520 in #767
- Add mme tests to sagemaker tests by @siddvenk in #763
- [wlm] Adds more logs to LMI engine detection by @frankfliu in #766
- fix typos with get default bucket prefix for sm session by @siddvenk in #768
- [serving] Uses predictable model name for HF model by @frankfliu in #771
- [serving] Adds parallel loading support for Python engine by @frankfliu in #770
- [console] file are not required in form data by @frankfliu in #773
- [docker] Avoid auto setting OMP_NUM_THREADS for GPU/INF docker images by @frankfliu in #774
- [wlm] Sets default maxWorkers based on OMP_NUM_THREADS by @frankfliu in #776
- Upload SM benchmark metrics to cloudwatch by @sindhuvahinis in #769
- [python] Support non-gpu models for huggingface by @frankfliu in #772
- Fixes integration test by @frankfliu in #779
- [python] Adjuests mpi workers based CUDA_VISIBLE_DEVICES by @frankfliu in #782
- [Stream] use huggingface standard generation for tnx by @lanking520 in #778
- Option to run only the lmi tests needed by @sindhuvahinis in #786
- add trust remote code option by @lanking520 in #781
- remove inf1 support and upgrade some package versions by @lanking520 in #785
- [python] Handles invalid retrun type case by @frankfliu in #790
- Remove hardcoded version in Assertion error by @sindhuvahinis in #789
- Add support for testing nightly images in sagemaker endpoint tests by @siddvenk in #788
- [python] Add application/jsonlines as content-type for streaming by @frankfliu in #791
- [python] Fixes trust_remote_code issue by @frankfliu in #793
- ad einops for supporting falcon models by @lanking520 in #792
- fix the stream generation by @lanking520 in #794
- [python] Fixes typo in transformers-neuronx.py by @frankfliu in #796
- [python] Adds content-type response for DeepSpeed and FasterTransformer handler by @frankfliu in #797
- [python] Fixes device id mismatch issue for mutlple GPU case by @frankfliu in #800
- [wlm] Sets default maxWorkers the same as earlier version by @frankfliu in #799
- [Fix] [CI/CD] Check if input is empty by @sindhuvahinis in #798
- add stream generation for huggingface streamer by @lanking520 in #801
- [python] Fixes device mismatch issue for streaming token by @frankfliu in #805
- [python] Add server side batching by @xyang16 in #795
- add safetensors by @lanking520 in #808
- upgrade deepspeed by @lanking520 in #804
- fix typo in sm workflow inputs by @siddvenk in #807
- Fix input_data and device order for streaming by @xyang16 in #809
- [python] Fixes retry_threshold bug by @frankfliu in #812
- fix huggingface device bugs by @lanking520 in #813
- [HF] typo fix by @lanking520 in #815
- [docs] Updates management api document by @frankfliu in #814
- Improvements in AOT UX by @sindhuvahinis in #787
- add pytorch kernel cache default directory by @lanking520 in #810
- [python] Fixes invlid device issue by @frankfliu in #816
- [wlm] Fixes WorkerThread name by @frankfliu in #817
- update gpu memory consumption and adding GPTNeoX, GPTJ by @lanking520 in #818
- [partition] keep 'option' in properties by @sindhuvahinis in #819
- [python] Fixes streaming token device mismatch bug by @frankfliu in #822
- [partition] Improves partition error message by @frankfliu in #826
- [python] Extract .py files recursively by @frankfliu in #821
- Remove flan-t5-xxl by @sindhuvahinis in #829
- return self after addition by @lanking520 in #830
- [python] Makes gpt-j model use triton mode by @frankfliu in #827
- [CI] Remove duplicated tests for AOT by @sindhuvahinis in #831
- fix workflow by @lanking520 in #833
New Contributors
- @alexkarezin made their first contribution in #742
Full Changelog: v0.22.1...v0.23.0-alpha