Release 0.23.0-Alpha Release · deepjavalibrary/djl-serving

This release solved several issues on DJLServing library and also brings some new features.

Supporting load from workflow directory #714
Fixed MME support with HF_MODEL_ID #712
Added parallel loading for python models #770
Fixed device mismatch issue #805
And more

What's Changed

[serving] Adds workflow model loading for SageMaker by @frankfliu in #661
[workflow] Allows model being shared between workflows by @frankfliu in #665
[python] prints out error message if pip install failed by @frankfliu in #666
update to djl 0.23.0 by @siddvenk in #668
[docker] Fixes fastertransformer docker file by @frankfliu in #671
[kserve] Fixes unit test for extra data type by @frankfliu in #673
install fixed version for transformers and accelerate by @lanking520 in #672
[ci] add performance testing by @tosterberg in #558
add numpy fix by @lanking520 in #674
SM Training job changes for AOT by @sindhuvahinis in #667
Create model dir to prevent issues with no code experience in SageMaker by @siddvenk in #675
Don't mount model dir for no code tests by @siddvenk in #676
AOT upload checkpoints tests by @sindhuvahinis in #678
[INF2][DLC] Update Neuron to 2.10 by @lanking520 in #681
add stable diffusion support on INF2 by @lanking520 in #683
[CI] add small fixes by @lanking520 in #684
Add HuggingFace TGI publish and test pipeline by @xyang16 in #650
Add shared memory arg to docker launch command in README by @rohithkrn in #685
Update github-slug-action to v4.4.1 by @xyang16 in #686
unset omp thread to prevent CLIP model delay by @lanking520 in #688
Change the bucket for different object by @sindhuvahinis in #691
[ci] make performance tests run in parallel by @tosterberg in #690
[api] Update ChunkedBytesSupplier API by @frankfliu in #692
[console] Fixes log file charset issue by @frankfliu in #693
add neuronx new feature for generation by @lanking520 in #694
[tgi] Add more models to TGI test pipeline by @xyang16 in #695
[INF2] adding clip model support by @lanking520 in #696
[plugin] Include djl s3 extension in djl-serving distribution by @frankfliu in #699
[INF2] add bf16 support to SD by @lanking520 in #700
[ci] Upgrade spotbugs to 5.0.14 by @frankfliu in #704
Add support for streaming Seq2Seq models by @rohithkrn in #698
add SageMaker MCE support by @lanking520 in #706
fix the device mapping issue if visible devices is set by @lanking520 in #707
fix the start gpu bug by @lanking520 in #709
[INF2] give better room for more tokens by @lanking520 in #710
bump up n positions by @lanking520 in #713
Refactor logic for supporting HF_MODEL_ID to support MME use case by @siddvenk in #712
[ci] reconfigure performance test time and machines by @tosterberg in #711
[workflow] Support load model from workflow directory by @frankfliu in #714
Add support for se2seq model loading in HF handler by @rohithkrn in #715
Add unit test for empty model store initialization by @siddvenk in #716
Fix no code tests in lmi test suite by @siddvenk in #717
[serving] Load function from workflow directory by @frankfliu in #718
[test] Reformat python code by @frankfliu in #720
Creates S3 Cache Engine by @zachgk in #719
[test] Refactor client.py by @frankfliu in #721
update fastertransformers build instruction by @lanking520 in #722
Add seq2seq streaming integ test by @rohithkrn in #724
[test] Update tranformser-neuxornx gpt-j-b mode options by @frankfliu in #723
[DeepSpeed][INF2] add vision components by @lanking520 in #725
[python] Support pip install in offline mode by @frankfliu in #729
[python] Add --no-index to pip install in offline mode by @frankfliu in #731
adding llama model support by @lanking520 in #727
tokenizer bug fixes by @lanking520 in #732
[FT] change the dependencies so by @lanking520 in #734
Remove TGI build and test pipeline by @xyang16 in #735
ft_handler fix by @rohithkrn in #736
[docker] Uses the same convention as tritonserver by @frankfliu in #738
[ci] Upgrade jacoco to 0.8.8 to support JDK17+ by @frankfliu in #739
[python] Fixes typo in fastertransformer handler by @frankfliu in #740
[python] Adds text/plain content-type support by @frankfliu in #741
[serving] Avoid unit-test hang by @frankfliu in #744
Skeleton structure for sequence batch scheduler by @sindhuvahinis in #745
update the wheel to have path fixed by @lanking520 in #747
Adding project diagrams link to architecture.md by @alexkarezin in #742
Add SageMaker integration test by @siddvenk in #705
[python] Handle torch.cuda.OutOfMemoryError by @frankfliu in #749
fix permissions for sm pysdk install script by @siddvenk in #751
[serving] Improves model loading logging by @frankfliu in #750
Asynchronous with PublisherBytesSupplier by @zachgk in #730
[cache] Rename evn var DDB_TABLE_NAME to SERVING_DDB_TABLE_NAME by @frankfliu in #753
[serving] Sets default minWorkers to 1 for GPU python model by @frankfliu in #755
SM AOT Tests by @sindhuvahinis in #756
[docker] Pin bitsandbytes version to 0.38.1 by @xyang16 in #754
[fix] bump versions for new deepspeed wheel by @tosterberg in #733
[fix] Fix bitsandbytes pip install by @xyang16 in #758
[serving] Fixes log message by @frankfliu in #765
add triton components in the nightly by @lanking520 in #767
Add mme tests to sagemaker tests by @siddvenk in #763
[wlm] Adds more logs to LMI engine detection by @frankfliu in #766
fix typos with get default bucket prefix for sm session by @siddvenk in #768
[serving] Uses predictable model name for HF model by @frankfliu in #771
[serving] Adds parallel loading support for Python engine by @frankfliu in #770
[console] file are not required in form data by @frankfliu in #773
[docker] Avoid auto setting OMP_NUM_THREADS for GPU/INF docker images by @frankfliu in #774
[wlm] Sets default maxWorkers based on OMP_NUM_THREADS by @frankfliu in #776
Upload SM benchmark metrics to cloudwatch by @sindhuvahinis in #769
[python] Support non-gpu models for huggingface by @frankfliu in #772
Fixes integration test by @frankfliu in #779
[python] Adjuests mpi workers based CUDA_VISIBLE_DEVICES by @frankfliu in #782
[Stream] use huggingface standard generation for tnx by @lanking520 in #778
Option to run only the lmi tests needed by @sindhuvahinis in #786
add trust remote code option by @lanking520 in #781
remove inf1 support and upgrade some package versions by @lanking520 in #785
[python] Handles invalid retrun type case by @frankfliu in #790
Remove hardcoded version in Assertion error by @sindhuvahinis in #789
Add support for testing nightly images in sagemaker endpoint tests by @siddvenk in #788
[python] Add application/jsonlines as content-type for streaming by @frankfliu in #791
[python] Fixes trust_remote_code issue by @frankfliu in #793
ad einops for supporting falcon models by @lanking520 in #792
fix the stream generation by @lanking520 in #794
[python] Fixes typo in transformers-neuronx.py by @frankfliu in #796
[python] Adds content-type response for DeepSpeed and FasterTransformer handler by @frankfliu in #797
[python] Fixes device id mismatch issue for mutlple GPU case by @frankfliu in #800
[wlm] Sets default maxWorkers the same as earlier version by @frankfliu in #799
[Fix] [CI/CD] Check if input is empty by @sindhuvahinis in #798
add stream generation for huggingface streamer by @lanking520 in #801
[python] Fixes device mismatch issue for streaming token by @frankfliu in #805
[python] Add server side batching by @xyang16 in #795
add safetensors by @lanking520 in #808
upgrade deepspeed by @lanking520 in #804
fix typo in sm workflow inputs by @siddvenk in #807
Fix input_data and device order for streaming by @xyang16 in #809
[python] Fixes retry_threshold bug by @frankfliu in #812
fix huggingface device bugs by @lanking520 in #813
[HF] typo fix by @lanking520 in #815
[docs] Updates management api document by @frankfliu in #814
Improvements in AOT UX by @sindhuvahinis in #787
add pytorch kernel cache default directory by @lanking520 in #810
[python] Fixes invlid device issue by @frankfliu in #816
[wlm] Fixes WorkerThread name by @frankfliu in #817
update gpu memory consumption and adding GPTNeoX, GPTJ by @lanking520 in #818
[partition] keep 'option' in properties by @sindhuvahinis in #819
[python] Fixes streaming token device mismatch bug by @frankfliu in #822
[partition] Improves partition error message by @frankfliu in #826
[python] Extract .py files recursively by @frankfliu in #821
Remove flan-t5-xxl by @sindhuvahinis in #829
return self after addition by @lanking520 in #830
[python] Makes gpt-j model use triton mode by @frankfliu in #827
[CI] Remove duplicated tests for AOT by @sindhuvahinis in #831
fix workflow by @lanking520 in #833

New Contributors

@alexkarezin made their first contribution in #742

Full Changelog: v0.22.1...v0.23.0-alpha

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.23.0-Alpha Release

What's Changed

New Contributors

Contributors