v0.0.22: Mixtral support, pipeline for sentence transformers, compatibility with Compel
What's Changed
Training
- Integrate new API for saving and loading with
neuronx_distributed
by @michaelbenayoun in #560
Inference
- Add support for Mixtral by @dacorvo in #569
- Improve Llama models performance by @dacorvo in #587
- Make Stable Diffusion pipelines compatible with compel by @JingyaHuang and @neo in #581 (with tests inspired by the snippets sent from @Suprhimp)
- Add
SentenceTransformers
support topipeline
forfeature-extration
by @philschmid in #583 - Allow download subfolder for caching models with subfolder by @JingyaHuang in #566
- Do not split decoder checkpoint files by @dacorvo in #567
TGI
- Set up TGI environment values with the ones used to build the model by @oOraph in #529
- TGI benchmark with llmperf by @dacorvo in #564
- Improve tgi env wrapper for neuron by @oOraph in #589
Caveat
Currently traced models with inline_weights_to_neff=False
have higher than expected latency during the inference. This is due to the weights are not automatically moved to Neuron devices. The issue will be fixed in #584, please avoid setting inline_weights_to_neff=False
in this release.
Other changes
- Improve installation guide by @JingyaHuang in #559
- upgrade optimum and then install optimum-neuron by @shub-kris in #533
- Cleanup obsolete code by @michaelbenayoun in #555
- Extend TGI integration tests by @dacorvo in #561
- Modify benchmarks by @dacorvo in #563
- Bump PyTorch to 2.1 by @JingyaHuang in #502
- fix(decoder): specify libraryname to suppress warning by @dacorvo in #570
- missing \ in quickstart inference guide by @yahavb in #574
- Use AWS 2.18.0 AMI as base by @dacorvo in #572
- Update TGI router version to 2.0.1 by @dacorvo in #577
- Add guide for LoRA adapters by @JingyaHuang in #582
- eos_token_id can be a list in configs by @dacorvo in #580
- Ease the tests when there is no hf token by @JingyaHuang in #585
- Change inline weights to Neff default value to True by @JingyaHuang in #590
New Contributors
Full Changelog: v0.0.21...v0.0.22