Release v0.0.22: Mixtral support, pipeline for sentence transformers, compatibility with Compel · huggingface/optimum-neuron

What's Changed

Training

Integrate new API for saving and loading with neuronx_distributed by @michaelbenayoun in #560

Inference

Add support for Mixtral by @dacorvo in #569
Improve Llama models performance by @dacorvo in #587
Make Stable Diffusion pipelines compatible with compel by @JingyaHuang and @neo in #581 (with tests inspired by the snippets sent from @Suprhimp)
Add SentenceTransformers support to pipeline for feature-extration by @philschmid in #583
Allow download subfolder for caching models with subfolder by @JingyaHuang in #566
Do not split decoder checkpoint files by @dacorvo in #567

TGI

Set up TGI environment values with the ones used to build the model by @oOraph in #529
TGI benchmark with llmperf by @dacorvo in #564
Improve tgi env wrapper for neuron by @oOraph in #589

Caveat

Currently traced models with inline_weights_to_neff=False have higher than expected latency during the inference. This is due to the weights are not automatically moved to Neuron devices. The issue will be fixed in #584, please avoid setting inline_weights_to_neff=False in this release.

Other changes

Improve installation guide by @JingyaHuang in #559
upgrade optimum and then install optimum-neuron by @shub-kris in #533
Cleanup obsolete code by @michaelbenayoun in #555
Extend TGI integration tests by @dacorvo in #561
Modify benchmarks by @dacorvo in #563
Bump PyTorch to 2.1 by @JingyaHuang in #502
fix(decoder): specify libraryname to suppress warning by @dacorvo in #570
missing \ in quickstart inference guide by @yahavb in #574
Use AWS 2.18.0 AMI as base by @dacorvo in #572
Update TGI router version to 2.0.1 by @dacorvo in #577
Add guide for LoRA adapters by @JingyaHuang in #582
eos_token_id can be a list in configs by @dacorvo in #580
Ease the tests when there is no hf token by @JingyaHuang in #585
Change inline weights to Neff default value to True by @JingyaHuang in #590

New Contributors

@yahavb made their first contribution in #574

Full Changelog: v0.0.21...v0.0.22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.22: Mixtral support, pipeline for sentence transformers, compatibility with Compel