Skip to content

Releases: huggingface/optimum-neuron

v0.0.16: T5 export and inference, general training fixes

19 Dec 13:29
Compare
Choose a tag to compare

What's Changed

Training

A few fixes related to precompilation and checkpoiting. Those fixes enable training LLMs on AWS Trainium instances without friction.

  • Skip model saving during precompilation and provide option to skip cache push (#365)
  • Fixes checkpoint saving and consolidtation for TP (#378)
  • A torch_xla compatible version of safetensors.torch.save_file is now used in the NeuronTrainer (#329)

Inference

  • Support for the export and inference of T5 (#267)
  • New documentation for Stable Diffusion XL Turbo (#374)

v0.0.15: Mistral training, Tensor parallelism improvement, better integration with the AWS SDK

24 Nov 17:46
Compare
Choose a tag to compare

What's Changed

Training

Distributed Training

  • parallel_cross_entropy loss support for tensor parallelism (#246)
  • Support for training the Mistral architecture with tensor parallelism (#303)

AWS SDK

  • Fix: neuron_parallel_compile is compatible with the cache system (#352)
  • Full support for neuron_parallel_compile with the cache system: compilation files produced by neuron_parallel_compile will be pushed to the remote cache repo on the Hugging Face Hub at the beginning of the next training job (#354)

Documentation

  • Guide explaining how distributed training works in optimum-neuron (#339)

Inference

  • Data parallelism option for Stable Diffusion - LCM allowing multi-device inference (#346)
  • Support decoding sequences of byte tokens in TGI (#350)

Documentation

  • Updated the documentation on LCM (#351)

v0.0.14: LCM support

17 Nov 16:38
Compare
Choose a tag to compare

What's Changed

LCM support

  • [Stable Diffusion] Add LCM(Latent Consistency Models) support by @JingyaHuang in #323

Tutorials and doc improvement

Major bugfixes

Other changes

New Contributors

Full Changelog: v0.0.13...v0.0.14

v0.0.13: AWS Neuron SDK 2.15

27 Oct 09:08
Compare
Choose a tag to compare

What's Changed

The main change in this release is the alignment with AWS Neuron SDK 2.15.

Text-generation

Other changes

Full Changelog: v0.0.12...v0.0.13

v0.0.12.1: Patch release for training with Neuron SDK 2.14

27 Oct 14:08
Compare
Choose a tag to compare

v0.0.12: SDXL refiner, Sequence parallelism training

16 Oct 08:42
Compare
Choose a tag to compare

What's Changed

Stable Diffusion: SDXL Refiner, Stable Diffusion Img2Img, Inpaint support

Distributed Training:

Text generation updates

Other changes

New Contributors

Full Changelog: v0.0.11...v0.0.12

v0.0.11: SDXL, LLama v2 training and inference, Inf2 powered TGI

12 Sep 13:50
Compare
Choose a tag to compare

SDXL Export and Inference

Optimum CLI now supports compiling components in the SDXL pipeline for inference on neuron devices (inf2/trn1).

Below is an example of compiling SDXL models. You can either compile it with an inf2 instance (inf2.8xlarge or larger recommended) or a CPU-only instance (disable the validation with --disable-validation) :

optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0 --task stable-diffusion-xl --batch_size 1 --height 1024 --width 1024 --auto_cast matmul --auto_cast_type bf16 sdxl_neuron/

And then run inference with the class NeuronStableDiffusionXLPipeline

from optimum.neuron import NeuronStableDiffusionXLPipeline

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained(
    model_id="sdxl_neuron/", device_ids=[0, 1]
)
image = stable_diffusion_xl(prompt).images[0]

Llama v1, v2 Inference

  • Add support for Llama inference through NeuronModelForCausalLM by @dacorvo in #223

Llama v2 Training

TGI

Major bugfixes

Other changes

Full Changelog: v0.0.10...v0.0.11

v0.0.10: Bugfixes and enhancement

28 Aug 11:55
Compare
Choose a tag to compare

Major bugfixes

  • Improve and Fix inferentia exporter by @JingyaHuang in #168
  • [Stable Diffusion] Fix the image size value inferral by @JingyaHuang in #167
  • Fix inferral of dynamic batch size from the config & Be compatible with transformers 4.32 by @JingyaHuang in #190

Enhancements of APIs

Other changes

New Contributors

Full Changelog: v0.0.9...v0.0.10

v0.0.9: Tensor Parallelism training for T5, more stable Stable Diffusion inference

07 Aug 18:47
Compare
Choose a tag to compare

Tensor Parallelism support for T5 on training

Enhance Stable Diffusion Inference

What's Changed

Full Changelog: v0.0.8...v0.0.9

v0.0.8: Tensor Parallelism, ZeRO-1 optimization and Stable Diffusion model classes

31 Jul 07:54
Compare
Choose a tag to compare

Tensor Parallelism and ZeRO-1 optimization

Tensor Parallelism

It is now possible to shard model's parameters across several Neuron cores using tensor parallelism enabling training of much larger models than before.

The following model architectures are supported:

  • BERT
  • RoBERTa
  • GPT Neo
  • LLaMa

Relevant PRs: #125 and #143

ZeRO-1

Deepspeed ZeRO Stage 1 optimization is supported as well, which shards the optimizer state across data-parallel ranks, resulting in an important memory save.

Relevant PRs: #140

Note: Tensor Parallelism and ZeRO-1 can be combined,

Stable Diffusion Models Inference support

NeuronStableDiffusionPipeline allows you to export your stable diffusion checkpoint to neuronx compatible format and run inference on Inf2 or trn1 instances while preserving the python interface you are used to from 🤗 diffusers

Example:

from optimum.neuron import NeuronStableDiffusionPipeline

model_id = "runwayml/stable-diffusion-v1-5"
input_shapes = {"batch_size": 1, "height": 512, "width": 512}  
stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(model_id, export=True, **input_shapes)

prompt = "a photo of an astronaut riding a horse on mars"
image = stable_diffusion(prompt).images[0]

Currently only Text-to-Image Generation task is supported.