Skip to content

v0.0.8: Tensor Parallelism, ZeRO-1 optimization and Stable Diffusion model classes

Compare
Choose a tag to compare
@michaelbenayoun michaelbenayoun released this 31 Jul 07:54
· 320 commits to main since this release

Tensor Parallelism and ZeRO-1 optimization

Tensor Parallelism

It is now possible to shard model's parameters across several Neuron cores using tensor parallelism enabling training of much larger models than before.

The following model architectures are supported:

  • BERT
  • RoBERTa
  • GPT Neo
  • LLaMa

Relevant PRs: #125 and #143

ZeRO-1

Deepspeed ZeRO Stage 1 optimization is supported as well, which shards the optimizer state across data-parallel ranks, resulting in an important memory save.

Relevant PRs: #140

Note: Tensor Parallelism and ZeRO-1 can be combined,

Stable Diffusion Models Inference support

NeuronStableDiffusionPipeline allows you to export your stable diffusion checkpoint to neuronx compatible format and run inference on Inf2 or trn1 instances while preserving the python interface you are used to from 🤗 diffusers

Example:

from optimum.neuron import NeuronStableDiffusionPipeline

model_id = "runwayml/stable-diffusion-v1-5"
input_shapes = {"batch_size": 1, "height": 512, "width": 512}  
stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(model_id, export=True, **input_shapes)

prompt = "a photo of an astronaut riding a horse on mars"
image = stable_diffusion(prompt).images[0]

Currently only Text-to-Image Generation task is supported.