Release v0.0.8: Tensor Parallelism, ZeRO-1 optimization and Stable Diffusion model classes · huggingface/optimum-neuron

Tensor Parallelism and ZeRO-1 optimization

Tensor Parallelism

It is now possible to shard model's parameters across several Neuron cores using tensor parallelism enabling training of much larger models than before.

The following model architectures are supported:

BERT
RoBERTa
GPT Neo
LLaMa

Relevant PRs: #125 and #143

ZeRO-1

Deepspeed ZeRO Stage 1 optimization is supported as well, which shards the optimizer state across data-parallel ranks, resulting in an important memory save.

Relevant PRs: #140

Note: Tensor Parallelism and ZeRO-1 can be combined,

Stable Diffusion Models Inference support

NeuronStableDiffusionPipeline allows you to export your stable diffusion checkpoint to neuronx compatible format and run inference on Inf2 or trn1 instances while preserving the python interface you are used to from 🤗 diffusers

Example:

from optimum.neuron import NeuronStableDiffusionPipeline

model_id = "runwayml/stable-diffusion-v1-5"
input_shapes = {"batch_size": 1, "height": 512, "width": 512}  
stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(model_id, export=True, **input_shapes)

prompt = "a photo of an astronaut riding a horse on mars"
image = stable_diffusion(prompt).images[0]

Currently only Text-to-Image Generation task is supported.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.8: Tensor Parallelism, ZeRO-1 optimization and Stable Diffusion model classes

Tensor Parallelism and ZeRO-1 optimization

Tensor Parallelism

ZeRO-1

Stable Diffusion Models Inference support