v0.0.8: Tensor Parallelism, ZeRO-1 optimization and Stable Diffusion model classes
Tensor Parallelism and ZeRO-1 optimization
Tensor Parallelism
It is now possible to shard model's parameters across several Neuron cores using tensor parallelism enabling training of much larger models than before.
The following model architectures are supported:
- BERT
- RoBERTa
- GPT Neo
- LLaMa
ZeRO-1
Deepspeed ZeRO Stage 1 optimization is supported as well, which shards the optimizer state across data-parallel ranks, resulting in an important memory save.
Relevant PRs: #140
Note: Tensor Parallelism and ZeRO-1 can be combined,
Stable Diffusion Models Inference support
NeuronStableDiffusionPipeline
allows you to export your stable diffusion checkpoint to neuronx compatible format and run inference on Inf2 or trn1 instances while preserving the python interface you are used to from 🤗 diffusers
Example:
from optimum.neuron import NeuronStableDiffusionPipeline
model_id = "runwayml/stable-diffusion-v1-5"
input_shapes = {"batch_size": 1, "height": 512, "width": 512}
stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(model_id, export=True, **input_shapes)
prompt = "a photo of an astronaut riding a horse on mars"
image = stable_diffusion(prompt).images[0]
Currently only Text-to-Image Generation task is supported.