Support Stable Diffusion model #129

siriux · 2022-09-20T09:56:29Z

Is your feature request related to a problem? Please describe.
I would like to be able to run Stable Diffusion using wonnx

Describe the solution you'd like
At least, these operators are missing and should be implemented before even trying too run Stable Diffusion on wonnx:
Einsum, Erf, Expand, InstanceNormalization, Shape, Slice

This is the minimum based on this guide that simplifies the onnx model (see the simplification table):
https://www.photoroom.com/tech/stable-diffusion-25-percent-faster-and-save-seconds/

Probably many more things will be needed, but I'm creating this issue because it can be a really interesting use case to be able to run SD in rust on the GPU directly.

I don't have much experience with wonnx or even ML, but I decided to create this issue because it surprised me how few operators are missing to run this model. I would need to get more experience with stable diffusion, diffusers library and onnx in python before attempting to port it here, but maybe there are more experienced users interested too.

haixuanTao · 2022-09-20T11:38:33Z

Hello Sirius, thanks for taking interest in wonnx!

The erf function is not yet a native operation on WGSL, see: https://www.w3.org/TR/WGSL/

It will be required to do an approximation of the erf function, to do stable diffusion on wonnx. I am at this point not sure on how to implement this.

siriux · 2022-09-20T12:25:58Z

Thanks for your answer. Again, let me reiterate my ignorance on this field, but this is what I've found.

The implementation used in tract seems very simple https://github.com/sonos/tract/blob/21928fb3652d028db5be1348e6017494318d4b86/onnx-opl/src/erf.rs

Looking at other WGSL shaders for other operations, it seems translatable.

The signum in WGSL is just sign, abs is the same, powi we can just use pow or even unroll it as it's 16 (and it's short and efficient), recip is just 1/x.

copysign is trickier, but for the erf function should be just a multiplication with the original sign (as erf(0) == 0).

I've looked a little bit to the other missing ops, and they don't seem as straight forward.

FL33TW00D · 2022-10-10T21:04:37Z

I looked into this a few weeks ago - it is a significant chunk of work for 2 reasons:

The ops to implement are complicated (i.e Einsum)
WONNX does not currently support parameterized dimensions, which would be required to implement the text encoder.

siriux · 2022-10-11T06:30:10Z

Thanks for looking at it. I hope one day we can be able to run something like SD in pure Rust.

philpax · 2022-11-13T01:42:35Z

As a matter of interest, tch-rs recently implemented Stable Diffusion: https://github.com/LaurentMazare/diffusers-rs

It's not directly applicable to this, but it could inform future development efforts.

pixelspark · 2023-02-07T22:29:09Z

WONNX does not currently support parameterized dimensions, which would be required to implement the text encoder.

I am not too familiar with SD but at least for BERT and other text encoders, parameterized dimensions can be replaced with fixed dimensions just fine (the model will then work with text token strings up to the statically set length).

pixelspark · 2023-03-07T14:19:47Z

WONNX does not currently support parameterized dimensions, which would be required to implement the text encoder.

The shape inference engine in WONNX now supports this (it allows you to set parametrized dimensions, then infer shapes for other outputs).

pixelspark · 2023-03-26T21:30:02Z

I looked into this a few weeks ago - it is a significant chunk of work for 2 reasons:

The ops to implement are complicated (i.e Einsum)

WONNX does not currently support parameterized dimensions, which would be required to implement the text encoder.

As for Einsum: this may be feasible, a first start is in #154

pixelspark added the enhancement New feature or request label Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Stable Diffusion model #129

Support Stable Diffusion model #129

siriux commented Sep 20, 2022

haixuanTao commented Sep 20, 2022

siriux commented Sep 20, 2022

FL33TW00D commented Oct 10, 2022

siriux commented Oct 11, 2022

philpax commented Nov 13, 2022

pixelspark commented Feb 7, 2023

pixelspark commented Mar 7, 2023

pixelspark commented Mar 26, 2023

Support Stable Diffusion model #129

Support Stable Diffusion model #129

Comments

siriux commented Sep 20, 2022

haixuanTao commented Sep 20, 2022

siriux commented Sep 20, 2022

FL33TW00D commented Oct 10, 2022

siriux commented Oct 11, 2022

philpax commented Nov 13, 2022

pixelspark commented Feb 7, 2023

pixelspark commented Mar 7, 2023

pixelspark commented Mar 26, 2023