honor `upcast_attention` model setting #2365

keturn · 2023-01-18T23:47:51Z

brings our attention code in line with huggingface/diffusers#1590

hopefully fixes #2329 float16 support for SD 2.1

lstein · 2023-01-19T03:20:56Z

No joy. Got a CUDA out of memory error as soon as rendering started.

Test program:

from ldm.invoke.generator.diffusers_pipeline import StableDiffusionGeneratorPipeline
from diffusers import DPMSolverMultistepScheduler
import torch

invoke_ai_cache =  '/home/lstein/invokeai/models/diffusers'
model_id = "stabilityai/stable-diffusion-2-1"

pipe = StableDiffusionGeneratorPipeline.from_pretrained(model_id,
                                                        torch_dtype=torch.float16,
                                                        cache_dir=invoke_ai_cache
                                                        )
pipe.disable_xformers_memory_efficient_attention()
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
image.save("astronaut_rides_horse.png")

Stack trace:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/lstein/Projects/InvokeAI/black.py:16 in <module>                                           │
│                                                                                                  │
│   13 pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)             │
│   14 pipe = pipe.to("cuda")                                                                      │
│   15 prompt = "a photo of an astronaut riding a horse on mars"                                   │
│ ❱ 16 image = pipe(prompt).images[0]                                                              │
│   17 image.save("astronaut_rides_horse.png")                                                     │
│   18                                                                                             │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py:27 in       │
│ decorate_context                                                                                 │
│                                                                                                  │
│    24 │   │   @functools.wraps(func)                                                             │
│    25 │   │   def decorate_context(*args, **kwargs):                                             │
│    26 │   │   │   with self.clone():                                                             │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                               │
│    28 │   │   return cast(F, decorate_context)                                                   │
│    29 │                                                                                          │
│    30 │   def _wrap_generator(self, func):                                                       │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pi │
│ peline_stable_diffusion.py:529 in __call__                                                       │
│                                                                                                  │
│   526 │   │   │   │   latent_model_input = self.scheduler.scale_model_input(latent_model_input   │
│   527 │   │   │   │                                                                              │
│   528 │   │   │   │   # predict the noise residual                                               │
│ ❱ 529 │   │   │   │   noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text   │
│   530 │   │   │   │                                                                              │
│   531 │   │   │   │   # perform guidance                                                         │
│   532 │   │   │   │   if do_classifier_free_guidance:                                            │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in      │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py:4 │
│ 24 in forward                                                                                    │
│                                                                                                  │
│   421 │   │   down_block_res_samples = (sample,)                                                 │
│   422 │   │   for downsample_block in self.down_blocks:                                          │
│   423 │   │   │   if hasattr(downsample_block, "has_cross_attention") and downsample_block.has   │
│ ❱ 424 │   │   │   │   sample, res_samples = downsample_block(                                    │
│   425 │   │   │   │   │   hidden_states=sample,                                                  │
│   426 │   │   │   │   │   temb=emb,                                                              │
│   427 │   │   │   │   │   encoder_hidden_states=encoder_hidden_states,                           │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in      │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py:777  │
│ in forward                                                                                       │
│                                                                                                  │
│    774 │   │   │   │   )[0]                                                                      │
│    775 │   │   │   else:                                                                         │
│    776 │   │   │   │   hidden_states = resnet(hidden_states, temb)                               │
│ ❱  777 │   │   │   │   hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden  │
│    778 │   │   │                                                                                 │
│    779 │   │   │   output_states += (hidden_states,)                                             │
│    780                                                                                           │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in      │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/attention.py:216 in    │
│ forward                                                                                          │
│                                                                                                  │
│   213 │   │                                                                                      │
│   214 │   │   # 2. Blocks                                                                        │
│   215 │   │   for block in self.transformer_blocks:                                              │
│ ❱ 216 │   │   │   hidden_states = block(hidden_states, encoder_hidden_states=encoder_hidden_st   │
│   217 │   │                                                                                      │
│   218 │   │   # 3. Output                                                                        │
│   219 │   │   if self.is_input_continuous:                                                       │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in      │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/attention.py:490 in    │
│ forward                                                                                          │
│                                                                                                  │
│   487 │   │   │   │   self.attn1(norm_hidden_states, encoder_hidden_states, attention_mask=att   │
│   488 │   │   │   )                                                                              │
│   489 │   │   else:                                                                              │
│ ❱ 490 │   │   │   hidden_states = self.attn1(norm_hidden_states, attention_mask=attention_mask   │
│   491 │   │                                                                                      │
│   492 │   │   if self.attn2 is not None:                                                         │
│   493 │   │   │   # 2. Cross-Attention                                                           │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in      │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/attention.py:638 in    │
│ forward                                                                                          │
│                                                                                                  │
│   635 │   │   │   hidden_states = hidden_states.to(query.dtype)                                  │
│   636 │   │   else:                                                                              │
│   637 │   │   │   if self._slice_size is None or query.shape[0] // self._slice_size == 1:        │
│ ❱ 638 │   │   │   │   hidden_states = self._attention(query, key, value, attention_mask)         │
│   639 │   │   │   else:                                                                          │
│   640 │   │   │   │   hidden_states = self._sliced_attention(query, key, value, sequence_lengt   │
│   641                                                                                            │
│                                                                                                  │
│ /home/lstein/Projects/InvokeAI/ldm/models/diffusion/cross_attention_control.py:470 in _attention │
│                                                                                                  │
│   467 │   │   #default_result = super()._attention(query,  key, value)                           │
│   468 │   │   if attention_mask is not None:                                                     │
│   469 │   │   │   print(f"{type(self).__name__} ignoring passed-in attention_mask")              │
│ ❱ 470 │   │   attention_result = self.get_invokeai_attention_mem_efficient(query, key, value)    │
│   471 │   │                                                                                      │
│   472 │   │   hidden_states = self.reshape_batch_dim_to_heads(attention_result)                  │
│   473 │   │   return hidden_states                                                               │
│                                                                                                  │
│ /home/lstein/Projects/InvokeAI/ldm/models/diffusion/cross_attention_control.py:306 in            │
│ get_invokeai_attention_mem_efficient                                                             │
│                                                                                                  │
│   303 │   def get_invokeai_attention_mem_efficient(self, q, k, v):                               │
│   304 │   │   if q.device.type == 'cuda':                                                        │
│   305 │   │   │   #print("in get_attention_mem_efficient with q shape", q.shape, ", k shape",    │
│ ❱ 306 │   │   │   return self.einsum_op_cuda(q, k, v)                                            │
│   307 │   │                                                                                      │
│   308 │   │   if q.device.type == 'mps' or q.device.type == 'cpu':                               │
│   309 │   │   │   if self.mem_total_gb >= 32:                                                    │
│                                                                                                  │
│ /home/lstein/Projects/InvokeAI/ldm/models/diffusion/cross_attention_control.py:300 in            │
│ einsum_op_cuda                                                                                   │
│                                                                                                  │
│   297 │   │   # fallback for when there is no saved strategy, or saved strategy does not slice   │
│   298 │   │   mem_free_total = get_mem_free_total(q.device)                                      │
│   299 │   │   # Divide factor of safety as there's copying and fragmentation                     │
│ ❱ 300 │   │   return self.einsum_op_tensor_mem(q, k, v, mem_free_total / 3.3 / (1 << 20))        │
│   301 │                                                                                          │
│   302 │                                                                                          │
│   303 │   def get_invokeai_attention_mem_efficient(self, q, k, v):                               │
│                                                                                                  │
│ /home/lstein/Projects/InvokeAI/ldm/models/diffusion/cross_attention_control.py:279 in            │
│ einsum_op_tensor_mem                                                                             │
│                                                                                                  │
│   276 │   def einsum_op_tensor_mem(self, q, k, v, max_tensor_mb):                                │
│   277 │   │   size_mb = q.shape[0] * q.shape[1] * k.shape[1] * q.element_size() // (1 << 20)     │
│   278 │   │   if size_mb <= max_tensor_mb:                                                       │
│ ❱ 279 │   │   │   return self.einsum_lowest_level(q, k, v, None, None, None)                     │
│   280 │   │   div = 1 << int((size_mb - 1) / max_tensor_mb).bit_length()                         │
│   281 │   │   if div <= q.shape[0]:                                                              │
│   282 │   │   │   return self.einsum_op_slice_dim0(q, k, v, q.shape[0] // div)                   │
│                                                                                                  │
│ /home/lstein/Projects/InvokeAI/ldm/models/diffusion/cross_attention_control.py:235 in            │
│ einsum_lowest_level                                                                              │
│                                                                                                  │
│   232 │   │   # calculate attention slice by taking the best scores for each latent pixel        │
│   233 │   │   if self.upcast_softmax:                                                            │
│   234 │   │   │   attention_scores = attention_scores.float()                                    │
│ ❱ 235 │   │   default_attention_slice = attention_scores.softmax(dim=-1).to(dtype=dtype)         │
│   236 │   │                                                                                      │
│   237 │   │   attention_slice_wrangler = self.attention_slice_wrangler                           │
│   238 │   │   if attention_slice_wrangler is not None:                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OutOfMemoryError: CUDA out of memory. Tried to allocate 1.58 GiB (GPU 0; 11.76 GiB total capacity; 8.89 GiB already allocated; 27.62 MiB free; 10.53 GiB reserved in total by 
PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

lstein

Unfortunately I'm now getting a CUDA out of memory error during the generation process. The GPU was idle with 12 GB of free memory at the time.

keturn · 2023-01-19T06:50:29Z

Unfortunately I'm now getting a CUDA out of memory error during the generation process. The GPU was idle with 12 GB of free memory at the time.

@lstein, does it work if you use a smaller image size?

and by "work" I mean "produce an image at all" — SD 2.1 really isn't pretty when running below its 768px default size.

damian0815

looks good - reading the logic makes sense, but i didn't test it - pending lstein's issue above i won't press "merge"

lstein · 2023-01-19T19:36:48Z

I'll try with a few different sizes, but 512x512 didn't work.

…

On Thu, Jan 19, 2023 at 1:50 AM Kevin Turner ***@***.***> wrote: Unfortunately I'm now getting a CUDA out of memory error during the generation process. The GPU was idle with 12 GB of free memory at the time. @lstein <https://github.com/lstein>, does it work if you use a smaller image size? and by "work" I mean "produce an image at all" — SD 2.1 really isn't pretty when running below its 768px default size. — Reply to this email directly, view it on GitHub <#2365 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA3EVLNFRCURMURQMP442LWTDP37ANCNFSM6AAAAAAT7WFLEE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- *Lincoln Stein* Head, Adaptive Oncology, OICR Senior Principal Investigator, OICR Professor, Department of Molecular Genetics, University of Toronto Tel: 416-673-8514 Cell: 416-817-8240 ***@***.*** *E**xecutive Assistant* Michelle Xin Tel: 647-260-7927 ***@***.*** ***@***.***>* *Ontario Institute for Cancer Research* MaRS Centre, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3 @OICR_news <https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Foicr_news&data=04%7C01%7CMichelle.Xin%40oicr.on.ca%7C9fa8636ff38b4a60ff5a08d926dd2113%7C9df949f8a6eb419d9caa1f8c83db674f%7C0%7C0%7C637583553462287559%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PS9KzggzFoecbbt%2BZQyhkWkQo9D0hHiiujsbP7Idv4s%3D&reserved=0> | www.oicr.on.ca *Collaborate. Translate. Change lives.* This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.

lstein · 2023-01-19T19:46:46Z

Unfortunately I'm now getting a CUDA out of memory error during the generation process. The GPU was idle with 12 GB of free memory at the time.

@lstein, does it work if you use a smaller image size?

and by "work" I mean "produce an image at all" — SD 2.1 really isn't pretty when running below its 768px default si

768x768 - crashes with the memory error
512x512 - crashes with the memory error
256x256 - doesn't crash, but the image looks predictably horrible (I wouldn't eat that sushi)

lstein

Currently unusable due to out of memory errors on images of 512x512 and higher.

keturn · 2023-01-19T20:21:52Z

Interesting, 512px works here within 12 GB.

I did add revision="fp16" to the from_pretrained, but I think that should only make a difference in terms of storage space, since it loads it in to the torch_dtype tensor either way.

But then how is it that #2335 works for you at full precision? cuz that runs out of memory for me too.

lstein · 2023-01-19T20:55:41Z

Interesting, 512px works here within 12 GB.

I did add revision="fp16" to the from_pretrained, but I think that should only make a difference in terms of storage space, since it loads it in to the torch_dtype tensor either way.

But then how is it that #2335 works for you at full precision? cuz that runs out of memory for me too.

I have no idea. Even without #2335, if I comment out self.enable_xformers_memory_efficient_attention() in diffusers_pipeline.py and then run with full precision (using --precision=float32, I can render 768x768 and 512x512 with no problems.

I think we've previously established that we are on the same versions of torch and CUDA - 1.13.1 and 11.7?

lstein · 2023-01-19T23:07:01Z

I'm getting discouraged that we're doing all this debugging work to get a model running that nobody likes much anyway.

It looks like PyPi has binary wheel builds for xformers on Windows and Linux. I'm thinking of adding this to requirements-base.txt, in which case the problem will magically go away unless the user disables xformers explicitly.

keturn · 2023-01-20T04:37:40Z

Can't rely on xformers to fix everything, because

I don't expect xformers is worth installing on non-CUDA platforms like ROCm and MacOS
cross-attention controls don't work with xformers [bug]: swap doesn't work when xformers is enabled #2328

keturn · 2023-01-30T23:58:45Z

obsoleted by #2385

honor upcast_attention model setting

b5442ea

keturn added the bug Something isn't working label Jan 18, 2023

keturn added this to the 2.3 🧨 milestone Jan 18, 2023

keturn requested a review from damian0815 January 18, 2023 23:47

lstein reviewed Jan 19, 2023

View reviewed changes

damian0815 approved these changes Jan 19, 2023

View reviewed changes

lstein requested changes Jan 19, 2023

View reviewed changes

damian0815 mentioned this pull request Jan 22, 2023

[bug]: Summary of cross-attention related SD-2.1/xformers/swap issues #2387

Closed

1 task

keturn closed this Jan 30, 2023

keturn deleted the feat/upcast-attention branch January 30, 2023 23:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

honor `upcast_attention` model setting #2365

honor `upcast_attention` model setting #2365

keturn commented Jan 18, 2023

lstein commented Jan 19, 2023

lstein left a comment

keturn commented Jan 19, 2023

damian0815 left a comment •

edited

Loading

lstein commented Jan 19, 2023 via email

lstein commented Jan 19, 2023 •

edited

Loading

lstein left a comment

keturn commented Jan 19, 2023

lstein commented Jan 19, 2023

lstein commented Jan 19, 2023

keturn commented Jan 20, 2023

keturn commented Jan 30, 2023

honor upcast_attention model setting #2365

honor upcast_attention model setting #2365

Conversation

keturn commented Jan 18, 2023

lstein commented Jan 19, 2023

lstein left a comment

Choose a reason for hiding this comment

keturn commented Jan 19, 2023

damian0815 left a comment • edited Loading

Choose a reason for hiding this comment

lstein commented Jan 19, 2023 via email

lstein commented Jan 19, 2023 • edited Loading

lstein left a comment

Choose a reason for hiding this comment

keturn commented Jan 19, 2023

lstein commented Jan 19, 2023

lstein commented Jan 19, 2023

keturn commented Jan 20, 2023

keturn commented Jan 30, 2023

honor `upcast_attention` model setting #2365

honor `upcast_attention` model setting #2365

damian0815 left a comment •

edited

Loading

lstein commented Jan 19, 2023 •

edited

Loading