Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

honor upcast_attention model setting #2365

Closed
wants to merge 1 commit into from
Closed

Conversation

keturn
Copy link
Contributor

@keturn keturn commented Jan 18, 2023

brings our attention code in line with huggingface/diffusers#1590

hopefully fixes #2329 float16 support for SD 2.1

@keturn keturn added the bug Something isn't working label Jan 18, 2023
@keturn keturn added this to the 2.3 🧨 milestone Jan 18, 2023
@keturn keturn requested a review from damian0815 January 18, 2023 23:47
@lstein
Copy link
Collaborator

lstein commented Jan 19, 2023

No joy. Got a CUDA out of memory error as soon as rendering started.

Test program:

from ldm.invoke.generator.diffusers_pipeline import StableDiffusionGeneratorPipeline
from diffusers import DPMSolverMultistepScheduler
import torch

invoke_ai_cache =  '/home/lstein/invokeai/models/diffusers'
model_id = "stabilityai/stable-diffusion-2-1"

pipe = StableDiffusionGeneratorPipeline.from_pretrained(model_id,
                                                        torch_dtype=torch.float16,
                                                        cache_dir=invoke_ai_cache
                                                        )
pipe.disable_xformers_memory_efficient_attention()
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
image.save("astronaut_rides_horse.png")

Stack trace:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/lstein/Projects/InvokeAI/black.py:16 in <module>                                           │
│                                                                                                  │
│   13 pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)             │
│   14 pipe = pipe.to("cuda")                                                                      │
│   15 prompt = "a photo of an astronaut riding a horse on mars"                                   │
│ ❱ 16 image = pipe(prompt).images[0]                                                              │
│   17 image.save("astronaut_rides_horse.png")                                                     │
│   18                                                                                             │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py:27 in       │
│ decorate_context                                                                                 │
│                                                                                                  │
│    24 │   │   @functools.wraps(func)                                                             │
│    25 │   │   def decorate_context(*args, **kwargs):                                             │
│    26 │   │   │   with self.clone():                                                             │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                               │
│    28 │   │   return cast(F, decorate_context)                                                   │
│    29 │                                                                                          │
│    30 │   def _wrap_generator(self, func):                                                       │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pi │
│ peline_stable_diffusion.py:529 in __call__                                                       │
│                                                                                                  │
│   526 │   │   │   │   latent_model_input = self.scheduler.scale_model_input(latent_model_input   │
│   527 │   │   │   │                                                                              │
│   528 │   │   │   │   # predict the noise residual                                               │
│ ❱ 529 │   │   │   │   noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text   │
│   530 │   │   │   │                                                                              │
│   531 │   │   │   │   # perform guidance                                                         │
│   532 │   │   │   │   if do_classifier_free_guidance:                                            │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in      │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py:4 │
│ 24 in forward                                                                                    │
│                                                                                                  │
│   421 │   │   down_block_res_samples = (sample,)                                                 │
│   422 │   │   for downsample_block in self.down_blocks:                                          │
│   423 │   │   │   if hasattr(downsample_block, "has_cross_attention") and downsample_block.has   │
│ ❱ 424 │   │   │   │   sample, res_samples = downsample_block(                                    │
│   425 │   │   │   │   │   hidden_states=sample,                                                  │
│   426 │   │   │   │   │   temb=emb,                                                              │
│   427 │   │   │   │   │   encoder_hidden_states=encoder_hidden_states,                           │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in      │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py:777  │
│ in forward                                                                                       │
│                                                                                                  │
│    774 │   │   │   │   )[0]                                                                      │
│    775 │   │   │   else:                                                                         │
│    776 │   │   │   │   hidden_states = resnet(hidden_states, temb)                               │
│ ❱  777 │   │   │   │   hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden  │
│    778 │   │   │                                                                                 │
│    779 │   │   │   output_states += (hidden_states,)                                             │
│    780                                                                                           │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in      │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/attention.py:216 in    │
│ forward                                                                                          │
│                                                                                                  │
│   213 │   │                                                                                      │
│   214 │   │   # 2. Blocks                                                                        │
│   215 │   │   for block in self.transformer_blocks:                                              │
│ ❱ 216 │   │   │   hidden_states = block(hidden_states, encoder_hidden_states=encoder_hidden_st   │
│   217 │   │                                                                                      │
│   218 │   │   # 3. Output                                                                        │
│   219 │   │   if self.is_input_continuous:                                                       │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in      │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/attention.py:490 in    │
│ forward                                                                                          │
│                                                                                                  │
│   487 │   │   │   │   self.attn1(norm_hidden_states, encoder_hidden_states, attention_mask=att   │
│   488 │   │   │   )                                                                              │
│   489 │   │   else:                                                                              │
│ ❱ 490 │   │   │   hidden_states = self.attn1(norm_hidden_states, attention_mask=attention_mask   │
│   491 │   │                                                                                      │
│   492 │   │   if self.attn2 is not None:                                                         │
│   493 │   │   │   # 2. Cross-Attention                                                           │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in      │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/attention.py:638 in    │
│ forward                                                                                          │
│                                                                                                  │
│   635 │   │   │   hidden_states = hidden_states.to(query.dtype)                                  │
│   636 │   │   else:                                                                              │
│   637 │   │   │   if self._slice_size is None or query.shape[0] // self._slice_size == 1:        │
│ ❱ 638 │   │   │   │   hidden_states = self._attention(query, key, value, attention_mask)         │
│   639 │   │   │   else:                                                                          │
│   640 │   │   │   │   hidden_states = self._sliced_attention(query, key, value, sequence_lengt   │
│   641                                                                                            │
│                                                                                                  │
│ /home/lstein/Projects/InvokeAI/ldm/models/diffusion/cross_attention_control.py:470 in _attention │
│                                                                                                  │
│   467 │   │   #default_result = super()._attention(query,  key, value)                           │
│   468 │   │   if attention_mask is not None:                                                     │
│   469 │   │   │   print(f"{type(self).__name__} ignoring passed-in attention_mask")              │
│ ❱ 470 │   │   attention_result = self.get_invokeai_attention_mem_efficient(query, key, value)    │
│   471 │   │                                                                                      │
│   472 │   │   hidden_states = self.reshape_batch_dim_to_heads(attention_result)                  │
│   473 │   │   return hidden_states                                                               │
│                                                                                                  │
│ /home/lstein/Projects/InvokeAI/ldm/models/diffusion/cross_attention_control.py:306 in            │
│ get_invokeai_attention_mem_efficient                                                             │
│                                                                                                  │
│   303 │   def get_invokeai_attention_mem_efficient(self, q, k, v):                               │
│   304 │   │   if q.device.type == 'cuda':                                                        │
│   305 │   │   │   #print("in get_attention_mem_efficient with q shape", q.shape, ", k shape",    │
│ ❱ 306 │   │   │   return self.einsum_op_cuda(q, k, v)                                            │
│   307 │   │                                                                                      │
│   308 │   │   if q.device.type == 'mps' or q.device.type == 'cpu':                               │
│   309 │   │   │   if self.mem_total_gb >= 32:                                                    │
│                                                                                                  │
│ /home/lstein/Projects/InvokeAI/ldm/models/diffusion/cross_attention_control.py:300 in            │
│ einsum_op_cuda                                                                                   │
│                                                                                                  │
│   297 │   │   # fallback for when there is no saved strategy, or saved strategy does not slice   │
│   298 │   │   mem_free_total = get_mem_free_total(q.device)                                      │
│   299 │   │   # Divide factor of safety as there's copying and fragmentation                     │
│ ❱ 300 │   │   return self.einsum_op_tensor_mem(q, k, v, mem_free_total / 3.3 / (1 << 20))        │
│   301 │                                                                                          │
│   302 │                                                                                          │
│   303 │   def get_invokeai_attention_mem_efficient(self, q, k, v):                               │
│                                                                                                  │
│ /home/lstein/Projects/InvokeAI/ldm/models/diffusion/cross_attention_control.py:279 in            │
│ einsum_op_tensor_mem                                                                             │
│                                                                                                  │
│   276 │   def einsum_op_tensor_mem(self, q, k, v, max_tensor_mb):                                │
│   277 │   │   size_mb = q.shape[0] * q.shape[1] * k.shape[1] * q.element_size() // (1 << 20)     │
│   278 │   │   if size_mb <= max_tensor_mb:                                                       │
│ ❱ 279 │   │   │   return self.einsum_lowest_level(q, k, v, None, None, None)                     │
│   280 │   │   div = 1 << int((size_mb - 1) / max_tensor_mb).bit_length()                         │
│   281 │   │   if div <= q.shape[0]:                                                              │
│   282 │   │   │   return self.einsum_op_slice_dim0(q, k, v, q.shape[0] // div)                   │
│                                                                                                  │
│ /home/lstein/Projects/InvokeAI/ldm/models/diffusion/cross_attention_control.py:235 in            │
│ einsum_lowest_level                                                                              │
│                                                                                                  │
│   232 │   │   # calculate attention slice by taking the best scores for each latent pixel        │
│   233 │   │   if self.upcast_softmax:                                                            │
│   234 │   │   │   attention_scores = attention_scores.float()                                    │
│ ❱ 235 │   │   default_attention_slice = attention_scores.softmax(dim=-1).to(dtype=dtype)         │
│   236 │   │                                                                                      │
│   237 │   │   attention_slice_wrangler = self.attention_slice_wrangler                           │
│   238 │   │   if attention_slice_wrangler is not None:                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OutOfMemoryError: CUDA out of memory. Tried to allocate 1.58 GiB (GPU 0; 11.76 GiB total capacity; 8.89 GiB already allocated; 27.62 MiB free; 10.53 GiB reserved in total by 
PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Copy link
Collaborator

@lstein lstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I'm now getting a CUDA out of memory error during the generation process. The GPU was idle with 12 GB of free memory at the time.

@keturn
Copy link
Contributor Author

keturn commented Jan 19, 2023

Unfortunately I'm now getting a CUDA out of memory error during the generation process. The GPU was idle with 12 GB of free memory at the time.

@lstein, does it work if you use a smaller image size?

and by "work" I mean "produce an image at all" — SD 2.1 really isn't pretty when running below its 768px default size.

Copy link
Contributor

@damian0815 damian0815 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good - reading the logic makes sense, but i didn't test it - pending lstein's issue above i won't press "merge"

@lstein
Copy link
Collaborator

lstein commented Jan 19, 2023 via email

@lstein
Copy link
Collaborator

lstein commented Jan 19, 2023

Unfortunately I'm now getting a CUDA out of memory error during the generation process. The GPU was idle with 12 GB of free memory at the time.

@lstein, does it work if you use a smaller image size?

and by "work" I mean "produce an image at all" — SD 2.1 really isn't pretty when running below its 768px default si

768x768 - crashes with the memory error
512x512 - crashes with the memory error
256x256 - doesn't crash, but the image looks predictably horrible (I wouldn't eat that sushi)

Copy link
Collaborator

@lstein lstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently unusable due to out of memory errors on images of 512x512 and higher.

@keturn
Copy link
Contributor Author

keturn commented Jan 19, 2023

Interesting, 512px works here within 12 GB.

I did add revision="fp16" to the from_pretrained, but I think that should only make a difference in terms of storage space, since it loads it in to the torch_dtype tensor either way.

But then how is it that #2335 works for you at full precision? cuz that runs out of memory for me too.

@lstein
Copy link
Collaborator

lstein commented Jan 19, 2023

Interesting, 512px works here within 12 GB.

I did add revision="fp16" to the from_pretrained, but I think that should only make a difference in terms of storage space, since it loads it in to the torch_dtype tensor either way.

But then how is it that #2335 works for you at full precision? cuz that runs out of memory for me too.

I have no idea. Even without #2335, if I comment out self.enable_xformers_memory_efficient_attention() in diffusers_pipeline.py and then run with full precision (using --precision=float32, I can render 768x768 and 512x512 with no problems.

I think we've previously established that we are on the same versions of torch and CUDA - 1.13.1 and 11.7?

@lstein
Copy link
Collaborator

lstein commented Jan 19, 2023

I'm getting discouraged that we're doing all this debugging work to get a model running that nobody likes much anyway.

It looks like PyPi has binary wheel builds for xformers on Windows and Linux. I'm thinking of adding this to requirements-base.txt, in which case the problem will magically go away unless the user disables xformers explicitly.

@keturn
Copy link
Contributor Author

keturn commented Jan 20, 2023

Can't rely on xformers to fix everything, because

@keturn
Copy link
Contributor Author

keturn commented Jan 30, 2023

obsoleted by #2385

@keturn keturn closed this Jan 30, 2023
@keturn keturn deleted the feat/upcast-attention branch January 30, 2023 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug]: SD 2.x models return only black in float16
3 participants