What are the purposes of axes_factor, ignore_factor_on_trunc, unlimited_area_hack? #4
Replies: 2 comments 4 replies
-
Good catch, I did not document those vars yet because the axes_factor can be a bit finicky outside of using the values 1 and 2, and unlimited_area_hack increases VRAM usage drastically when its intended usage comes up. I'll document them "officially" on the README at a later date, got a lot on my plate in terms of features and bug fixes atm. I'll be very verbose in my responses here, just so I have it in writing. For axes_factor and ignore_factor_on_trunc, yep, it rearranges the input in groupnorm_mm_forward. The reason it's finicky (and why the ignore_factor_on_trunc is there) is that ComfyUI has VRAM optimizations kick in at certain resolution/batch_size combinations, that (I think) cause it to not add the uncond portions of the latents, cutting the expected latent chunks in half. Example: at something like 512x512 batch_size 16, the latent has the first dimension of 32, while at 1024x1024 batch_size 16, the latent has first dimension as 16 while all the others remain constant. Because it gets cut in half, if the code were to keep using the axes_factor 2 internally, the image seems to lose fidelity from my experimentation with the way it's normalized. So, when it detects that the expected latent chunks are not twice the size of the expected video frames, ignore_factor_on_trunc lets it use the value of 1 instead of the default 2 or whatever value is chosen, to keep it more consistent with what the user might expect to happen. And funnily enough, this halving behavior means that when ignore_factor_on_trunc is False, if you were to select, say a batch_size of 15 at a higher resolution (or have a second upscaling pass that pushes the resolution to the optimization threshold), the axes_factor 2 that normally works would attempt to rearrange the tensor in half - but it can't, because instead of getting the latent in 30 chunks, it would get 15. This optimization behavior is actually the OG reason why the original ComfyUI animatediff repo had the issue where too high of a resolution would cause the animation to be "cut in half" and rendered as two separate groups of latents - the code in motion_module.py originally derived the expected video_frame length from half the latent chunks. And yep, it would also throw an error with an odd amount of frames too when the opti kicked in. As for unlimited_area_hack, that overrides the maximum_batch_area function, which is what is used to determine when that halving optimization should kick in when sampling. The override is making that function return the max integer supported by Python3 rather than actually doing any math. As expected, it will pretty much double VRAM requirements when sampling at resolutions/batch_sizes that trigger the optimization. From some subjective tests, honestly it does not affect the output significantly at the resolutions that trigger the optimization (at least on my machine, but I have 24GB of VRAM). So unlimited_area_hack can be kept off unless the image is getting chunked in half at resolutions low enough where it is noticeable, but it is hard to even know when it happens unless I add code to print something out in the logs about it happening. It's kinda the opposite of an optimization when set to True. |
Beta Was this translation helpful? Give feedback.
-
Update: I talked with comfy and I now have a better understanding of how to get the appropriate axes factor value, to the point where I can probably take those variables out entirely. The unlimited_area_hack might also be unnecessary, I will be refactoring some code to hopefully get rid of a couple issues as well. |
Beta Was this translation helpful? Give feedback.
-
From the code, it looks like axes_factor, ignore_factor_on_trunc are for rearranging the input in
groupnorm_mm_forward
, though I'm not sure whatunlimited_area_hack
is for (possibly for low-VRAM devices?).Beta Was this translation helpful? Give feedback.
All reactions