-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support stable-diffusion-webui-forge #69
base: main
Are you sure you want to change the base?
Conversation
Hijacks the forge_sample function and calls it separately for each individual text conditioning. The outputs are then recombined into a format that the extension already knows how to work with. This doesn't use forge's patcher or take advantage of all of forge's performance optimizations brought over from ComfyUI via ldm_patched, but it does guarantee backwards compatibility for base A1111 users, and allows Forge users to use the extension without maintaining a separate install of A1111.
Is this PR still a work in progress? I am playing a bit with the code and it looks relatively complete. There is something I am not sure I completely grasp, I would appreciate if you could help clarify this part. There seems to be a lot going on in the code, is there something preventing us from calling the original forge function with each auxiliary latent stacked on the batch dimension and keep the settings as they are as input to Good work overall! |
I think I should also clarify something. You may already know this, but let me explain this just in case. The reason that the If there is a way to implement this without monkey patching, we should consider it I think. I don't know exactly what builtin extension mechanisms are in place in forge, I need to investigate the code or maybe we should ask lllysaviel to add something for this use case / contribute a PR to forge ourselves. |
Yes -- there's a lot of things that I want to improve/fix, but I don't mind merging a hacky but working version and cleaning it up in a second pass. I'm about to be called away to a family engagement shortly. I'll be able to respond more fully to the comments later tonight and push some fixes I've been working on. |
The issue I think is what is returned here: https://github.com/lllyasviel/stable-diffusion-webui-forge/blob/main/ldm_patched/modules/samplers.py#L289 Generating at 896x1152 with a batch size of 1, with the prompt If I use the same settings with the prompt
I still get cond_pred with shape torch.Size([1, 4, 144, 112]) -- because ldm_patched already combined the auxiliary conds and returned just the single cond latent Increasing batch size to 8, then I get torch.Size([8, 4, 144, 112]) I think you knew this already, because you suggested:
I think that may be cleaner for playing nice with other extensions, the part I'm trying to figure out is how to perform the surgery on |
Yeah it seems tricky to make it work at a first glance. Let me see if I can get it to work on my side and I'll come back here with what I find. For personal reasons I don't want to falsely promise that I will find something by tonight, however I really intend to look into this today. I can also focus on something else if you want to tackle this part. |
so ldm_patched.modules.samplers.calc_cond_uncond_batch returns a pair of denoised latents (cond_pred, uncond_pred) and then applies classifier-free guidance (a custom implementation can be supplied via model_options) to compute a single cfg_result. if model_options includes any any sampler_post_cfg_function, those can further alter cfg_result before it is returned. this extension wants to work with a tree structure of latent vectors and compute the cfg_result by traversing that tree, but ldm_patched can only compute cfg_result from two vectors at a time I am thinking maybe we should work at a lower level, at ldm_patched.modules_samplers.sampling_function or ldm_patched.modules_samplers.calc_cond_uncond_batch, to call the original calc_cond_uncond_batch multiple times and collect the results. for example, let's say the batch size is 2, and B=1 has 5 prompts while B=2 has only 2 prompts. we'd call calc_cond_uncond_batch 7 times, and collect the results like
(the noconds aren't necessary for current functionality, but they'd be almost free to implement with this design, and would enable cool features like https://github.com/comfyanonymous/ComfyUI/blob/master/comfy_extras/nodes_perpneg.py) we'd then traverse the cond tree for each batch to compute the modified an additional benefit would be that this would roughly cut in half the number of calls needed to calc_cond_uncond_batch vs. the current implementation in this PR, which makes one call to calc_cond_uncond_batch for every single cond (including the uncond) and effectively discards one half of the results I think it still has to be a monkey patch rather than a change to forge itself, because forge doesn't have any concept of manipulating trees of conds or latents |
I looked at the code and the web API docs for the txt2img route, and now I'm skeptical of my previous statement on B being different for each index in N because I only see a single prompt in the suggested JSON body. I recall that at least within an extension in A1111, it is possible to set a different prompt for each batch index in the So after seeing this, I made a test extension just now that overwrites the second prompt in the batch with 3 positive AND conds while leaving the first prompt with no AND, and it seems that in forge,
Yeah maybe that'd be a better approach to padding... I believe we have to change the behavior of forge in this way, so as to allow a different number of conds per batch index, to prevent crashes in the extension.
Perpneg is implemented with the AND_PERP keyword already. The tree structure is supposed to enable these alternative denoising methods. I'm not sure I fully understand the "nocond" concept, so please ignore this paragraph if it doesn't add any useful info for you.
I think that's a valid point, it's much simpler to implement it like this and I totally support this approach. Another possibility to look at could be to have in forge a pre-processing list of modifiers and a post-processing list of modifiers in reverse order of registration. Like this, we can change the shape of what needs to be processed before and after denoising while respecting the order of other patches and whatever expectation of shapes they have. But that's a bit involved... I think it could be tackled at a different time if we ever do it. Of course the simplest solution would be to have a tree of prompts natively in the base repo but I don't think forge maintainers would welcome this kind of contribution as it is a concept very specific to neutral prompt. |
Another approach might be to just catch the error and print it when conds are missing in the tree, and then continue generation normally. It isn't a very frequent use case to begin with. IIUC it would simplify the implementation to just let forge strip the extra conds? |
Maybe forge would accept a contribution of a I think a good path forward is
|
I pushed a draft of the type of implementation I have in mind It is surely bug ridden and likely doesn't work at all yet, but I have to step away for the evening |
Seems to be a sensible approach, let's go with that. I pushed changes that make the code work at a basic level. With multiple AND keywords, it seems to give different results when the extension is enabled vs when the extension is disabled, this needs to be investigated (it is probably my fault). I am also not totally sure that patches and other extensible properties of conds (like |
Last night, I tested prompt forms like
with assigned weights of this leads me to believe that the issue is not with the way latents are combined, but rather with the way we're trying to sample latents in the first place -- maybe ldm_patched doesn't like treating conds and unconds as interchangeable :/ if I have some time on my lunch break, i'll try calling the original function once per cond to see if it makes a difference |
I made some progress on this, it seems that in the original While it might not be the only problem in the code, taking this into consideration fixes some of the issues I was having yesterday. Do you mind if I push my attempt at fixing this to your branch? I don't want to conflict with any local changes you have. |
lol, that would explain why results were so much better for me when I implemented weight normalization >_<
wow you're right, and it's forge specific / not inherited from comfy -- lllyasviel/stable-diffusion-webui-forge@998327c might be worth raising an issue there to inquire
go for it, I can test on a different branch or rebase as needed |
Damn this being forge specific is really surprising. I pushed the changes, let me know if it helps in any way with your testing (if you find time of course). |
I got the same results when I sampled once per cond, as I did when I sampled in pairs treating every other cond as uncond, so our efficient sampling idea works 🎉 I also got consistent results when distributing weights of bracketed groups over individual prompt weights, so your fix seems to work? but see below for a question about why |
This allows compatibility with other forge extensions like self-attention guidance that call the function outside of the normal context.
@ljleb I fixed an issue that occurred when using this extension together forge's builtin self-attention guidance extension, which occurred because that extension calls calc_cond_uncond_batch separately after computing CFG, at https://github.com/lllyasviel/stable-diffusion-webui-forge/blob/main/ldm_patched/contrib/external_sag.py#L155 |
Awesome! Nice find. This is starting to look solid. For personal reasons, I have not been able to bring myself to finish testing/reviewing unmerged work on this repo. As you spent a lot of time and efforts here, I'm starting to feel really bad with this situation. There is unfortunately not much I can do about it in the short term, but I'm actively seeking a solution. As you seem to have a couple ideas with the tree of prompts and other syntax extensions, I was initially considering exceptionally giving away ownership to you eventually if I can't resolve my personal reasons somehow (assuming you would be willing to take this burden). I am however not really in favor of giving away ownership to anyone usually, and now especially in light of the recent xz situation. If the repo becomes stale for long enough (like a couple of months) and your fork has useful additions in it, w-e-w might accept a PR for changing the URL in the extensions listing. Before it gets to that point, I intend to test this code and move forward with it when I get the opportunity, but I can't promise anything. Sorry for the trouble this causes. |
Upon testing this further, I find that a basic prompt with a single AND does not reproduce the results I get with the extension disabled. I learned why |
Thank you, I'll block out some time to look into this tonight. |
I have started working on this and fixed it, but the results are a little bit different still for some other reason (I suspect it is caused by the top level AND prompts being merged in delta space, which is not exactly how they are merged normally. it is mathematically equivalent but might be creating imprecisions in the denoising process). I will take some time to finish this tomorrow if I can't do it tonight. |
In case anyone is here trying to get this to work locally with manually installed forge, lllyasviel has made structural changes to the backend including removing the module that this pull request interfaces with entirely last month. |
Hijacks the forge_sample function and calls it separately for each individual text conditioning. The outputs are then recombined into a format that the extension already knows how to work with.
This doesn't use forge's patcher or take advantage of all of forge's performance optimizations brought over from ComfyUI via ldm_patched, but it does guarantee backwards compatibility for base A1111 users, and allows Forge users to use the extension without maintaining a separate install of A1111.
For a discussion of some of the challenges of manipulating denoised latents more directly using built-in functionality of Forge, see: hako-mikan/sd-webui-regional-prompter#299