Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: git-re-basin potential improvement for model merge #1176

Closed
DirtyHamster opened this issue May 26, 2023 · 91 comments
Closed

[Feature]: git-re-basin potential improvement for model merge #1176

DirtyHamster opened this issue May 26, 2023 · 91 comments
Labels
enhancement New feature or request

Comments

@DirtyHamster
Copy link

DirtyHamster commented May 26, 2023

Feature description

I spent a fair portion of the week reading about the various types of block merging methods for models that are currently available. One paper in particular entitled "GIT RE-BASIN: MERGING MODELS MODULO PERMUTATION SYMMETRIES" really caught my attention and I thought you might find it interesting as well. Paper is available here: https://arxiv.org/abs/2209.04836

Is there a reasonable method test to do their method vs what we already use to see if it would be an improvement? What method are you currently using so I can better understand what I'd be testing against? I had thought perhaps testing against their own method proof if their own original method proof is available. I figured I'd write this as I did incase someone else might be interested in testing it too.

My thought was this potentially could be added in as a option under interpolation as an "auto" option. As the weights are auto guided, I thought this might follow along with your idea of Ease-of-Use. Some of the more manual methods have users setting each of the independent in and out weight values model blocks which would also be nice to be able to do without leaving the core UI.

(conversation went on from there)

I did a little extra searching around from that:
Just a simple GUI for the code:
https://github.com/diStyApps/Merge-Stable-Diffusion-models-without-distortion-gui

GUI used in some of the testing:
https://github.com/axsddlr/sd-gui-meh

Code explored:
https://github.com/samuela/git-re-basin
https://github.com/themrzmaster/git-re-basin-pytorch
They have some code in the pull section for dealing with safe tensors partially as well:
diStyApps/Merge-Stable-Diffusion-models-without-distortion-gui#1

Code used in testing:
https://github.com/s1dlx/meh

Results were brought up from comments below after testing method was agreed on:


Model 1: used is: https://huggingface.co/Deltaadams/HD-22 fp32
Model 2: used is: dreamshaper_5BakedVae.safetensors via: https://huggingface.co/Lykon/DreamShaper
Both models pruned from full trainable ema models to fp32 no ema and fp16 no ema prior to testing.

Testing method sampler and size settings:
Settings: DPM++ 2M Karras @ 20 steps and a CFG scale of 7, Seed: 1897848000, Size: 512x716, CLIP: 4
Prompts Used: a headshot photographic portrait of a woman, a cat as a DJ at the turntables

Testing regiment: (Multiplier to be run from 0.1 to 0.9)

base-fp16+custom#1-fp16, base-fp16+custom#2-fp16
xyz_grid-0000-1897848000-a cat as a DJ at the turntables H22xdreamshaper-model_fp16_vae_fp16
xyz_grid-0032-1897848000-a headshot photographic portrait of a woman

base-fp32+custom#1-fp32, base-fp32+custom#2-fp32
xyz_grid-0001-1897848000-a cat as a DJ at the turntables H22xdreamshaper-model_fp16_vae_fp32
xyz_grid-0031-1897848000-a headshot photographic portrait of a woman

base-fp32+custom#1-fp16, base-fp32+custom#2-fp16
xyz_grid-0027-1897848000-a cat as a DJ at the turntables
xyz_grid-0026-1897848000-a headshot photographic portrait of a woman

Re-git-basin side will be similarly mirrored: (Weight value set at .5:.5, iteration value to be run from 1 to 10)

Test1: base-fp16+custom#1-fp16, base-fp16+custom#2-fp16 @ weight: .5:.5, iteration {number set...}

xyz_grid-0016-1897848000-a cat as a DJ at the turntables
xyz_grid-0018-1897848000-a headshot photographic portrait of a woman

Test2: base-fp32+custom#1-fp32, base-fp32+custom#2-fp32 @ weight: .5:.5, iteration {number set...}

xyz_grid-0019-1897848000-a cat as a DJ at the turntables
xyz_grid-0020-1897848000-a headshot photographic portrait of a woman

Test3: base-fp32+custom#1-fp16, base-fp32+custom#2-fp16 @ weight: .5:.5, iteration {number set...}

xyz_grid-0024-1897848000-a cat as a DJ at the turntables
xyz_grid-0025-1897848000-a headshot photographic portrait of a woman

Version Platform Description

Latest published version: e048679 2023-05-25T21:13:56Z

@DirtyHamster DirtyHamster added the enhancement New feature or request label May 26, 2023
@vladmandic
Copy link
Owner

the algo currently implemented by model merger is near-trivial and there is no intelligence whatsoever. you can see it at

def run_modelmerger(id_task, primary_model_name, secondary_model_name, tertiary_model_name, interp_method, multiplier, save_as_half, custom_name, checkpoint_format, config_source, bake_in_vae, discard_weights, save_metadata): # pylint: disable=unused-argument

if you can do any quantifiable test of the new algo showing the results, i'd love to include this code as you've suggested.

@vladmandic vladmandic changed the title [Feature]: (Potential feature a couple of questions) git-re-basin potential improvement for model merge [Feature]: git-re-basin potential improvement for model merge May 27, 2023
@DirtyHamster
Copy link
Author

I found the code for ours but I just couldn't figure out what the method called to really read more about it. lol... I thought perhaps it had a method name or something other than being "standard webui type of thing" which is kind of what I got back from all my searching.

Test wise what I'm thinking is 2 models and following them between with images between 2 similar merging routines. The issue I was having coming up with a good test. The methods are different so In the git re method there is alpha which I'm thinking currently would be similar to our multiplier m but then they also have iterations. Their entire paper only mentions iterations once, "Our experiments showed this algorithm to be fast in terms of both iterations necessary for convergence and wall-clock time, generally on the order of seconds to a few minutes."

Our current method just does the simplest merge it doesn't go for this ideal idea of convergence. I haven't tried to break their software yet so I don't know what the maximum iterations are yet. My base thought was doing 1 to 10 iterations on new method. In my own head I'd like to do the testing in a repeatable way that could potentially provide a default value for this iterations value as our models will probabilistically have more similarities than dissimilarities in general provided they are trained on the natural world.

On the standard merge I could do .1 stepping to 1 as in 10 steps
On the git-rebasin I could do .1 stepping to 1 @ 1 iteration same via the steps
Should this be followed up with say 10 iterations? As iterations isn't handled in the base concept I'm not sure how to position this?

I want to make sure that it's quantifiable too hence the excess of questions. Do you have any suggestion of publicly available models to use? Can we make a decision on x and y model for the testing. Base that is downloaded via vlad install (sd1.5) and something else as a secondary? It's hard to pick because there are a lot of small pruned models. I've had different experiences with merging pruned and unpruned modules. I was thinking of using this as they have used the manual method and have it listed in their notes: https://civitai.com/models/18207/ether-real-mix you see the in and out and alpha listed as a manual operation like 0,0.8,0.8,0.8,0.8,0.8,0,0,0,0,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0,0 these are the block weights they used in their merge. You count to the center number for the in and outs. They've been using a non default method but at least I know where their blocks are to some degree and it diverges from base enough style wise.

What would be satisfiable quantification for testing on this?

@vladmandic
Copy link
Owner

i like the proposed methodology, only thing i'd propose is to use two full fp32 model as base model and then create fp16 of it.
so we'd have 3 models, each in 2 variations

and then run merges using different variations (total of 6 tests):

  • base-fp32+custom#1-fp32, base-fp32+custom#2-fp32
  • base-fp16+custom#1-fp16, base-fp16+custom#2-fp16
  • base-fp32+custom#1-fp16, base-fp32+custom#1-fp16,

for custom model, how about https://civitai.com/models/4384/dreamshaper? that's a full 5.5gb fp32 model with baked in vae. and you can pick a second one.

regarding iterations - good question - i have no idea what the best setting would be? maybe try two extremes and see what the differences are and if they are significant, we can leave it as exposed setting in ui rather than predetermining it.

@DirtyHamster
Copy link
Author

DirtyHamster commented May 28, 2023

My suggested intention for iteration value was to leave it exposed but with a defaulted value given i.e. what ever looks best from the test results to be listed as the defaulted value. So if others decide to repeat the test that value for default could be averaged out for a best of value and still be easily changed. Honestly I'll try to break it and report on that too so we can try to eliminate some trouble shooting in the future if possible. I just want to avoid giving a value of 1 and having 1 not being a high enough of a value to make a significant change.

I agree on using fp32 models in the test too, I probably would have forgotten to include them if you didn't mention it. As their initial state of their code doesn't deal with safe tensors files I'll convert to checkpoint files first to avoid any hiccups. Dealing with the safe tensor files can be dealt with later.

I'll use https://github.com/arenasys/stable-diffusion-webui-model-toolkit to evaluate and check the models before after the merges to look for any changes or problems. Think about including this in the train tab it's really useful for figuring out what's going on component wise inside of the models if they don't run. I don't expect much to change in similar models but it's just good data for documentation.

This regiment is fine for me:

  • base-fp32+custom#1-fp32, base-fp32+custom#2-fp32
  • base-fp16+custom#1-fp16, base-fp16+custom#2-fp16
  • base-fp32+custom#1-fp16, base-fp32+custom#1-fp16

(Multiplier to be run from .1 to 1)

Re-git-basin side will be similarly mirrored:

  • base-fp32+custom#1-fp32, base-fp32+custom#2-fp32 @ iteration {number set...}
  • base-fp16+custom#1-fp16, base-fp16+custom#2-fp16 @ iteration {number set...}
  • base-fp32+custom#1-fp16, base-fp32+custom#1-fp16 @ iteration {number set...}

(Multiplier to be run from .1 to 1)

I think for sampling method I'm going to stick with: DPM++ 2M Karras @ 32 steps and a CFG scale of 12.5 which are setting I know generally works for every model I've tried it on so far. This could be expanded on latter but as a first run I don't want to over complicate it too much.

I was thinking of using some of the textual inversion template prompts for the default text prompt. Something fairly basic:
a photo of a [name], a rendering of a [name], a cropped photo of the [name], the photo of a [name], a photo of a clean [name], a photo of a dirty [name], a dark photo of the [name]. My own prompts end up being over complex so I'm trying to make sure it's something easy. I'll just replace [name] with woman or man

I was thinking of not using a negative prompt by default beyond maybe nfws or something similar. As not to effect image quality.

Using: https://civitai.com/models/4384/dreamshaper when I just checked it says it's Full Model fp16 (5.55 GB) jumping between versions. Version 5 inpainting is Full Model fp32 (5.28 GB) would that version be ok as it satisfies the fp32 issue?

I have a copy of the Hentai diffusion at fp32 it's similarly a larger unpruned model using the prompt a photo of a woman I get something like this generally out of it @1440x810: (I use this size intentionally looking for width duplication tiling in models and it's a high enough res that I can avoid using hires fix and or restore faces.)

(image snipped as not needed)

I had used this in another random merge test not documented trying to go between 2d illustrative to 3d realistic styles particularly dealing with faces using the standard method. So this might contrast well via base style. If you have a better suggestion as a second model feel free to suggest it I don't mind at all as the inclusion of the method is my end goal. My other thought was to look for a very realistic model and contrast it that way. I think the test would probably prove out regardless in the in-between though.

If we can square that off then I'll retype up the combined test method for the initial post and get started on testing.

@vladmandic
Copy link
Owner

Using: https://civitai.com/models/4384/dreamshaper when I just checked it says it's Full Model fp16 (5.55 GB)

i think that's likely a typo, i don't see how fp16 model can be that big.

regarding prompt and sampler - pick anything as long as the image is detailed enough so we can pick up if there is any loss of details.

everything else you wrote, no issues.

@DirtyHamster
Copy link
Author

The HD model at fp32 is 7.5gb which is much larger and is the only reason I'm questioning the dreamshaper one. A fair number of my unpruned fp16 models are 4.4gb which isn't that far off size wise from 5.5gb... I'll look at this tonight when I move the model into their folders and start trying to work out a seed and prompt that does a fair close up face and shoulder for both models. I've only downloaded it so far. I'm hoping to use to same seed on both models so it will take me a few tries to find one that works. I'll get back to you on this before I start testing.

multiplier on the re-basin will be called alpha as that's what they use in the paper.

When I narrow down a seed to work with using the settings: DPM++ 2M Karras @ 32 steps and a CFG scale of 12.5 I want to just do a simple not on my computer check that you or anyone else available can get the same or similar image first. I'll do this for each model used in the test as a default base image. This should be the only external "compu time" help I need before going into the testing phase of the actual merging. If that doesn't work then we have to look at our settings to see if there is some interference going on before moving forward. This is just for repeatability of testing.

I'll go over some of this tonight and pick up on Tuesday as I have a busy day tomorrow. I'll append my original post with the method test and then provide the test data following in the comments similar to as this is posted. Maybe one or two more posts before testing I'll look into the dreamshaper issue and let you know what I find before hand. I think most of the other stuff we agree on which is good.

@vladmandic
Copy link
Owner

sounds good on all items.
re: dreamshaper - as long as you use one fp32 model, i don't care which one it is, so don't spent too much time looking into dreamshaper.

@DirtyHamster
Copy link
Author

DirtyHamster commented May 29, 2023

Understood, I'll make sure I post links to both models used also. I'll try your suggestion first since you might have more experience using that one and can notice changes. I'm going to try to use 2 fp32 models and then bring them both down to fp16 just to be clear on that.

@DirtyHamster
Copy link
Author

DirtyHamster commented May 29, 2023

Using the [model toolkit] (https://github.com/arenasys/stable-diffusion-webui-model-toolkit)

Note this model has a correctable error: CLIP has incorrect positions, missing: 41. To clear the error run through model convert with the appropriate values. The error report is given first and then the fixed report after that.

Under basic for the "Model 1 used is: https://github.com/Delcos/Hentai-Diffusion fp32 7.5gb" as listed above I get the following report as default with the error:

Report (1676/1444/0102)
Model is 7.17 GB. Multiple model types identified: SD-v1, EMA-v1. Model type SD-v1 will be used. Model components are: UNET-v1-SD, VAE-v1-SD, CLIP-v1-SD.

Contains 3.20 GB of junk data! Wastes 1.99 GB on precision. CLIP has incorrect positions, missing: 41.

Model will be pruned to 3.97 GB. (note I'm not pruning this or making alterations right now)

Under advanced the report is as follows:

Report (1676/1444/0102)
Statistics
Total keys: 1831 (7.17 GB), Useless keys: 686 (3.20 GB).

Architecture
SD-v1
UNET-v1
UNET-v1-SD
VAE-v1
VAE-v1-SD
CLIP-v1
CLIP-v1-SD
Additional
EMA-v1
EMA-UNET-v1
UNET-v1-EMA
Rejected
UNET-v1-Inpainting: Missing required keys (1 of 686)
model.diffusion_model.input_blocks.0.0.weight (320, 9, 3, 3)
UNET-v1-Pix2Pix: Missing required keys (1 of 686)
model.diffusion_model.input_blocks.0.0.weight (320, 8, 3, 3)
UNET-v1-Pix2Pix-EMA: Missing required keys (1 of 686)
model_ema.diffusion_modelinput_blocks00weight (320, 8, 3, 3)
UNET-v2-SD: Missing required keys (64 of 686)
model.diffusion_model.output_blocks.4.1.proj_out.weight (1280, 1280)
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight (640, 1024)
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight (1280, 1024)
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight (640, 1024)
model.diffusion_model.output_blocks.7.1.proj_out.weight (640, 640)

UNET-v2-Inpainting: Missing required keys (65 of 686)
model.diffusion_model.output_blocks.4.1.proj_out.weight (1280, 1280)
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight (640, 1024)
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight (1280, 1024)
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight (640, 1024)
model.diffusion_model.output_blocks.7.1.proj_out.weight (640, 640)

UNET-v2-Depth: Missing required keys (65 of 686)
model.diffusion_model.output_blocks.4.1.proj_out.weight (1280, 1280)
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight (640, 1024)
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight (1280, 1024)
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight (640, 1024)
model.diffusion_model.output_blocks.7.1.proj_out.weight (640, 640)

SD-v1-Pix2Pix: Missing required classes
UNET-v1-Pix2Pix
SD-v1-ControlNet: Missing required classes
ControlNet-v1
SD-v2: Missing required classes
CLIP-v2
UNET-v2
SD-v2-Depth: Missing required classes
CLIP-v2
UNET-v2-Depth
Depth-v2

With the error fixed the report comes out as the following:

Basic Report:

Report (1676/1444/0102)
Model is 7.17 GB. Multiple model types identified: SD-v1, EMA-v1. Model type SD-v1 will be used. Model components are: UNET-v1-SD, VAE-v1-SD, CLIP-v1-SD.

Contains 3.20 GB of junk data! Wastes 1.99 GB on precision. (no changes other than clip fix have been made)

Model will be pruned to 1.99 GB.

Statistics
Total keys: 1831 (7.17 GB), Useless keys: 686 (3.20 GB).

Architecture
SD-v1
UNET-v1
UNET-v1-SD
VAE-v1
VAE-v1-SD
CLIP-v1
CLIP-v1-SD
Additional
EMA-v1
EMA-UNET-v1
UNET-v1-EMA
Rejected
UNET-v1-Inpainting: Missing required keys (1 of 686)
model.diffusion_model.input_blocks.0.0.weight (320, 9, 3, 3)
UNET-v1-Pix2Pix: Missing required keys (1 of 686)
model.diffusion_model.input_blocks.0.0.weight (320, 8, 3, 3)
UNET-v1-Pix2Pix-EMA: Missing required keys (1 of 686)
model_ema.diffusion_modelinput_blocks00weight (320, 8, 3, 3)
UNET-v2-SD: Missing required keys (64 of 686)
model.diffusion_model.middle_block.1.proj_in.weight (1280, 1280)
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight (320, 1024)
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight (320, 1024)
model.diffusion_model.output_blocks.7.1.proj_in.weight (640, 640)
model.diffusion_model.output_blocks.5.1.proj_out.weight (1280, 1280)

UNET-v2-Inpainting: Missing required keys (65 of 686)
model.diffusion_model.middle_block.1.proj_in.weight (1280, 1280)
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight (320, 1024)
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight (320, 1024)
model.diffusion_model.output_blocks.7.1.proj_in.weight (640, 640)
model.diffusion_model.output_blocks.5.1.proj_out.weight (1280, 1280)

UNET-v2-Depth: Missing required keys (65 of 686)
model.diffusion_model.middle_block.1.proj_in.weight (1280, 1280)
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight (320, 1024)
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight (320, 1024)
model.diffusion_model.output_blocks.7.1.proj_in.weight (640, 640)
model.diffusion_model.output_blocks.5.1.proj_out.weight (1280, 1280)

SD-v1-Pix2Pix: Missing required classes
UNET-v1-Pix2Pix
SD-v1-ControlNet: Missing required classes
ControlNet-v1
SD-v2: Missing required classes
UNET-v2
CLIP-v2
SD-v2-Depth: Missing required classes
CLIP-v2
UNET-v2-Depth
Depth-v2

Next post will look at the suggested model 2 (noting I didn't see anything here that explicitly tells me if it's fp16 or 32 which I was hoping would be identified. If someone has some extra time I'd really like to know more about these missing keys and classes? (this can be dealt with later though.)

@DirtyHamster
Copy link
Author

model 2:

Basic report:

Model is 5.55 GB. Multiple model types identified: SD-v1, EMA-v1. Model type SD-v1 will be used. Model components are: UNET-v1-SD, VAE-v1-SD, CLIP-v1-SD.

Contains 1.60 GB of junk data! Wastes 1.97 GB on precision.

Uses the SD-v2 VAE.

Model will be pruned to 1.99 GB. (not altering the model at this point)

Advanced report:

Statistics
Total keys: 1819 (5.55 GB), Useless keys: 686 (1.60 GB).

Architecture
SD-v1
UNET-v1
UNET-v1-SD
VAE-v1
VAE-v1-SD
CLIP-v1
CLIP-v1-SD
Additional
EMA-v1
EMA-UNET-v1
UNET-v1-EMA
Rejected
UNET-v1-Inpainting: Missing required keys (1 of 686)
model.diffusion_model.input_blocks.0.0.weight (320, 9, 3, 3)
UNET-v1-Pix2Pix: Missing required keys (1 of 686)
model.diffusion_model.input_blocks.0.0.weight (320, 8, 3, 3)
UNET-v1-Pix2Pix-EMA: Missing required keys (1 of 686)
model_ema.diffusion_modelinput_blocks00weight (320, 8, 3, 3)
UNET-v2-SD: Missing required keys (64 of 686)
model.diffusion_model.middle_block.1.proj_in.weight (1280, 1280)
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight (320, 1024)
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight (320, 1024)
model.diffusion_model.output_blocks.7.1.proj_in.weight (640, 640)
model.diffusion_model.output_blocks.5.1.proj_out.weight (1280, 1280)

UNET-v2-Inpainting: Missing required keys (65 of 686)
model.diffusion_model.middle_block.1.proj_in.weight (1280, 1280)
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight (320, 1024)
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight (320, 1024)
model.diffusion_model.output_blocks.7.1.proj_in.weight (640, 640)
model.diffusion_model.output_blocks.5.1.proj_out.weight (1280, 1280)

UNET-v2-Depth: Missing required keys (65 of 686)
model.diffusion_model.middle_block.1.proj_in.weight (1280, 1280)
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight (320, 1024)
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight (320, 1024)
model.diffusion_model.output_blocks.7.1.proj_in.weight (640, 640)
model.diffusion_model.output_blocks.5.1.proj_out.weight (1280, 1280)

SD-v1-Pix2Pix: Missing required classes
UNET-v1-Pix2Pix
SD-v1-ControlNet: Missing required classes
ControlNet-v1
SD-v2: Missing required classes
UNET-v2
CLIP-v2
SD-v2-Depth: Missing required classes
CLIP-v2
UNET-v2-Depth
Depth-v2

@DirtyHamster
Copy link
Author

On model one I noticed in the report it states "CLIP has incorrect positions, missing: 41." I'm going to run this through model converter to see if clip fix will repair that. If that works and clears the error I'm fine with that. I'll just edit the above post on it with the repair instructions if not I'll pick a different model to use.

@DirtyHamster
Copy link
Author

DirtyHamster commented May 29, 2023

ok passing it through model converter seemed to work for clearing that error. I now have 2 saved copies for that model to work from. I'm editing my original model report appending to the end rather than removing the erroneously one.

Converting model...
100%|██████████████████████████████████████████████████████████████████████████| 1831/1831 [00:00<00:00, 915129.96it/s]
fixed broken clip
[41]
Saving to \AI_Models\Stable Diffusion\HD-22-fp32-fixclip.ckpt...
Saving to \AI_Models\Stable Diffusion\HD-22-fp32-fixclip.safetensors...

@DirtyHamster
Copy link
Author

DirtyHamster commented May 29, 2023

I'm going to convert the second model over to ckpt format and prep the data space (then try to clean up my mess) call it a night I'll pick up on Tuesday. (i.e. I have to mess around with the seeds before moving forward just generating lots of junk data..) I'll post the best of the seeds to pick from if I get to a good one just stop me so I can test on that I don't think I'll get through it tonight though. I think everything else is squared off for the test though.

If you spot anything I missed just point it out.

(combined from a direct post under it)

I ran a couple quick tests on prompts just to try to gear that in a little.
Testing on this prompt on both models "For my own notes so I know where to pick up from":

sampler settings: DPM++ 2M Karras @ 32 steps and a CFG scale of 12.5
starting with seed 1280
resolution: 900x900:
Clip 1:

Prompt: A head shot portrait photo of a woman
Neg prompt: empty: "this will be empty unless I hit images that are nfws"

Seemed to be giving fairly consistent head and shoulders outputs

Last edit (Just cleaned this up a little adjusted prompt and resolution. Clip setting. Removed older images.)

@vladmandic
Copy link
Owner

rendering at natively high resolution is always problematic, especially with extreme aspect ratios. i find 720x640 or 640x720 the highest reliable value, anything higher should go through some kind of second pass (hires fix, ultimate upscale, multidiffusion upscaler, etc.).

@brunogcar
Copy link

not really sure if its actually relevant to what you are trying to achieve, but have you seen https://github.com/Xerxemi/sdweb-auto-MBW

@DirtyHamster
Copy link
Author

@vladmandic I've played around a lot with different sizes the 1440x810 is about as large as I can go without too much getting too funky. Though I've gotten some fairly interesting conjoined body deformities too at the size it's normally ok for closeups though. I normally get a good laugh out of them. I'll probably cut the size in half for the test I just want to make sure there's enough detail without doing any post processing on them.

@brunogcar I had looked that one over initially too, in the auto mbw method you have to set the weights manually where as in the git-re-basin it's allowing and leveraging the computer to pick the closest similar values for the weights itself based on the data inside the model. While either method would still be an advancement over what the default method but it still wouldn't be attempting to do what git-re-base tires to achieve when merging.

So right now all were trying to achieve is generating a number of image sets outputted at known intervals so we compare the results of the default method vs the git-re-basin method to see if it preforms at the levels that they state.

@Aptronymist
Copy link
Collaborator

What about this?
https://github.com/s1dlx/sd-webui-bayesian-merger

@DirtyHamster
Copy link
Author

@Aptronymist that's closer to what's going to be tested but it uses a different method. I wouldn't mind trying that one later after the first test as it is similarly automated. I read a fair handful of methods they are all fairly interesting. Most of them haven't really published a look book of images to compare the methods across though. That's part of my interest in it just to see what's do able in a repeatable manner.

Most of the other ones that I read through can be found listed here:
https://www.sdcompendium.com/doku.php?id=engineering_merge_7.3_0106

The standard block merge method has some good tests done and some fair explanations available but it's still kind of an unknown as to what concepts are exactly where in the blocks: https://rentry.co/BlockMergeExplained which is one of the reasons why I'm looking first at one of the automated alternatives. If you scroll down to look at their example images the size is so small that it makes it a little difficult to really examine what's going on. So I want to make sure the images are large enough be be useful as well.

I had done similar test on 2 dissimilar models (a 3d and a 2d style) previously doing a batch of consecutive mergings run at multiplier value .3 but I didn't document it well enough to publish it. It just gave me a idea that a standardized test would be useful for future evaluations. Some of what I noticed was odd distortions in the models as they were merged around where they started to look 2.5dish. So some of what I want to look for is if the concept merging happens faster and more precise than the standard merge, will those distortions happen again. Similarly if that will happen on the other merge method...

@DirtyHamster
Copy link
Author

Tried passing a few models through tonight via the git-re-basin windows version and hit errors before each output. I'm going to try the two models I saw them use in their video ( https://www.youtube.com/watch?v=ePZzmMFb8Po ) as it could be an issue of the models in general or my own settings and I have to eliminate some of those possibilities. lol... I will probably also try one of the command line versions too as I saw more people commenting on getting that working. Have to go reread through the few repositories but will get to that soon.

Note on the command line version it looks like it runs through as many iterations as it takes to finish rather than being selectable as in the windows version. https://github.com/ogkalu2/Merge-Stable-Diffusion-models-without-distortion

These were the errors I got:

First attempt:

        ---------------------
            model_a:    HD-22-fp32-fixclip.ckpt
            model_b:    dreamshaper_6bakedvae_chk.ckpt
            output:     HD-22-fp32-fixclip_0.1_dreamshaper_6bakedvae_chk_0.9_1it_fp16.ckpt
            alpha:      0.1
            usefp16:    True  
            iterations: 1
        ---------------------

Using half precision

            ---------------------
                ITERATION 1
            ---------------------

new alpha = 0.1

FINDING PERMUTATIONS
P_bg337: -0.5
P_bg358: 0.25
<class 'KeyError'> 'model_ema.diffusion_modelinput_blocks00bias'

Program self-halted after error...

Second attempt: noticed I forgot to click off fp16 trying again. possibly since they were both fp32 it might have run into an issue there... Made sure I was running them as fp32.

        ---------------------
            model_a:    HD-22-fp32-fixclip.ckpt
            model_b:    dreamshaper_6bakedvae_chk.ckpt
            output:     HD-22-fp32-fixclip_0.1_dreamshaper_6bakedvae_chk_0.9_1it.ckpt
            alpha:      0.1
            usefp16:    False  
            iterations: 1
        ---------------------

Using full precision

            ---------------------
                ITERATION 1
            ---------------------

new alpha = 0.1

FINDING PERMUTATIONS
<class 'RuntimeError'> expected scalar type Float but found Half

Program self-halted after error...

3rd attempt try 2 known fp16s correctly pointing at them being fp16s...

        ---------------------
            model_a:    HD-22-fp16-fixclip.ckpt
            model_b:    dreamshaper_6bakedvae_chk_fp16.ckpt
            output:     HD-22-fp16-fixclip_0.1_dreamshaper_6bakedvae_chk_fp16_0.9_1it_fp16.ckpt
            alpha:      0.1
            usefp16:    True  
            iterations: 1
        ---------------------

Using half precision

            ---------------------
                ITERATION 1
            ---------------------

new alpha = 0.1

FINDING PERMUTATIONS
P_bg337: -1.0
P_bg358: 0.25
<class 'KeyError'> 'model_ema.diffusion_modelinput_blocks00bias'

Program self-halted after error...

@vladmandic
Copy link
Owner

<class 'KeyError'> 'model_ema.diffusion_modelinput_blocks00bias'

i've seen this mentioned at dreambooth repo and author stated that in most cases its due to oom?

@DirtyHamster
Copy link
Author

DirtyHamster commented Jun 1, 2023

My original searches really weren't pulling much just using:

"<class 'KeyError'> 'model_ema.diffusion_modelinput_blocks00bias'" site:github.com
Or
"'model_ema.diffusion_modelinput_blocks00bias'" site:github.com

with or without the quotes....

I saw some mention of oom with a few of them but others didn't mention it. I think I might have spotted that one from dreambooth too. They mention a KeyError: 'model_ema.diffusion_modelinput_blocks00bias' Though I'm not sure if the oom mentioned later in the thread has anything to do with first issue beyond being similar I got a error like that too type of chatter. Still reading though.

This could also be model related I decided to test our default merger and also tosses an error on trying to merge the two models:

22:33:53-750675 INFO Version: eaea88a Mon May 1 21:03:08 2023 -0400
22:33:55-306870 INFO Latest published version: 8f4bc4d 2023-05-31T16:44:35Z

Available models: H:\AI_Progs\AI_Models\Stable Diffusion 81
Loading H:\AI_Progs\AI_Models\Stable Diffusion\dreamshaper_6bakedvae_chk.ckpt...
Loading weights: H:\AI_Progs\AI_Models\Stable Diffusion\dreamshaper_6bakedvae_chk.ckpt ━━━━━━ 0.0/6… -:--:--
GB
Loading H:\AI_Progs\AI_Models\Stable Diffusion\HD-22-fp32-fixclip.ckpt...
Loading weights: H:\AI_Progs\AI_Models\Stable Diffusion\HD-22-fp32-fixclip.ckpt ━━━━━━━━━━ 0.0/7.7 -:--:--
GB
Merging...
100%|█████████████████████████████████████████████████████████████████████████████| 1831/1831 [00:14<00:00, 126.32it/s]
Saving to \AI_Models\Stable Diffusion.5-ours-hd22-dreamshaper-fp32.ckpt...
API error: POST: http://127.0.0.1:7860/internal/progress {'error': 'LocalProtocolError', 'detail': '', 'body': '', 'errors': "Can't send data when our state is ERROR"}
HTTP API: LocalProtocolError
╭───────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮
│ H:\AI_Progs\VladDiffusion\automatic\venv\lib\site-packages\starlette\middleware\errors.py:162 in │
call
│ │
│ H:\AI_Progs\VladDiffusion\automatic\venv\lib\site-packages\starlette\middleware\base.py:109 in call
│ │
│ ... 7 frames hidden ... │
│ │
│ H:\AI_Progs\VladDiffusion\automatic\venv\lib\site-packages\h11_connection.py:512 in send │
│ │
│ 511 │ │ """ │
│ ❱ 512 │ │ data_list = self.send_with_data_passthrough(event) │
│ 513 │ │ if data_list is None: │
│ │
│ H:\AI_Progs\VladDiffusion\automatic\venv\lib\site-packages\h11_connection.py:527 in │
│ send_with_data_passthrough │
│ │
│ 526 │ │ if self.our_state is ERROR: │
│ ❱ 527 │ │ │ raise LocalProtocolError("Can't send data when our state is ERROR") │
│ 528 │ │ try: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
LocalProtocolError: Can't send data when our state is ERROR
Checkpoint not found: None
Available models: H:\AI_Progs\AI_Models\Stable Diffusion 81
Checkpoint saved to H:\AI_Progs\AI_Models\Stable Diffusion.5-ours-hd22-dreamshaper-fp32.ckpt.

It did give me an output for a model, however it does not see the model even after restarting the server. :D I did this twice same error. Normally I don't have any issues with merging you can see how many models I have lol...

Redownloading the models just to make sure there weren't download issues involved. Will run again as a double check.

redownloaded the dreamshaper 5 fp32 re-ran the merge on ours I still got the same error at the end but it at least gave me back a loadable model this time.

(snipped for duplicate error post)

Ok so I reran the gui version 5 and got a different error after the redownload. This error I know I saw listed at least. Some one mentioned this could be caused by stable diffusion base model differences stating (SD 1.5 and SD 2.1 you get this error). I don't see that in the model data though:

dreamshaper: Model is 6.88 GB. Multiple model types identified: SD-v1, EMA-v1. Model type SD-v1 will be used. Model components are: UNET-v1-SD, VAE-v1-SD, CLIP-v1-SD.
HD 22: Model is 7.17 GB. Multiple model types identified: SD-v1, EMA-v1. Model type SD-v1 will be used. Model components are: UNET-v1-SD, VAE-v1-SD, CLIP-v1-SD.

        ---------------------
            model_a:    HD-22-fp32-fixclip.ckpt
            model_b:    dreamshaper_5BakedVae_fp32.ckpt
            output:     HD-22-fp32-fixclip_0.3_dreamshaper_5BakedVae_fp32_0.7_4it.ckpt
            alpha:      0.3
            usefp16:    False  
            iterations: 4
        ---------------------

<class 'KeyError'> 'state_dict'

I'll keep trying later. I'm pretty sure I saw that error mentioned else where in the reps.

@vladmandic
Copy link
Owner

vladmandic commented Jun 1, 2023

LocalProtocolError: Can't send data when our state is ERROR

this happens when network socket gets closed by someone (os, router, whatever) since it was idle for too long. silly gradio does not know how to handle keepalive correctly - they're still fumbling around with that.
but since it happens at the very end when server is trying to tell browser its done, you should already have saved model by then.

class 'KeyError'> 'state_dict'

this seems pretty fundamental as state_dict is pretty much the core of the sd model.

@DirtyHamster
Copy link
Author

DirtyHamster commented Jun 2, 2023

Ok good to know on ours for the next large size merge batch I try. I just found it odd that I got an unusable model the first time.

On the:
class 'KeyError'> 'state_dict'
I spotted the same error listed in this discussion for them: diStyApps/Merge-Stable-Diffusion-models-without-distortion-gui#11 There isn't a lot of commentary going on in their discussions section though for the gui.

I noticed in the comments in another related repo that some models don't have a given stat_dict and that it's normally just skipped as a layer and then they go directly to the keys when that happens. So it might just be an issue of the gui implementation of it. Spotted that here: ogkalu2/Merge-Stable-Diffusion-models-without-distortion#31 Not really sure yet as the difference between the model versions I tried weren't from entirely different concept or even makers. I have some other thoughts on this further down:

Lost my source link for this: some models don't work as an "A" input, but will work as a "B" input. causing <class 'KeyError'> 'model_ema.decay'

On a correct run it looks like the output should be similar to this: ogkalu2/Merge-Stable-Diffusion-models-without-distortion#24

OOM type of crash looked more like this: ogkalu2/Merge-Stable-Diffusion-models-without-distortion#14 . I was also looking through some of the initial chatter from when ogkalu2 was working on the PermutationSpecs samuela/git-re-basin#5 "The test i have running has 2 dreambooth models pruned to 2GB. The bigger the size of the models, the higher the RAM usage. I didn't realize 4gb models were too much for 32 GB ram systems currently. The problem is the linear sum assignment. It can only run on the CPU" Apparently someone kept OMMing at 32GB @ half the model size I was trying to push through. Since that's the same amount I'm running on that's entirely a strong possibility with the full models that I was trying to start with even though the errors were different. In the GUI that difference in error could be where the runs were in the process when the OOM occurred potentially. The GUI doesn't have much documentation.

There are some suggestions of removing ema's first so the baked in vae could also be causing issues.
ogkalu2/Merge-Stable-Diffusion-models-without-distortion#19

I dl'd a different full fp32 model last night just incase some the issues might be caused via the HD model. So I'm going to try the A and B swap and different models and a few other odds and ends such as pruning before moving re-trying on the CL version. I do kind of get the feeling while reading through that it's not far enough along yet to be added in without more dev work on it just from the number of errors being asked about. Which could be why there are unanswered responses to getting extensions for it. I am really curious if turns out to just be a pruning issue as that seems fairly rational. If that's the case I can still do a partial test for it until it's better optimized if it can be optimized as is. Still have a fair bit more to read over on it too.

If that doesn't work I'll pause to do the initial look book for merge comparisons using ours as I'd like to get that posted and then check out Aptronymist's suggestion of trying the other automated merges: https://github.com/s1dlx/sd-webui-bayesian-merger . Along with brunogcar's suggestion: https://github.com/Xerxemi/sdweb-auto-MBW though that has an open issue for the fork version for here: Xerxemi/sdweb-auto-MBW#19 The UI differences between what we already have and this is a lot larger. Both of these do handle a fair amount of concepts our default method does not do. I can always re-check back in on the re git basin method as I go.

I'm going to clean up some of my posts to the tread too just to keep it more readable in the next few days and try some of the other stuff I mentioned in regard to pruning and other sizes... I think I'm going to fix the title as as well to something like: Potential improvements for model merge & merge tests - git-re-basin and others. When I get done cleaning it up.

Note - looks like to get to pruned states of 2gb stated as working it has to be fp16, as fp32 pruning would be 4gb.

Got up to:
pruned no ema version on hd22_fp16
extracted a copy of dreamshaper5's baked in vae and pruned version of no ema version fp16
Will attempt the merge later.

With pruned files: Run 1:

        ---------------------
            model_a:    HD-22-fp16-fixclip-prunded-no-ema.ckpt
            model_b:    dreamshaper_fp16_novae_no_ema_pruned.ckpt
            output:     HD-22-fp16-fixclip-prunded-no-ema_0.1_dreamshaper_fp16_novae_no_ema_pruned_0.9_1it_fp16.ckpt
            alpha:      0.1
            usefp16:    True  
            iterations: 10
        ---------------------

Using half precision

            ---------------------
                ITERATION 1
            ---------------------

new alpha = 0.01

FINDING PERMUTATIONS
P_bg324: 0.5
<class 'KeyError'> 'first_stage_model.encoder.norm_out.weight'

With pruned files: run 2 switching A & B

        ---------------------
            model_a:    dreamshaper_fp16_novae_no_ema_pruned.ckpt
            model_b:    HD-22-fp16-fixclip-prunded-no-ema.ckpt
            output:     dreamshaper_fp16_novae_no_ema_pruned_0.1_HD-22-fp16-fixclip-prunded-no-ema_0.9_10it_fp16.ckpt
            alpha:      0.1
            usefp16:    True  
            iterations: 10
        ---------------------

Using half precision

            ---------------------
                ITERATION 1
            ---------------------

new alpha = 0.01

<class 'KeyError'> 'first_stage_model.decoder.conv_in.bias'

This second test didn't even calc before spiting out an error. I ran this again after restarting the gui just to be sure.

Note - With the smaller file size I'm not thinking this an indication of OOM. To check I reran both tests watching vram in task manager, no indication of OOM so I can rule that out atleast. It's interesting that I'm getting so many different errors. I'm going to try swapping out the models next.

@DirtyHamster
Copy link
Author

Counterfeit-V3.0_full_fp32 is the next one I'm going to try with this I'll prune it down to no vae, no ema and try the method against both the above models already tried. I might do one more attempt after that before gathering and filing out error reports and trying the cli version. I've been thinking about this all weekend.

@s1dlx
Copy link

s1dlx commented Jun 5, 2023

possibly you could be interested in https://github.com/s1dlx/meh
where we have a re-basin implementation running
in this branch we optimised for low VRAM use
s1dlx/meh#15

@DirtyHamster
Copy link
Author

@s1dlx thanks I'll take a look at it. The one I've been trying has been throwing nothing but errors at me.

@DirtyHamster
Copy link
Author

DirtyHamster commented Jun 7, 2023

@vladmandic I finished the first batch of merges for full weighted fp32's
Model 1 at 1 will be the unmerged version and model 2 at 1 will be also the unmerged version the rest of the spread of merges go from .1 to .9 as weighted sum interpolation method. Images below.

I was looking over the repo that @s1dlx posted a few messages above and saw that they have 3 fp16 models selected out for their tests. So when I get down to the fp16's I'll include the 2 models that they used for the weighted sum interpolation method in our tests with the same methodology. I really dig the weighted subtraction and the multiply difference methods in their examples: https://github.com/s1dlx/meh/wiki @s1dlx are you going to post examples of your implementation of re-basin in your wiki?

I have to admit I like their prompt so I'm going to use that as a secondary to the one for the woman so the tests can match up better further along.

a headshot photographic portrait of a woman
a cat as a DJ at the turntables
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1897848000, Size: 512x716

Test 1:
Model 1: HD22_clipfixed_fp32
Model 2: DreamShaper_5 BakedVae_fp32
Vae Used: Extracted version of DreamShaper_5's

Images snipped for readability (regenerating images as grids).

@vladmandic
Copy link
Owner

i'd consider this a success.
can you give me some metrics on the actual merge process? memory requirements, duration, etc...

@s1dlx
Copy link

s1dlx commented Jun 17, 2023

@DirtyHamster presets are simply lists of 26 floats you give to the merger for doing block merge. They are implemented in presets.py

regarding logging, meh already logs vram for rebasin

in the dev branch there’s some initial proper logging added. That would make it in release 0.8

also, a lot of the experiments you are doing are being also done in our small discord server. Perhaps you can join and compare results

@DirtyHamster
Copy link
Author

DirtyHamster commented Jun 17, 2023

experiments you are doing are being also done in our small discord server

Sure what's the discord address?

@vladmandic
Copy link
Owner

just an idea of exit critera:

  • does it work? yes.
  • how does it compare to existing basic merge?
    • benefits? (from user perspective, this mostly comes down to visual quality?)
    • functionality? (is it only 0.5+0.5 or can it do other?)
    • performance?
    • requirements? (vram)
  • what are the best defaults?
  • which tunables should be exposed? (that create value for users)

@s1dlx
Copy link

s1dlx commented Jun 17, 2023

experiments you are doing are being also done in our small discord server

Sure what's the discord address?

I’ll add it to the readme but it’s the same we have for Bayesian-merger

https://github.com/s1dlx/sd-webui-bayesian-merger

@DirtyHamster
Copy link
Author

just an idea of exit critera:

  • does it work? yes.
  • how does it compare to existing basic merge?
    • benefits? (from user perspective, this mostly comes down to visual quality?)
    • functionality? (is it only 0.5+0.5 or can it do other?)
    • performance?
    • requirements? (vram)
  • what are the best defaults?
  • which tunables should be exposed? (that create value for users)

This is fair.. I think a lot of these are a yes but lets look at it after the double checks an result fill ins. I'm serious enough to say I would use this for most of my own 50 percent merges regardless if you add it in. Most of the presets can probably be left out though it would be nice to have options to include them.

Sure what's the discord address?
https://discord.gg/X8U6ycVk

I'll take a look. I don't normally use discord.

@DirtyHamster presets are simply lists of 26 floats you give to the merger for doing block merge. They are implemented in presets.py
regarding logging, meh already logs vram for rebasin
in the dev branch there’s some initial proper logging added. That would make it in release 0.8
also, a lot of the experiments you are doing are being also done in our small discord server. Perhaps you can join and compare results

I'll take a closer look, but I want more specs than just vram. i,e ram cpu temps...

@vladmandic
Copy link
Owner

vladmandic commented Jun 17, 2023

Having a pull down menu with presets is not a problem, no matter how many there are.
Having 10+ different checkboxes or number ranges is not.

Regarding logger, if you want low level, GPU-Z has a built-in logger and nothing beats it's sensors. Even if you don't like built-in logger, I'd suggest to search for something that uses their sensor data.

@DirtyHamster
Copy link
Author

@vladmandic Will check it out I'm doing this between a lot of yard work so I might crap out a few days coming I have around 2 tons of small stone to move that just got dropped off yesterday and a deck to fix.

All the fields available via the, current help listing is this for meh:

Usage: merge_models.py [OPTIONS]

Options:
-a, --model_a TEXT
-b, --model_b TEXT
-c, --model_c TEXT
-m, --merging_method [add_difference|distribution_crossover|euclidean_add_difference|filter_top_k|kth_abs_value|multiply_difference|ratio_to_region|similarity_add_difference|sum_twice|tensor_sum|ties_add_difference|top_k_tensor_sum|triple_sum|weighted_subtraction|weighted_sum]
-wc, --weights_clip
-p, --precision INTEGER
-o, --output_path TEXT
-f, --output_format [safetensors|ckpt]
-wa, --weights_alpha TEXT
-ba, --base_alpha FLOAT
-wb, --weights_beta TEXT
-bb, --base_beta FLOAT
-rb, --re_basin
-rbi, --re_basin_iterations INTEGER
-d, --device [cpu|cuda]
-wd, --work_device [cpu|cuda]
-pr, --prune
-bwpa, --block_weights_preset_alpha [GRAD_V|GRAD_A|FLAT_25|FLAT_75|WRAP08|WRAP12|WRAP14|WRAP16|MID12_50|OUT07|OUT12|OUT12_5|RING08_SOFT|RING08_5|RING10_5|RING10_3|SMOOTHSTEP|REVERSE_SMOOTHSTEP|2SMOOTHSTEP|2R_SMOOTHSTEP|3SMOOTHSTEP|3R_SMOOTHSTEP|4SMOOTHSTEP|4R_SMOOTHSTEP|HALF_SMOOTHSTEP|HALF_R_SMOOTHSTEP|ONE_THIRD_SMOOTHSTEP|ONE_THIRD_R_SMOOTHSTEP|ONE_FOURTH_SMOOTHSTEP|ONE_FOURTH_R_SMOOTHSTEP|COSINE|REVERSE_COSINE|TRUE_CUBIC_HERMITE|TRUE_REVERSE_CUBIC_HERMITE|FAKE_CUBIC_HERMITE|FAKE_REVERSE_CUBIC_HERMITE|ALL_A|ALL_B]
-bwpb, --block_weights_preset_beta [GRAD_V|GRAD_A|FLAT_25|FLAT_75|WRAP08|WRAP12|WRAP14|WRAP16|MID12_50|OUT07|OUT12|OUT12_5|RING08_SOFT|RING08_5|RING10_5|RING10_3|SMOOTHSTEP|REVERSE_SMOOTHSTEP|2SMOOTHSTEP|2R_SMOOTHSTEP|3SMOOTHSTEP|3R_SMOOTHSTEP|4SMOOTHSTEP|4R_SMOOTHSTEP|HALF_SMOOTHSTEP|HALF_R_SMOOTHSTEP|ONE_THIRD_SMOOTHSTEP|ONE_THIRD_R_SMOOTHSTEP|ONE_FOURTH_SMOOTHSTEP|ONE_FOURTH_R_SMOOTHSTEP|COSINE|REVERSE_COSINE|TRUE_CUBIC_HERMITE|TRUE_REVERSE_CUBIC_HERMITE|FAKE_CUBIC_HERMITE|FAKE_REVERSE_CUBIC_HERMITE|ALL_A|ALL_B]
-j, --threads INTEGER
--help Show this message and exit.

To give you an idea of my inputs for testing:

Basic re-basin only seems to require the following:
merge_models.py -a H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\HD-22-fixclip-noema-fp32.safetensors -b H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\dreamshaper5_Bakedvae_fp16-noema.safetensors -m weighted_sum -p 16 -o H:\Users\adamf\AI_Progs\AI_Models\test5\0-merge000001_10.safetensors -f safetensors -ba 0.5 -bb 0.5 -rb -rbi 10

What s1dlx would give us a lot more options regardless if re-basin is better or not though I think it is better for 50:50's.

On a side note is it possible to use the default file explorer to select the model rather than a drop down in the ui when select ing the main model's, vaes, or other components? I have so many so being able to sort them is an issue, I don't know if anyone else is this crazy with collecting them.

I'll take a look at GPU-Z later tonight too. Have to get back to work for a bit.

@vladmandic
Copy link
Owner

On a side note is it possible to use the default file explorer to select the model rather than a drop down in the ui when select ing the main model's, vaes, or other components?

whatever you select via explorer needs to be validated and matched to some known entry so that exact entry can be used moving forward.
for example, server knows about models it enumerated - trying to select something it doesn't already know of is a mess.
definitely doable, but non trivial.

regarding params, i just noticed one thing - i need to take a look at the code if git-re-basin allows passing of actual device instead of string (cpu or cuda) as sdnext itself supports quite a few other backends and if i integrate this, i cannot say, oh this is cuda-only feature.

@DirtyHamster
Copy link
Author

I haven't tried the swap arg on device since version 4 at that point I couldn't get the option arg to work. I think by default it's set to cpu.

I found that the drop-downs are a little messy to find stuff in during testing. It's 10x of any model involved. So it just has me thinking can we do sort orders and stuff. I think I'm beyond normal use case scenarios but it's still something to think about UI wise.

@s1dlx
Copy link

s1dlx commented Jun 18, 2023

meh cli tool has that many arguments but the library is much simpler. Basically all the presets stuff is not included as those simply override “wa” and “ba” (and the beta ones).

on the cuda/cpu side…I imagine you can change cuda with “gpu” and get the same result on amd cards

an example of how to use the library is given by the sd-webui-bayesian-merger extension

the idea of the library is to be a generic merging one, not just a tool for rebasin

@DirtyHamster
Copy link
Author

@s1dlx

meh cli tool has that many arguments but the library is much simpler. Basically all the presets stuff is not included as those simply override “wa” and “ba” (and the beta ones).

I meant that as in what fields might be needed in the UI wise not that it was over complex or anything. I think what you have going on in your repository is much better then the basic one that we are using currently.

@vladmandic I'll probably get back to testing tomorrow. Other than the existing last few add-ins for compare the only other potion I can test between the current method and meh is add difference. Should I run tests on that too?

@vladmandic
Copy link
Owner

Maybe just a quick test to see if anything deeper is really needed?

@DirtyHamster
Copy link
Author

@vladmandic

Maybe just a quick test to see if anything deeper is really needed?

I'll do a light testing on it while finish up the other stuff. I want to do the time trials too. I haven't found any issues yet that would concern anyone so it's probably safe to start looking at the code to incorporate it. You kind of make out like a bandit with all the extras that are packed into the repository.

I'll run this off too:

btw, totally off-topic, since you've already mentioned lyriel model before (and that's one of my favorites, i'd be curious (and this is just personal) how does it merge with another something like https://civitai.com/models/18798/meinaunreal

I'll do it as 0.1 to 0.9 same as in the tests. Haven't forgotten about it.

@DirtyHamster
Copy link
Author

DirtyHamster commented Jun 21, 2023

Finished outputting the fp32 merges that I have to re-output, about to start working on the fp16's.. I really shouldn't have deleted them lol... After thoughts are great right...

Just figured do a small status update and noticed the time clock from the last merge as it's still running from last night. Thought you might find it amusing as it's the only thing I've managed to break...

image

In about an hour or 2 and I'll have the rest of the results out. The extended testing is still to come though.

@DirtyHamster
Copy link
Author

DirtyHamster commented Jun 22, 2023

@vladmandic basic testing is done will get on with the extras:

Results so far for discussion if necessary:
Model 1: used is: https://huggingface.co/Deltaadams/HD-22 fp32
Model 2: used is: dreamshaper_5BakedVae.safetensors via: https://huggingface.co/Lykon/DreamShaper
Both models pruned from full trainable ema models to fp32 no ema and fp16 no ema prior to testing.

Testing method sampler and size settings:
Settings: DPM++ 2M Karras @ 20 steps and a CFG scale of 7, Seed: 1897848000, Size: 512x716, CLIP: 4
Prompts Used: a headshot photographic portrait of a woman, a cat as a DJ at the turntables

Testing regiment: (Multiplier to be run from 0.1 to 0.9)

base-fp16+custom#1-fp16, base-fp16+custom#2-fp16
xyz_grid-0000-1897848000-a cat as a DJ at the turntables H22xdreamshaper-model_fp16_vae_fp16
xyz_grid-0032-1897848000-a headshot photographic portrait of a woman

base-fp32+custom#1-fp32, base-fp32+custom#2-fp32
xyz_grid-0001-1897848000-a cat as a DJ at the turntables H22xdreamshaper-model_fp16_vae_fp32
xyz_grid-0031-1897848000-a headshot photographic portrait of a woman

base-fp32+custom#1-fp16, base-fp32+custom#2-fp16
xyz_grid-0027-1897848000-a cat as a DJ at the turntables
xyz_grid-0026-1897848000-a headshot photographic portrait of a woman

Re-git-basin side will be similarly mirrored: (Weight value set at .5:.5, iteration value to be run from 1 to 10)

Test1: base-fp16+custom#1-fp16, base-fp16+custom#2-fp16 @ weight: .5:.5, iteration {number set...}

xyz_grid-0016-1897848000-a cat as a DJ at the turntables
xyz_grid-0018-1897848000-a headshot photographic portrait of a woman

Test2: base-fp32+custom#1-fp32, base-fp32+custom#2-fp32 @ weight: .5:.5, iteration {number set...}

xyz_grid-0019-1897848000-a cat as a DJ at the turntables
xyz_grid-0020-1897848000-a headshot photographic portrait of a woman

Test3: base-fp32+custom#1-fp16, base-fp32+custom#2-fp16 @ weight: .5:.5, iteration {number set...}

xyz_grid-0024-1897848000-a cat as a DJ at the turntables
xyz_grid-0025-1897848000-a headshot photographic portrait of a woman

@DirtyHamster
Copy link
Author

DirtyHamster commented Jun 22, 2023

@vladmandic The lyriel x meinaunreal base merge set is finished do you have a favored prompt that you would like for me to use with them. I can do them across multiple vae, if you'd like too. I still have to finish up the re-basin for them. Just figured I'd ask first before posting outputs.

Looking at https://openhardwaremonitor.org/ and https://www.techpowerup.com/download/techpowerup-gpu-z/ for the hardware testing portion at the moment. Other suggestions are welcome.

@vladmandic
Copy link
Owner

@DirtyHamster naah, use whatever you want.

@DirtyHamster
Copy link
Author

@vladmandic
Just figured I'd ask first. I have one in mind.

@DirtyHamster
Copy link
Author

DirtyHamster commented Jun 23, 2023

@vladmandic Outputs from the merge plus I found a funny bug that I can't seem to replicate,,

xyz_grid-0048-1897848000-ultra-high detail (HDR_1) 8k (realistic_1 5) (masterpiece_1 5) (photorealistic_1 5) (photorealism_1 5), daringly cinematic,
xyz_grid-0047-1897848000-ultra-high detail (HDR_1) 8k (realistic_1 5) (masterpiece_1 5) (photorealistic_1 5) (photorealism_1 5), daringly cinematic,
xyz_grid-0046-1897848000-ultra-high detail (HDR_1) 8k (realistic_1 5) (masterpiece_1 5) (photorealistic_1 5) (photorealism_1 5), daringly cinematic,
xyz_grid-0045-1897848000-ultra-high detail (HDR_1) 8k (realistic_1 5) (masterpiece_1 5) (photorealistic_1 5) (photorealism_1 5), daringly cinematic,
xyz_grid-0044-1897848000-ultra-high detail (HDR_1) 8k (realistic_1 5) (masterpiece_1 5) (photorealistic_1 5) (photorealism_1 5), daringly cinematic,
xyz_grid-0043-1897848000-ultra-high detail (HDR_1) 8k (realistic_1 5) (masterpiece_1 5) (photorealistic_1 5) (photorealism_1 5), daringly cinematic,
xyz_grid-0042-1897848000-ultra-high detail (HDR_1) 8k (realistic_1 5) (masterpiece_1 5) (photorealistic_1 5) (photorealism_1 5), daringly cinematic,
xyz_grid-0041-1897848000-ultra-high detail (HDR_1) 8k (realistic_1 5) (masterpiece_1 5) (photorealistic_1 5) (photorealism_1 5), daringly cinematic,

Prompts and settings on this:

Steps: 32, Sampler: DPM++ 2M, CFG scale: 12.5, Seed: 1897848000, Size: 716x512, Model: {x}, VAE: {y}, Clip skip: 3

prompt:
"ultra-high detail (HDR:1)" 8k (realistic:1.5) (masterpiece:1.5) (photorealistic:1.5) (photorealism:1.5), "daringly cinematic", "viscerally exaggerated", high quality professional photograph with "good depth of field" of a (realistic "woodland rocky brook with a ("small waterfall")") and fog in background at ((sunset) "colorful sky"),

negative prompt:
watermark, signature, "lowres", "bad quality", "low quality", "lowest quality", "worst quality", blurry, pixelated,
drawling, sketch, sketched, painted,

For this little but of the side testing I thought to use water because it's just such an easily acknowledged abstraction in it's many forms. Also so from some of our other little chatter I just figured I'd run them all the vae's I've come across extracted or otherwise to see what the change is. Hope you enjoy...

The strange bug when the server goes to sleep in browser it sometimes seem to move the clip skip position regardless of where you have it set on the UI. I have not found any logical behavior for this but the outputs on these are at clip skip 3 while I have it set to 1, This I think this was responsible for an earlier error which I quickly just blamed on myself... It's correct on the listing below the output though.

@vladmandic
Copy link
Owner

loving how it clearly shows the progression!

btw, totally off-topic, you're usage of quotes in the prompt doesn't really do what you think it does - this is the parsed prompt:

[['ultra-high detail HDR 8k', 1.0], ['realistic masterpiece photorealistic photorealism', 1.5], ['daringly cinematic", "viscerally exaggerated", high quality professional photograph with "good depth of field" of a', 1.0], ['realistic "woodland rocky brook with a', 1.1], ['small waterfall"', 1.21], ['and fog in background at', 1.0], ['sunset', 1.21], ['colorful sky"', 1.1]]

re: clip skip - knowing how its monkey-patched in the backend to have backward compatibility, its quite possible.

@DirtyHamster
Copy link
Author

loving how it clearly shows the progression!

I've found sometimes the merges at between the sub-tenths and hundredths ranges have interesting stuff going on too.

btw, totally off-topic, you're usage of quotes in the prompt doesn't really do what you think it does - this is the parsed prompt:

I mostly use quotes for concept groupings so some of it is just for me keeping track of them as modular concepts when cutting and pasting from my notes, as well as moving them around in the prompt. So I generally just leave them in as it doesn't generate errors from using them as I am. Sometimes I do mess with them to see if I can find any difference when using it in complex phrasing. However the only guidance I've really seen for quotation syntax usage has been from a few lines talking about it in Prompt S/R using it similarly for grouping in a list form. i.e. no spaces between quotes and separating commas.

@vladmandic
Copy link
Owner

set SD_PROMPT_DEBUG=1 and you can see the parsed result.

@DirtyHamster
Copy link
Author

Pardon the delay, I'm just getting back up to speed. I had a bad electrical storm that took out every switch. router, modem, and cable box in the house last weekend and took a while to get all of that replaced and fixed for the most part still have some odds and ends to do.

Which file am I setting the arg: set SD_PROMPT_DEBUG=1 into?

@vladmandic
Copy link
Owner

Just do it in the shell before starting webui.

@DirtyHamster
Copy link
Author

@vladmandic Ok I did one large merge test today for 50 iterations it ran for 16m30s to do the hardware test. I logged the run using open hardware monitor at 30s intervals. The CVS file is the hardware log from that which should be adequate enough data for any lesser run.

I used the full unpruned version of allowing it to be pruned via meh so it could have a little extra work:
dreamshaper_5BakedVae_fp32.safetensors 7.2gb
HD-22-fp32-fixclip.safetensors 7.5gb
Result merge file: 8.8gb (it returns everything that it prunes out to the model at the end as stated before.)

OpenHardwareMonitorLog-2023-07-09.csv

I'm pruning output model post prior to image grid generations to be compared with base models fp32 merged at 0.5, and mehs at iterations 10, 50. Just a reminder these should be all fairly similar and is just to look for differences among the center point merges.

xyz_grid-0000-a headshot photographic portrait of a woman
xyz_grid-0001-a cat as a DJ at the turntables

The differences are really small that I can spot, pixel differences at the edges of things, depth of blacks, small little details with more iterations especially in the collar area of the woman's top. With the prompts used I still kind of think the best run was on the two fp16 with that there was very good detail pick up across the board.

I haven't had a lot of time lately to run these off or I would have done some more in between iterations between the 10 and 50. Time allowing I'll get around to doing iterations of 100 and 200 as a follow up later as we know that merge method works already. So I think I should probably move on and I try the other merge methods hopefully later in the week. I'm still really not sure what will be good comparison to use against the block_weight presets beyond just checking to see if they work.

@vladmandic
Copy link
Owner

i think we should just check if there are any visual differences when using presets to see value of including them or not.
doesn't matter if we like them or not, question is are differences even noticeable. other than that, i think we can wrap-up testing and start on the actual pr.

@DirtyHamster
Copy link
Author

DirtyHamster commented Jul 11, 2023

Last few things on my list before testing the presets...

Did a little side test using pix2pix and caught an error, I reported it over on Meh's repo. Just noting it here so you can be aware of it or if you have any insight for it. s1dlx/meh#38

I need to check inpainting too as that could cause a similar error as it's also additional model architecture.
Checked and filed error report: s1dlx/meh#42

@s1dlx
Copy link

s1dlx commented Jul 21, 2023

@DirtyHamster pix2pix and inpainting should be fixed in sd-meh 0.9.0 0.9.4

@vladmandic
Copy link
Owner

git re-basin has been added to sdnext. there are some remaining issues that need to be addressed, but they are handled via separate issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants