Normal calculation method. Can be used in all modes.
The comparison of two models is performed using cosine similarity, centered on the set ratio, and is calculated to eliminate loss due to merging. See below for further details. hako-mikan#33 https://github.com/recoilme/losslessmix
The original simple weight mode is the most basic method and works by linearly interpolating between the two models based on a given weight alpha. At alpha = 0, the output is the first model (model A), and at alpha = 1, the output is the second model (model B). Any other value of alpha results in a weighted average of the two models.
- Original merge results between AnythingV3 and FeverDream model
charming girl mid-shot. scenery-beautiful majestic
One key advantage of the cosine methods over the original simple weight mode is that they take into account the structural similarity between the two models, which can lead to better results when the two models are similar but not identical. Another advantage of the cosine methods is that they can help prevent overfitting and improve generalization by limiting the amount of detail from one model that is incorporated into the other.
In the case of CosineA, we normalize the vectors of the first model (model A) before merging, so the resulting merged model will favor the structure of the first model while incorporating details from the second model. This is because we are essentially aligning the direction of the first model's vectors with the direction of the corresponding vectors in the second model.
- CosineA merge results between AnythingV3 and FeverDream model Note structure-wise the pose direction/flow and face area
Detail-wise for example note how above and below, in all cases there's more blur preserved for the background compared to foreground, instead of the linear difference in the original merge.
On the other hand, in CosineB, we normalize the vectors of the second model (model B) before merging, so the resulting merged model will favor the structure of the second model while incorporating details from the first model. This is because we are aligning the direction of the second model's vectors with the direction of the corresponding vectors in the first model.
- CosineB merge results between AnythingV3 and FeverDream model Note structure-wise the pose direction/flow and face area, and how in the background it tried to keep the form more from the right too
In summary, the choice between CosineA and CosineB depends on which model's structure you want to prioritize in the resulting merged model. If you want to prioritize the structure of the first model, use CosineA. If you want to prioritize the structure of the second model, use CosineB.
Note also how the second model is more the 'reference point' for the merging looking at Alpha 1 compared to the changes at 0, so the order of models can also change the end result to look for your desired output.
A method of add difference that mixes the benefits of Median and Gaussian filters, to add model differences in a smoother way trying to avoid the negative 'burning' effect that can be seen when adding too many models this way. This also achieves more than just simply adding the difference at a lower value.
- The starting point for reference
- Adding a collection of models on top of it, each with a value of 1
The burn here is very obvious
- Adding a collection of models on top of it, each with a value of 0.5
Still not an outcome I would accept, especially you can see with the bird
The functionality and result of just the Median filter
- Reduces noise in the difference by replacing each value with the median of the neighboring values.
- Preserves edges and structures in the difference, which is helpful when you want to transfer the learning related to object shapes and boundaries.
- Non-linear filtering, which means it can better preserve the important features in the difference while reducing noise.
The functionality and result of just the Gaussian filter
- Smooths the difference by applying a Gaussian kernel, which reduces high-frequency noise and retains the low-frequency components.
- The level of smoothing can be controlled by the sigma parameter, allowing you to experiment with different levels of smoothing.
- Linear filtering, which means it can better preserve the global structure in the difference while reducing noise.
- The final result when instead using the combination of Median and Gaussian filters Note also compared with either the Median/Guassin filters individually how you can see the top left of the mans hair in the top right image doesn't get 'stuck' when combining them here, achieving the best result overall
TIP Sometimes you may want to use this smooth Add difference as an alternative to the regular, even without the risk of burning. In these cases you could increase the Alpha up to 2, as smooth Add at 1 is a lower impact change individually than regular Add, but this of course depends on your desired outcome.
- This is an Elemental merge that goes beyond Elemental merging. As you know, each elemental tensor determines the features of an image in U-NET, and in normal merging, the values of each tensor are multiplied by a ratio and added together as shown below (normal). In the tensor method, the tensors are combined by dividing them by the ratio as shown in the figure below (tensor).
The tensor size of each element is noted below.
model.diffusion_model.time_embed.0.weight torch.Size([1280, 320])
model.diffusion_model.time_embed.0.bias torch.Size([1280])
model.diffusion_model.time_embed.2.weight torch.Size([1280, 1280])
model.diffusion_model.time_embed.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.0.0.weight torch.Size([320, 4, 3, 3])
model.diffusion_model.input_blocks.0.0.bias torch.Size([320])
model.diffusion_model.input_blocks.1.0.in_layers.0.weight torch.Size([320])
model.diffusion_model.input_blocks.1.0.in_layers.0.bias torch.Size([320])
model.diffusion_model.input_blocks.1.0.in_layers.2.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.input_blocks.1.0.in_layers.2.bias torch.Size([320])
model.diffusion_model.input_blocks.1.0.emb_layers.1.weight torch.Size([320, 1280])
model.diffusion_model.input_blocks.1.0.emb_layers.1.bias torch.Size([320])
model.diffusion_model.input_blocks.1.0.out_layers.0.weight torch.Size([320])
model.diffusion_model.input_blocks.1.0.out_layers.0.bias torch.Size([320])
model.diffusion_model.input_blocks.1.0.out_layers.3.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.input_blocks.1.0.out_layers.3.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.norm.weight torch.Size([320])
model.diffusion_model.input_blocks.1.1.norm.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.proj_in.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.input_blocks.1.1.proj_in.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_q.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_k.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_v.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([2560, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([2560])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.ff.net.2.weight torch.Size([320, 1280])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.ff.net.2.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_q.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm1.weight torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm1.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm2.weight torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm2.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm3.weight torch.Size([320])
model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm3.bias torch.Size([320])
model.diffusion_model.input_blocks.1.1.proj_out.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.input_blocks.1.1.proj_out.bias torch.Size([320])
model.diffusion_model.input_blocks.2.0.in_layers.0.weight torch.Size([320])
model.diffusion_model.input_blocks.2.0.in_layers.0.bias torch.Size([320])
model.diffusion_model.input_blocks.2.0.in_layers.2.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.input_blocks.2.0.in_layers.2.bias torch.Size([320])
model.diffusion_model.input_blocks.2.0.emb_layers.1.weight torch.Size([320, 1280])
model.diffusion_model.input_blocks.2.0.emb_layers.1.bias torch.Size([320])
model.diffusion_model.input_blocks.2.0.out_layers.0.weight torch.Size([320])
model.diffusion_model.input_blocks.2.0.out_layers.0.bias torch.Size([320])
model.diffusion_model.input_blocks.2.0.out_layers.3.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.input_blocks.2.0.out_layers.3.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.norm.weight torch.Size([320])
model.diffusion_model.input_blocks.2.1.norm.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.proj_in.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.input_blocks.2.1.proj_in.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn1.to_q.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn1.to_k.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn1.to_v.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([2560, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([2560])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.ff.net.2.weight torch.Size([320, 1280])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.ff.net.2.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_q.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm1.weight torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm1.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm2.weight torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm2.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm3.weight torch.Size([320])
model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm3.bias torch.Size([320])
model.diffusion_model.input_blocks.2.1.proj_out.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.input_blocks.2.1.proj_out.bias torch.Size([320])
model.diffusion_model.input_blocks.3.0.op.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.input_blocks.3.0.op.bias torch.Size([320])
model.diffusion_model.input_blocks.4.0.in_layers.0.weight torch.Size([320])
model.diffusion_model.input_blocks.4.0.in_layers.0.bias torch.Size([320])
model.diffusion_model.input_blocks.4.0.in_layers.2.weight torch.Size([640, 320, 3, 3])
model.diffusion_model.input_blocks.4.0.in_layers.2.bias torch.Size([640])
model.diffusion_model.input_blocks.4.0.emb_layers.1.weight torch.Size([640, 1280])
model.diffusion_model.input_blocks.4.0.emb_layers.1.bias torch.Size([640])
model.diffusion_model.input_blocks.4.0.out_layers.0.weight torch.Size([640])
model.diffusion_model.input_blocks.4.0.out_layers.0.bias torch.Size([640])
model.diffusion_model.input_blocks.4.0.out_layers.3.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.input_blocks.4.0.out_layers.3.bias torch.Size([640])
model.diffusion_model.input_blocks.4.0.skip_connection.weight torch.Size([640, 320, 1, 1])
model.diffusion_model.input_blocks.4.0.skip_connection.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.norm.weight torch.Size([640])
model.diffusion_model.input_blocks.4.1.norm.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.proj_in.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.input_blocks.4.1.proj_in.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn1.to_q.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn1.to_k.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn1.to_v.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([5120, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([5120])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.ff.net.2.weight torch.Size([640, 2560])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.ff.net.2.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_q.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm1.weight torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm1.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm2.weight torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm2.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm3.weight torch.Size([640])
model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm3.bias torch.Size([640])
model.diffusion_model.input_blocks.4.1.proj_out.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.input_blocks.4.1.proj_out.bias torch.Size([640])
model.diffusion_model.input_blocks.5.0.in_layers.0.weight torch.Size([640])
model.diffusion_model.input_blocks.5.0.in_layers.0.bias torch.Size([640])
model.diffusion_model.input_blocks.5.0.in_layers.2.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.input_blocks.5.0.in_layers.2.bias torch.Size([640])
model.diffusion_model.input_blocks.5.0.emb_layers.1.weight torch.Size([640, 1280])
model.diffusion_model.input_blocks.5.0.emb_layers.1.bias torch.Size([640])
model.diffusion_model.input_blocks.5.0.out_layers.0.weight torch.Size([640])
model.diffusion_model.input_blocks.5.0.out_layers.0.bias torch.Size([640])
model.diffusion_model.input_blocks.5.0.out_layers.3.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.input_blocks.5.0.out_layers.3.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.norm.weight torch.Size([640])
model.diffusion_model.input_blocks.5.1.norm.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.proj_in.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.input_blocks.5.1.proj_in.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn1.to_q.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn1.to_k.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn1.to_v.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([5120, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([5120])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.ff.net.2.weight torch.Size([640, 2560])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.ff.net.2.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_q.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm1.weight torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm1.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm2.weight torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm2.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm3.weight torch.Size([640])
model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm3.bias torch.Size([640])
model.diffusion_model.input_blocks.5.1.proj_out.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.input_blocks.5.1.proj_out.bias torch.Size([640])
model.diffusion_model.input_blocks.6.0.op.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.input_blocks.6.0.op.bias torch.Size([640])
model.diffusion_model.input_blocks.7.0.in_layers.0.weight torch.Size([640])
model.diffusion_model.input_blocks.7.0.in_layers.0.bias torch.Size([640])
model.diffusion_model.input_blocks.7.0.in_layers.2.weight torch.Size([1280, 640, 3, 3])
model.diffusion_model.input_blocks.7.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.7.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.7.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.0.skip_connection.weight torch.Size([1280, 640, 1, 1])
model.diffusion_model.input_blocks.7.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.norm.weight torch.Size([1280])
model.diffusion_model.input_blocks.7.1.norm.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.input_blocks.7.1.proj_in.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.input_blocks.7.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.input_blocks.7.1.proj_out.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.0.in_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.0.in_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.0.in_layers.2.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.8.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.8.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.norm.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.1.norm.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.input_blocks.8.1.proj_in.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.input_blocks.8.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.input_blocks.8.1.proj_out.bias torch.Size([1280])
model.diffusion_model.input_blocks.9.0.op.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.9.0.op.bias torch.Size([1280])
model.diffusion_model.input_blocks.10.0.in_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.10.0.in_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.10.0.in_layers.2.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.10.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.10.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.10.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.input_blocks.10.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.10.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.10.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.10.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.input_blocks.11.0.in_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.11.0.in_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.11.0.in_layers.2.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.11.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.input_blocks.11.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.input_blocks.11.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.input_blocks.11.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.input_blocks.11.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.input_blocks.11.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.input_blocks.11.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.middle_block.0.in_layers.0.weight torch.Size([1280])
model.diffusion_model.middle_block.0.in_layers.0.bias torch.Size([1280])
model.diffusion_model.middle_block.0.in_layers.2.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.middle_block.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.middle_block.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.middle_block.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.middle_block.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.middle_block.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.middle_block.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.middle_block.1.norm.weight torch.Size([1280])
model.diffusion_model.middle_block.1.norm.bias torch.Size([1280])
model.diffusion_model.middle_block.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.middle_block.1.proj_in.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.middle_block.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.middle_block.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.middle_block.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.middle_block.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.middle_block.1.proj_out.bias torch.Size([1280])
model.diffusion_model.middle_block.2.in_layers.0.weight torch.Size([1280])
model.diffusion_model.middle_block.2.in_layers.0.bias torch.Size([1280])
model.diffusion_model.middle_block.2.in_layers.2.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.middle_block.2.in_layers.2.bias torch.Size([1280])
model.diffusion_model.middle_block.2.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.middle_block.2.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.middle_block.2.out_layers.0.weight torch.Size([1280])
model.diffusion_model.middle_block.2.out_layers.0.bias torch.Size([1280])
model.diffusion_model.middle_block.2.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.middle_block.2.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.0.0.in_layers.0.weight torch.Size([2560])
model.diffusion_model.output_blocks.0.0.in_layers.0.bias torch.Size([2560])
model.diffusion_model.output_blocks.0.0.in_layers.2.weight torch.Size([1280, 2560, 3, 3])
model.diffusion_model.output_blocks.0.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.0.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.0.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.0.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.0.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.0.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.0.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.0.0.skip_connection.weight torch.Size([1280, 2560, 1, 1])
model.diffusion_model.output_blocks.0.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.1.0.in_layers.0.weight torch.Size([2560])
model.diffusion_model.output_blocks.1.0.in_layers.0.bias torch.Size([2560])
model.diffusion_model.output_blocks.1.0.in_layers.2.weight torch.Size([1280, 2560, 3, 3])
model.diffusion_model.output_blocks.1.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.1.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.1.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.1.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.1.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.1.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.1.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.1.0.skip_connection.weight torch.Size([1280, 2560, 1, 1])
model.diffusion_model.output_blocks.1.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.0.in_layers.0.weight torch.Size([2560])
model.diffusion_model.output_blocks.2.0.in_layers.0.bias torch.Size([2560])
model.diffusion_model.output_blocks.2.0.in_layers.2.weight torch.Size([1280, 2560, 3, 3])
model.diffusion_model.output_blocks.2.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.2.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.2.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.2.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.0.skip_connection.weight torch.Size([1280, 2560, 1, 1])
model.diffusion_model.output_blocks.2.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.2.1.conv.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.2.1.conv.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.0.in_layers.0.weight torch.Size([2560])
model.diffusion_model.output_blocks.3.0.in_layers.0.bias torch.Size([2560])
model.diffusion_model.output_blocks.3.0.in_layers.2.weight torch.Size([1280, 2560, 3, 3])
model.diffusion_model.output_blocks.3.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.3.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.3.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.0.skip_connection.weight torch.Size([1280, 2560, 1, 1])
model.diffusion_model.output_blocks.3.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.norm.weight torch.Size([1280])
model.diffusion_model.output_blocks.3.1.norm.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.3.1.proj_in.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.output_blocks.3.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.3.1.proj_out.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.0.in_layers.0.weight torch.Size([2560])
model.diffusion_model.output_blocks.4.0.in_layers.0.bias torch.Size([2560])
model.diffusion_model.output_blocks.4.0.in_layers.2.weight torch.Size([1280, 2560, 3, 3])
model.diffusion_model.output_blocks.4.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.4.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.4.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.0.skip_connection.weight torch.Size([1280, 2560, 1, 1])
model.diffusion_model.output_blocks.4.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.norm.weight torch.Size([1280])
model.diffusion_model.output_blocks.4.1.norm.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.4.1.proj_in.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.output_blocks.4.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.4.1.proj_out.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.0.in_layers.0.weight torch.Size([1920])
model.diffusion_model.output_blocks.5.0.in_layers.0.bias torch.Size([1920])
model.diffusion_model.output_blocks.5.0.in_layers.2.weight torch.Size([1280, 1920, 3, 3])
model.diffusion_model.output_blocks.5.0.in_layers.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.0.emb_layers.1.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.0.emb_layers.1.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.0.out_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.5.0.out_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.0.out_layers.3.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.5.0.out_layers.3.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.0.skip_connection.weight torch.Size([1280, 1920, 1, 1])
model.diffusion_model.output_blocks.5.0.skip_connection.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.norm.weight torch.Size([1280])
model.diffusion_model.output_blocks.5.1.norm.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.proj_in.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.5.1.proj_in.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn1.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn1.to_k.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn1.to_v.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([10240, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([10240])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.ff.net.2.weight torch.Size([1280, 5120])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.ff.net.2.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_q.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([1280, 1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm1.weight torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm1.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm2.weight torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm2.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm3.weight torch.Size([1280])
model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm3.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.1.proj_out.weight torch.Size([1280, 1280, 1, 1])
model.diffusion_model.output_blocks.5.1.proj_out.bias torch.Size([1280])
model.diffusion_model.output_blocks.5.2.conv.weight torch.Size([1280, 1280, 3, 3])
model.diffusion_model.output_blocks.5.2.conv.bias torch.Size([1280])
model.diffusion_model.output_blocks.6.0.in_layers.0.weight torch.Size([1920])
model.diffusion_model.output_blocks.6.0.in_layers.0.bias torch.Size([1920])
model.diffusion_model.output_blocks.6.0.in_layers.2.weight torch.Size([640, 1920, 3, 3])
model.diffusion_model.output_blocks.6.0.in_layers.2.bias torch.Size([640])
model.diffusion_model.output_blocks.6.0.emb_layers.1.weight torch.Size([640, 1280])
model.diffusion_model.output_blocks.6.0.emb_layers.1.bias torch.Size([640])
model.diffusion_model.output_blocks.6.0.out_layers.0.weight torch.Size([640])
model.diffusion_model.output_blocks.6.0.out_layers.0.bias torch.Size([640])
model.diffusion_model.output_blocks.6.0.out_layers.3.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.output_blocks.6.0.out_layers.3.bias torch.Size([640])
model.diffusion_model.output_blocks.6.0.skip_connection.weight torch.Size([640, 1920, 1, 1])
model.diffusion_model.output_blocks.6.0.skip_connection.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.norm.weight torch.Size([640])
model.diffusion_model.output_blocks.6.1.norm.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.proj_in.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.6.1.proj_in.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn1.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn1.to_k.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn1.to_v.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([5120, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([5120])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.ff.net.2.weight torch.Size([640, 2560])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.ff.net.2.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm1.weight torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm1.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm2.weight torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm2.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm3.weight torch.Size([640])
model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm3.bias torch.Size([640])
model.diffusion_model.output_blocks.6.1.proj_out.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.6.1.proj_out.bias torch.Size([640])
model.diffusion_model.output_blocks.7.0.in_layers.0.weight torch.Size([1280])
model.diffusion_model.output_blocks.7.0.in_layers.0.bias torch.Size([1280])
model.diffusion_model.output_blocks.7.0.in_layers.2.weight torch.Size([640, 1280, 3, 3])
model.diffusion_model.output_blocks.7.0.in_layers.2.bias torch.Size([640])
model.diffusion_model.output_blocks.7.0.emb_layers.1.weight torch.Size([640, 1280])
model.diffusion_model.output_blocks.7.0.emb_layers.1.bias torch.Size([640])
model.diffusion_model.output_blocks.7.0.out_layers.0.weight torch.Size([640])
model.diffusion_model.output_blocks.7.0.out_layers.0.bias torch.Size([640])
model.diffusion_model.output_blocks.7.0.out_layers.3.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.output_blocks.7.0.out_layers.3.bias torch.Size([640])
model.diffusion_model.output_blocks.7.0.skip_connection.weight torch.Size([640, 1280, 1, 1])
model.diffusion_model.output_blocks.7.0.skip_connection.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.norm.weight torch.Size([640])
model.diffusion_model.output_blocks.7.1.norm.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.proj_in.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.7.1.proj_in.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_k.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_v.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([5120, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([5120])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.ff.net.2.weight torch.Size([640, 2560])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.ff.net.2.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm1.weight torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm1.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm2.weight torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm2.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm3.weight torch.Size([640])
model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm3.bias torch.Size([640])
model.diffusion_model.output_blocks.7.1.proj_out.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.7.1.proj_out.bias torch.Size([640])
model.diffusion_model.output_blocks.8.0.in_layers.0.weight torch.Size([960])
model.diffusion_model.output_blocks.8.0.in_layers.0.bias torch.Size([960])
model.diffusion_model.output_blocks.8.0.in_layers.2.weight torch.Size([640, 960, 3, 3])
model.diffusion_model.output_blocks.8.0.in_layers.2.bias torch.Size([640])
model.diffusion_model.output_blocks.8.0.emb_layers.1.weight torch.Size([640, 1280])
model.diffusion_model.output_blocks.8.0.emb_layers.1.bias torch.Size([640])
model.diffusion_model.output_blocks.8.0.out_layers.0.weight torch.Size([640])
model.diffusion_model.output_blocks.8.0.out_layers.0.bias torch.Size([640])
model.diffusion_model.output_blocks.8.0.out_layers.3.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.output_blocks.8.0.out_layers.3.bias torch.Size([640])
model.diffusion_model.output_blocks.8.0.skip_connection.weight torch.Size([640, 960, 1, 1])
model.diffusion_model.output_blocks.8.0.skip_connection.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.norm.weight torch.Size([640])
model.diffusion_model.output_blocks.8.1.norm.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.proj_in.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.8.1.proj_in.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn1.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn1.to_k.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn1.to_v.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([5120, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([5120])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.ff.net.2.weight torch.Size([640, 2560])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.ff.net.2.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_q.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([640, 640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm1.weight torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm1.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm2.weight torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm2.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm3.weight torch.Size([640])
model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm3.bias torch.Size([640])
model.diffusion_model.output_blocks.8.1.proj_out.weight torch.Size([640, 640, 1, 1])
model.diffusion_model.output_blocks.8.1.proj_out.bias torch.Size([640])
model.diffusion_model.output_blocks.8.2.conv.weight torch.Size([640, 640, 3, 3])
model.diffusion_model.output_blocks.8.2.conv.bias torch.Size([640])
model.diffusion_model.output_blocks.9.0.in_layers.0.weight torch.Size([960])
model.diffusion_model.output_blocks.9.0.in_layers.0.bias torch.Size([960])
model.diffusion_model.output_blocks.9.0.in_layers.2.weight torch.Size([320, 960, 3, 3])
model.diffusion_model.output_blocks.9.0.in_layers.2.bias torch.Size([320])
model.diffusion_model.output_blocks.9.0.emb_layers.1.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.9.0.emb_layers.1.bias torch.Size([320])
model.diffusion_model.output_blocks.9.0.out_layers.0.weight torch.Size([320])
model.diffusion_model.output_blocks.9.0.out_layers.0.bias torch.Size([320])
model.diffusion_model.output_blocks.9.0.out_layers.3.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.output_blocks.9.0.out_layers.3.bias torch.Size([320])
model.diffusion_model.output_blocks.9.0.skip_connection.weight torch.Size([320, 960, 1, 1])
model.diffusion_model.output_blocks.9.0.skip_connection.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.norm.weight torch.Size([320])
model.diffusion_model.output_blocks.9.1.norm.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.proj_in.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.9.1.proj_in.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_k.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_v.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([2560, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([2560])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.ff.net.2.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.ff.net.2.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm1.weight torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm1.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm2.weight torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm2.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm3.weight torch.Size([320])
model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm3.bias torch.Size([320])
model.diffusion_model.output_blocks.9.1.proj_out.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.9.1.proj_out.bias torch.Size([320])
model.diffusion_model.output_blocks.10.0.in_layers.0.weight torch.Size([640])
model.diffusion_model.output_blocks.10.0.in_layers.0.bias torch.Size([640])
model.diffusion_model.output_blocks.10.0.in_layers.2.weight torch.Size([320, 640, 3, 3])
model.diffusion_model.output_blocks.10.0.in_layers.2.bias torch.Size([320])
model.diffusion_model.output_blocks.10.0.emb_layers.1.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.10.0.emb_layers.1.bias torch.Size([320])
model.diffusion_model.output_blocks.10.0.out_layers.0.weight torch.Size([320])
model.diffusion_model.output_blocks.10.0.out_layers.0.bias torch.Size([320])
model.diffusion_model.output_blocks.10.0.out_layers.3.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.output_blocks.10.0.out_layers.3.bias torch.Size([320])
model.diffusion_model.output_blocks.10.0.skip_connection.weight torch.Size([320, 640, 1, 1])
model.diffusion_model.output_blocks.10.0.skip_connection.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.norm.weight torch.Size([320])
model.diffusion_model.output_blocks.10.1.norm.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.proj_in.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.10.1.proj_in.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn1.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn1.to_k.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn1.to_v.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([2560, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([2560])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.ff.net.2.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.ff.net.2.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm1.weight torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm1.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm2.weight torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm2.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm3.weight torch.Size([320])
model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm3.bias torch.Size([320])
model.diffusion_model.output_blocks.10.1.proj_out.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.10.1.proj_out.bias torch.Size([320])
model.diffusion_model.output_blocks.11.0.in_layers.0.weight torch.Size([640])
model.diffusion_model.output_blocks.11.0.in_layers.0.bias torch.Size([640])
model.diffusion_model.output_blocks.11.0.in_layers.2.weight torch.Size([320, 640, 3, 3])
model.diffusion_model.output_blocks.11.0.in_layers.2.bias torch.Size([320])
model.diffusion_model.output_blocks.11.0.emb_layers.1.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.11.0.emb_layers.1.bias torch.Size([320])
model.diffusion_model.output_blocks.11.0.out_layers.0.weight torch.Size([320])
model.diffusion_model.output_blocks.11.0.out_layers.0.bias torch.Size([320])
model.diffusion_model.output_blocks.11.0.out_layers.3.weight torch.Size([320, 320, 3, 3])
model.diffusion_model.output_blocks.11.0.out_layers.3.bias torch.Size([320])
model.diffusion_model.output_blocks.11.0.skip_connection.weight torch.Size([320, 640, 1, 1])
model.diffusion_model.output_blocks.11.0.skip_connection.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.norm.weight torch.Size([320])
model.diffusion_model.output_blocks.11.1.norm.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.proj_in.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.11.1.proj_in.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn1.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn1.to_k.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn1.to_v.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn1.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn1.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.ff.net.0.proj.weight torch.Size([2560, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.ff.net.0.proj.bias torch.Size([2560])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.ff.net.2.weight torch.Size([320, 1280])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.ff.net.2.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_q.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_out.0.weight torch.Size([320, 320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_out.0.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm1.weight torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm1.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm2.weight torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm2.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm3.weight torch.Size([320])
model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm3.bias torch.Size([320])
model.diffusion_model.output_blocks.11.1.proj_out.weight torch.Size([320, 320, 1, 1])
model.diffusion_model.output_blocks.11.1.proj_out.bias torch.Size([320])
model.diffusion_model.out.0.weight torch.Size([320])
model.diffusion_model.out.0.bias torch.Size([320])
model.diffusion_model.out.2.weight torch.Size([4, 320, 3, 3])
model.diffusion_model.out.2.bias torch.Size([4])
first_stage_model.encoder.conv_in.weight torch.Size([128, 3, 3, 3])
first_stage_model.encoder.conv_in.bias torch.Size([128])
first_stage_model.encoder.down.0.block.0.norm1.weight torch.Size([128])
first_stage_model.encoder.down.0.block.0.norm1.bias torch.Size([128])
first_stage_model.encoder.down.0.block.0.conv1.weight torch.Size([128, 128, 3, 3])
first_stage_model.encoder.down.0.block.0.conv1.bias torch.Size([128])
first_stage_model.encoder.down.0.block.0.norm2.weight torch.Size([128])
first_stage_model.encoder.down.0.block.0.norm2.bias torch.Size([128])
first_stage_model.encoder.down.0.block.0.conv2.weight torch.Size([128, 128, 3, 3])
first_stage_model.encoder.down.0.block.0.conv2.bias torch.Size([128])
first_stage_model.encoder.down.0.block.1.norm1.weight torch.Size([128])
first_stage_model.encoder.down.0.block.1.norm1.bias torch.Size([128])
first_stage_model.encoder.down.0.block.1.conv1.weight torch.Size([128, 128, 3, 3])
first_stage_model.encoder.down.0.block.1.conv1.bias torch.Size([128])
first_stage_model.encoder.down.0.block.1.norm2.weight torch.Size([128])
first_stage_model.encoder.down.0.block.1.norm2.bias torch.Size([128])
first_stage_model.encoder.down.0.block.1.conv2.weight torch.Size([128, 128, 3, 3])
first_stage_model.encoder.down.0.block.1.conv2.bias torch.Size([128])
first_stage_model.encoder.down.0.downsample.conv.weight torch.Size([128, 128, 3, 3])
first_stage_model.encoder.down.0.downsample.conv.bias torch.Size([128])
first_stage_model.encoder.down.1.block.0.norm1.weight torch.Size([128])
first_stage_model.encoder.down.1.block.0.norm1.bias torch.Size([128])
first_stage_model.encoder.down.1.block.0.conv1.weight torch.Size([256, 128, 3, 3])
first_stage_model.encoder.down.1.block.0.conv1.bias torch.Size([256])
first_stage_model.encoder.down.1.block.0.norm2.weight torch.Size([256])
first_stage_model.encoder.down.1.block.0.norm2.bias torch.Size([256])
first_stage_model.encoder.down.1.block.0.conv2.weight torch.Size([256, 256, 3, 3])
first_stage_model.encoder.down.1.block.0.conv2.bias torch.Size([256])
first_stage_model.encoder.down.1.block.0.nin_shortcut.weight torch.Size([256, 128, 1, 1])
first_stage_model.encoder.down.1.block.0.nin_shortcut.bias torch.Size([256])
first_stage_model.encoder.down.1.block.1.norm1.weight torch.Size([256])
first_stage_model.encoder.down.1.block.1.norm1.bias torch.Size([256])
first_stage_model.encoder.down.1.block.1.conv1.weight torch.Size([256, 256, 3, 3])
first_stage_model.encoder.down.1.block.1.conv1.bias torch.Size([256])
first_stage_model.encoder.down.1.block.1.norm2.weight torch.Size([256])
first_stage_model.encoder.down.1.block.1.norm2.bias torch.Size([256])
first_stage_model.encoder.down.1.block.1.conv2.weight torch.Size([256, 256, 3, 3])
first_stage_model.encoder.down.1.block.1.conv2.bias torch.Size([256])
first_stage_model.encoder.down.1.downsample.conv.weight torch.Size([256, 256, 3, 3])
first_stage_model.encoder.down.1.downsample.conv.bias torch.Size([256])
first_stage_model.encoder.down.2.block.0.norm1.weight torch.Size([256])
first_stage_model.encoder.down.2.block.0.norm1.bias torch.Size([256])
first_stage_model.encoder.down.2.block.0.conv1.weight torch.Size([512, 256, 3, 3])
first_stage_model.encoder.down.2.block.0.conv1.bias torch.Size([512])
first_stage_model.encoder.down.2.block.0.norm2.weight torch.Size([512])
first_stage_model.encoder.down.2.block.0.norm2.bias torch.Size([512])
first_stage_model.encoder.down.2.block.0.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.2.block.0.conv2.bias torch.Size([512])
first_stage_model.encoder.down.2.block.0.nin_shortcut.weight torch.Size([512, 256, 1, 1])
first_stage_model.encoder.down.2.block.0.nin_shortcut.bias torch.Size([512])
first_stage_model.encoder.down.2.block.1.norm1.weight torch.Size([512])
first_stage_model.encoder.down.2.block.1.norm1.bias torch.Size([512])
first_stage_model.encoder.down.2.block.1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.2.block.1.conv1.bias torch.Size([512])
first_stage_model.encoder.down.2.block.1.norm2.weight torch.Size([512])
first_stage_model.encoder.down.2.block.1.norm2.bias torch.Size([512])
first_stage_model.encoder.down.2.block.1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.2.block.1.conv2.bias torch.Size([512])
first_stage_model.encoder.down.2.downsample.conv.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.2.downsample.conv.bias torch.Size([512])
first_stage_model.encoder.down.3.block.0.norm1.weight torch.Size([512])
first_stage_model.encoder.down.3.block.0.norm1.bias torch.Size([512])
first_stage_model.encoder.down.3.block.0.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.3.block.0.conv1.bias torch.Size([512])
first_stage_model.encoder.down.3.block.0.norm2.weight torch.Size([512])
first_stage_model.encoder.down.3.block.0.norm2.bias torch.Size([512])
first_stage_model.encoder.down.3.block.0.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.3.block.0.conv2.bias torch.Size([512])
first_stage_model.encoder.down.3.block.1.norm1.weight torch.Size([512])
first_stage_model.encoder.down.3.block.1.norm1.bias torch.Size([512])
first_stage_model.encoder.down.3.block.1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.3.block.1.conv1.bias torch.Size([512])
first_stage_model.encoder.down.3.block.1.norm2.weight torch.Size([512])
first_stage_model.encoder.down.3.block.1.norm2.bias torch.Size([512])
first_stage_model.encoder.down.3.block.1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.down.3.block.1.conv2.bias torch.Size([512])
first_stage_model.encoder.mid.block_1.norm1.weight torch.Size([512])
first_stage_model.encoder.mid.block_1.norm1.bias torch.Size([512])
first_stage_model.encoder.mid.block_1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.mid.block_1.conv1.bias torch.Size([512])
first_stage_model.encoder.mid.block_1.norm2.weight torch.Size([512])
first_stage_model.encoder.mid.block_1.norm2.bias torch.Size([512])
first_stage_model.encoder.mid.block_1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.mid.block_1.conv2.bias torch.Size([512])
first_stage_model.encoder.mid.attn_1.norm.weight torch.Size([512])
first_stage_model.encoder.mid.attn_1.norm.bias torch.Size([512])
first_stage_model.encoder.mid.attn_1.q.weight torch.Size([512, 512, 1, 1])
first_stage_model.encoder.mid.attn_1.q.bias torch.Size([512])
first_stage_model.encoder.mid.attn_1.k.weight torch.Size([512, 512, 1, 1])
first_stage_model.encoder.mid.attn_1.k.bias torch.Size([512])
first_stage_model.encoder.mid.attn_1.v.weight torch.Size([512, 512, 1, 1])
first_stage_model.encoder.mid.attn_1.v.bias torch.Size([512])
first_stage_model.encoder.mid.attn_1.proj_out.weight torch.Size([512, 512, 1, 1])
first_stage_model.encoder.mid.attn_1.proj_out.bias torch.Size([512])
first_stage_model.encoder.mid.block_2.norm1.weight torch.Size([512])
first_stage_model.encoder.mid.block_2.norm1.bias torch.Size([512])
first_stage_model.encoder.mid.block_2.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.mid.block_2.conv1.bias torch.Size([512])
first_stage_model.encoder.mid.block_2.norm2.weight torch.Size([512])
first_stage_model.encoder.mid.block_2.norm2.bias torch.Size([512])
first_stage_model.encoder.mid.block_2.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.encoder.mid.block_2.conv2.bias torch.Size([512])
first_stage_model.encoder.norm_out.weight torch.Size([512])
first_stage_model.encoder.norm_out.bias torch.Size([512])
first_stage_model.encoder.conv_out.weight torch.Size([8, 512, 3, 3])
first_stage_model.encoder.conv_out.bias torch.Size([8])
first_stage_model.decoder.conv_in.weight torch.Size([512, 4, 3, 3])
first_stage_model.decoder.conv_in.bias torch.Size([512])
first_stage_model.decoder.mid.block_1.norm1.weight torch.Size([512])
first_stage_model.decoder.mid.block_1.norm1.bias torch.Size([512])
first_stage_model.decoder.mid.block_1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.mid.block_1.conv1.bias torch.Size([512])
first_stage_model.decoder.mid.block_1.norm2.weight torch.Size([512])
first_stage_model.decoder.mid.block_1.norm2.bias torch.Size([512])
first_stage_model.decoder.mid.block_1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.mid.block_1.conv2.bias torch.Size([512])
first_stage_model.decoder.mid.attn_1.norm.weight torch.Size([512])
first_stage_model.decoder.mid.attn_1.norm.bias torch.Size([512])
first_stage_model.decoder.mid.attn_1.q.weight torch.Size([512, 512, 1, 1])
first_stage_model.decoder.mid.attn_1.q.bias torch.Size([512])
first_stage_model.decoder.mid.attn_1.k.weight torch.Size([512, 512, 1, 1])
first_stage_model.decoder.mid.attn_1.k.bias torch.Size([512])
first_stage_model.decoder.mid.attn_1.v.weight torch.Size([512, 512, 1, 1])
first_stage_model.decoder.mid.attn_1.v.bias torch.Size([512])
first_stage_model.decoder.mid.attn_1.proj_out.weight torch.Size([512, 512, 1, 1])
first_stage_model.decoder.mid.attn_1.proj_out.bias torch.Size([512])
first_stage_model.decoder.mid.block_2.norm1.weight torch.Size([512])
first_stage_model.decoder.mid.block_2.norm1.bias torch.Size([512])
first_stage_model.decoder.mid.block_2.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.mid.block_2.conv1.bias torch.Size([512])
first_stage_model.decoder.mid.block_2.norm2.weight torch.Size([512])
first_stage_model.decoder.mid.block_2.norm2.bias torch.Size([512])
first_stage_model.decoder.mid.block_2.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.mid.block_2.conv2.bias torch.Size([512])
first_stage_model.decoder.up.0.block.0.norm1.weight torch.Size([256])
first_stage_model.decoder.up.0.block.0.norm1.bias torch.Size([256])
first_stage_model.decoder.up.0.block.0.conv1.weight torch.Size([128, 256, 3, 3])
first_stage_model.decoder.up.0.block.0.conv1.bias torch.Size([128])
first_stage_model.decoder.up.0.block.0.norm2.weight torch.Size([128])
first_stage_model.decoder.up.0.block.0.norm2.bias torch.Size([128])
first_stage_model.decoder.up.0.block.0.conv2.weight torch.Size([128, 128, 3, 3])
first_stage_model.decoder.up.0.block.0.conv2.bias torch.Size([128])
first_stage_model.decoder.up.0.block.0.nin_shortcut.weight torch.Size([128, 256, 1, 1])
first_stage_model.decoder.up.0.block.0.nin_shortcut.bias torch.Size([128])
first_stage_model.decoder.up.0.block.1.norm1.weight torch.Size([128])
first_stage_model.decoder.up.0.block.1.norm1.bias torch.Size([128])
first_stage_model.decoder.up.0.block.1.conv1.weight torch.Size([128, 128, 3, 3])
first_stage_model.decoder.up.0.block.1.conv1.bias torch.Size([128])
first_stage_model.decoder.up.0.block.1.norm2.weight torch.Size([128])
first_stage_model.decoder.up.0.block.1.norm2.bias torch.Size([128])
first_stage_model.decoder.up.0.block.1.conv2.weight torch.Size([128, 128, 3, 3])
first_stage_model.decoder.up.0.block.1.conv2.bias torch.Size([128])
first_stage_model.decoder.up.0.block.2.norm1.weight torch.Size([128])
first_stage_model.decoder.up.0.block.2.norm1.bias torch.Size([128])
first_stage_model.decoder.up.0.block.2.conv1.weight torch.Size([128, 128, 3, 3])
first_stage_model.decoder.up.0.block.2.conv1.bias torch.Size([128])
first_stage_model.decoder.up.0.block.2.norm2.weight torch.Size([128])
first_stage_model.decoder.up.0.block.2.norm2.bias torch.Size([128])
first_stage_model.decoder.up.0.block.2.conv2.weight torch.Size([128, 128, 3, 3])
first_stage_model.decoder.up.0.block.2.conv2.bias torch.Size([128])
first_stage_model.decoder.up.1.block.0.norm1.weight torch.Size([512])
first_stage_model.decoder.up.1.block.0.norm1.bias torch.Size([512])
first_stage_model.decoder.up.1.block.0.conv1.weight torch.Size([256, 512, 3, 3])
first_stage_model.decoder.up.1.block.0.conv1.bias torch.Size([256])
first_stage_model.decoder.up.1.block.0.norm2.weight torch.Size([256])
first_stage_model.decoder.up.1.block.0.norm2.bias torch.Size([256])
first_stage_model.decoder.up.1.block.0.conv2.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.block.0.conv2.bias torch.Size([256])
first_stage_model.decoder.up.1.block.0.nin_shortcut.weight torch.Size([256, 512, 1, 1])
first_stage_model.decoder.up.1.block.0.nin_shortcut.bias torch.Size([256])
first_stage_model.decoder.up.1.block.1.norm1.weight torch.Size([256])
first_stage_model.decoder.up.1.block.1.norm1.bias torch.Size([256])
first_stage_model.decoder.up.1.block.1.conv1.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.block.1.conv1.bias torch.Size([256])
first_stage_model.decoder.up.1.block.1.norm2.weight torch.Size([256])
first_stage_model.decoder.up.1.block.1.norm2.bias torch.Size([256])
first_stage_model.decoder.up.1.block.1.conv2.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.block.1.conv2.bias torch.Size([256])
first_stage_model.decoder.up.1.block.2.norm1.weight torch.Size([256])
first_stage_model.decoder.up.1.block.2.norm1.bias torch.Size([256])
first_stage_model.decoder.up.1.block.2.conv1.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.block.2.conv1.bias torch.Size([256])
first_stage_model.decoder.up.1.block.2.norm2.weight torch.Size([256])
first_stage_model.decoder.up.1.block.2.norm2.bias torch.Size([256])
first_stage_model.decoder.up.1.block.2.conv2.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.block.2.conv2.bias torch.Size([256])
first_stage_model.decoder.up.1.upsample.conv.weight torch.Size([256, 256, 3, 3])
first_stage_model.decoder.up.1.upsample.conv.bias torch.Size([256])
first_stage_model.decoder.up.2.block.0.norm1.weight torch.Size([512])
first_stage_model.decoder.up.2.block.0.norm1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.0.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.0.conv1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.0.norm2.weight torch.Size([512])
first_stage_model.decoder.up.2.block.0.norm2.bias torch.Size([512])
first_stage_model.decoder.up.2.block.0.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.0.conv2.bias torch.Size([512])
first_stage_model.decoder.up.2.block.1.norm1.weight torch.Size([512])
first_stage_model.decoder.up.2.block.1.norm1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.1.conv1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.1.norm2.weight torch.Size([512])
first_stage_model.decoder.up.2.block.1.norm2.bias torch.Size([512])
first_stage_model.decoder.up.2.block.1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.1.conv2.bias torch.Size([512])
first_stage_model.decoder.up.2.block.2.norm1.weight torch.Size([512])
first_stage_model.decoder.up.2.block.2.norm1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.2.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.2.conv1.bias torch.Size([512])
first_stage_model.decoder.up.2.block.2.norm2.weight torch.Size([512])
first_stage_model.decoder.up.2.block.2.norm2.bias torch.Size([512])
first_stage_model.decoder.up.2.block.2.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.block.2.conv2.bias torch.Size([512])
first_stage_model.decoder.up.2.upsample.conv.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.2.upsample.conv.bias torch.Size([512])
first_stage_model.decoder.up.3.block.0.norm1.weight torch.Size([512])
first_stage_model.decoder.up.3.block.0.norm1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.0.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.0.conv1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.0.norm2.weight torch.Size([512])
first_stage_model.decoder.up.3.block.0.norm2.bias torch.Size([512])
first_stage_model.decoder.up.3.block.0.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.0.conv2.bias torch.Size([512])
first_stage_model.decoder.up.3.block.1.norm1.weight torch.Size([512])
first_stage_model.decoder.up.3.block.1.norm1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.1.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.1.conv1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.1.norm2.weight torch.Size([512])
first_stage_model.decoder.up.3.block.1.norm2.bias torch.Size([512])
first_stage_model.decoder.up.3.block.1.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.1.conv2.bias torch.Size([512])
first_stage_model.decoder.up.3.block.2.norm1.weight torch.Size([512])
first_stage_model.decoder.up.3.block.2.norm1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.2.conv1.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.2.conv1.bias torch.Size([512])
first_stage_model.decoder.up.3.block.2.norm2.weight torch.Size([512])
first_stage_model.decoder.up.3.block.2.norm2.bias torch.Size([512])
first_stage_model.decoder.up.3.block.2.conv2.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.block.2.conv2.bias torch.Size([512])
first_stage_model.decoder.up.3.upsample.conv.weight torch.Size([512, 512, 3, 3])
first_stage_model.decoder.up.3.upsample.conv.bias torch.Size([512])
first_stage_model.decoder.norm_out.weight torch.Size([128])
first_stage_model.decoder.norm_out.bias torch.Size([128])
first_stage_model.decoder.conv_out.weight torch.Size([3, 128, 3, 3])
first_stage_model.decoder.conv_out.bias torch.Size([3])
first_stage_model.quant_conv.weight torch.Size([8, 8, 1, 1])
first_stage_model.quant_conv.bias torch.Size([8])
first_stage_model.post_quant_conv.weight torch.Size([4, 4, 1, 1])
first_stage_model.post_quant_conv.bias torch.Size([4])
cond_stage_model.transformer.text_model.embeddings.token_embedding.weight torch.Size([49408, 768])
cond_stage_model.transformer.text_model.embeddings.position_embedding.weight torch.Size([77, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.0.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.1.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.1.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.1.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.1.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.1.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.2.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.2.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.2.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.2.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.3.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.3.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.3.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.3.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.4.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.4.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.4.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.4.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.4.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.5.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.5.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.5.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.5.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.5.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.6.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.6.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.6.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.6.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.6.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.7.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.7.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.7.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.7.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.7.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.8.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.8.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.8.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.8.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.8.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.9.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.9.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.9.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.9.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.9.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.10.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.10.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.10.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.10.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.10.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.k_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.k_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.v_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.v_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.q_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.q_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.out_proj.weight torch.Size([768, 768])
cond_stage_model.transformer.text_model.encoder.layers.11.self_attn.out_proj.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.layer_norm1.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.layer_norm1.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.mlp.fc1.weight torch.Size([3072, 768])
cond_stage_model.transformer.text_model.encoder.layers.11.mlp.fc1.bias torch.Size([3072])
cond_stage_model.transformer.text_model.encoder.layers.11.mlp.fc2.weight torch.Size([768, 3072])
cond_stage_model.transformer.text_model.encoder.layers.11.mlp.fc2.bias torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.layer_norm2.weight torch.Size([768])
cond_stage_model.transformer.text_model.encoder.layers.11.layer_norm2.bias torch.Size([768])
cond_stage_model.transformer.text_model.final_layer_norm.weight torch.Size([768])
cond_stage_model.transformer.text_model.final_layer_norm.bias torch.Size([768])