[WIP] adding incontext-learning community pipeline #6419

charchit7 · 2024-01-01T13:07:38Z

What does this PR do?

Introduces Prompt-diffusion as part of #6214
Paper : https://arxiv.org/pdf/2305.01115.pdf

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@sayakpaul

charchit7 · 2024-01-01T13:16:50Z

With this PR, in addition to updating the code base, I plan to share insights on what I learn along the way. I'll start by briefly discussing ControlNet and in-context learning (in-context learning uses same idea), hoping to both enhance my understanding and assist others looking into it.

charchit7 · 2024-01-01T14:00:03Z

controlnet:

Stable Diffusion occasionally struggles to generate images aligned with specific prompts. ControlNet addresses this by enhancing Stable Diffusion's capabilities. The fundamental concept of ControlNet is straightforward:

Dual Weight System: It involves cloning Stable Diffusion's weights into two sets - a trainable copy and a locked copy. This dual system preserves the original information while allowing for modifications.
Zero-Convolution Technique: This is used to connect the two sets of weights. Essentially, in the initial phase, the inputs and outputs of both the trainable and locked copies of the neural network block are aligned, as if the duplication does not exist.

In-context learning :

Largly used in LLM world : is a specific method of prompt-learning where we demonstrate the task to the models as a example.
In vision we have challenges like : designinig effective prompts. Current models are task specific so they are not designed for in-context learning and lack flexibility to adapt.

So, that's where prompt-diff comes into picture.

Input to the models :
prompt: {text-guidance, example: (image1 → image2), image-query: image3} → target: image4)
where (image1 → image2) consists of a pair of vision task examples, e.g., (depth map → image),
text-guidance provides language instructions for specific tasks, and image3 is the input image query
that aligns in type with image1 and hence could be a real image or an image condition (e.g., depth or
hed map).

Both the paper are pretty neat read : incontext-learning and controlnet.

sayakpaul · 2024-01-01T14:06:37Z

Pipelines under community are loadable directly, as is the case with the custom_pipeline argument.

In this case, there's a modified ControlNet module that needs to be considered. This is why it's better to have it under research_projects like how it's done for ControlNetXS: #6316.

Also, I'd suggest maintaining a Google Doc if you want to share your findings with the community in a more elaborate manner (I appreciate your willingness to do so). This way, the PR thread stays clean and to the point.

charchit7 · 2024-01-01T14:13:24Z

@sayakpaul Ahh, Got it. I'll change it accordingly.
Yeah, it will be difficult for you guys. Apologies for it. I'll create a doc and comment after things are done.

charchit7 · 2024-01-07T15:19:24Z

Hey @sayakpaul
Apologies for the delay.
I have read the code and plan on implementing by tomorrow. Could you please tell what's the goal should be for this project. Is it just to convert the code to Diffusers format for showcases or create fully functional training script and all.

If you could please give me a idea about the flow that would help.

sayakpaul · 2024-01-07T15:24:20Z

For now, it should just be about getting the pre-trained model to work in diffusers. We don't need to bother about training scripts now.

charchit7 · 2024-01-07T15:33:30Z

Thanks man.

charchit7 · 2024-01-16T06:47:19Z

Hey @sayakpaul I will resume this for sure..just wanted to update here. Had a cut in my right hand fingers so working on simple scripts for now.

sayakpaul · 2024-01-16T07:19:23Z

Sorry about that! No pressure.

adding incontext-learning pipeline

956f11b

added files in research folder

8cf884f

charchit7 closed this Jan 26, 2024

charchit7 deleted the incontext-learning branch October 8, 2024 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] adding incontext-learning community pipeline #6419

[WIP] adding incontext-learning community pipeline #6419

charchit7 commented Jan 1, 2024

charchit7 commented Jan 1, 2024 •

edited

Loading

charchit7 commented Jan 1, 2024

sayakpaul commented Jan 1, 2024

charchit7 commented Jan 1, 2024

charchit7 commented Jan 7, 2024 •

edited

Loading

sayakpaul commented Jan 7, 2024

charchit7 commented Jan 7, 2024

charchit7 commented Jan 16, 2024

sayakpaul commented Jan 16, 2024

[WIP] adding incontext-learning community pipeline #6419

[WIP] adding incontext-learning community pipeline #6419

Conversation

charchit7 commented Jan 1, 2024

What does this PR do?

Before submitting

Who can review?

charchit7 commented Jan 1, 2024 • edited Loading

charchit7 commented Jan 1, 2024

controlnet:

In-context learning :

sayakpaul commented Jan 1, 2024

charchit7 commented Jan 1, 2024

charchit7 commented Jan 7, 2024 • edited Loading

sayakpaul commented Jan 7, 2024

charchit7 commented Jan 7, 2024

charchit7 commented Jan 16, 2024

sayakpaul commented Jan 16, 2024

charchit7 commented Jan 1, 2024 •

edited

Loading

charchit7 commented Jan 7, 2024 •

edited

Loading