Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] adding incontext-learning community pipeline #6419

Closed
wants to merge 2 commits into from

Conversation

charchit7
Copy link
Contributor

What does this PR do?

Introduces Prompt-diffusion as part of #6214
Paper : https://arxiv.org/pdf/2305.01115.pdf

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@sayakpaul

@charchit7
Copy link
Contributor Author

charchit7 commented Jan 1, 2024

With this PR, in addition to updating the code base, I plan to share insights on what I learn along the way. I'll start by briefly discussing ControlNet and in-context learning (in-context learning uses same idea), hoping to both enhance my understanding and assist others looking into it.

@charchit7
Copy link
Contributor Author

controlnet:

Stable Diffusion occasionally struggles to generate images aligned with specific prompts. ControlNet addresses this by enhancing Stable Diffusion's capabilities. The fundamental concept of ControlNet is straightforward:

  • Dual Weight System: It involves cloning Stable Diffusion's weights into two sets - a trainable copy and a locked copy. This dual system preserves the original information while allowing for modifications.
  • Zero-Convolution Technique: This is used to connect the two sets of weights. Essentially, in the initial phase, the inputs and outputs of both the trainable and locked copies of the neural network block are aligned, as if the duplication does not exist.

In-context learning :

  • Largly used in LLM world : is a specific method of prompt-learning where we demonstrate the task to the models as a example.
  • In vision we have challenges like : designinig effective prompts. Current models are task specific so they are not designed for in-context learning and lack flexibility to adapt.

So, that's where prompt-diff comes into picture.
image

Input to the models :
prompt: {text-guidance, example: (image1 → image2), image-query: image3} → target: image4)
where (image1 → image2) consists of a pair of vision task examples, e.g., (depth map → image),
text-guidance provides language instructions for specific tasks, and image3 is the input image query
that aligns in type with image1 and hence could be a real image or an image condition (e.g., depth or
hed map).

Both the paper are pretty neat read : incontext-learning and controlnet.

@sayakpaul
Copy link
Member

Pipelines under community are loadable directly, as is the case with the custom_pipeline argument.

In this case, there's a modified ControlNet module that needs to be considered. This is why it's better to have it under research_projects like how it's done for ControlNetXS: #6316.

Also, I'd suggest maintaining a Google Doc if you want to share your findings with the community in a more elaborate manner (I appreciate your willingness to do so). This way, the PR thread stays clean and to the point.

@charchit7
Copy link
Contributor Author

@sayakpaul Ahh, Got it. I'll change it accordingly.
Yeah, it will be difficult for you guys. Apologies for it. I'll create a doc and comment after things are done.

@charchit7
Copy link
Contributor Author

charchit7 commented Jan 7, 2024

Hey @sayakpaul
Apologies for the delay.
I have read the code and plan on implementing by tomorrow. Could you please tell what's the goal should be for this project. Is it just to convert the code to Diffusers format for showcases or create fully functional training script and all.

If you could please give me a idea about the flow that would help.

@sayakpaul
Copy link
Member

For now, it should just be about getting the pre-trained model to work in diffusers. We don't need to bother about training scripts now.

@charchit7
Copy link
Contributor Author

Thanks man.

@charchit7
Copy link
Contributor Author

Hey @sayakpaul I will resume this for sure..just wanted to update here. Had a cut in my right hand fingers so working on simple scripts for now.

@sayakpaul
Copy link
Member

Sorry about that! No pressure.

@charchit7 charchit7 closed this Jan 26, 2024
@charchit7 charchit7 deleted the incontext-learning branch October 8, 2024 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants