Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training test case question #123

Open
jimb2834 opened this issue Apr 12, 2024 · 5 comments
Open

Training test case question #123

jimb2834 opened this issue Apr 12, 2024 · 5 comments

Comments

@jimb2834
Copy link

Hello @orpatashnik - Great work!

I wanted to perform a test and create a new StyleGan2 model with CLIP. So simply adding "captions" like male, and female to the images, and learn how it works.

Example:

  • image001.png
  • image001.txt <-- inside is the keyword male or female

What qty of images do you suggest?
and
If I used only 10,000 is this enough to be able to locate the latent space or get some idea how it works?

Thanks!

@orpatashnik
Copy link
Owner

orpatashnik commented Apr 13, 2024

Hi @jimb2834 ,
Thanks for you interest in our work :)
Can you please clarify what do you mean by creating a new StyleGAN model with CLIP?

@jimb2834
Copy link
Author

jimb2834 commented Apr 13, 2024

Hi @orpatashnik - Yes your work is great

Yes, Here is a little background: In the past, I have experimented and created new GAN, SG, SG-ada, SG3 models for various use cases, like medical. Recently I became interested in Clip and how to train the models with a "keyword" or like in diffusion "captions" - So simply put I gathered a training set from FFHQ and labeled them "male" and "female" and want to use your training script to train a StyleCLIP model.

Thanks for reading this also

@orpatashnik
Copy link
Owner

Hi @jimb2834 ,

StyleCLIP is a method that employs CLIP and StyleGAN for editing, we didn't fine-tune StyleGAN or changed its architecture.
So given an image, you can change its attributes by using text.

If you are only interested in controlling the gender, a possible solution can be to use one of StyleCLIP's methods (latent mapper or global directions). Then you can sample a random latent code, and shift it towards the target gender either with the global direction or a trained mapper. For both of these methods you don't need labeled data, as CLIP provides the guidance.

If you are interested to train your own GAN, maybe you can take a look here: https://github.com/JiauZhang/GigaGAN. This is a new GAN architecture that can be conditioned on text.

@jimb2834
Copy link
Author

Hi @orpatashnik - Thank you again for the replies;

I see, but what baffles me is this and perhaps you could advise where my logic is flawed.

  • CLIP is a neural network trained on a variety of (image, text) pairs. i.e. the Image must be paired with "text"

image

In my eventual use case, we cannot use other(s) terminology aka "text" within our future model so need to train a new one matching our Images with our "text" - This then allows us the future ability to use the text "embedding" to isolate a feature.

My original "man/women" example may be confusing as it seems quite trivial and already solved, But my goal is the contrary. I need to train specific images with NEW terminology "text". So cannot seem to find a working example where someone has trained a GAN model with "keywords/text" other than in Diffusion models using BLIP/WD14 etc

Does this make sense?

@orpatashnik
Copy link
Owner

Two GANs I am aware of that use text as input are:

  1. GigaGAN - https://mingukkang.github.io/GigaGAN/
  2. StyleGAN-T - https://sites.google.com/view/stylegan-t/

Both of them train a GAN with a dataset consistent of (text, image) pairs. None of them has official implementation, but you can find some non-official ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants