A collection of papers, datasets, and evaluations for personalized text-to-image generation.
Content:
Personalized text-to-image (P-T2I) generation aims to acquire a new concept from a limited set of reference images and generate target images embodying this novel concept. The ability to generate such images holds significant value for various applications, including image editing, enhancing classifier performance, and bolstering robustness, among others.
This GitHub repository aims to provide the necessary resources for people interested in this line of research.
- An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion, (ICLR'23) [project] [paper] [code]
- DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation, (CVPR'23) [project] [paper] [code*]
- Multi-Concept Customization of Text-to-Image Diffusion, (CVPR'23) [project] [paper] [code]
- Multiresolution Textual Inversion, (NeurIPS-W'22) [paper] [code]
- Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models, (arXiv - Feb'23) [project] [paper]
- ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation, (arXiv - Feb'23) [paper] [code] [demo]
- Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion, (arXiv - March'23) [project] [paper] [code]
- P+: Extended Textual Conditioning in Text-to-Image Generation, (arXiv - March'23) [project] [paper]
- Cones: Concept neurons in diffusion models for customized generation, (arXiv - March'23) [paper] [code]
- InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning, (arXiv - April'23) [project] [paper]
- Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach, (arXiv - May'23) [paper]
- BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing, (arXiv - May'23) [project] [paper] [code]
- Break-A-Scene: Extracting Multiple Concepts from a Single Image, (arXiv - May'23) [project] [paper]
- A Neural Space-Time Representation for Text-to-Image Personalization, (arXiv - May'23) [project] [paper] [code]
- Photoswap: Personalized Subject Swapping in Images, (arXiv - May'23) [project] [paper]
- Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models, (arXiv - May'23) [project] [paper] [code]
- Concept Decomposition for Visual Exploration and Inspiration, (arXiv - May'23) [project] [paper]
- Inserting Anybody in Diffusion Models via Celeb Basis, (arXiv - June'23) [project] [paper] [code]
- ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation, (arXiv - June'23) [paper] [code]
- Face0: Instantaneously Conditioning a Text-to-Image Model on a Face, (arXiv - June'23) [paper]
- ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models, (arXiv - June'23) [project] [paper] [code] [demo]
- Controlling Text-to-Image Diffusion by Orthogonal Finetuning, (arXiv - June'23) [project] [paper] [code]
- Fréchet Inception Distance (FID) -- For Image Quality [Paper] [Python Code (Pytorch)]
- Kernel Inception Distance (KID) -- For Concept Overfitting [Paper] [Python Code (Pytorch)]
- DINO Cosine Similarity -- For Concept Similarity [Paper] [Code]
- CLIPScore -- For Image-Text Alignment [Paper] [Code]
- Concept Confidence Deviation (CCD) -- For Concept & Composition Alignment [paper] [code]
We highly encourage community contributions. Feel free to create either issue or a pull request.