In recent years, vision language models achieved remarkable advancements, making them incredibly pertinent to image-text tasks such as image-text generation, visual question and answering, image-text contrastive learning, cross-modal retrieval, art and design generation, and more. Within this dynamic landscape of image-text capabilities, one technique that has gained considerable attention is known as “stable diffusion”. This deep learning model specializes in the production of high-quality images, characterized by their nuanced and precise visual content. In essence, high-quality images encapsulate intricate details while concurrently prioritizing the elimination of noise and distortions that might otherwise confuse the model during the training or fine-tuning process. In the context of this research, we employ the most advanced visual language models available to enhance the task of generating prompts for these AI-generated images. Our dataset exclusively comprises high-quality AI-generated images that are created from a third party using stable diffusion. Notably, our research centers on the fine-tuning of two preeminent visual language models, namely BLIP and GIT. BLIP and GIT are both multimodal vision-language models that undergo pretraining on diverse tasks. BLIP employs a complex architecture with separate modules for various tasks, including image-text pairing and captioning. GIT, on the other hand, uses a simpler architecture, concatenating encoded features for efficiency. Through the fine-tuning process of both these models, intriguing disparities emerge in the generated captions between these two models. Consequently, our study seeks to discern the underlying factors contributing to divergent captioning outcomes, with a particular focus on elucidating why one model consistently produces captions of enhanced accuracy compared to the other. Ultimately, achieving superior results compared to the current state-of-the-art AI-generated image-to-prompt models.
-
Notifications
You must be signed in to change notification settings - Fork 1
mohsinposts/Decoding-AI-Art
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published