Skip to content

mohsinposts/Decoding-AI-Art

Repository files navigation

Decoding-AI-Art

In recent years, vision language models achieved remarkable advancements, making them incredibly pertinent to image-text tasks such as image-text generation, visual question and answering, image-text contrastive learning, cross-modal retrieval, art and design generation, and more. Within this dynamic landscape of image-text capabilities, one technique that has gained considerable attention is known as “stable diffusion”. This deep learning model specializes in the production of high-quality images, characterized by their nuanced and precise visual content. In essence, high-quality images encapsulate intricate details while concurrently prioritizing the elimination of noise and distortions that might otherwise confuse the model during the training or fine-tuning process. In the context of this research, we employ the most advanced visual language models available to enhance the task of generating prompts for these AI-generated images. Our dataset exclusively comprises high-quality AI-generated images that are created from a third party using stable diffusion. Notably, our research centers on the fine-tuning of two preeminent visual language models, namely BLIP and GIT. BLIP and GIT are both multimodal vision-language models that undergo pretraining on diverse tasks. BLIP employs a complex architecture with separate modules for various tasks, including image-text pairing and captioning. GIT, on the other hand, uses a simpler architecture, concatenating encoded features for efficiency. Through the fine-tuning process of both these models, intriguing disparities emerge in the generated captions between these two models. Consequently, our study seeks to discern the underlying factors contributing to divergent captioning outcomes, with a particular focus on elucidating why one model consistently produces captions of enhanced accuracy compared to the other. Ultimately, achieving superior results compared to the current state-of-the-art AI-generated image-to-prompt models.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published