Blog about visual models: onhttps://huggingface.co/blog/vlms
Interview at MIT (https://visions.media.mit.edu) with Merve Noyan https://x.com/mervenoyann of Hugging Face that covers Multimodal Open Source models, their architecture and applications. https://www.youtube.com/watch?v=_TlhKHTgWjY
Hugging Face tutorial on ColPali and Qwen vision models: https://huggingface.co/learn/cookbook/multimodal_rag_using_document_retrieval_and_vlms
More information about ColPali: https://huggingface.co/blog/manu/colpali