Skip to content

Latest commit

 

History

History
12 lines (7 loc) · 643 Bytes

vision_language_models.md

File metadata and controls

12 lines (7 loc) · 643 Bytes

Visual models and multi-models

Visual models

Blog about visual models: onhttps://huggingface.co/blog/vlms

Interview at MIT (https://visions.media.mit.edu) with Merve Noyan https://x.com/mervenoyann of Hugging Face that covers Multimodal Open Source models, their architecture and applications. https://www.youtube.com/watch?v=_TlhKHTgWjY

Resources for further steps in the OCR recognition with multi-modal models

Hugging Face tutorial on ColPali and Qwen vision models: https://huggingface.co/learn/cookbook/multimodal_rag_using_document_retrieval_and_vlms

More information about ColPali: https://huggingface.co/blog/manu/colpali