Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MODULE] Implement Chapter 5: Vision Language Model #59

Merged
merged 21 commits into from
Dec 16, 2024

Conversation

duydl
Copy link
Collaborator

@duydl duydl commented Dec 6, 2024

PR for Issue #49: Implement Vision Language Model chapter

Description

This PR adds the Vision Language Model (VLM) chapter.

Changes Introduced

  • Overview of VLMs

    • Defined Vision Language Models and their capabilities.
    • Highlighted applications like image captioning, visual question answering, and multimodal reasoning.
    • Linked to the detailed VLM Usage page.
    • Complete VLM Usage page
  • Fine-Tuning Guide

    • Explained the process and importance of fine-tuning for specific tasks.
    • Linked to the detailed VLM Fine-Tuning page.
    • Complete VLM Fine-Tuning page.
  • Exercise Notebooks

    • Added two Jupyter Notebooks:
      • vlm_usage_sample.ipynb: Demonstrates pre-trained VLM usage for tasks such as image and video processing.
      • vlm_finetune_sample.ipynb: Guides fine-tuning a VLM for various datasets and advanced methods.
    • Includes examples and tiered exercises for learners.

Copy link
Collaborator

@burtenshaw burtenshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

This looks like a great structure. I've reviewed the text but not the notebook. Let's get the text in place then move on to the notebook last.

5_vision_language_models/README.md Outdated Show resolved Hide resolved
5_vision_language_models/README.md Outdated Show resolved Hide resolved
5_vision_language_models/README.md Outdated Show resolved Hide resolved
5_vision_language_models/vlm_finetuning.md Show resolved Hide resolved
5_vision_language_models/vlm_finetuning.md Outdated Show resolved Hide resolved
5_vision_language_models/vlm_usage.md Outdated Show resolved Hide resolved
@duydl duydl marked this pull request as ready for review December 7, 2024 08:21
@duydl
Copy link
Collaborator Author

duydl commented Dec 7, 2024

@burtenshaw I think it is ready for a review.

@burtenshaw
Copy link
Collaborator

@burtenshaw I think it is ready for a review.

Nice work! Let's get the notebook work from here in, and then I'll get a reviewer on it.

@duydl
Copy link
Collaborator Author

duydl commented Dec 7, 2024

@burtenshaw I got the notebook working, though the training would take some time on my hardware.

@duydl
Copy link
Collaborator Author

duydl commented Dec 15, 2024

@burtenshaw Seem like this should be merged by tomorrow. Sorry, I got unexpected busy and could not work on this. Let see what I can add before the deadline...

@burtenshaw
Copy link
Collaborator

@burtenshaw Seem like this should be merged by tomorrow. Sorry, I got unexpected busy and could not work on this. Let see what I can add before the deadline...

No worries. I am currently merging modules on their release day, so I'll do this tomorrow.

@burtenshaw burtenshaw merged commit f43a962 into huggingface:main Dec 16, 2024
zcasanova pushed a commit to zcasanova/smol-course that referenced this pull request Dec 27, 2024
[MODULE] Implement Chapter 5: Vision Language Model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants