Implement Vision Language Model chapter #49

burtenshaw · 2024-12-05T20:28:09Z

We need to implement the section on VLMs. It should be based on existing content from the huggingface ecosystem, adapted for SmolVLM, adapted to the course structure, and offer exercises.

Material

Steps to do

add prose that explains vlm in simplest possible terms
add references to all pages
add exercise notebook on finetuning SmolVLM

duydl · 2024-12-06T01:16:45Z

Hi, could the issue be assigned?

burtenshaw · 2024-12-06T06:14:34Z

Hi, could the issue be assigned?

Of course, I expect others will contribute too, but open a draft PR with the outline and LGTM.

jungnerd · 2024-12-07T04:35:56Z

Hi there 👋🏻
I'd like to contribute to this issue by providing an exercise notebook demonstrating how to fine-tune VLM with TRL?
The notebook will be based on How to Fine-Tune Multimodal Models or VLMs with Hugging Face TRL but using a SmolVLM instead.

burtenshaw · 2024-12-07T06:13:03Z

That's great @jungnerd . Thanks.

We're already working on the text for the vlm chapter here: #59 . There's a first draft of the notebook from @duydl .

Why don't Open a PR on to #59 with an exercise notebook?

duydl · 2024-12-07T12:25:34Z

@jungnerd I had been working on sft for vlm and just finished it. There is still the dpo approach notebook, could be based on this blog Hugging Face Blog: Preference Optimization for VLMs, and probably lots of fixes and optimizations in completed the markdown and notebooks for vlm.

jungnerd · 2024-12-07T13:39:38Z

@duydl Am I understanding correctly that you’re suggesting creating a notebook to fine-tune SmolVLM using DPO? If so, that sounds great! I think it would be really exciting to work on a fine-tuning notebook with DPO!

duydl mentioned this issue Dec 6, 2024

[MODULE] Implement Chapter 5: Vision Language Model #59

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Vision Language Model chapter #49

Implement Vision Language Model chapter #49

burtenshaw commented Dec 5, 2024

duydl commented Dec 6, 2024

burtenshaw commented Dec 6, 2024

jungnerd commented Dec 7, 2024

burtenshaw commented Dec 7, 2024 •

edited

Loading

duydl commented Dec 7, 2024 •

edited

Loading

jungnerd commented Dec 7, 2024

Implement Vision Language Model chapter #49

Implement Vision Language Model chapter #49

Comments

burtenshaw commented Dec 5, 2024

Material

Steps to do

duydl commented Dec 6, 2024

burtenshaw commented Dec 6, 2024

jungnerd commented Dec 7, 2024

burtenshaw commented Dec 7, 2024 • edited Loading

duydl commented Dec 7, 2024 • edited Loading

jungnerd commented Dec 7, 2024

burtenshaw commented Dec 7, 2024 •

edited

Loading

duydl commented Dec 7, 2024 •

edited

Loading