Image Captioning using Finetuned PaliGemma2 #112

sitamgithub-MSIT · 2024-12-06T21:29:59Z

Description of the feature request:

Recently, PaliGemma2 was released. We also have a dedicated folder for PaliGemma2. Both of the notebooks are actually about the fine-tuning domain using Jax and Keras. But we also need an inferencing notebook with PaliGemma2 that is missing right now. So my proposed notebook will add an inferencing notebook using Keras with the latest released fine-tuned checkpoint for the image captioning task.

What problem are you trying to solve with this feature?

With this PaliGemma2 release, we have a finetuned checkpoint for the DOCCI dataset. This can be a great use case for showing the image captioning task with this latest checkpoint. We can extend the feature to multilingual use cases as well.

Any other information you'd like to share?

The notebook will run the 3B PaliGemma2 version with bfloat16 that eventually can be run in the Colab T4 GPU via the multibackend Keras and Keras Hub.

cc: @windmaple

windmaple · 2024-12-07T06:38:26Z

@jethac @bebechien who actually worked on PaliGemma2 launch, any comment?

bebechien · 2024-12-07T07:00:00Z

Both of us worked on. We mainly focused on fine-tuning notebook for the launch but I agree with this request. I'll add a notebook with DOCCI soon.

sitamgithub-MSIT · 2024-12-07T09:36:47Z

@windmaple @bebechien I want to say that we are using Keras CV and Keras NLP packages for Keras-specific and other examples. But recently Keras CV and Keras NLP merged into one package Keras Hub. The package is officially released in Pypi and further model releases will be there from now on. Keras's official documentation is also updated. Now do we plan to switch to Keras Hub for existing examples? As it is very easy, instead of keras nlp we need to install the Keras Hub and replace keras_nlp with keras_hub. That's it. Need your opinions on this.

bebechien · 2024-12-09T02:53:06Z

@windmaple @bebechien I want to say that we are using Keras CV and Keras NLP packages for Keras-specific and other examples. But recently Keras CV and Keras NLP merged into one package Keras Hub. The package is officially released in Pypi and further model releases will be there from now on. Keras's official documentation is also updated. Now do we plan to switch to Keras Hub for existing examples? As it is very easy, instead of keras nlp we need to install the Keras Hub and replace keras_nlp with keras_hub. That's it. Need your opinions on this.

Correct, but no rushes, since all existing usage will continue to work and we are in the middle of transition.
keras-team/keras-hub#1831

sitamgithub-MSIT · 2024-12-09T20:45:14Z

@bebechien Hello! In this image captioning context, we also have the ONNX model of PaliGemma2 that can be run with transformers.js. I experimented with it and prepared a Colab notebook to run the PaliGemma2 model with a Node.js application. Here is the link: https://colab.research.google.com/drive/1Ne6-j905479dmtlCMfyqiHCTD60LCNRN?usp=sharing

You can view it and let me know your opinion on whether it is needed for example contribution or not.

bebechien · 2024-12-10T09:08:50Z

@bebechien Hello! In this image captioning context, we also have the ONNX model of PaliGemma2 that can be run with transformers.js. I experimented with it and prepared a Colab notebook to run the PaliGemma2 model with a Node.js application. Here is the link: https://colab.research.google.com/drive/1Ne6-j905479dmtlCMfyqiHCTD60LCNRN?usp=sharing
You can view it and let me know your opinion on whether it is needed for example contribution or not.

Looks great to me! I think the example demonstrates how to perform inference with Node.js. It's particularly useful for those who want to run the model directly in their browser without needing a server.

sitamgithub-MSIT · 2024-12-10T12:28:54Z

@bebechien Hello! In this image captioning context, we also have the ONNX model of PaliGemma2 that can be run with transformers.js. I experimented with it and prepared a Colab notebook to run the PaliGemma2 model with a Node.js application. Here is the link: https://colab.research.google.com/drive/1Ne6-j905479dmtlCMfyqiHCTD60LCNRN?usp=sharing
You can view it and let me know your opinion on whether it is needed for example contribution or not.

Looks great to me! I think the example demonstrates how to perform inference with Node.js. It's particularly useful for those who want to run the model directly in their browser without needing a server.

Yes, that's the purpose of the notebook. Should I create a PR around it?

bebechien · 2024-12-11T00:50:31Z

Yes! And please follow the guide on contributing before sending a PR. Thanks!

Let me close this issue since we added the image captioning example with PaliGemma2

bebechien mentioned this issue Dec 9, 2024

Add PaliGemma 2 Quickstart #113

Merged

bebechien closed this as completed Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image Captioning using Finetuned PaliGemma2 #112

Image Captioning using Finetuned PaliGemma2 #112

sitamgithub-MSIT commented Dec 6, 2024 •

edited

Loading

windmaple commented Dec 7, 2024

bebechien commented Dec 7, 2024

sitamgithub-MSIT commented Dec 7, 2024

bebechien commented Dec 9, 2024

sitamgithub-MSIT commented Dec 9, 2024

bebechien commented Dec 10, 2024

sitamgithub-MSIT commented Dec 10, 2024 •

edited

Loading

bebechien commented Dec 11, 2024

Image Captioning using Finetuned PaliGemma2 #112

Image Captioning using Finetuned PaliGemma2 #112

Comments

sitamgithub-MSIT commented Dec 6, 2024 • edited Loading

Description of the feature request:

What problem are you trying to solve with this feature?

Any other information you'd like to share?

windmaple commented Dec 7, 2024

bebechien commented Dec 7, 2024

sitamgithub-MSIT commented Dec 7, 2024

bebechien commented Dec 9, 2024

sitamgithub-MSIT commented Dec 9, 2024

bebechien commented Dec 10, 2024

sitamgithub-MSIT commented Dec 10, 2024 • edited Loading

bebechien commented Dec 11, 2024

sitamgithub-MSIT commented Dec 6, 2024 •

edited

Loading

sitamgithub-MSIT commented Dec 10, 2024 •

edited

Loading