Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image Captioning using Finetuned PaliGemma2 #112

Closed
sitamgithub-MSIT opened this issue Dec 6, 2024 · 8 comments
Closed

Image Captioning using Finetuned PaliGemma2 #112

sitamgithub-MSIT opened this issue Dec 6, 2024 · 8 comments

Comments

@sitamgithub-MSIT
Copy link
Contributor

sitamgithub-MSIT commented Dec 6, 2024

Description of the feature request:

Recently, PaliGemma2 was released. We also have a dedicated folder for PaliGemma2. Both of the notebooks are actually about the fine-tuning domain using Jax and Keras. But we also need an inferencing notebook with PaliGemma2 that is missing right now. So my proposed notebook will add an inferencing notebook using Keras with the latest released fine-tuned checkpoint for the image captioning task.

What problem are you trying to solve with this feature?

With this PaliGemma2 release, we have a finetuned checkpoint for the DOCCI dataset. This can be a great use case for showing the image captioning task with this latest checkpoint. We can extend the feature to multilingual use cases as well.

Any other information you'd like to share?

The notebook will run the 3B PaliGemma2 version with bfloat16 that eventually can be run in the Colab T4 GPU via the multibackend Keras and Keras Hub.

cc: @windmaple

@windmaple
Copy link
Collaborator

@jethac @bebechien who actually worked on PaliGemma2 launch, any comment?

@bebechien
Copy link
Collaborator

Both of us worked on. We mainly focused on fine-tuning notebook for the launch but I agree with this request. I'll add a notebook with DOCCI soon.

@sitamgithub-MSIT
Copy link
Contributor Author

@windmaple @bebechien I want to say that we are using Keras CV and Keras NLP packages for Keras-specific and other examples. But recently Keras CV and Keras NLP merged into one package Keras Hub. The package is officially released in Pypi and further model releases will be there from now on. Keras's official documentation is also updated. Now do we plan to switch to Keras Hub for existing examples? As it is very easy, instead of keras nlp we need to install the Keras Hub and replace keras_nlp with keras_hub. That's it. Need your opinions on this.

@bebechien
Copy link
Collaborator

@windmaple @bebechien I want to say that we are using Keras CV and Keras NLP packages for Keras-specific and other examples. But recently Keras CV and Keras NLP merged into one package Keras Hub. The package is officially released in Pypi and further model releases will be there from now on. Keras's official documentation is also updated. Now do we plan to switch to Keras Hub for existing examples? As it is very easy, instead of keras nlp we need to install the Keras Hub and replace keras_nlp with keras_hub. That's it. Need your opinions on this.

Correct, but no rushes, since all existing usage will continue to work and we are in the middle of transition.
keras-team/keras-hub#1831

@sitamgithub-MSIT
Copy link
Contributor Author

@bebechien Hello! In this image captioning context, we also have the ONNX model of PaliGemma2 that can be run with transformers.js. I experimented with it and prepared a Colab notebook to run the PaliGemma2 model with a Node.js application. Here is the link: https://colab.research.google.com/drive/1Ne6-j905479dmtlCMfyqiHCTD60LCNRN?usp=sharing

You can view it and let me know your opinion on whether it is needed for example contribution or not.

@bebechien
Copy link
Collaborator

@bebechien Hello! In this image captioning context, we also have the ONNX model of PaliGemma2 that can be run with transformers.js. I experimented with it and prepared a Colab notebook to run the PaliGemma2 model with a Node.js application. Here is the link: https://colab.research.google.com/drive/1Ne6-j905479dmtlCMfyqiHCTD60LCNRN?usp=sharing
You can view it and let me know your opinion on whether it is needed for example contribution or not.

Looks great to me! I think the example demonstrates how to perform inference with Node.js. It's particularly useful for those who want to run the model directly in their browser without needing a server.

@sitamgithub-MSIT
Copy link
Contributor Author

sitamgithub-MSIT commented Dec 10, 2024

@bebechien Hello! In this image captioning context, we also have the ONNX model of PaliGemma2 that can be run with transformers.js. I experimented with it and prepared a Colab notebook to run the PaliGemma2 model with a Node.js application. Here is the link: https://colab.research.google.com/drive/1Ne6-j905479dmtlCMfyqiHCTD60LCNRN?usp=sharing
You can view it and let me know your opinion on whether it is needed for example contribution or not.

Looks great to me! I think the example demonstrates how to perform inference with Node.js. It's particularly useful for those who want to run the model directly in their browser without needing a server.

Yes, that's the purpose of the notebook. Should I create a PR around it?

@bebechien
Copy link
Collaborator

Yes! And please follow the guide on contributing before sending a PR. Thanks!

Let me close this issue since we added the image captioning example with PaliGemma2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants