Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add examples/gke/tgi-multi-lora-deployment #102

Merged
merged 20 commits into from
Oct 10, 2024
Merged

Conversation

alvarobartt
Copy link
Member

@alvarobartt alvarobartt commented Sep 26, 2024

Description

This PR adds an example on how to deploy TGI via the Hugging Face DLC for Gemma2 using multiple LoRA adapters for inference on a single NVIDIA L4 instance.

The three adapters have been fine-tuned in collaboration with @Jofthomas and can be found under the https://hf.co/google-cloud-partnership org on the Hub (still private, datasets can be moved there too):

cc @philschmid for a potential Cloud Tuesday post, @Jofthomas for his presentation on the upcoming Gemma Developer Day in Tokyo, and @pagezyhf for visibility on the example itself

And kudos to @Narsil for support on reviewing and merging huggingface/text-generation-inference#2567, and @datavistics et al for their post at https://huggingface.co/blog/multi-lora-serving

Additionally

This PR also includes the scripts/internal/update_example_tables.py script, which is being internally used to automatically generate the tables with the examples across the different files within this repository, to be automated on another PR.

This would temporarily make things easier to maintain, as when adding a new example one can just python scripts/internal/update_example_tables.py in the meantime to update those.

To also include modifications on `examples`, `Makefile`, and
`docs/scripts` or anything under `docs/`
Still pending on the official release of the latest TGI DLC on Google
Cloud
@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…run,gke,vertex-ai}/README.md`

Update example listing via `python
scripts/internal/update_example_tables.py` so as to automatically
generate those listings (respecting the previous content within the
file, to alphabetically sort those as Vertex AI > GKE > Cloud Run, and
some more fixes and improvements
To include the `examples/` prefix within the paths to the examples from
the root directory i.e. in the `README.md` file
@alvarobartt alvarobartt merged commit ceec771 into main Oct 10, 2024
1 check passed
@alvarobartt alvarobartt deleted the multi-lora-example branch October 10, 2024 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants