[Feature]: Context Caching for Vertex AI #6898

DreamGenX · 2024-11-25T13:54:03Z

The Feature

It looks like Gemini context caching does not work when using Vertex AI.
Doing a cursory search, it looks like this part would need to be implemented to support Vertex AI:

litellm/litellm/llms/vertex_ai_and_google_ai_studio/gemini/transformation.py

Line 392 in c73ce95

await context_caching_endpoints.async_check_and_create_cache(

Motivation, pitch

Context cachin is already supported using the Gemini API and it's a good way to reduce costs.

Twitter / LinkedIn details

No response

codenprogressive · 2024-12-06T03:53:52Z

+1
Any updates on enabling context caching for LLMs (Gemini, Claude, etc) available through VertexAI?

krrishdholakia · 2024-12-08T07:14:17Z

@codenprogressive just tested claude on vertex ai - it doesn't look like prompt caching is available there yet.

I plan on picking up vertex ai context caching this week

codenprogressive · 2024-12-08T15:33:58Z

Hey @krrishdholakia, indeed Claude on Vertex still doesn't support prompt caching! (there is a feature request: anthropics/anthropic-sdk-python#653)

I think for now we can enable it for Gemini on VertexAI.

emerzon · 2024-12-18T04:05:27Z

AWS has started a preview for the prompt caching for Claude: https://pages.awscloud.com/promptcaching-Preview.html
Hope Vertex comes up next

emerzon · 2025-01-14T18:49:20Z

Vertex support is now live: https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude-prompt-caching#use_prompt_caching

krrishdholakia · 2025-01-15T02:33:06Z

@emerzon i believe this ticket is for vertex ai gemini, the vertex ai anthropic prompt caching issue is probably separate. i'm working on it now though, since we shouldn't be passing any extra headers to it.

… actually use optional param passed in Fixes #6898 (comment)

* build(pyproject.toml): bump uvicorn depedency requirement Fixes BerriAI#7768 * fix(anthropic/chat/transformation.py): fix is_vertex_request check to actually use optional param passed in Fixes BerriAI#6898 (comment) * fix(o1_transformation.py): fix azure o1 'is_o1_model' check to just check for o1 in model string BerriAI#7743 * test: load vertex creds

DreamGenX added the enhancement New feature or request label Nov 25, 2024

krrishdholakia self-assigned this Nov 25, 2024

krrishdholakia added a commit that referenced this issue Jan 15, 2025

fix(anthropic/chat/transformation.py): fix is_vertex_request check to…

bce36a8

… actually use optional param passed in Fixes #6898 (comment)

krrishdholakia closed this as completed in 8353caa Jan 15, 2025

krrishdholakia mentioned this issue Jan 15, 2025

build(pyproject.toml): bump uvicorn depedency requirement + Azure o1 model check fix + Vertex Anthropic headers fix #7773

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Context Caching for Vertex AI #6898

[Feature]: Context Caching for Vertex AI #6898

DreamGenX commented Nov 25, 2024

codenprogressive commented Dec 6, 2024

krrishdholakia commented Dec 8, 2024

codenprogressive commented Dec 8, 2024

emerzon commented Dec 18, 2024

emerzon commented Jan 14, 2025

krrishdholakia commented Jan 15, 2025

[Feature]: Context Caching for Vertex AI #6898

[Feature]: Context Caching for Vertex AI #6898

Comments

DreamGenX commented Nov 25, 2024

The Feature

Motivation, pitch

Twitter / LinkedIn details

codenprogressive commented Dec 6, 2024

krrishdholakia commented Dec 8, 2024

codenprogressive commented Dec 8, 2024

emerzon commented Dec 18, 2024

emerzon commented Jan 14, 2025

krrishdholakia commented Jan 15, 2025