-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Context Caching for Vertex AI #6898
Comments
+1 |
@codenprogressive just tested claude on vertex ai - it doesn't look like prompt caching is available there yet. I plan on picking up vertex ai context caching this week |
Hey @krrishdholakia, indeed Claude on Vertex still doesn't support prompt caching! (there is a feature request: anthropics/anthropic-sdk-python#653) I think for now we can enable it for Gemini on VertexAI. |
AWS has started a preview for the prompt caching for Claude: https://pages.awscloud.com/promptcaching-Preview.html |
Vertex support is now live: https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude-prompt-caching#use_prompt_caching |
@emerzon i believe this ticket is for vertex ai gemini, the vertex ai anthropic prompt caching issue is probably separate. i'm working on it now though, since we shouldn't be passing any extra headers to it. |
… actually use optional param passed in Fixes #6898 (comment)
* build(pyproject.toml): bump uvicorn depedency requirement Fixes BerriAI#7768 * fix(anthropic/chat/transformation.py): fix is_vertex_request check to actually use optional param passed in Fixes BerriAI#6898 (comment) * fix(o1_transformation.py): fix azure o1 'is_o1_model' check to just check for o1 in model string BerriAI#7743 * test: load vertex creds
The Feature
It looks like Gemini context caching does not work when using Vertex AI.
Doing a cursory search, it looks like this part would need to be implemented to support Vertex AI:
litellm/litellm/llms/vertex_ai_and_google_ai_studio/gemini/transformation.py
Line 392 in c73ce95
Motivation, pitch
Context cachin is already supported using the Gemini API and it's a good way to reduce costs.
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered: