feat(vertex_httpx.py): Support cachedContent. #4492

Manouchehri · 2024-07-01T17:03:58Z

Title

Adds support for context caching.

https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-use#use-context-cache-sample-python

Type

🆕 New Feature
🐛 Bug Fix

Changes

Just adds a new field. Requires using the v1beta1/ route atm.

[REQUIRED] Testing - Attach a screenshot of any new tests passing locally

If UI changes, send a screenshot/GIF of working UI fixes

vercel · 2024-07-01T17:04:01Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
litellm	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jul 1, 2024 5:04pm

krrishdholakia · 2024-07-02T00:14:37Z

litellm/llms/vertex_httpx.py

@@ -1050,6 +1053,8 @@ def completion(
                data["safetySettings"] = safety_settings
            if generation_config is not None:
                data["generationConfig"] = generation_config
+            if cached_content is not None:
+                data["cachedContent"] = cached_content


can you share an example of a call with this that works?

I thought cachedcontent was a specific client that had to be pulled?

response = await client.chat.completions.create( model="gemini-1.5-pro-001", max_tokens=max_tokens, messages=messages, stream=True, temperature=temperature, extra_body={ "vertex_location": "us-central1", "api_base": "https://us-central1-aiplatform.googleapis.com/v1beta1/projects/REMOVED-REMOVED/locations/us-central1/publishers/google/models/gemini-1.5-pro-001", "cached_content": "projects/REMOVED/locations/us-central1/cachedContents/1367546174348722176", }, )

You have to first create cached content outside of LiteLLM. https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-create#create-context-cache-sample-python

Context caches are unique to Google Cloud projects and regions, I sadly can't share mine. :(

can i see what your code with creating the cache + using litellm looks like?

Can't we just add it as a cache option? like s3 cache

Can't we just add it as a cache option? like s3 cache

No, it's unrelated imo.

can i see what your code with creating the cache

Sure.

from vertexai.generative_models import Part, Content from vertexai.preview import caching contents_here: list[Content] = [ Content(role="user", parts=[Part.from_text("huge string of text here")]) ] cached_content = caching.CachedContent.create( model_name="gemini-1.5-pro-001", contents=contents_here, expire_time=datetime.datetime(2024, 7, 21), ) response = await client.chat.completions.create( model="gemini-1.5-pro-001", max_tokens=8192, messages=[ { "role": "user", "content": "quote all everything above this message", }, ], temperature=temperature, extra_body={ "api_base": "https://us-central1-aiplatform.googleapis.com/v1beta1/projects/litellm-REMOVED/locations/us-central1/publishers/google/models/gemini-1.5-pro-001", "cached_content": cached_content.resource_name, }, )

oh - why use the proxy here?

since you're already using the vertex endpoint

fyi might be relevant here - if you want to run the cache create call through proxy https://docs.litellm.ai/docs/proxy/pass_through

It's still great to see it logged in langfuse and billing and such accounted for with the prompt itself.

feat(vertex_httpx.py): Support cachedContent.

5572494

vercel bot deployed to Preview July 1, 2024 17:04 View deployment

Manouchehri requested a review from krrishdholakia July 1, 2024 22:13

Manouchehri added the high priority label Jul 1, 2024

krrishdholakia reviewed Jul 2, 2024

View reviewed changes

krrishdholakia merged commit 612af8f into BerriAI:main Jul 2, 2024
2 of 3 checks passed

Manouchehri mentioned this pull request Jul 31, 2024

Gemini API: Context Caching #4284

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vertex_httpx.py): Support cachedContent. #4492

feat(vertex_httpx.py): Support cachedContent. #4492

Manouchehri commented Jul 1, 2024

vercel bot commented Jul 1, 2024 •

edited

Loading

krrishdholakia Jul 2, 2024

Manouchehri Jul 2, 2024

krrishdholakia Jul 2, 2024

Manouchehri Jul 2, 2024

krrishdholakia Jul 2, 2024

krrishdholakia Jul 2, 2024

Manouchehri Jul 2, 2024

feat(vertex_httpx.py): Support cachedContent. #4492

feat(vertex_httpx.py): Support cachedContent. #4492

Conversation

Manouchehri commented Jul 1, 2024

Title

Type

Changes

[REQUIRED] Testing - Attach a screenshot of any new tests passing locally

vercel bot commented Jul 1, 2024 • edited Loading

krrishdholakia Jul 2, 2024

Choose a reason for hiding this comment

Manouchehri Jul 2, 2024

Choose a reason for hiding this comment

krrishdholakia Jul 2, 2024

Choose a reason for hiding this comment

Manouchehri Jul 2, 2024

Choose a reason for hiding this comment

krrishdholakia Jul 2, 2024

Choose a reason for hiding this comment

krrishdholakia Jul 2, 2024

Choose a reason for hiding this comment

Manouchehri Jul 2, 2024

Choose a reason for hiding this comment

vercel bot commented Jul 1, 2024 •

edited

Loading