-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(vertex_httpx.py): Support cachedContent. #4492
feat(vertex_httpx.py): Support cachedContent. #4492
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@@ -1050,6 +1053,8 @@ def completion( | |||
data["safetySettings"] = safety_settings | |||
if generation_config is not None: | |||
data["generationConfig"] = generation_config | |||
if cached_content is not None: | |||
data["cachedContent"] = cached_content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you share an example of a call with this that works?
I thought cachedcontent was a specific client that had to be pulled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
response = await client.chat.completions.create(
model="gemini-1.5-pro-001",
max_tokens=max_tokens,
messages=messages,
stream=True,
temperature=temperature,
extra_body={
"vertex_location": "us-central1",
"api_base": "https://us-central1-aiplatform.googleapis.com/v1beta1/projects/REMOVED-REMOVED/locations/us-central1/publishers/google/models/gemini-1.5-pro-001",
"cached_content": "projects/REMOVED/locations/us-central1/cachedContents/1367546174348722176",
},
)
You have to first create cached content outside of LiteLLM. https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-create#create-context-cache-sample-python
Context caches are unique to Google Cloud projects and regions, I sadly can't share mine. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can i see what your code with creating the cache + using litellm looks like?
Can't we just add it as a cache option? like s3 cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just add it as a cache option? like s3 cache
No, it's unrelated imo.
can i see what your code with creating the cache
Sure.
from vertexai.generative_models import Part, Content
from vertexai.preview import caching
contents_here: list[Content] = [
Content(role="user", parts=[Part.from_text("huge string of text here")])
]
cached_content = caching.CachedContent.create(
model_name="gemini-1.5-pro-001",
contents=contents_here,
expire_time=datetime.datetime(2024, 7, 21),
)
response = await client.chat.completions.create(
model="gemini-1.5-pro-001",
max_tokens=8192,
messages=[
{
"role": "user",
"content": "quote all everything above this message",
},
],
temperature=temperature,
extra_body={
"api_base": "https://us-central1-aiplatform.googleapis.com/v1beta1/projects/litellm-REMOVED/locations/us-central1/publishers/google/models/gemini-1.5-pro-001",
"cached_content": cached_content.resource_name,
},
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh - why use the proxy here?
since you're already using the vertex endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi might be relevant here - if you want to run the cache create call through proxy https://docs.litellm.ai/docs/proxy/pass_through
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still great to see it logged in langfuse and billing and such accounted for with the prompt itself.
Title
Adds support for context caching.
https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-use#use-context-cache-sample-python
Type
🆕 New Feature
🐛 Bug Fix
Changes
Just adds a new field. Requires using the
v1beta1/
route atm.[REQUIRED] Testing - Attach a screenshot of any new tests passing locally
If UI changes, send a screenshot/GIF of working UI fixes