Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(vertex_httpx.py): Support cachedContent. #4492

Merged
merged 1 commit into from
Jul 2, 2024

Conversation

Manouchehri
Copy link
Collaborator

Title

Adds support for context caching.

https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-use#use-context-cache-sample-python

Type

🆕 New Feature
🐛 Bug Fix

Changes

Just adds a new field. Requires using the v1beta1/ route atm.

[REQUIRED] Testing - Attach a screenshot of any new tests passing locally

If UI changes, send a screenshot/GIF of working UI fixes

image

Copy link

vercel bot commented Jul 1, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
litellm ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 1, 2024 5:04pm

@@ -1050,6 +1053,8 @@ def completion(
data["safetySettings"] = safety_settings
if generation_config is not None:
data["generationConfig"] = generation_config
if cached_content is not None:
data["cachedContent"] = cached_content
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you share an example of a call with this that works?

I thought cachedcontent was a specific client that had to be pulled?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response = await client.chat.completions.create(
    model="gemini-1.5-pro-001",
    max_tokens=max_tokens,
    messages=messages,
    stream=True,
    temperature=temperature,
    extra_body={
        "vertex_location": "us-central1",
        "api_base": "https://us-central1-aiplatform.googleapis.com/v1beta1/projects/REMOVED-REMOVED/locations/us-central1/publishers/google/models/gemini-1.5-pro-001",
        "cached_content": "projects/REMOVED/locations/us-central1/cachedContents/1367546174348722176",
    },
)

You have to first create cached content outside of LiteLLM. https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-create#create-context-cache-sample-python

Context caches are unique to Google Cloud projects and regions, I sadly can't share mine. :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can i see what your code with creating the cache + using litellm looks like?

Can't we just add it as a cache option? like s3 cache

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just add it as a cache option? like s3 cache

No, it's unrelated imo.

can i see what your code with creating the cache

Sure.

from vertexai.generative_models import Part, Content
from vertexai.preview import caching

contents_here: list[Content] = [
    Content(role="user", parts=[Part.from_text("huge string of text here")])
]
cached_content = caching.CachedContent.create(
    model_name="gemini-1.5-pro-001",
    contents=contents_here,
    expire_time=datetime.datetime(2024, 7, 21),
)

response = await client.chat.completions.create(
    model="gemini-1.5-pro-001",
    max_tokens=8192,
    messages=[
            {
                "role": "user",
                "content": "quote all everything above this message",
            },
        ],
    temperature=temperature,
    extra_body={
        "api_base": "https://us-central1-aiplatform.googleapis.com/v1beta1/projects/litellm-REMOVED/locations/us-central1/publishers/google/models/gemini-1.5-pro-001",
        "cached_content": cached_content.resource_name,
    },
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh - why use the proxy here?

since you're already using the vertex endpoint

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi might be relevant here - if you want to run the cache create call through proxy https://docs.litellm.ai/docs/proxy/pass_through

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still great to see it logged in langfuse and billing and such accounted for with the prompt itself.

@krrishdholakia krrishdholakia merged commit 612af8f into BerriAI:main Jul 2, 2024
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants