-
Notifications
You must be signed in to change notification settings - Fork 437
Need the option to mask the input and output of the LLM API in Datadog LLM observability #11179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @Yun-Kim |
Hi, |
Hi @Gekko0114! Are you using VertexAI to use Gemini? If so, we have an integration coming in the next few weeks which will include controls to omit the input and output. Also, we're looking for people to try the integration as we develop it, would you be interested in partnering with us for this? |
Yes, I am using VertexAI. Sounds great!
No problem! What should I do? |
Awesome, @Gekko0114 are you in our public Slack? If not could you join and we can follow up with you once we have a build ready for you to try! |
Hi @Kyle-Verhoog |
Not sure if it's off topic. I tried with setting env I want to collect the metrics but hide the input completely or having option to mask the sensitive information like phone number and email, I wonder if we could get flexibility to control over that. |
@Gekko0114 you should be able to find me under "Kyle Verhoog", send me a message! |
@yj-ang this sounds like a bug 🤔. I will do some investigation and get back to you. |
This issue has been automatically closed after a period of inactivity. If it's a |
Kinda shocked this issue is fully ignored, make us feel like no serious company is using LLMObs by Datadog. We're on openai's zero data retention enterprise tier and need to ensure data is not logged for several of our customers. For now we'll try this applying this monkeypatching workaround but will likely switch to LangFuse or Logfire for a more robust solution. original_llmobs_span_event = LLMObs._instance._llmobs_span_event
def obfuscating_llmobs_span_event(span: Span) -> dict[str, Any]:
# TODO: check span tags to see if customer needs zero data retention; return intact event if not
event = original_llmobs_span_event(span)
if event["meta"].get("input", {}).get("messages"):
event["meta"]["input"]["messages"] = [
{
"role": message["role"],
"content": "<OBFUSCATED>"
if message["content"] is not None
else None,
}
for message in event["meta"]["input"]["messages"]
]
if event["meta"].get("output", {}).get("messages"):
event["meta"]["output"]["messages"] = [
{
"role": message["role"],
"content": "<OBFUSCATED>"
if message["content"] is not None
else None,
}
for message in event["meta"]["output"]["messages"]
]
return event
LLMObs._instance._llmobs_span_event = obfuscating_llmobs_span_event |
We landed on this solution:
|
Hi @underyx! Thanks for following up and posting a work-around. I can assure you the issue hasn't been ignored (despite the Github automation closing the issue) and we have it on our roadmap to be implemented in the next couple of weeks. |
Add capability to add a span processor. The processor can be used to mutate or redact sensitive data contained in inputs and outputs from LLM calls. ```python from ddtrace.llmobs import LLMObsSpan def my_processor(span: LLMObsSpan): for message in span.output: message["content"] = "" LLMObs.enable(span_processor=my_processor) LLMObs.register_processor(my_processor) ``` Public docs: DataDog/documentation#29365 Shared tests: TODO Closes: #11179
Add capability to add a span processor. The processor can be used to mutate or redact sensitive data contained in inputs and outputs from LLM calls. ```python from ddtrace.llmobs import LLMObsSpan def my_processor(span: LLMObsSpan): for message in span.output: message["content"] = "" LLMObs.enable(span_processor=my_processor) LLMObs.register_processor(my_processor) ``` Public docs: DataDog/documentation#29365 Shared tests: TODO Closes: #11179 (cherry picked from commit cc8e98c)
Backport cc8e98c from #13426 to 3.8. Add capability to add a span processor. The processor can be used to mutate or redact sensitive data contained in inputs and outputs from LLM calls. ```python from ddtrace.llmobs import LLMObsSpan def my_processor(span: LLMObsSpan): for message in span.output: message["content"] = "" LLMObs.enable(span_processor=my_processor) LLMObs.register_processor(my_processor) ``` Public docs: DataDog/documentation#29365 Shared tests: TODO Closes: #11179 ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) [](https://datadoghq.atlassian.net/browse/MLOB-2712) Co-authored-by: kyle <[email protected]>
The input and output of the LLM API contain sensitive information, so I don't want to send them to Datadog.
I would like to have an option to send data to Datadog with the input and output masked.
If it's okay, I want to create a PR for this.
Previously I raised the same question in this issue #10517 .
In this issue, I heard that APM integration dashboards should be used for these four providers (OpenAI/Bedrock/LangChain/Anthropic).
However, I would like to monitor Gemini.
According to #10971, Gemini will not be supported in APM integration because LLM obs will support Gemini.
Therefore I need an option to mask the input and output of the LLM API in Datadog LLM observability.
If it is okay, I would like to create a PR for this.
The text was updated successfully, but these errors were encountered: