-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(llmobs): encode llm objects in utf-8 before sending #11961
base: main
Are you sure you want to change the base?
Conversation
|
BenchmarksBenchmark execution time: 2025-01-17 05:07:22 Comparing candidate commit 88d3e48 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 394 metrics, 2 unstable metrics. |
f48a064
to
3fab554
Compare
--- | ||
fixes: | ||
- | | ||
LLM Observability: This fix resolves an issue where annotating a span with non utf-8 input/output values resulted in encoding errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLM Observability: This fix resolves an issue where annotating a span with non utf-8 input/output values resulted in encoding errors. | |
LLM Observability: This fix resolves an issue where annotating a span with non latin-1 (but valid utf-8) input/output values resulted in encoding errors. |
This PR resolves an issue in the Python SDK where non-ascii/utf8 characters being annotated on spans resulted in span payloads being dropped due to encoding errors.
In #11330 we previously added the
ensure_ascii=False
option to oursafe_json()
helper's use ofjson.dumps(...)
in order to keep non-ascii characters from being encoded multiple times into nonsense (as we were callingsafe_json()
multiple nested times while building the span event from the span tags. However this resulted in issues where non-latin1 characters (which is a subset of utf-8 and apparently the encoding scheme HTTP library relies on, which we in turn rely on to submit payloads) broke the encoding at payload submission time.To fix this, we remove the
ensure_ascii=False
option at the final write time.Also note that after #11543 we mostly centralized all of the times a span event is encoded, which is at write time and when encoding the span's input/output value fields (which can be a json dictionary format). Since we need to provide valid json formatting for the IO fields (which leads to a prettier UI display), we still need to call
json.dumps(ensure_ascii=False)
to avoid the same problem as fixed by #11330, i.e. keep the non-ascii characters unencoded until at the very end (i.e. write time)This PR also adds minor test fixtures mocking out the LLMObs back end intake to make assertions on the payloads we should be submitting to LLMObs, since previous tests were all relying on the span events prior to encoding/submission and weren't able to cover this scenario.
Checklist
Reviewer Checklist