Langfuse overwriting ingested usage for gpt-4o-mini models & incorrect inferred usage for low-detail image data #4231

LastRemote · 2024-11-14T12:29:51Z

LastRemote
Nov 14, 2024

I am using Langfuse low-level Python SDK to track my gpt-4o-models usage. I use a generation to track model i/o, TTFT and usage information. I found two issues here:

Langfuse returns incorrect usage if I set "detail": "low" according to OpenAI vision guide. The model returns 'usage': {'completion_tokens': 120, 'prompt_tokens': 5683, 'total_tokens': 5803}, but I got 15,631 → 120 (∑ 15,751) from Langfuse.
Even if I send the usage to Langfuse client, Langfuse still prefers its inferred usage, which is not intended according to Langfuse documentation. I tried to update the model (in the generation) to something else, and it tracked the usage data correctly. I am attaching a screenshot of my experiments below:

I am using a self-hosted Langfuse (v2.81.0 OSS) for these experiments. I hope it helps.

@dosu · 2024-11-14T12:32:28Z

dosubot[bot]
bot Nov 14, 2024

Langfuse is designed to prioritize user-provided usage data over inferred data. If you're experiencing issues where Langfuse is overwriting your ingested usage data for the gpt-4o-mini models, it might be due to a specific configuration or a bug in the version you are using. According to the Langfuse documentation, ingested usage and cost data should override inferred data ^[1].

Regarding the incorrect usage data for low-detail image settings, there isn't specific information available about how Langfuse handles 'low' detail image data or any related configurations in the self-hosted version 2.81.0 OSS ^[1]. This might require further investigation or a potential update to the Langfuse system to handle such cases correctly.

To address these issues, you might want to:

Double-check your configuration settings to ensure that the usage data is being sent correctly.
Consider updating to a newer version if available, as it might contain fixes for these issues.
Review the Langfuse documentation and changelogs for any updates or notes on handling specific models or image data settings.

If the problem persists, it might be beneficial to look into the Langfuse codebase or reach out to the maintainers for more detailed troubleshooting.

To continue talking to Dosu, mention @dosu.

0 replies

marcklingen · 2024-11-14T14:03:57Z

marcklingen
Nov 14, 2024
Maintainer

How do you integrate with Langfuse, do you use the latest openai integration? Please provide more details, happy to have a look

For context, Langfuse uses tiktoken to infer token counts when none are set. This currently does not work correctly for multi-modal traces, but when using openai token counts should be captured at the source when using one of the Langfuse integrations.

6 replies

LastRemote Nov 25, 2024
Author

Hello @marcklingen , may I ask for an update on this matter? Still encountering this issue and I have to rename the model to something else at the moment.

hassiebp Dec 3, 2024
Maintainer

Hi @LastRemote - could you please share a full repro example for us to have a informed look at this? Also a brief question" in line 169 of your screenshot you are doing a span update and not a generation update, are you correctly referencing a generation client when updating the usage? 🤔

hassiebp Dec 3, 2024
Maintainer

Also upgrading your self hosted version to the latest v2 release (2.93.0) might already resolve this issue for you, too

LastRemote Dec 4, 2024
Author

Hi @LastRemote - could you please share a full repro example for us to have a informed look at this? Also a brief question" in line 169 of your screenshot you are doing a span update and not a generation update, are you correctly referencing a generation client when updating the usage? 🤔

@hassiebp It is a generation, sorry for the bad naming here (I am referring both generations and span as '_span's). And yes, I will extract a minimal working example. This might take a little time because the current tech stack is a bit complicated.

Btw, I am just curious, is the tiktoken calculation happening on the backend? I can try a backend upgrade first (or test against public langfuse) if that's the case.

hassiebp Dec 4, 2024
Maintainer

Testing against Langfuse Cloud would be good to spot any diffs and to pinpoint whether this is an SDK / backend issue. And yes, the tokenization happens server-side.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

Langfuse overwriting ingested usage for gpt-4o-mini models & incorrect inferred usage for low-detail image data #4231

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 6 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Langfuse

Langfuse overwriting ingested usage for gpt-4o-mini models & incorrect inferred usage for low-detail image data #4231

LastRemote Nov 14, 2024

Replies: 2 comments · 6 replies

dosubot[bot] bot Nov 14, 2024

marcklingen Nov 14, 2024 Maintainer

LastRemote Nov 25, 2024 Author

hassiebp Dec 3, 2024 Maintainer

hassiebp Dec 3, 2024 Maintainer

LastRemote Dec 4, 2024 Author

hassiebp Dec 4, 2024 Maintainer

LastRemote
Nov 14, 2024

Replies: 2 comments 6 replies

dosubot[bot]
bot Nov 14, 2024

marcklingen
Nov 14, 2024
Maintainer

LastRemote Nov 25, 2024
Author

hassiebp Dec 3, 2024
Maintainer

hassiebp Dec 3, 2024
Maintainer

LastRemote Dec 4, 2024
Author

hassiebp Dec 4, 2024
Maintainer