-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] TraceQL Metrics seems to be incorrect when use status=error #4608
Comments
I'm sorry but I'm not really able to use these screenshots to debug this issue. I'm not saying there's not a bug, but nothing in your issue seems inconsistent to me.
count of time is weird b/c it's bound to a hidden step. It will calculate the number of occurrences of spans that match your query over the past step duration. I would recommend using |
@joe-elliott Thanks for the reply! Sorry, let me rephrase my question. Between 10:00 and 11:00, I used the same query: However, when I tried using: This issue does not occur when I only use span.{attribute} to search. And the reason I use count_over_time instead of rate is that I want to calculate the exact number of errors each service encountered per minute. In fact, the graphs generated by rate and count_over_time are the same, except for the units. ![]() ![]() |
Service name discrepancy The service name you have blocked out is the root service name of the trace and not the service name on the span identified by your filters. If you expand those traces and look at the service names on the spans are you still seeing 30? Rate discrepancy The final image shows a rate of .33 spans / second over a 1 hour period. |
@joe-elliott |
@joe-elliott |
You are correct. I was moving fast and misread .13 as .33. Thank you for pointing that out. So I have tried to recreate this w/o luck. I'm using the following k6 script: k6 script
This is loaded and pointed at a Tempo instance something like this: docker-compose
If you review the script it creates a single span trace and then sleeps 100ms. So it should generate roughly 10 spans per second. 1 out of 3 will have status = error. Plain rate is showing about that: Rate of I do not know what is happening in your case. |
Thanks, I will try it later! |
Describe the bug
I use TraceQL metrics to get the count of failed QB query spans.
I found that when using span attributes for counting, the numbers match correctly.
However, when using status = error, the numbers are incorrect.
I used the following query to retrieve a lot of spans(around 500 spans), but in TraceQL metrics, the count is far lower than the spans retrieved.
And there is indeed a lot of status error span in tempo.
Is this a bug, or am I using metrics incorrectly?
Expected behavior
The metrics number should be match to span count.
Environment:
The text was updated successfully, but these errors were encountered: