Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unbound_response_time_seconds missing cached responses #52

Open
codl opened this issue Feb 23, 2023 · 2 comments
Open

unbound_response_time_seconds missing cached responses #52

codl opened this issue Feb 23, 2023 · 2 comments

Comments

@codl
Copy link

codl commented Feb 23, 2023

The help text for the unbound_response_time_seconds histogram says: "Query response time in seconds"

I thought this meant it would measure the time unbound takes to respond to every client query, however it does not seem to include queries served from cache

The munin plugin plots total cache hits along with the histogram, putting them under the lowest histogram bucket

munin chart

I'm not sure it's possible in Prometheus to do histogram quantile calculation over a histogram + another stray series interpreted as an extra bucket. Perhaps unbound_response_time_seconds should include cache hits in the lowest bucket? At least this should be documented

@jsha
Copy link
Collaborator

jsha commented Feb 23, 2023

An interesting question! Cache hits and cache misses will have a completely different distribution, so it's probably hard to represent them nicely in a single set of histogram buckets. We could add a label cache="hit" vs cache="miss" but the buckets would still be suboptimal for one or the other situation.

I can also see, though, why you would be interested in the question of "what is the performance my end-users see, covering both hits and misses."

it does not seem to include queries served from cache

Can I ask what you're basing this on? I don't know one way or the other what the answer is.

@codl
Copy link
Author

codl commented Feb 24, 2023

I can also see, though, why you would be interested in the question of "what is the performance my end-users see, covering both hits and misses."

That's exactly it 🙂

Can I ask what you're basing this on?

It was a guess based on some surprising results I was seeing on my dashboard, reinforced by checking out the munin setup, and then experimentation confirmed my guess.

I started a new unbound server and repeated the same query a few times, checking unbound-control stats_noreset after each query, and found that the first answer was counted in one of the buckets and subsequent answers were not. I also found through experimentation that background "prefetch" queries don't seem to be counted in the histogram either. I thought maybe the histogram measured outgoing recursion time, regardless of whether it is user-facing or not.

Caveat emptor, I didn't check local authority zones, forward zones, etc, I can't say if those are counted or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants