You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some times (and I have not found a clear interval and can be in the middle of the night even with low volume of traffic) we get paged that loki_api_v1_query_range is experiencing an abnormal p99 latency of several seconds.
exmple/loki-read loki_api_v1_query_range is experiencing 20.07s 95th percentile latency.
From the reader point of view we can see a lot of errors related to reached tail max duration limit and connection timeouts.
From a backend point of view there are no logs and only one backend that seams to do all the work, not really sure what it is doing.
Restarting the backend (kubectl rollout restart sts/loki-backend) has a clear effect on this issue bringing down memory usage way down and cpu usage down to what is more normal levels compared to when things are working normally:
And the alert clears out.
Zooming out on a seven day period we can see a trend that repeats, namely memory usage for loki backend which leads us to expect that this might be a memory leak of some kind:
To Reproduce
Steps to reproduce the behavior:
Install Loki v3.1.0 with the following helm values:
Describe the bug
Some times (and I have not found a clear interval and can be in the middle of the night even with low volume of traffic) we get paged that
loki_api_v1_query_range
is experiencing an abnormal p99 latency of several seconds.From the reader point of view we can see a lot of errors related to
reached tail max duration limit
and connection timeouts.From a backend point of view there are no logs and only one backend that seams to do all the work, not really sure what it is doing.
Restarting the backend (
kubectl rollout restart sts/loki-backend
) has a clear effect on this issue bringing down memory usage way down and cpu usage down to what is more normal levels compared to when things are working normally:And the alert clears out.
Zooming out on a seven day period we can see a trend that repeats, namely memory usage for loki backend which leads us to expect that this might be a memory leak of some kind:
To Reproduce
Steps to reproduce the behavior:
Install Loki v3.1.0 with the following helm values:
Expected behavior
Expecting query performance being tied more to usage and less to how long loki has been running.
Environment:
Loki config
The text was updated successfully, but these errors were encountered: