Is high error rate during rollouts of thanos receive expected? #4277
Unanswered
jmichalek132
asked this question in
Questions & Answers
Replies: 2 comments 3 replies
-
Replication errors during updates are expected, however they shouldn't surface to the remote-write responses if you have enough healthy receive nodes. With your setup (6 nodes, replication factor 2), you should tolerate 2/6 down nodes. Do you have Pod Disruption Budgets set? https://github.com/thanos-io/kube-thanos/blob/f53ad9856c6f765989ea76ba8eff8dd1e77186b7/jsonnet/kube-thanos/kube-thanos-receive.libsonnet#L224 |
Beta Was this translation helpful? Give feedback.
3 replies
-
Thanks for this! Some ideas during our Contributor Hours:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I wanted to ask whether high error rate during a rollout of thanos receive is expected?
When triggering a rollout of thanos receive for e.g. by doing
kubectl rollout restart statefulset thanos-receive-staging
, we experience high error rate on all layers (http error rate, replication error rate and forward request error rate).Screenshot of metrics during a rollout:
Errors in log of thanos-receive-default:
Errors in log of thanos-receive-staging:
Our deployment.
Configuration of thanos receive default:
The one for thanos-receive is almost the same with exception of necessary modifications such as name of the statefulset etc.
Hashring json config:
Beta Was this translation helpful? Give feedback.
All reactions