Preventing One Noisy Pod from Overloading kubernetes_logs Source in Vector #22040
Unanswered
caesarxuchao
asked this question in
Q&A
Replies: 1 comment 2 replies
-
I think if you set https://vector.dev/docs/reference/configuration/sources/kubernetes_logs/#oldest_first to |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Our Setup
We run pods from different services on the same Kubernetes node and use a Vector daemon pod on the node to collect logs using the kubernetes_logs source, apply some transformation, and eventually send to the datadog_logs sink.
The Problem
If one service was buggy and generated a large volume of logs, we observed that the kubernetes_logs source could consume excessive memory, leading to an OOM.
We are looking for a way to achieve "fair sharing" or "resource isolation" so that a single misbehaving service does not impact the log collection of other colocated pods.
What We Have Tried
We have tried using the throttling filter, which can enforce rate limits per pod or per namespace. However, since the throttling filter is applied downstream of the kubernetes_logs source, it doesn't prevent the source from processing the buggy pod's logs at a high speed (and consuming a lot of memory). In fact, because the throttling filter discards messages quickly, it doesn't create backpressure for kubernetes_logs, causing it to process logs even faster compared to when backpressure is applied by the datadog_logs sink.
If kubernetes_logs itself provided a configuration option to limit the rate of log lines read from each k8s pod or namespace, it would satisfy our use case. However, such a configuration option doesn’t seem to exist.
We have also tried using the request.rate_limit_num setting of the datadog_logs sink. It successfully creates backpressure to the kubernetes_logs source and causes it to slow down processing incoming logs, preventing the excessive memory consumption. But this approach does not provide fair sharing, meaning logs from the normal pods are also blocked when the rate limit is reached.
Related issues
#17123 Limit memory usage for sources - Make vector 'resource aware': If this issue is resolved, it would help prevent our Vector daemon from OOMing. However, in addition to that, we also want to enforce "fair sharing", which remains unsolved by the solutions proposed in that issue.
Beta Was this translation helpful? Give feedback.
All reactions