Preventing One Noisy Pod from Overloading kubernetes_logs Source in Vector #22040

caesarxuchao · 2024-12-16T18:44:01Z

caesarxuchao
Dec 16, 2024

Our Setup

We run pods from different services on the same Kubernetes node and use a Vector daemon pod on the node to collect logs using the kubernetes_logs source, apply some transformation, and eventually send to the datadog_logs sink.

The Problem

If one service was buggy and generated a large volume of logs, we observed that the kubernetes_logs source could consume excessive memory, leading to an OOM.

We are looking for a way to achieve "fair sharing" or "resource isolation" so that a single misbehaving service does not impact the log collection of other colocated pods.

What We Have Tried

We have tried using the throttling filter, which can enforce rate limits per pod or per namespace. However, since the throttling filter is applied downstream of the kubernetes_logs source, it doesn't prevent the source from processing the buggy pod's logs at a high speed (and consuming a lot of memory). In fact, because the throttling filter discards messages quickly, it doesn't create backpressure for kubernetes_logs, causing it to process logs even faster compared to when backpressure is applied by the datadog_logs sink.
If kubernetes_logs itself provided a configuration option to limit the rate of log lines read from each k8s pod or namespace, it would satisfy our use case. However, such a configuration option doesn’t seem to exist.
We have also tried using the request.rate_limit_num setting of the datadog_logs sink. It successfully creates backpressure to the kubernetes_logs source and causes it to slow down processing incoming logs, preventing the excessive memory consumption. But this approach does not provide fair sharing, meaning logs from the normal pods are also blocked when the rate limit is reached.

Related issues

#17123 Limit memory usage for sources - Make vector 'resource aware': If this issue is resolved, it would help prevent our Vector daemon from OOMing. However, in addition to that, we also want to enforce "fair sharing", which remains unsolved by the solutions proposed in that issue.

jszwedko · 2024-12-17T17:24:08Z

jszwedko
Dec 17, 2024
Maintainer

I think if you set https://vector.dev/docs/reference/configuration/sources/kubernetes_logs/#oldest_first to false it will read from the inputs fairly rather than getting pegged reading from the noisy pod. Vector will hold open the files until it reads them all, though, so you may want to also set https://vector.dev/docs/reference/configuration/sources/kubernetes_logs/#rotate_wait_secs to drop the noisy pod log files after rotation.

2 replies

caesarxuchao Dec 18, 2024
Author

Thanks Jesse. I experimented with oldest_first=false, but I saw vector's memory consumption still surged when there was a noisy pod present.
I guess I need to set datadog_logs's request.rate_limit_num to limit the rate events are sent to datadog (thus limiting the memory consumption), and at the same time set oldest_first=false for the kubernetes_logs to enforce fair-sharing?
It's not obvious what value I should set the request.rate_limit_num to so that vector's memory consumptions stays under the OOM limit. Please let me know if there is a better approach.

jszwedko Dec 18, 2024
Maintainer

That sounds about right. You may need to limit concurrency in the sink to avoid memory growth through the creation of concurrent batches.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preventing One Noisy Pod from Overloading kubernetes_logs Source in Vector #22040

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Preventing One Noisy Pod from Overloading kubernetes_logs Source in Vector #22040

caesarxuchao Dec 16, 2024

Our Setup

The Problem

What We Have Tried

Related issues

Replies: 1 comment · 2 replies

jszwedko Dec 17, 2024 Maintainer

caesarxuchao Dec 18, 2024 Author

jszwedko Dec 18, 2024 Maintainer

caesarxuchao
Dec 16, 2024

Replies: 1 comment 2 replies

jszwedko
Dec 17, 2024
Maintainer

caesarxuchao Dec 18, 2024
Author

jszwedko Dec 18, 2024
Maintainer