-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM on physical servers #2495
Comments
ei @epcim thank you for reporting this! Of course, I've not a right answer here but I have some questions that could help us to dig into it:
|
ha, that I forgot to mention. @Andreagit97 We experience the issue with havent tried this image change/upstream:
interesting is this part - mem hiked significantly this part correspond: to better detail even shows it was 9d quite OK, and then started to hike cross/verified with prod env, the version landed on 6.12.2022 but the the issue was first time visible straight on 24.1.2023 (appears on 1 node from dozens) but it has the same pattern: for comparison period 12/2022 - 04/2023 with all nodes on prod - is hard to read/identify the issue just from metrics, pods are killed early etc.. later on January we have increased mem limit for pod and the mem hike started to be recognisable on metrics
|
Thank you these are really interesting info that could help us in the troubleshooting! |
surprisingly, there were not much changes in rules since October 2022, new is on right side, while left is some my November version (which basically is 1:1 with October). these that changed, as far as I know (ignoring tags, lists, or proc name in .. (new binaries), etc.., and all removed)not in my current code base that has the issue |
ei thank you for the update! The bad news is that since the underlying Falco code is changed the OOM issue could be caused also by an already existing rule :/ so unfortunately we cannot restrict the investigation scope |
/milestone 0.35.0 |
@jasondellaluce will you try to simulate the issue on your side and collect metrics before 0.35? |
@epcim This issue is hard to reproduce on our side, and I think deeper testing on this specific path will not happen before 0.35. However, we're testing the latest dev Falco also with tools like Valgrind and in the most common deployment scenarios, so my suggestion will be to try out 0.35 once it's out and see if the issue still occurs. It's hard to tell if the issue is caused by your rule setup, your workload, or by a mix of the two. The most likely thing is that this could be happening within libsinsp, and that very specific workloads force the library to grow its internal state unbounded. This will definitely require further investigation. |
Same every increasing memory consumption with 0.35 (upgraded from 0.33), but our falco setup is a bit different than the one described in this issue. Deployed as a systemd unit on a VM (own hosts, so no cloud stuff), syscalls disabled ( Can't make a memory dump, because falco claims 132G virt (VM has 6GB RAM and 30GB disk.... no idea why it needs this much virt) and it seems a memory dump is trying to write 132G to disk, which obviously fails on a 30GB disk. |
@sboschman do you also reproduce this kind of memory usage when running Falco for syscalls collection, without plugins? |
@jasondellaluce we do not run falco with syscalls collection enabled at all, so not a use-case I can comment on. |
/milestone 0.36.0 |
@epcim would you be in a position to re-run some test with eBPF and libbpf stats kernel setting enabled with Falco's new experimental native metrics? Asking because I would be curious to see if spikes in memory correlate with surges in event rates (both at the tracepoints aka the libbpf stats and also in userspace which obviously depends on the syscalls you enable). Please feel free to anonymize logs and/or share an anonymized version of it on slack in a DM. What we unfortunately don't yet have in the metrics feature are the detailed syscalls counters and some other internal state related stats we aim to add for the next Falco release. |
In addition @epcim could we get more information around the cgroups version on these machines? Memory counting in the kernel can in many cases be just wrong. For example see kubernetes-sigs/kind#421 and I have also heard rumors about cgroups leaking memory. cgroups v2 has superior memory management, hence would be curious to know which cgroups version you are dealing with? Plus you also have that on host deployment, mind getting me up to speed about the exact memory metrics you base OOM for those cases (aka the non container_memory_working_set_byte cases)? Apologies if you posted that already above and I just couldn't read everything. Thanks in advance! |
Just out of curiosity this particular host is running kernel cgroups v1 or cgroups v2? Thank you! We will investigate the cgroups related memory metrics the OOM killer uses more, also @sboschman use case where the binary is only used for k8saudit logs filtering, meaning in that scenario most of the libs code is not used (no kernel driver, no sinsp state, no container engine, basically no allocations etc). Edit: And maybe also show RSS memory metric over time. |
@incertum the host is running cgroups v2:
I am experimenting with the effect of rules configuration on this. It seems that disabling all rules doesn't reproduce the issue, so I'm trying to understand if I can isolate it to specific rule/s. |
Hi @emilgelman thanks this is great news you have cgroups v2. By the way we now also have the However, I think here we need to investigate in different places more drastically (meaning going back to the drawing board) as it has also been reported for plugins only. In that case we merely do event filtering in libsinsp, so most of the libsinsp complexity does not apply which kind of narrows down the search space. I am going to prioritize 👀 into it, it likely will take some time. In addition, in case you are curious to learn more about the underlying libs and kernel drivers with respect to memory:
|
Simulated a noisy Falco config on my developer Linux box. Enabling most supported syscalls was sufficient to simulate memory issues:
Using valgrind massif heap profiler:
Reading the tbb API docs https://oneapi-src.github.io/oneTBB/main/tbb_userguide/Concurrent_Queue_Classes.html, we use the following variant Here is a staging branch to correct this: https://github.com/incertum/falco/tree/queue-capacity-outputs, what do you all think? However, the root cause is rather the entire event flow being too slow, basically we don't get to pop in time from the queue in these extreme cases, because we are seeing timeouts and also noticed heavy kernel side drops. Basically the pipe is just not holding up when trying to monitor so many syscalls even just on a more or less idle laptop. I would suggest we should re-audit the entire Falco processing and outputs engine and look for improvement areas, because when I did the same profiling with the libs |
The rationale for an unbounded queue was that the output consumer must be responsive enough to accept all the alerts produced by Falco. When the output consumer is too slow, a dedicated watchdog will emit an error message in the Otherwise, if the memory is growing but the queue is not, there might be just an implementation bug. Have you checked that? 🤔 |
Thanks @leogr all of the above is true. And for everyone reading this, unbounded queues can be a good choice and more efficient anyways if you have other controls prior. The queue filling up is one very likely cause for memory growth in real-life production. At the same time there can always be more bugs in other places. Using the heap profiler on my laptop added enough overhead / slowness to show these symptoms when having that one noisy Falco rule. Have yet to get deeper into profiling. My current recommendations: Here I would expose a queue capacity to the end user and add a default value. Have it "Experimental" so we could remove it again should we find much better ways of handling heavy event pipes in future Falco releases. We still need to discuss the recover strategy:
Sadly none of this is a solution to get Falco to work on such more heavy production servers or workload types. Opening a new ticket to discuss a re-audit of the Falco specific outputs handling #2691. Pragmatic expected outcomes are that perhaps we can improve things, however I doubt all problems will magically disappear, because we can't scale horizontally (throw more compute at the problem what is typically down in for example big data stream processing). In fact, folks want a security tool to almost consume no CPU and memory, but never drop events. Considering Falco's primary use of alerting on abnormal behavior I project that having smarter advanced anomaly detection approaches could be more promising to avoid having to deal with bursty outputs in the first place, but maybe I am biased 🙃 . Meanwhile, adopters can re-audit the syscalls they monitor (using the new |
I opened the PR to expose the configs to set a custom capacity. |
I was busy last few weeks but count to reconfigure/test next weeks all the findings on thread. |
Perfect, yes I would suggest to first try the option of being able to set a queue capacity and after test deployments we shall see if there are other issues still in terms of memory actually leaking / increasing radically over time beyond expected limits. At least the simulation above shows that this is something that currently could happen vs with the capacity in the simulation I at least didn't observe memory leaking. At the same time reminder this is not fixing the root cause, see #2495 (comment) In addition, we may need to experiment with best default values across the various settings that can control the outputs ... |
@epcim Can you try your initial config (the one pasted in the opening post), but disabling |
Issues go stale after 90d of inactivity. Mark the issue as fresh with Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with Provide feedback via https://github.com/falcosecurity/community. /lifecycle stale |
/remove-lifecycle stale |
/milestone 0.40.0 We will release Falco 0.39.0 in a couple of weeks; we would really appreciate feedback on this issue with latest Falco if possible 🙏 thanks everyone! |
This is still reproducible on 0.38.2 physical servers. the mem increases until OOM the higher evts rate is, the faster the mem increases. However, its been hard for our env to play with output queue option and we dont want to drop events. update: cgroup is under |
Thank you for reporting this. We really appreciate 🙏 If you have any chance to try Falco 0.39 (released yesterday), please let us know. |
@leogr sure thanks for helping out, may I know what changes are introduced in this new version(0.39.0) to mitigate the mem increase issue? |
There were no specific changes for mem, because we couldn't reproduce the issue. Still, since the dev cycle lasted 4 months and we merged more than 100 PRs in libs, there may be a chance of improvements anyway (due to indirect fixes). Also a further question:
Did you refer to alerts (ie. events that match a rule and output an alert) or just syscall events? I'm asking this because I'm a bit skeptical that the root problem resides in the output queue. |
Thanks No, not alerts, this evts rate I get from the metrics snapshot => And we are using http_output to push alerts/logs to sidekick but since we only push a pretty high level logs/alerts to sidekick, sidekick is basically sending nothing. some mem increase trend after we update the version to 0.38.0, then it begins increase. same for 0.38.2 |
👍
I agree the http_output is not the issue.
Good to know, thanks. I was expecting that since no significant changes were introduced in 0.38.2 Further questions:
Thanks in advance |
Yes, there are custom rules, we are heavily using proc.xname container and k8s fields
No
This we do tried on some low-load physical servers, not significant in mem but evts.rate dropped, mem still increasing slowly but this does reduce mem on cloud servers
CENTOS with 4.18.0-240.10.1.el7.x86_64 (mem consume more than rhel) These are args
Also I can share with some log with the mem increasing most fast `{"hostname":"master-4","output":"Falco metrics snapshot","output_fields":{"evt.hostname":"master-4","evt.source":"syscall","evt.time":1727960245486161869,"falco.container_memory_used_mb":858.1,"falco.cpu_usage_perc":14.1,"falco.duration_sec":542699,"falco.evts_rate_sec":107172.8,"falco.host_boot_ts":1711092869000000000,"falco.host_cpu_usage_perc":33.2,"falco.host_memory_used_mb":319066.5,"falco.host_num_cpus":48,"falco.host_open_fds":136128,"falco.host_procs_running":22,"falco.kernel_release":"4.18.0-240.10.1.ves3.el7.x86_64","falco.memory_pss_mb":826.1,"falco.memory_rss_mb":826.0,"falco.memory_vsz_mb":5751.1,"falco.n_added_fds":17038456397,"falco.n_added_threads":149146559,"falco.n_cached_fd_lookups":36451412404,"falco.n_cached_thread_lookups":44814709948,"falco.n_containers":284,"falco.n_drops_full_threadtable":0,"falco.n_failed_fd_lookups":858849095,"falco.n_failed_thread_lookups":1208309236,"falco.n_fds":3760493,"falco.n_missing_container_images":0,"falco.n_noncached_fd_lookups":19109637694,"falco.n_noncached_thread_lookups":22658314331,"falco.n_removed_fds":3874486584,"falco.n_removed_threads":159875055,"falco.n_retrieve_evts_drops":5198866225,"falco.n_retrieved_evts":5104539237,"falco.n_store_evts_drops":0,"falco.n_stored_evts":5212691647,"falco.n_threads":16093,"falco.num_evts":60253340929,"falco.num_evts_prev":60156885388,"falco.outputs_queue_num_drops":0,"falco.start_ts":1727417545878155283,"falco.version":"0.38.2","scap.engine_name":"kmod","scap.evts_drop_rate_sec":0.0,"scap.evts_rate_sec":106467.0,"scap.n_drops":12727847,"scap.n_drops_buffer_clone_fork_enter":0,"scap.n_drops_buffer_clone_fork_exit":11,"scap.n_drops_buffer_close_exit":840,"scap.n_drops_buffer_connect_enter":13,"scap.n_drops_buffer_connect_exit":13,"scap.n_drops_buffer_dir_file_enter":0,"scap.n_drops_buffer_dir_file_exit":0,"scap.n_drops_buffer_execve_enter":1,"scap.n_drops_buffer_execve_exit":1,"scap.n_drops_buffer_open_enter":285,"scap.n_drops_buffer_open_exit":11,"scap.n_drops_buffer_other_interest_enter":0,"scap.n_drops_buffer_other_interest_exit":0,"scap.n_drops_buffer_proc_exit":0,"scap.n_drops_buffer_total":12726640,"scap.n_drops_bug":0,"scap.n_drops_page_faults":1207,"scap.n_drops_perc":2.0872408656547624e-06,"scap.n_drops_prev":12727845,"scap.n_evts":59903219517,"scap.n_evts_prev":59807399239,"scap.n_preemptions":0},"priority":"Informational","rule":"Falco internal: metrics snapshot","source":"internal","time":"2024-10-03T12:57:25.486161869Z"}` some mem increasing slow {"hostname":"master-14","output":"Falco metrics snapshot","output_fields":{"evt.hostname":"master-14","evt.source":"syscall","evt.time":1727962043929149577,"falco.container_memory_used_mb":214.4,"falco.cpu_usage_perc":2.3,"falco.duration_sec":544499,"falco.evts_rate_sec":49971.3,"falco.host_boot_ts":1711094152000000000,"falco.host_cpu_usage_perc":19.9,"falco.host_memory_used_mb":158874.1,"falco.host_num_cpus":48,"falco.host_open_fds":115072,"falco.host_procs_running":33,"falco.kernel_release":"4.18.0-240.10.1.ves3.el7.x86_64","falco.memory_pss_mb":197.3,"falco.memory_rss_mb":197.2,"falco.memory_vsz_mb":5128.6,"falco.n_added_fds":1089162999,"falco.n_added_threads":7080596,"falco.n_cached_fd_lookups":16972056897,"falco.n_cached_thread_lookups":25709477955,"falco.n_containers":53,"falco.n_drops_full_threadtable":0,"falco.n_failed_fd_lookups":375818295,"falco.n_failed_thread_lookups":44083880,"falco.n_fds":914543,"falco.n_missing_container_images":0,"falco.n_noncached_fd_lookups":11857527076,"falco.n_noncached_thread_lookups":5174432536,"falco.n_removed_fds":783105364,"falco.n_removed_threads":7078333,"falco.n_retrieve_evts_drops":1415126062,"falco.n_retrieved_evts":1281757179,"falco.n_store_evts_drops":0,"falco.n_stored_evts":1300682956,"falco.n_threads":2261,"falco.num_evts":29953801359,"falco.num_evts_prev":29908827163,"falco.outputs_queue_num_drops":0,"falco.start_ts":1727417544287913581,"falco.version":"0.38.2","scap.engine_name":"kmod","scap.evts_drop_rate_sec":0.0,"scap.evts_rate_sec":49978.1,"scap.n_drops":155,"scap.n_drops_buffer_clone_fork_enter":0,"scap.n_drops_buffer_clone_fork_exit":0,"scap.n_drops_buffer_close_exit":0,"scap.n_drops_buffer_connect_enter":0,"scap.n_drops_buffer_connect_exit":0,"scap.n_drops_buffer_dir_file_enter":0,"scap.n_drops_buffer_dir_file_exit":0,"scap.n_drops_buffer_execve_enter":0,"scap.n_drops_buffer_execve_exit":0,"scap.n_drops_buffer_open_enter":0,"scap.n_drops_buffer_open_exit":0,"scap.n_drops_buffer_other_interest_enter":0,"scap.n_drops_buffer_other_interest_exit":0,"scap.n_drops_buffer_proc_exit":0,"scap.n_drops_buffer_total":153,"scap.n_drops_bug":0,"scap.n_drops_page_faults":2,"scap.n_drops_perc":0.0,"scap.n_drops_prev":155,"scap.n_evts":29956317650,"scap.n_evts_prev":29911337389,"scap.n_preemptions":0},"priority":"Informational","rule":"Falco internal: metrics snapshot","source":"internal","time":"2024-10-03T13:27:23.929149577Z"} |
Just noticed:
The delta is greater than 13 billion. cc @falcosecurity/libs-maintainers does it ring a bell? 🤔 |
Hi team, do we have any update? we can find this kind of oom on heavy load nodes(more containers like 200 more) more often than low load nodes(less containers, ,usually less than 100). and sometimes even it is not reaching the mem limit, it also crashed. |
Not yet sorry. We are investigating. May you share the log of the crashing Falco instance? Also, let us know if you noticed any difference with 0.39 |
Much thanks Not yet tried with 0.39.0, its hard to upgrade to a new version at once. but yes here is the log from the crashed instance: but as it shows, the output is not complete when crash happens. |
Thank you! I still notice the same pattern:
22.3 billions added, only 5.8 billions removed. This seems strange to me. |
Thank you for looking into this so quickly. Besides possible libs issue, is that possibility that misconfiguration for falco or some place may causing this issue also? |
Those are the counters of the file descriptors being tracked by Falco. Unless I'm missing something, it looks like fds are added to the state maintained by the libs, but not cleaned up (I don't know why). This may explain the continuous memory increase.
One configuration option that comes to my mind is: |
We have not set any accept, accept4, bind, capset, chdir, chroot, clone, clone3, close, connect, creat, dup, dup2, dup3, epoll_create, epoll_create1, eventfd, eventfd2, execve, execveat, fchdir, fcntl, fork, getsockopt, inotify_init, inotify_init1, io_uring_setup, memfd_create, mount, open, open_by_handle_at, openat, openat2, pidfd_getfd, pidfd_open, pipe, pipe2, prctl, prlimit, procexit, recvfrom, recvmsg, sendmsg, sendto, setgid, setpgid, setresgid, setresuid, setrlimit, setsid, setuid, shutdown, signalfd, signalfd4, socket, socketpair, timerfd_create, umount, umount2, userfaultfd, vfork |
This problem is still reproducible on current latest version (0.39.1). After investigating, i found that falco is using If I use jemalloc to override the default glibc For now, I will still continue to use "jemalloc" to workaround this problem, but I think in the future, falco should switch to use this lib instead of the "leaky" glibc |
That's an interesting discover @dungngo4520 ! We will definitely check jemalloc too and test which is the best one; will keep you all updated :) |
I opened a PR to port Falco release artifacts to jemalloc: #3406 🚀 |
After discussing on slack, I was told to provide information about a similar issue on our side (https://kubernetes.slack.com/archives/CMWH3EH32/p1733150481383569?thread_ts=1732623114.184849&cid=CMWH3EH32) We seem to be experiencing a memory leak happening only on one of our nodes. This only seem to happen under very particular circumstances as we have only seen it happen on one out of 5 clusters, and only on a specific node. Comparatively, pods on the other nodes and in other clusters run with very low memory (80-100MiB)
Here is another test we did increasing memory limit further to 2000MiB. We can see the linear increase of memory over time quite clearly: Reminder that for all other nodes this is only around 80-100MiB instead. Additional information: falco:
podSecurityContext:
appArmorProfile:
type: Unconfined
serviceMonitor:
create: true
labels:
app: kube-prometheus-stack
release: kube-prometheus-stack
extra:
args:
- --disable-cri-async
resources:
requests:
cpu: 100m
memory: 300Mi
limits:
cpu: "4"
memory: 2000Mi
tty: false
controller:
kind: daemonset
driver:
enabled: true
kind: modern_ebpf
modernEbpf:
leastPrivileged: true
collectors:
enabled: true
kubernetes:
enabled: false
metrics:
enabled: true
falco:
grpc:
enabled: true
grpc_output:
enabled: true
rules_files:
- /etc/falco/falco_rules.yaml
- /etc/falco/falco-incubating_rules.yaml
- /etc/falco/rules.d
webserver:
prometheus_metrics_enabled: true
falcoctl:
artifact:
install:
resources:
requests:
cpu: 10m
memory: 50Mi
limits:
memory: 100Mi
# -- Enable the init container. We do not recommend installing (or following) plugins for security reasons since they are executable objects.
enabled: true
follow:
resources:
requests:
cpu: 10m
memory: 50Mi
limits:
memory: 100Mi
# -- Enable the sidecar container. We do not support it yet for plugins. It is used only for rules feed such as k8saudit-rules rules.
enabled: true
config:
artifact:
install:
# -- List of artifacts to be installed by the falcoctl init container.
refs: [falco-rules:3, falco-incubating-rules:3]
follow:
# -- List of artifacts to be installed by the falcoctl init container.
refs: [falco-rules:3, falco-incubating-rules:3]
falcosidekick:
resources:
requests:
cpu: 50m
memory: 40Mi
limits:
memory: 150Mi
enabled: true
webui:
initContainer:
resources:
limits:
memory: 128Mi
requests:
cpu: 10m
memory: 50Mi
ingress:
enabled: true
annotations:
cert-manager.io/issuer: aks-dns-issuer
nginx.ingress.kubernetes.io/auth-signin: https://$host/oauth2/start?rd=$escaped_request_uri
nginx.ingress.kubernetes.io/auth-url: https://$host/oauth2/auth
nginx.ingress.kubernetes.io/proxy-buffer-size: 8k
nginx.ingress.kubernetes.io/proxy-buffers-number: "4"
hosts:
- host: falco-ui.aks.azure.cloud-nuage.dfo-mpo.gc.ca
paths:
- path: /
tls:
- hosts:
- falco-ui.aks.azure.cloud-nuage.dfo-mpo.gc.ca
secretName: falco-sidekick-ui-tls-secret
resources:
requests:
cpu: 10m
memory: 50Mi
limits:
memory: 100Mi
ttl: 7d
enabled: true
replicaCount: 1
redis:
storageSize: "10Gi"
resources:
requests:
cpu: 10m
memory: 500Mi
limits:
memory: 600Mi
disableauth: true
config:
teams:
webhookurl: "https://086gc.webhook.office.com/webhookb2/536cb545-ca6a-457d-953d-e51eba402c16@1594fdae-a1d9-4405-915d-011467234338/IncomingWebhook/4679627f3f784184b5df1eca332ccc03/046b0d7c-7e51-4be5-96ee-a07207ea78ad"
activityimage: ""
outputformat: "all"
minimumpriority: "notice"
serviceMonitor:
# -- enable the deployment of a Service Monitor for the Prometheus Operator.
enabled: true
# -- specify Additional labels to be added on the Service Monitor.
additionalLabels:
app: kube-prometheus-stack
release: kube-prometheus-stack
k8s-metacollector:
resources:
requests:
cpu: 10m
memory: 50Mi
limits:
memory: 100Mi
containerSecurityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
seccompProfile:
type: RuntimeDefault
serviceMonitor:
create: true
# -- path at which the metrics are expose by the k8s-metacollector.
path: /metrics
# -- labels set of labels to be applied to the ServiceMonitor resource.
labels:
app: kube-prometheus-stack
release: kube-prometheus-stack
grafana:
dashboards:
enabled: true
#TODO: add exception for grafana unexpected connection
customRules:
custom.local.yaml: |-
- list: devops_agents
items: [
iwlsacr.azurecr.io/ubuntu-agent,
dmpacr.azurecr.io/azure-build-agent,
]
# Rule : Change thread namespace
- list: user_change_thread_namespace_images
items: [
docker.io/library/alpine,
ghcr.io/inspektor-gadget/inspektor-gadget,
]
- macro: weaveworks_kured
condition: (container.image.repository=weaveworks/kured and proc.pname=kured)
- rule: Change thread namespace
condition: and (not weaveworks_kured) and (not container.id=host) and (not proc.name=calico-node) and (not container.image.repository in (user_change_thread_namespace_images))
override:
condition: append
# Rule : Launch Privileged Container and Launch Excessively Capable Container
- list: user_privileged_images
items: [
mcr.microsoft.com/azuremonitor/containerinsights/ciprod,
mcr.microsoft.com/aks/hcp/hcp-tunnel-front,
mcr.microsoft.com/oss/calico/node,
mcr.microsoft.com/oss/kubernetes-csi/secrets-store/driver,
mcr.microsoft.com/oss/kubernetes/kube-proxy,
weaveworks/kured,
mcr.microsoft.com/oss/kubernetes-csi/azuredisk-csi,
mcr.microsoft.com/oss/kubernetes-csi/azurefile-csi,
mcr.microsoft.com/mirror/docker/library/busybox,
mcr.microsoft.com/aks/ip-masq-agent-v2,
ghcr.io/inspektor-gadget/inspektor-gadget,
docker.io/library/alpine,
mcr.microsoft.com/oss/calico/pod2daemon-flexvol,
docker.io/calico/node-driver-registrar,
docker.io/calico/csi,
]
- macro: user_privileged_containers
condition: (container.image.repository in (user_privileged_images))
override:
condition: replace
# Rule : Mount Launched in Privileged Container
- list: user_privileged_mount_images
items: [
mcr.microsoft.com/oss/kubernetes-csi/secrets-store/driver,
mcr.microsoft.com/oss/kubernetes-csi/azuredisk-csi,
]
- macro: user_privileged_mount_containers
condition: (container.image.repository in (user_privileged_mount_images))
- rule: Mount Launched in Privileged Container
condition: and not user_privileged_mount_containers
override:
condition: append
# Rule : Terminal shell in container
- list: user_expected_terminal_shell_images
items: [
devops_agents,
docker.io/library/busybox,
apscommonacr.azurecr.io/aks/ubuntu-agent,
docker.io/grafana/promtail,
]
- macro: user_expected_terminal_shell_in_container_conditions
condition: (container.image.repository in (user_expected_terminal_shell_images))
override:
condition: replace
# Rule : Delete or rename shell history
- list: user_known_rename_shell_history_images
items: [
devops_agents,
apscommonacr.azurecr.io/aks/ubuntu-agent,
]
- macro: user_known_rename_shell_history_containers
condition: (container.image.repository in (user_known_rename_shell_history_images))
- macro: user_known_rename_shell_history_command_and_file
condition: ((proc.cmdline startswith containerd and evt.arg.name startswith /var/lib/containerd/)
or (proc.cmdline startswith bash and evt.arg.name startswith /root/.bash_history))
- rule: Delete or rename shell history
condition: and not user_known_rename_shell_history_containers and not user_known_rename_shell_history_command_and_file
override:
condition: append
# Rule : Contact K8S API Server From Container
- list: user_known_contact_k8s_api_server_images
items: [
docker.io/weaveworks/kured,
quay.io/kiwigrid/k8s-sidecar,
quay.io/argoproj/argocd,
ghcr.io/kyverno/kyverno,
ghcr.io/kyverno/kyverno-cli,
ghcr.io/kyverno/kyvernopre,
ghcr.io/dexidp/dex,
ghcr.io/argoproj-labs/argocd-extensions,
ghcr.io/kyverno/policy-reporter-kyverno-plugin,
ghcr.io/kyverno/policy-reporter,
quay.io/argoproj/argocli,
quay.io/kubescape/kubescape,
quay.io/kubescape/storage,
ghcr.io/argoproj-labs/argocd-extensions,
docker.io/bitnami/kubectl,
oci.external-secrets.io/external-secrets/external-secrets,
ghcr.io/kyverno/background-controller,
ghcr.io/aquasecurity/trivy-operator,
ghcr.io/kyverno/cleanup-controller,
ghcr.io/kyverno/reports-controller,
docker.io/grafana/promtail,
docker.io/falcosecurity/k8s-metacollector,
ghcr.io/fjogeleit/trivy-operator-polr-adapter,
ghcr.io/opencost/opencost,
quay.io/jetstack/cert-manager-startupapicheck,
ghcr.io/inspektor-gadget/inspektor-gadget,
docker.io/falcosecurity/falcosidekick,
ghcr.io/aquasecurity/node-collector,
ghcr.io/stakater/reloader,
]
- macro: user_known_contact_k8s_api_server_activities
condition: (proc.cmdline startswith cainjector or
proc.cmdline startswith controller or
proc.cmdline startswith fluent-bit or
proc.cmdline startswith keda or
proc.cmdline startswith keda-adapter or
proc.cmdline startswith kube-state-metr or
proc.cmdline startswith kured or
container.image.repository in (user_known_contact_k8s_api_server_images) or
proc.cmdline startswith nginx-ingress or
proc.cmdline startswith prometheus or
proc.cmdline startswith operator or
proc.cmdline startswith secrets-store-c or
proc.cmdline startswith velero or
proc.cmdline startswith webhook)
override:
condition: replace
# Rule : Launch Package Management Process in Container
- list: user_known_package_manager_in_images
items: [
mcr.microsoft.com/azuremonitor/containerinsights/ciprod,
dmpacr.azurecr.io/azure-build-agent,
docker.io/moby/buildkit,
quay.io/kubecost1/opencost-ui,
iwlsacr.azurecr.io/ubuntu-agent,
apscommonacr.azurecr.io/aks/ubuntu-agent,
docker.io/library/python,
]
- macro: user_known_package_manager_in_container
condition: (container.image.repository in (user_known_package_manager_in_images))
override:
condition: replace
# Rule : Set Setuid or Setgid bit
- list: user_known_set_setuid_or_setgid_images
items: [
devops_agents,
docker.io/moby/buildkit,
]
- macro: user_known_set_setuid_or_setgid_bit_conditions
condition: (container.image.repository in (user_known_set_setuid_or_setgid_images) or
(evt.arg.filename startswith /var/lib/kubelet/pods/) or
(evt.arg.filename startswith /var/lib/containerd/io) or
(evt.arg.filename startswith /var/lib/containerd/tmpmounts) or
container.id="")
override:
condition: replace
# Rule : Redirect STDOUT/STDIN to Network Connection in Container
- list: user_known_redirect_stdin_stdout_network_images
items: [
mcr.microsoft.com/aks/hcp/hcp-tunnel-front,
quay.io/argoproj/argocd,
iwlsacr.azurecr.io/ubuntu-agent,
apscommonacr.azurecr.io/aks/ubuntu-agent,
registry.k8s.io/ingress-nginx/controller,
ghcr.io/aquasecurity/trivy,
docker.io/moby/buildkit,
]
- list: user_known_redirect_stdin_stdout_network_processes
items: [
calico-node,
kube-proxy,
buildkitd,
]
- macro: user_known_stand_streams_redirect_activities
condition: (container.image.repository in (user_known_redirect_stdin_stdout_network_images)
or proc.name in (user_known_redirect_stdin_stdout_network_processes))
override:
condition: replace
- rule: Redirect STDOUT/STDIN to Network Connection in Container
output: Redirect stdout/stdin to network connection (gparent=%proc.aname[2] ggparent=%proc.aname[3] gggparent=%proc.aname[4] fd.sip=%fd.sip connection=%fd.name lport=%fd.lport rport=%fd.rport fd_type=%fd.type fd_proto=fd.l4proto evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline terminal=%proc.tty %container.info container_duration=%container.duration)
override:
output: append
# Rule : Read sensitive file untrusted
- macro: user_read_sensitive_file_conditions
condition: >
(container.id=host) and
(fd.name in (/etc/shadow))
# Rule : Modify Shell Configuration File
- list: user_known_modify_shell_config_images
items: [
docker.io/moby/buildkit,
mcr.microsoft.com/azuremonitor/containerinsights/ciprod,
]
- macro: user_known_modify_shell_config_containers
condition: (container.image.repository in (user_known_modify_shell_config_images))
- list: user_know_modify_shell_config_host_cmdline_allowed
items: [
containerd,
]
- macro: user_known_modify_shell_config_host_cmdline
condition: (container.id=host and
(proc.cmdline in (user_know_modify_shell_config_host_cmdline_allowed)
or fd.name startswith /var/lib/containerd/tmpmounts
or fd.name startswith /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs)
or container.id="")
- rule: Modify Shell Configuration File
condition: and not user_known_modify_shell_config_host_cmdline and not user_known_modify_shell_config_containers
override:
condition: append
# Rule : Clear Log Activities
- list: user_known_clear_log_files_images
items: [
docker.io/moby/buildkit,
]
- macro: allowed_clear_log_files
condition: (container.id=host and
(fd.name startswith /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs
or fd.name startswith /var/lib/containerd/tmpmounts
or fd.name startswith /var/log/azure-vnet.log)
or (container.image.repository in (user_known_clear_log_files_images))
or container.id="")
override:
condition: replace
- macro: containerd_activities
condition: or (proc.name=containerd and fd.name startswith "/var/log/azure-vnet.log")
override:
condition: append
# Rule : DB program spawned process
- list: user_known_db_spawned_process_images
items: [
dmpacr.azurecr.io/azure-build-agent,
]
- macro: user_known_db_spawned_processes
condition: (container.image.repository in (user_known_db_spawned_process_images))
override:
condition: replace
# Rule : System procs network activity
- list: user_expected_system_procs_network_activity_images
items: [
dmpacr.azurecr.io/azure-build-agent,
quay.io/argoproj/argocd,
ghcr.io/aquasecurity/trivy,
docker.io/bitnami/kubectl,
iwlsacr.azurecr.io/ubuntu-agent,
apscommonacr.azurecr.io/aks/ubuntu-agent,
]
- list: user_expected_system_procs_network_activity_conditions_pnames
items: [
azure-vnet,
aks-log-collect,
]
- macro: user_expected_system_procs_network_activity_conditions
condition: ((container.image.repository in (user_expected_system_procs_network_activity_images))
or (proc.pname in (user_expected_system_procs_network_activity_conditions_pnames)))
override:
condition: replace
# Rule : Launch Suspicious Network Tool in Container
- list: user_known_network_tool_images
items: [
dmpacr.azurecr.io/zookeeper,
dmpacr.azurecr.io/busybox-custom
docker.io/library/busybox,
]
- macro: user_known_network_tool_activities
condition: (container.image.repository in (user_known_network_tool_images))
override:
condition: replace
# Rule : Non sudo setuid
- rule: Non sudo setuid
enabled: false
override:
enabled: replace
# Rule : Create files below dev
# NOTE: temporarity disable until missing image issue is fixed by Falco
# - list: user_known_create_files_below_dev_images
# items: [
# mcr.microsoft.com/azuremonitor/containerinsights/ciprod,
# ]
#
# - macro: user_known_create_files_below_dev_activities
# condition: (container.image.repository in (user_known_create_files_below_dev_images))
- rule: Create files below dev
enabled: false
override:
enabled: replace
# Rule : Drop and execute new binary in container
- list: known_drop_and_execute_containers
items: [
iwlsacr.azurecr.io/ubuntu-agent,
docker.io/moby/buildkit,
mcr.microsoft.com/azuremonitor/containerinsights/ciprod,
]
override:
items: append
- list: drop_and_execute_new_binary_in_container_cmdline
items: [
"crond -n -s",
]
- rule: Drop and execute new binary in container
condition: and not proc.cmdline in (drop_and_execute_new_binary_in_container_cmdline)
override:
condition: append
# Rule : Read environment variable from /proc files
- list: known_binaries_to_read_environment_variables_from_proc_files
items: [
Agent.Worker,
.NET ThreadPool,
.NET Sockets,
]
override:
items: append
# Rule : Launch Ingress Remote File Copy Tools in Container
- list: user_known_ingress_remote_file_copy_images
items: [
mcr.microsoft.com/azuremonitor/containerinsights/ciprod,
iwlsacr.azurecr.io/ubuntu-agent,
docker.io/library/busybox,
]
- macro: user_known_ingress_remote_file_copy_activities
condition: (container.image.repository in (user_known_ingress_remote_file_copy_images))
override:
condition: replace
# Rule : Unexpected UDP Traffic
- rule: Unexpected UDP Traffic
enabled: false
override:
enabled: replace
# Rule : Contact EC2 Instance Metadata Service From Container
- rule: Contact EC2 Instance Metadata Service From Container
enabled: false
override:
enabled: replace
# Rule : Change namespace privileges via unshare
- rule: Change namespace privileges via unshare
enabled: false
override:
enabled: replace
# Rule : Contact cloud metadata service from container
# disable rule since missing some field data
- rule: Contact cloud metadata service from container
enabled: false
override:
enabled: replace
# Rule : Read ssh information
- list: user_known_read_ssh_information_activities_images
items: [
quay.io/argoproj/argocd,
]
- macro: user_known_read_ssh_information_activities
condition: (container.image.repository in (user_known_read_ssh_information_activities_images))
override:
condition: replace
# Rule : Create Hardlink Over Sensitive Files
- list: pname_create_hardlink_over_sensitive_files
items: [
buildkit-runc,
]
- rule: Create Hardlink Over Sensitive Files
condition: and not proc.pname in (pname_create_hardlink_over_sensitive_files)
override:
condition: append
# Rule: Schedule Cron Jobs
- list: user_known_cron_jobs_images
items: [
mcr.microsoft.com/azuremonitor/containerinsights/ciprod,
]
- macro: user_known_cron_jobs
condition: (container.image.repository in (user_known_cron_jobs_images)
or container.id="")
override:
condition: replace
# Rule: Exfiltrating Artifacts via Kubernetes Control Plane
- list: exfiltrating_artifacts_via_kubernetes_control_plane_images
items: [
iwlsacr.azurecr.io/ubuntu-agent,
apscommonacr.azurecr.io/aks/ubuntu-agent,
]
- rule: Exfiltrating Artifacts via Kubernetes Control Plane
condition: and not container.image.repository in (exfiltrating_artifacts_via_kubernetes_control_plane_images)
override:
condition: append
# Rule: Launch Ingress Remote File Copy Tools in Container
- list: user_known_ingress_remote_file_copy_activities_images
items: [
docker.io/moby/buildkit,
]
- macro: user_known_ingress_remote_file_copy_activities
condition: (container.image.repository in (user_known_ingress_remote_file_copy_activities_images))
override:
condition: replace
# Rule: Packet socket created in container
- list: packet_socket_created_in_container_images
items: [
ghcr.io/inspektor-gadget/inspektor-gadget,
]
- rule: Packet socket created in container
condition: and not container.image.repository in (packet_socket_created_in_container_images)
override:
condition: append
Environment k8s, falco in container Falco version:
System info:
Cloud provider: AKS
Kernel: |
@jemag can you try to use jemalloc as allocator on the faulty node? |
@FedeDP unfortunately yesterday we cycled through nodes during an AKS node image update and the problematic node is now gone. I cannot reproduce the problem now. I don't know what kind of specific circumstances could have caused this, especially since all those nodes were part of an Azure VM scale set and should all be similar. If I notice the problem happening again, I will make sure to try to use jemalloc. |
Describe the bug
Pn 0.34.x releases we do experience mem leak on physical instances, while the same setup on AWS is fine. It could be due node workload, but still its clear mem leak.
Actually as of now RC not identified,
How to reproduce it
This is bit customised deployment (not helm, etc.)
This is the config falco is given (we do use more rules, but the problem happens with only upstream ones (now the rules from rules repo)
Expected behaviour
Drop memory at regular intervals
Screenshots
Cloud instances of falco on AWS: (ok behaviour, screenshot is imo on 0.33.x version)
Instances on physical servers: ( OOM, on 0.34.1, the nodes in the cluster are exactly the same, though, only 2 of 4 are affected by mem increase (could be due specific workload). Surprisingly same metric does not match the pattern from AWS/GCP nodes (above)
Environment
K8s, falco in container
Physical server, under load
/etc/os-release
not relevant, it's basically centos but customisedK8s, custom manifests - described on some older issue here: Falco runtime error in k8s_replicationcontroller_handler_state for large k8s clusters (400+ nodes) #1909 (comment)
The text was updated successfully, but these errors were encountered: