Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM on physical servers #2495

Open
epcim opened this issue Apr 14, 2023 · 58 comments
Open

OOM on physical servers #2495

epcim opened this issue Apr 14, 2023 · 58 comments

Comments

@epcim
Copy link

epcim commented Apr 14, 2023

Describe the bug

Pn 0.34.x releases we do experience mem leak on physical instances, while the same setup on AWS is fine. It could be due node workload, but still its clear mem leak.

Actually as of now RC not identified,

  • looking for help to do some mem profile or debug the issue
  • anyone with similar behavior?

How to reproduce it

This is bit customised deployment (not helm, etc.)

This is the config falco is given (we do use more rules, but the problem happens with only upstream ones (now the rules from rules repo)

data:
  falco.yaml: |
    rules_file:
      - /etc/falco-upstream/falco_rules.yaml                          
      - /etc/falco/rules.d
    
    plugins:
    - name: json
      library_path: libjson.so
      init_config: ""
    
    load_plugins: []
    watch_config_files: true
    time_format_iso_8601: false
    
    
    json_include_output_property: true
    json_include_tags_property: true
    json_output: true
    log_stderr: true
    log_syslog: false
    # "alert", "critical", "error", "warning", "notice", "info", "debug".
    log_level: error
    libs_logger:
      enabled: false
      severity: debug # "info", "debug", "trace".
    priority: warning
    
    buffered_outputs: false
    syscall_buf_size_preset: 4
    syscall_event_drops:
      threshold: 0.1
      actions:
        - log
      rate: 0.03333
      max_burst: 1
      simulate_drops: false
    
    
    syscall_event_timeouts:
      max_consecutives: 1000
    
    webserver:
      enabled: true
      k8s_healthz_endpoint: /healthz
      listen_port: 64765
      ssl_enabled: false
      ssl_certificate: /volterra/secrets/identity/server.crt
      threadiness: 0
      #k8s_audit_endpoint: /k8s-audit
    
    output_timeout: 2000
    outputs:
      rate: 1
      max_burst: 1000
    syslog_output:
      enabled: false
    file_output:
      enabled: false
      keep_alive: false
      filename: ./events.txt
    stdout_output:
      enabled: true
    program_output:
      enabled: false
      keep_alive: false
      program: "jq '{text: .output}' | curl -d @- -X POST https://hooks.slack.com/services/XXX"
    http_output:
      enabled: true
      url: "http://falco-sidekick.monitoring.svc.cluster.local:64801/"
      user_agent: falcosecurity/falco
    grpc:
      enabled: false
      bind_address: unix:///run/falco/falco.sock
      threadiness: 0
    grpc_output:
      enabled: false
    
    metadata_download:
      max_mb: 100
      chunk_wait_us: 1000
      watch_freq_sec: 1
    
    modern_bpf:
      cpus_for_each_syscall_buffer: 2"

Expected behaviour

Drop memory at regular intervals

Screenshots

Cloud instances of falco on AWS: (ok behaviour, screenshot is imo on 0.33.x version)
image

Instances on physical servers: ( OOM, on 0.34.1, the nodes in the cluster are exactly the same, though, only 2 of 4 are affected by mem increase (could be due specific workload). Surprisingly same metric does not match the pattern from AWS/GCP nodes (above)
image

Environment

K8s, falco in container
Physical server, under load

  • Falco version:
{"default_driver_version":"4.0.0+driver","driver_api_version":"3.0.0","driver_schema_version":"2.0.0","engine_version":"16","falco_version":"0.34.1","libs_version":"0.10.4","plugin_api_version":"2.0.0"}
  • System info:
{
  "machine": "x86_64",
  "nodename": "master-1",
  "release": "4.18.0-240.10.1.ves1.el7.x86_64",
  "sysname": "Linux",
  "version": "#1 SMP Tue Mar 30 15:02:49 UTC 2021"
}
  • Cloud provider or hardware configuration:
  • OS: /etc/os-release not relevant, it's basically centos but customised
  • Kernel:
root@master-1:/# uname -a
Linux master-1 4.18.0-240.10.1.ves1.el7.x86_64 #1 SMP Tue Mar 30 15:02:49 UTC 2021 x86_64 GNU/Linux
@epcim epcim added the kind/bug label Apr 14, 2023
@Andreagit97
Copy link
Member

ei @epcim thank you for reporting this! Of course, I've not a right answer here but I have some questions that could help us to dig into it:

  1. Which engine are you using bpf, kernel module, or modern bpf? From your config I can see that you are not using plugins, is it true?
  2. If I understood well you are experiencing mem-leak only on physical servers, while with the same/similar setup, Falco works well on AWS. Have you ever seen this kind of issue with previous Falco versions (0.33.0 for example)?
  3. You said that you are able to reproduce the issue with just the default ruleset. It would help us a lot if you were able to find a minimal set of rules that could reproduce the issue. The ruleset is huge and involves many syscalls so working on a minimal set would speed up the troubleshooting

@epcim
Copy link
Author

epcim commented Apr 14, 2023

ha, that I forgot to mention. @Andreagit97

We experience the issue with kernel moduel on production env. and it was reproduced either with kernel moduel and bpf on the test env's.

havent tried modern bpf

this image change/upstream:

-FROM falcosecurity/falco-no-driver:0.32.2
+FROM falcosecurity/falco-no-driver:0.33.1
  1. I do believe issue is there since 0.33.0 before hard to say, not visible in metrics (as container could be killed early). From metrics (from dev env):
    1y view:
    image

interesting is this part - mem hiked significantly
image

this part correspond: to 0.32.2 -> 0.33.1 - on 29.11.2022
image

better detail even shows it was 9d quite OK, and then started to hike
image

cross/verified with prod env, the version landed on 6.12.2022 but the the issue was first time visible straight on 24.1.2023 (appears on 1 node from dozens) but it has the same pattern:
image

for comparison period 12/2022 - 04/2023 with all nodes on prod - is hard to read/identify the issue just from metrics, pods are killed early etc.. later on January we have increased mem limit for pod and the mem hike started to be recognisable on metrics

image

  1. minimalise the ruleset
  • Yes that's what I am thinking of now as next step
  • Other question is what thread internally could live for such a long time (well from my side it's visible that the issue appears on nodes that are freq. accessed over ssh (so shell sessions, started docker containers are sys-calls I would blame first).
  • At best if you would assume what changes happen in codebase between 0.32.2 -> 0.33.1

@Andreagit97
Copy link
Member

Thank you these are really interesting info that could help us in the troubleshooting!
We will start to take a look, keep us updated if you find something new!

@epcim
Copy link
Author

epcim commented Apr 17, 2023

surprisingly, there were not much changes in rules since October 2022, new is on right side, while left is some my November version (which basically is 1:1 with October).

these that changed, as far as I know (ignoring tags, lists, or proc name in .. (new binaries), etc.., and all removed)

image

image (and similar)

image

image

image

not in my current code base that has the issue

image

image

image

image

@Andreagit97
Copy link
Member

ei thank you for the update! The bad news is that since the underlying Falco code is changed the OOM issue could be caused also by an already existing rule :/ so unfortunately we cannot restrict the investigation scope

@jasondellaluce
Copy link
Contributor

/milestone 0.35.0

@epcim
Copy link
Author

epcim commented Jun 1, 2023

@jasondellaluce will you try to simulate the issue on your side and collect metrics before 0.35?

@jasondellaluce
Copy link
Contributor

@epcim This issue is hard to reproduce on our side, and I think deeper testing on this specific path will not happen before 0.35. However, we're testing the latest dev Falco also with tools like Valgrind and in the most common deployment scenarios, so my suggestion will be to try out 0.35 once it's out and see if the issue still occurs. It's hard to tell if the issue is caused by your rule setup, your workload, or by a mix of the two. The most likely thing is that this could be happening within libsinsp, and that very specific workloads force the library to grow its internal state unbounded. This will definitely require further investigation.

@sboschman
Copy link
Contributor

sboschman commented Jun 20, 2023

Same every increasing memory consumption with 0.35 (upgraded from 0.33), but our falco setup is a bit different than the one described in this issue. Deployed as a systemd unit on a VM (own hosts, so no cloud stuff), syscalls disabled (--disable-source syscall), only k8saudit/json plugins enabled, default syscall rules removed, grpc/http/syslog output enabled.

Can't make a memory dump, because falco claims 132G virt (VM has 6GB RAM and 30GB disk.... no idea why it needs this much virt) and it seems a memory dump is trying to write 132G to disk, which obviously fails on a 30GB disk.

@jasondellaluce
Copy link
Contributor

@sboschman do you also reproduce this kind of memory usage when running Falco for syscalls collection, without plugins?

@sboschman
Copy link
Contributor

@jasondellaluce we do not run falco with syscalls collection enabled at all, so not a use-case I can comment on.

@jasondellaluce
Copy link
Contributor

/milestone 0.36.0

@poiana poiana added this to the 0.36.0 milestone Jun 21, 2023
@incertum
Copy link
Contributor

@epcim would you be in a position to re-run some test with eBPF and libbpf stats kernel setting enabled with Falco's new experimental native metrics? Asking because I would be curious to see if spikes in memory correlate with surges in event rates (both at the tracepoints aka the libbpf stats and also in userspace which obviously depends on the syscalls you enable). Please feel free to anonymize logs and/or share an anonymized version of it on slack in a DM. What we unfortunately don't yet have in the metrics feature are the detailed syscalls counters and some other internal state related stats we aim to add for the next Falco release.

@incertum
Copy link
Contributor

incertum commented Jul 12, 2023

In addition @epcim could we get more information around the cgroups version on these machines? Memory counting in the kernel can in many cases be just wrong. For example see kubernetes-sigs/kind#421 and I have also heard rumors about cgroups leaking memory. cgroups v2 has superior memory management, hence would be curious to know which cgroups version you are dealing with?

Plus you also have that on host deployment, mind getting me up to speed about the exact memory metrics you base OOM for those cases (aka the non container_memory_working_set_byte cases)? Apologies if you posted that already above and I just couldn't read everything.

Thanks in advance!

@emilgelman
Copy link

I'm seeing the same behavior using Falco 0.35.1.
Running on AKS, a single Node using Ubuntu 22.04.
Falco deployed using Helm with the default rule set. Pod memory is constantly increasing, the cluster is practically idle (no other workloads running).
falco.yaml:

falco:
  log_level: debug
  syscall_event_drops:
    rate: 1000000000
    max_burst: 1000000000
  json_output: true
  json_include_output_property: true
  file_output:
    keep_alive: false
    enabled: false
  grpc_output:
    enabled: true
  grpc:
    enabled: true
driver:
  enabled: true
  kind: modern-bpf
image

@incertum
Copy link
Contributor

incertum commented Jul 14, 2023

Just out of curiosity this particular host is running kernel cgroups v1 or cgroups v2? Thank you!

We will investigate the cgroups related memory metrics the OOM killer uses more, also @sboschman use case where the binary is only used for k8saudit logs filtering, meaning in that scenario most of the libs code is not used (no kernel driver, no sinsp state, no container engine, basically no allocations etc).

Edit:

And maybe also show RSS memory metric over time.

@emilgelman
Copy link

emilgelman commented Jul 14, 2023

@incertum the host is running cgroups v2:

# stat -fc %T /sys/fs/cgroup/
cgroup2fs

I am experimenting with the effect of rules configuration on this. It seems that disabling all rules doesn't reproduce the issue, so I'm trying to understand if I can isolate it to specific rule/s.

@incertum
Copy link
Contributor

incertum commented Jul 14, 2023

Hi @emilgelman thanks this is great news you have cgroups v2. By the way we now also have the base_syscalls config in falco.yaml for radical syscalls monitoring control, check it out.

However, I think here we need to investigate in different places more drastically (meaning going back to the drawing board) as it has also been reported for plugins only. In that case we merely do event filtering in libsinsp, so most of the libsinsp complexity does not apply which kind of narrows down the search space.

I am going to prioritize 👀 into it, it likely will take some time.


In addition, in case you are curious to learn more about the underlying libs and kernel drivers with respect to memory:

  • Yes we do build up a process cache table in libsinsp, but we also hook into the scheduler process exit tracepoint to purge items from the table again, else the memory would skyrocket in no time.
  • The same applies for the container engine, therefore I suspect it must be something much more subtle while still being event driven.
  • Then there is the discussion around absolute memory usage regardless of time drifts. For example, we learned the hard way that the new eBPF ring buffer wrongly accounts memory twice, check out our conversation with the kernel mailing list. Adjusting parameters such as syscall_buf_size_preset and modern_bpf.cpus_for_each_syscall_buffer can help, again this is just some more insights a bit unrelated to the fact that we are investigating subtle drifts over time in this issue. I am also still hoping to one day meet someone who knows all the answers re Linux kernel memory management and accounting, often it's not even clear what the right metric is and if the metric is accounting memory in a meaningful way.

@incertum
Copy link
Contributor

Simulated a noisy Falco config on my developer Linux box. Enabling most supported syscalls was sufficient to simulate memory issues:

- rule: test
  desc: test
  condition: evt.type!=close
  enabled: true
  output: '%evt.type %evt.num %proc.aname[5] %proc.name %proc.tty %proc.exepath %fd.name'
  priority: NOTICE

Using valgrind massif heap profiler:

sudo insmod driver/falco.ko
sudo valgrind --tool=massif \
         userspace/falco/falco -c ../../falco.yaml -r ../../falco_rules_test.yaml > /tmp/out

massif-visualizer massif.out.$PID

image

Reading the tbb API docs https://oneapi-src.github.io/oneTBB/main/tbb_userguide/Concurrent_Queue_Classes.html, we use the following variant ... By default, a concurrent_bounded_queue is unbounded. It may hold any number of values, until memory runs out. ... and currently we do not set a safety capacity, or better expose it as parameter.

Here is a staging branch to correct this: https://github.com/incertum/falco/tree/queue-capacity-outputs, what do you all think?

However, the root cause is rather the entire event flow being too slow, basically we don't get to pop in time from the queue in these extreme cases, because we are seeing timeouts and also noticed heavy kernel side drops. Basically the pipe is just not holding up when trying to monitor so many syscalls even just on a more or less idle laptop. I would suggest we should re-audit the entire Falco processing and outputs engine and look for improvement areas, because when I did the same profiling with the libs sinsp-example binary, memory and output logs were pretty stable over time ...

@leogr
Copy link
Member

leogr commented Jul 21, 2023

Reading the tbb API docs https://oneapi-src.github.io/oneTBB/main/tbb_userguide/Concurrent_Queue_Classes.html, we use the following variant ... By default, a concurrent_bounded_queue is unbounded. It may hold any number of values, until memory runs out. ... and currently we do not set a safety capacity, or better expose it as parameter.

The rationale for an unbounded queue was that the output consumer must be responsive enough to accept all the alerts produced by Falco. When the output consumer is too slow, a dedicated watchdog will emit an error message in the stderr to notify the user that the configure output consumer is too slow or blocked. By design, this is the only case when the queue can grow indefinitely.

Otherwise, if the memory is growing but the queue is not, there might be just an implementation bug. Have you checked that? 🤔

@incertum
Copy link
Contributor

Thanks @leogr all of the above is true. And for everyone reading this, unbounded queues can be a good choice and more efficient anyways if you have other controls prior.

The queue filling up is one very likely cause for memory growth in real-life production. At the same time there can always be more bugs in other places. Using the heap profiler on my laptop added enough overhead / slowness to show these symptoms when having that one noisy Falco rule. Have yet to get deeper into profiling.

My current recommendations:

Here I would expose a queue capacity to the end user and add a default value. Have it "Experimental" so we could remove it again should we find much better ways of handling heavy event pipes in future Falco releases.

We still need to discuss the recover strategy:

  • exit aka a self imposed OOM kill?
  • swap the queue aka drop hoping we recover from some outlier bursts? I suspect if we get here we likely are already heavily dropping kernel side because of a full kernel->userspace buffer (meaning Falco is already not working) and there are existing notifications as @leogr you highlighted. I would try this option.

Sadly none of this is a solution to get Falco to work on such more heavy production servers or workload types. Opening a new ticket to discuss a re-audit of the Falco specific outputs handling #2691. Pragmatic expected outcomes are that perhaps we can improve things, however I doubt all problems will magically disappear, because we can't scale horizontally (throw more compute at the problem what is typically down in for example big data stream processing). In fact, folks want a security tool to almost consume no CPU and memory, but never drop events.

Considering Falco's primary use of alerting on abnormal behavior I project that having smarter advanced anomaly detection approaches could be more promising to avoid having to deal with bursty outputs in the first place, but maybe I am biased 🙃 .

Meanwhile, adopters can re-audit the syscalls they monitor (using the new base_syscalls option) and consider tuning Falco rules more. It may help with the problems described here.

@incertum
Copy link
Contributor

incertum commented Aug 1, 2023

I opened the PR to expose the configs to set a custom capacity.

@epcim
Copy link
Author

epcim commented Aug 1, 2023

I was busy last few weeks but count to reconfigure/test next weeks all the findings on thread.

@incertum
Copy link
Contributor

incertum commented Aug 1, 2023

Perfect, yes I would suggest to first try the option of being able to set a queue capacity and after test deployments we shall see if there are other issues still in terms of memory actually leaking / increasing radically over time beyond expected limits. At least the simulation above shows that this is something that currently could happen vs with the capacity in the simulation I at least didn't observe memory leaking.

At the same time reminder this is not fixing the root cause, see #2495 (comment)

In addition, we may need to experiment with best default values across the various settings that can control the outputs ...

@FedeDP
Copy link
Contributor

FedeDP commented Aug 24, 2023

@epcim Can you try your initial config (the one pasted in the opening post), but disabling http_output?
This is a wild guess, but it's worth a try!
Thank you (also, we are working hard to understand and reproduce the isse :) )

@LucaGuerra LucaGuerra added this to the 0.39.0 milestone May 30, 2024
@poiana
Copy link
Contributor

poiana commented Aug 28, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@FedeDP
Copy link
Contributor

FedeDP commented Aug 29, 2024

/remove-lifecycle stale

@FedeDP
Copy link
Contributor

FedeDP commented Sep 16, 2024

/milestone 0.40.0

We will release Falco 0.39.0 in a couple of weeks; we would really appreciate feedback on this issue with latest Falco if possible 🙏 thanks everyone!

@poiana poiana modified the milestones: 0.39.0, 0.40.0 Sep 16, 2024
@chenliu1993
Copy link

chenliu1993 commented Oct 1, 2024

/milestone 0.40.0

We will release Falco 0.39.0 in a couple of weeks; we would really appreciate feedback on this issue with latest Falco if possible 🙏 thanks everyone!

This is still reproducible on 0.38.2 physical servers. the mem increases until OOM

the higher evts rate is, the faster the mem increases.

However, its been hard for our env to play with output queue option and we dont want to drop events.


update: cgroup is under
stat -fc %T /sys/fs/cgroup/
tmpfs

@leogr
Copy link
Member

leogr commented Oct 2, 2024

/milestone 0.40.0
We will release Falco 0.39.0 in a couple of weeks; we would really appreciate feedback on this issue with latest Falco if possible 🙏 thanks everyone!

This is still reproducible on 0.38.2 physical servers. the mem increases until OOM

the higher evts rate is, the faster the mem increases.

However, its been hard for our env to play with output queue option and we dont want to drop events.

update: cgroup is under stat -fc %T /sys/fs/cgroup/ tmpfs

Thank you for reporting this. We really appreciate 🙏

If you have any chance to try Falco 0.39 (released yesterday), please let us know.

@chenliu1993
Copy link

chenliu1993 commented Oct 2, 2024

@leogr sure thanks for helping out, may I know what changes are introduced in this new version(0.39.0) to mitigate the mem increase issue?

@leogr
Copy link
Member

leogr commented Oct 2, 2024

@leogr sure thanks for helping out, may I know what changes are introduced in this new version(0.39.0) to mitigate the mem increase issue?

There were no specific changes for mem, because we couldn't reproduce the issue. Still, since the dev cycle lasted 4 months and we merged more than 100 PRs in libs, there may be a chance of improvements anyway (due to indirect fixes).

Also a further question:

the higher evts rate is, the faster the mem increases.

Did you refer to alerts (ie. events that match a rule and output an alert) or just syscall events?

I'm asking this because I'm a bit skeptical that the root problem resides in the output queue.

@chenliu1993
Copy link

chenliu1993 commented Oct 3, 2024

@leogr sure thanks for helping out, may I know what changes are introduced in this new version(0.39.0) to mitigate the mem increase issue?

There were no specific changes for mem, because we couldn't reproduce the issue. Still, since the dev cycle lasted 4 months and we merged more than 100 PRs in libs, there may be a chance of improvements anyway (due to indirect fixes).

Also a further question:

the higher evts rate is, the faster the mem increases.

Did you refer to alerts (ie. events that match a rule and output an alert) or just syscall events?

I'm asking this because I'm a bit skeptical that the root problem resides in the output queue.

Thanks

No, not alerts, this evts rate I get from the metrics snapshot => scap.evts_rate_sec this value. And we are under heavy network traffic could it be a reason for OOM. And more container running also speed is more fast

And we are using http_output to push alerts/logs to sidekick

but since we only push a pretty high level logs/alerts to sidekick, sidekick is basically sending nothing.

some mem increase trend after we update the version to 0.38.0, then it begins increase. same for 0.38.2

4

@leogr
Copy link
Member

leogr commented Oct 3, 2024

@chenliu1993

No, not alerts, this evts rate I get from the metrics snapshot => scap.evts_rate_sec this value. And we are under heavy network traffic could it be a reason for OOM. And more container running also speed is more fast

👍

And we are using http_output to push alerts/logs to sidekick

but since we only push a pretty high level logs/alerts to sidekick, sidekick is basically sending nothing.

I agree the http_output is not the issue.

some mem increase trend after we update the version to 0.38.0, then it begins increase. same for 0.38.2

Good to know, thanks. I was expecting that since no significant changes were introduced in 0.38.2

Further questions:

  • Are you using custom rules? Or just the default Falco ruleset?
  • Are you using plugins?
  • Have you customized the base_syscalls option?
  • May you share the full Falco configuration (ie. falco.yaml) you use?
  • System information (os, kernel version, driver used) would be useful too

Thanks in advance

@chenliu1993
Copy link

chenliu1993 commented Oct 3, 2024

Are you using custom rules? Or just the default Falco ruleset?

Yes, there are custom rules, we are heavily using proc.xname container and k8s fields

Are you using plugins?

No

Have you customized the base_syscalls option?

This we do tried on some low-load physical servers, not significant in mem but evts.rate dropped, mem still increasing slowly but this does reduce mem on cloud servers

May you share the full Falco configuration (ie. falco.yaml) you use?

config.txt

System information (os, kernel version, driver used) would be useful too

CENTOS with 4.18.0-240.10.1.el7.x86_64 (mem consume more than rhel)
RHEL with 5.14.0-427.22.1.el9_4.x86_64

These are args

 - args:
        - /usr/bin/falco
        - --cri
        - /run/containerd/containerd.sock
        - --cri
        - /run/crio/crio.sock
        - -pk
        - -o
        - engine.kind=kmod

Also I can share with some log with the mem increasing most fast

`{"hostname":"master-4","output":"Falco metrics snapshot","output_fields":{"evt.hostname":"master-4","evt.source":"syscall","evt.time":1727960245486161869,"falco.container_memory_used_mb":858.1,"falco.cpu_usage_perc":14.1,"falco.duration_sec":542699,"falco.evts_rate_sec":107172.8,"falco.host_boot_ts":1711092869000000000,"falco.host_cpu_usage_perc":33.2,"falco.host_memory_used_mb":319066.5,"falco.host_num_cpus":48,"falco.host_open_fds":136128,"falco.host_procs_running":22,"falco.kernel_release":"4.18.0-240.10.1.ves3.el7.x86_64","falco.memory_pss_mb":826.1,"falco.memory_rss_mb":826.0,"falco.memory_vsz_mb":5751.1,"falco.n_added_fds":17038456397,"falco.n_added_threads":149146559,"falco.n_cached_fd_lookups":36451412404,"falco.n_cached_thread_lookups":44814709948,"falco.n_containers":284,"falco.n_drops_full_threadtable":0,"falco.n_failed_fd_lookups":858849095,"falco.n_failed_thread_lookups":1208309236,"falco.n_fds":3760493,"falco.n_missing_container_images":0,"falco.n_noncached_fd_lookups":19109637694,"falco.n_noncached_thread_lookups":22658314331,"falco.n_removed_fds":3874486584,"falco.n_removed_threads":159875055,"falco.n_retrieve_evts_drops":5198866225,"falco.n_retrieved_evts":5104539237,"falco.n_store_evts_drops":0,"falco.n_stored_evts":5212691647,"falco.n_threads":16093,"falco.num_evts":60253340929,"falco.num_evts_prev":60156885388,"falco.outputs_queue_num_drops":0,"falco.start_ts":1727417545878155283,"falco.version":"0.38.2","scap.engine_name":"kmod","scap.evts_drop_rate_sec":0.0,"scap.evts_rate_sec":106467.0,"scap.n_drops":12727847,"scap.n_drops_buffer_clone_fork_enter":0,"scap.n_drops_buffer_clone_fork_exit":11,"scap.n_drops_buffer_close_exit":840,"scap.n_drops_buffer_connect_enter":13,"scap.n_drops_buffer_connect_exit":13,"scap.n_drops_buffer_dir_file_enter":0,"scap.n_drops_buffer_dir_file_exit":0,"scap.n_drops_buffer_execve_enter":1,"scap.n_drops_buffer_execve_exit":1,"scap.n_drops_buffer_open_enter":285,"scap.n_drops_buffer_open_exit":11,"scap.n_drops_buffer_other_interest_enter":0,"scap.n_drops_buffer_other_interest_exit":0,"scap.n_drops_buffer_proc_exit":0,"scap.n_drops_buffer_total":12726640,"scap.n_drops_bug":0,"scap.n_drops_page_faults":1207,"scap.n_drops_perc":2.0872408656547624e-06,"scap.n_drops_prev":12727845,"scap.n_evts":59903219517,"scap.n_evts_prev":59807399239,"scap.n_preemptions":0},"priority":"Informational","rule":"Falco internal: metrics snapshot","source":"internal","time":"2024-10-03T12:57:25.486161869Z"}`

some mem increasing slow

{"hostname":"master-14","output":"Falco metrics snapshot","output_fields":{"evt.hostname":"master-14","evt.source":"syscall","evt.time":1727962043929149577,"falco.container_memory_used_mb":214.4,"falco.cpu_usage_perc":2.3,"falco.duration_sec":544499,"falco.evts_rate_sec":49971.3,"falco.host_boot_ts":1711094152000000000,"falco.host_cpu_usage_perc":19.9,"falco.host_memory_used_mb":158874.1,"falco.host_num_cpus":48,"falco.host_open_fds":115072,"falco.host_procs_running":33,"falco.kernel_release":"4.18.0-240.10.1.ves3.el7.x86_64","falco.memory_pss_mb":197.3,"falco.memory_rss_mb":197.2,"falco.memory_vsz_mb":5128.6,"falco.n_added_fds":1089162999,"falco.n_added_threads":7080596,"falco.n_cached_fd_lookups":16972056897,"falco.n_cached_thread_lookups":25709477955,"falco.n_containers":53,"falco.n_drops_full_threadtable":0,"falco.n_failed_fd_lookups":375818295,"falco.n_failed_thread_lookups":44083880,"falco.n_fds":914543,"falco.n_missing_container_images":0,"falco.n_noncached_fd_lookups":11857527076,"falco.n_noncached_thread_lookups":5174432536,"falco.n_removed_fds":783105364,"falco.n_removed_threads":7078333,"falco.n_retrieve_evts_drops":1415126062,"falco.n_retrieved_evts":1281757179,"falco.n_store_evts_drops":0,"falco.n_stored_evts":1300682956,"falco.n_threads":2261,"falco.num_evts":29953801359,"falco.num_evts_prev":29908827163,"falco.outputs_queue_num_drops":0,"falco.start_ts":1727417544287913581,"falco.version":"0.38.2","scap.engine_name":"kmod","scap.evts_drop_rate_sec":0.0,"scap.evts_rate_sec":49978.1,"scap.n_drops":155,"scap.n_drops_buffer_clone_fork_enter":0,"scap.n_drops_buffer_clone_fork_exit":0,"scap.n_drops_buffer_close_exit":0,"scap.n_drops_buffer_connect_enter":0,"scap.n_drops_buffer_connect_exit":0,"scap.n_drops_buffer_dir_file_enter":0,"scap.n_drops_buffer_dir_file_exit":0,"scap.n_drops_buffer_execve_enter":0,"scap.n_drops_buffer_execve_exit":0,"scap.n_drops_buffer_open_enter":0,"scap.n_drops_buffer_open_exit":0,"scap.n_drops_buffer_other_interest_enter":0,"scap.n_drops_buffer_other_interest_exit":0,"scap.n_drops_buffer_proc_exit":0,"scap.n_drops_buffer_total":153,"scap.n_drops_bug":0,"scap.n_drops_page_faults":2,"scap.n_drops_perc":0.0,"scap.n_drops_prev":155,"scap.n_evts":29956317650,"scap.n_evts_prev":29911337389,"scap.n_preemptions":0},"priority":"Informational","rule":"Falco internal: metrics snapshot","source":"internal","time":"2024-10-03T13:27:23.929149577Z"}

@leogr
Copy link
Member

leogr commented Oct 3, 2024

Just noticed:

"falco.n_added_fds": 17038456397,
    ....
"falco.n_removed_fds": 3874486584,

The delta is greater than 13 billion. cc @falcosecurity/libs-maintainers does it ring a bell? 🤔

@chenliu1993
Copy link

Hi team, do we have any update? we can find this kind of oom on heavy load nodes(more containers like 200 more) more often than low load nodes(less containers, ,usually less than 100). and sometimes even it is not reaching the mem limit, it also crashed.

@leogr
Copy link
Member

leogr commented Oct 8, 2024

Hi team, do we have any update? we can find this kind of oom on heavy load nodes(more containers like 200 more) more often than low load nodes(less containers, ,usually less than 100). and sometimes even it is not reaching the mem limit, it also crashed.

Not yet sorry. We are investigating.

May you share the log of the crashing Falco instance?

Also, let us know if you noticed any difference with 0.39

@chenliu1993
Copy link

chenliu1993 commented Oct 9, 2024

Much thanks

Not yet tried with 0.39.0, its hard to upgrade to a new version at once.

but yes here is the log from the crashed instance:
falco_restart.log

but as it shows, the output is not complete when crash happens.

@leogr
Copy link
Member

leogr commented Oct 9, 2024

Thank you!

I still notice the same pattern:

    "falco.n_added_fds": 22318709858,
    "falco.n_removed_fds": 5825272052,

22.3 billions added, only 5.8 billions removed. This seems strange to me.

@chenliu1993
Copy link

chenliu1993 commented Oct 10, 2024

Thank you!

I still notice the same pattern:

    "falco.n_added_fds": 22318709858,
    "falco.n_removed_fds": 5825272052,

22.3 billions added, only 5.8 billions removed. This seems strange to me.

Thank you for looking into this so quickly.

Besides possible libs issue, is that possibility that misconfiguration for falco or some place may causing this issue also?
And mind me asking what meaning or function of falco are these value refers to?

@leogr
Copy link
Member

leogr commented Oct 10, 2024

And mind me asking what meaning or function of falco are these value refers to?

Those are the counters of the file descriptors being tracked by Falco. Unless I'm missing something, it looks like fds are added to the state maintained by the libs, but not cleaned up (I don't know why). This may explain the continuous memory increase.

Besides possible libs issue, is that possibility that misconfiguration for falco or some place may causing this issue also?

One configuration option that comes to my mind is: base_syscalls. If Falco does not receive syscalls events that signal when a file description is freed, it might keep track of them forever. I can't think of any other option now, but there might be.

@chenliu1993
Copy link

chenliu1993 commented Oct 10, 2024

We have not set any base_syscalls config yet but from debug logs, I can see these are the syscalls by default

accept, accept4, bind, capset, chdir, chroot, clone, clone3, close, connect, creat, dup, dup2, dup3, epoll_create, epoll_create1, eventfd, eventfd2, execve, execveat, fchdir, fcntl, fork, getsockopt, inotify_init, inotify_init1, io_uring_setup, memfd_create, mount, open, open_by_handle_at, openat, openat2, pidfd_getfd, pidfd_open, pipe, pipe2, prctl, prlimit, procexit, recvfrom, recvmsg, sendmsg, sendto, setgid, setpgid, setresgid, setresuid, setrlimit, setsid, setuid, shutdown, signalfd, signalfd4, socket, socketpair, timerfd_create, umount, umount2, userfaultfd, vfork

@dungngo4520
Copy link

This problem is still reproducible on current latest version (0.39.1).

After investigating, i found that falco is using malloc/free of glibc. If the memory allocated is smaller than 128kb, the free function does not actually return memory to the system when called. See this stackoverflow question

If I use jemalloc to override the default glibc malloc/free, the memory still increases in high load environment, but DROPS after 20min. And when I stop making it "high load", the memory return to normal, no sign of memory leak.

For now, I will still continue to use "jemalloc" to workaround this problem, but I think in the future, falco should switch to use this lib instead of the "leaky" glibc malloc/free.

@FedeDP
Copy link
Contributor

FedeDP commented Nov 15, 2024

That's an interesting discover @dungngo4520 !
I recently pushed an upstream branch in libs to test the usage of tbb malloc library: https://oneapi-src.github.io/oneTBB/main/tbb_userguide/Linux_C_Dynamic_Memory_Interface_Replacement.html.
It should also help a bit with perf: https://github.com/falcosecurity/libs/tree/wip/test_tbbmem

We will definitely check jemalloc too and test which is the best one; will keep you all updated :)

@FedeDP
Copy link
Contributor

FedeDP commented Nov 20, 2024

I opened a PR to port Falco release artifacts to jemalloc: #3406 🚀

@jemag
Copy link

jemag commented Dec 2, 2024

After discussing on slack, I was told to provide information about a similar issue on our side (https://kubernetes.slack.com/archives/CMWH3EH32/p1733150481383569?thread_ts=1732623114.184849&cid=CMWH3EH32)


We seem to be experiencing a memory leak happening only on one of our nodes.

This only seem to happen under very particular circumstances as we have only seen it happen on one out of 5 clusters, and only on a specific node.
Our clusters are fairly small (3-4 nodes) and the daemonset is not collecting audit logs (separate deployment does so). We can see that even increasing memory to 1536MiB, it just keeps increasing infinitely until it crashes
image

Comparatively, pods on the other nodes and in other clusters run with very low memory (80-100MiB)

image
So this is definitely abnormal for us.

Here is another test we did increasing memory limit further to 2000MiB. We can see the linear increase of memory over time quite clearly:
image

Reminder that for all other nodes this is only around 80-100MiB instead.


Additional information:
We use helm to deploy with the following values:

falco:
  podSecurityContext:
    appArmorProfile:
      type: Unconfined
  serviceMonitor:
    create: true
    labels:
      app: kube-prometheus-stack
      release: kube-prometheus-stack
  extra:
    args:
      - --disable-cri-async
  resources:
    requests:
      cpu: 100m
      memory: 300Mi
    limits:
      cpu: "4"
      memory: 2000Mi
  tty: false
  controller:
    kind: daemonset
  driver:
    enabled: true
    kind: modern_ebpf
    modernEbpf:
      leastPrivileged: true
  collectors:
    enabled: true
    kubernetes:
      enabled: false
  metrics:
    enabled: true
  falco:
    grpc:
      enabled: true
    grpc_output:
      enabled: true
    rules_files:
      - /etc/falco/falco_rules.yaml
      - /etc/falco/falco-incubating_rules.yaml
      - /etc/falco/rules.d
    webserver:
      prometheus_metrics_enabled: true
  falcoctl:
    artifact:
      install:
        resources:
          requests:
            cpu: 10m
            memory: 50Mi
          limits:
            memory: 100Mi
        # -- Enable the init container. We do not recommend installing (or following) plugins for security reasons since they are executable objects.
        enabled: true
      follow:
        resources:
          requests:
            cpu: 10m
            memory: 50Mi
          limits:
            memory: 100Mi
        # -- Enable the sidecar container. We do not support it yet for plugins. It is used only for rules feed such as k8saudit-rules rules.
        enabled: true
    config:
      artifact:
        install:
          # -- List of artifacts to be installed by the falcoctl init container.
          refs: [falco-rules:3, falco-incubating-rules:3]
        follow:
          # -- List of artifacts to be installed by the falcoctl init container.
          refs: [falco-rules:3, falco-incubating-rules:3]
  falcosidekick:
    resources:
      requests:
        cpu: 50m
        memory: 40Mi
      limits:
        memory: 150Mi
    enabled: true
    webui:
      initContainer:
        resources:
          limits:
            memory: 128Mi
          requests:
            cpu: 10m
            memory: 50Mi
      ingress:
        enabled: true
        annotations:
          cert-manager.io/issuer: aks-dns-issuer
          nginx.ingress.kubernetes.io/auth-signin: https://$host/oauth2/start?rd=$escaped_request_uri
          nginx.ingress.kubernetes.io/auth-url: https://$host/oauth2/auth
          nginx.ingress.kubernetes.io/proxy-buffer-size: 8k
          nginx.ingress.kubernetes.io/proxy-buffers-number: "4"
        hosts:
          - host: falco-ui.aks.azure.cloud-nuage.dfo-mpo.gc.ca
            paths:
              - path: /
        tls:
          - hosts:
              - falco-ui.aks.azure.cloud-nuage.dfo-mpo.gc.ca
            secretName: falco-sidekick-ui-tls-secret
      resources:
        requests:
          cpu: 10m
          memory: 50Mi
        limits:
          memory: 100Mi
      ttl: 7d
      enabled: true
      replicaCount: 1
      redis:
        storageSize: "10Gi"
        resources:
          requests:
            cpu: 10m
            memory: 500Mi
          limits:
            memory: 600Mi
      disableauth: true
    config:
      teams:
        webhookurl: "https://086gc.webhook.office.com/webhookb2/536cb545-ca6a-457d-953d-e51eba402c16@1594fdae-a1d9-4405-915d-011467234338/IncomingWebhook/4679627f3f784184b5df1eca332ccc03/046b0d7c-7e51-4be5-96ee-a07207ea78ad"
        activityimage: ""
        outputformat: "all"
        minimumpriority: "notice"
    serviceMonitor:
      # -- enable the deployment of a Service Monitor for the Prometheus Operator.
      enabled: true
      # -- specify Additional labels to be added on the Service Monitor.
      additionalLabels:
        app: kube-prometheus-stack
        release: kube-prometheus-stack

  k8s-metacollector:
    resources:
      requests:
        cpu: 10m
        memory: 50Mi
      limits:
        memory: 100Mi
    containerSecurityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      seccompProfile:
        type: RuntimeDefault
    serviceMonitor:
      create: true
      # -- path at which the metrics are expose by the k8s-metacollector.
      path: /metrics
      # -- labels set of labels to be applied to the ServiceMonitor resource.
      labels:
        app: kube-prometheus-stack
        release: kube-prometheus-stack
    grafana:
      dashboards:
        enabled: true

  #TODO: add exception for grafana unexpected connection
  customRules:
    custom.local.yaml: |-

      - list: devops_agents
        items: [
          iwlsacr.azurecr.io/ubuntu-agent,
          dmpacr.azurecr.io/azure-build-agent,
          ]

      # Rule : Change thread namespace

      - list: user_change_thread_namespace_images
        items: [
          docker.io/library/alpine,
          ghcr.io/inspektor-gadget/inspektor-gadget,
          ]

      - macro: weaveworks_kured
        condition: (container.image.repository=weaveworks/kured and proc.pname=kured)

      - rule: Change thread namespace
        condition: and (not weaveworks_kured) and (not container.id=host) and (not proc.name=calico-node) and (not container.image.repository in (user_change_thread_namespace_images))
        override:
          condition: append

      # Rule : Launch Privileged Container and Launch Excessively Capable Container

      - list: user_privileged_images
        items: [
          mcr.microsoft.com/azuremonitor/containerinsights/ciprod,
          mcr.microsoft.com/aks/hcp/hcp-tunnel-front,
          mcr.microsoft.com/oss/calico/node,
          mcr.microsoft.com/oss/kubernetes-csi/secrets-store/driver,
          mcr.microsoft.com/oss/kubernetes/kube-proxy,
          weaveworks/kured,
          mcr.microsoft.com/oss/kubernetes-csi/azuredisk-csi,
          mcr.microsoft.com/oss/kubernetes-csi/azurefile-csi,
          mcr.microsoft.com/mirror/docker/library/busybox,
          mcr.microsoft.com/aks/ip-masq-agent-v2,
          ghcr.io/inspektor-gadget/inspektor-gadget,
          docker.io/library/alpine,
          mcr.microsoft.com/oss/calico/pod2daemon-flexvol,
          docker.io/calico/node-driver-registrar,
          docker.io/calico/csi,
          ]

      - macro: user_privileged_containers
        condition: (container.image.repository in (user_privileged_images))
        override:
          condition: replace

      # Rule : Mount Launched in Privileged Container

      - list: user_privileged_mount_images
        items: [
          mcr.microsoft.com/oss/kubernetes-csi/secrets-store/driver,
          mcr.microsoft.com/oss/kubernetes-csi/azuredisk-csi,
          ]

      - macro: user_privileged_mount_containers
        condition: (container.image.repository in (user_privileged_mount_images))

      - rule: Mount Launched in Privileged Container
        condition: and not user_privileged_mount_containers
        override:
          condition: append

      # Rule : Terminal shell in container

      - list: user_expected_terminal_shell_images
        items: [
          devops_agents,
          docker.io/library/busybox,
          apscommonacr.azurecr.io/aks/ubuntu-agent,
          docker.io/grafana/promtail,
          ]

      - macro: user_expected_terminal_shell_in_container_conditions
        condition: (container.image.repository in (user_expected_terminal_shell_images))
        override:
          condition: replace

      # Rule : Delete or rename shell history

      - list: user_known_rename_shell_history_images
        items: [
          devops_agents,
          apscommonacr.azurecr.io/aks/ubuntu-agent,
          ]

      - macro: user_known_rename_shell_history_containers
        condition: (container.image.repository in (user_known_rename_shell_history_images))

      - macro: user_known_rename_shell_history_command_and_file
        condition: ((proc.cmdline startswith containerd and evt.arg.name startswith /var/lib/containerd/)
                    or (proc.cmdline startswith bash and evt.arg.name startswith /root/.bash_history))

      - rule: Delete or rename shell history
        condition: and not user_known_rename_shell_history_containers and not user_known_rename_shell_history_command_and_file
        override:
          condition: append

      # Rule : Contact K8S API Server From Container

      - list: user_known_contact_k8s_api_server_images
        items: [
          docker.io/weaveworks/kured,
          quay.io/kiwigrid/k8s-sidecar,
          quay.io/argoproj/argocd,
          ghcr.io/kyverno/kyverno,
          ghcr.io/kyverno/kyverno-cli,
          ghcr.io/kyverno/kyvernopre,
          ghcr.io/dexidp/dex,
          ghcr.io/argoproj-labs/argocd-extensions,
          ghcr.io/kyverno/policy-reporter-kyverno-plugin,
          ghcr.io/kyverno/policy-reporter,
          quay.io/argoproj/argocli,
          quay.io/kubescape/kubescape,
          quay.io/kubescape/storage,
          ghcr.io/argoproj-labs/argocd-extensions,
          docker.io/bitnami/kubectl,
          oci.external-secrets.io/external-secrets/external-secrets,
          ghcr.io/kyverno/background-controller,
          ghcr.io/aquasecurity/trivy-operator,
          ghcr.io/kyverno/cleanup-controller,
          ghcr.io/kyverno/reports-controller,
          docker.io/grafana/promtail,
          docker.io/falcosecurity/k8s-metacollector,
          ghcr.io/fjogeleit/trivy-operator-polr-adapter,
          ghcr.io/opencost/opencost,
          quay.io/jetstack/cert-manager-startupapicheck,
          ghcr.io/inspektor-gadget/inspektor-gadget,
          docker.io/falcosecurity/falcosidekick,
          ghcr.io/aquasecurity/node-collector,
          ghcr.io/stakater/reloader,
        ]

      - macro: user_known_contact_k8s_api_server_activities
        condition: (proc.cmdline startswith cainjector or
                    proc.cmdline startswith controller or
                    proc.cmdline startswith fluent-bit or
                    proc.cmdline startswith keda or
                    proc.cmdline startswith keda-adapter or
                    proc.cmdline startswith kube-state-metr or
                    proc.cmdline startswith kured or
                    container.image.repository in (user_known_contact_k8s_api_server_images) or
                    proc.cmdline startswith nginx-ingress or
                    proc.cmdline startswith prometheus or
                    proc.cmdline startswith operator or
                    proc.cmdline startswith secrets-store-c or
                    proc.cmdline startswith velero or
                    proc.cmdline startswith webhook)
        override:
          condition: replace

      # Rule : Launch Package Management Process in Container

      - list: user_known_package_manager_in_images
        items: [
          mcr.microsoft.com/azuremonitor/containerinsights/ciprod,
          dmpacr.azurecr.io/azure-build-agent,
          docker.io/moby/buildkit,
          quay.io/kubecost1/opencost-ui,
          iwlsacr.azurecr.io/ubuntu-agent,
          apscommonacr.azurecr.io/aks/ubuntu-agent,
          docker.io/library/python,
        ]

      - macro: user_known_package_manager_in_container
        condition: (container.image.repository in (user_known_package_manager_in_images))
        override:
          condition: replace

      # Rule : Set Setuid or Setgid bit

      - list: user_known_set_setuid_or_setgid_images
        items: [
          devops_agents,
          docker.io/moby/buildkit,
        ]

      - macro: user_known_set_setuid_or_setgid_bit_conditions
        condition: (container.image.repository in (user_known_set_setuid_or_setgid_images) or
                    (evt.arg.filename startswith /var/lib/kubelet/pods/) or
                    (evt.arg.filename startswith /var/lib/containerd/io) or
                    (evt.arg.filename startswith /var/lib/containerd/tmpmounts) or
                    container.id="")
        override:
          condition: replace

      # Rule : Redirect STDOUT/STDIN to Network Connection in Container

      - list: user_known_redirect_stdin_stdout_network_images
        items: [
          mcr.microsoft.com/aks/hcp/hcp-tunnel-front,
          quay.io/argoproj/argocd,
          iwlsacr.azurecr.io/ubuntu-agent,
          apscommonacr.azurecr.io/aks/ubuntu-agent,
          registry.k8s.io/ingress-nginx/controller,
          ghcr.io/aquasecurity/trivy,
          docker.io/moby/buildkit,
        ]

      - list: user_known_redirect_stdin_stdout_network_processes
        items: [
          calico-node,
          kube-proxy,
          buildkitd,
        ]

      - macro: user_known_stand_streams_redirect_activities
        condition: (container.image.repository in (user_known_redirect_stdin_stdout_network_images)
                   or proc.name in (user_known_redirect_stdin_stdout_network_processes))
        override:
          condition: replace

      - rule: Redirect STDOUT/STDIN to Network Connection in Container
        output: Redirect stdout/stdin to network connection (gparent=%proc.aname[2] ggparent=%proc.aname[3] gggparent=%proc.aname[4] fd.sip=%fd.sip connection=%fd.name lport=%fd.lport rport=%fd.rport fd_type=%fd.type fd_proto=fd.l4proto evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline terminal=%proc.tty %container.info container_duration=%container.duration)
        override:
          output: append

      # Rule : Read sensitive file untrusted

      - macro: user_read_sensitive_file_conditions
        condition: >
          (container.id=host) and
          (fd.name in (/etc/shadow))

      # Rule : Modify Shell Configuration File

      - list: user_known_modify_shell_config_images
        items: [
          docker.io/moby/buildkit,
          mcr.microsoft.com/azuremonitor/containerinsights/ciprod,
        ]

      - macro: user_known_modify_shell_config_containers
        condition: (container.image.repository in (user_known_modify_shell_config_images))

      - list: user_know_modify_shell_config_host_cmdline_allowed
        items: [
          containerd,
        ]

      - macro: user_known_modify_shell_config_host_cmdline
        condition: (container.id=host and
                   (proc.cmdline in (user_know_modify_shell_config_host_cmdline_allowed)
                   or fd.name startswith /var/lib/containerd/tmpmounts
                   or fd.name startswith /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs)
                   or container.id="")

      - rule: Modify Shell Configuration File
        condition: and not user_known_modify_shell_config_host_cmdline and not user_known_modify_shell_config_containers
        override:
          condition: append

      # Rule : Clear Log Activities

      - list: user_known_clear_log_files_images
        items: [
          docker.io/moby/buildkit,
        ]

      - macro: allowed_clear_log_files
        condition: (container.id=host and
                   (fd.name startswith /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs
                   or fd.name startswith /var/lib/containerd/tmpmounts
                   or fd.name startswith /var/log/azure-vnet.log)
                   or (container.image.repository in (user_known_clear_log_files_images))
                   or container.id="")
        override:
          condition: replace

      - macro: containerd_activities
        condition: or (proc.name=containerd and fd.name startswith "/var/log/azure-vnet.log")
        override:
          condition: append

      # Rule : DB program spawned process

      - list: user_known_db_spawned_process_images
        items: [
          dmpacr.azurecr.io/azure-build-agent,
        ]

      - macro: user_known_db_spawned_processes
        condition: (container.image.repository in (user_known_db_spawned_process_images))
        override:
          condition: replace

      # Rule : System procs network activity

      - list: user_expected_system_procs_network_activity_images
        items: [
          dmpacr.azurecr.io/azure-build-agent,
          quay.io/argoproj/argocd,
          ghcr.io/aquasecurity/trivy,
          docker.io/bitnami/kubectl,
          iwlsacr.azurecr.io/ubuntu-agent,
          apscommonacr.azurecr.io/aks/ubuntu-agent,
        ]

      - list: user_expected_system_procs_network_activity_conditions_pnames
        items: [
          azure-vnet,
          aks-log-collect,
          ]

      - macro: user_expected_system_procs_network_activity_conditions
        condition: ((container.image.repository in (user_expected_system_procs_network_activity_images))
                   or (proc.pname in (user_expected_system_procs_network_activity_conditions_pnames)))
        override:
          condition: replace

      # Rule : Launch Suspicious Network Tool in Container

      - list: user_known_network_tool_images
        items: [
          dmpacr.azurecr.io/zookeeper,
          dmpacr.azurecr.io/busybox-custom
          docker.io/library/busybox,
        ]

      - macro: user_known_network_tool_activities
        condition: (container.image.repository in (user_known_network_tool_images))
        override:
          condition: replace

      # Rule : Non sudo setuid

      - rule: Non sudo setuid
        enabled: false
        override:
          enabled: replace

      # Rule : Create files below dev
      # NOTE: temporarity disable until missing image issue is fixed by Falco

      # - list: user_known_create_files_below_dev_images
      #   items: [
      #     mcr.microsoft.com/azuremonitor/containerinsights/ciprod,
      #   ]
      #
      # - macro: user_known_create_files_below_dev_activities
      #   condition: (container.image.repository in (user_known_create_files_below_dev_images))

      - rule: Create files below dev
        enabled: false
        override:
          enabled: replace

      # Rule : Drop and execute new binary in container

      - list: known_drop_and_execute_containers
        items: [
          iwlsacr.azurecr.io/ubuntu-agent,
          docker.io/moby/buildkit,
          mcr.microsoft.com/azuremonitor/containerinsights/ciprod,
        ]
        override:
          items: append

      - list: drop_and_execute_new_binary_in_container_cmdline
        items: [
          "crond -n -s",
        ]

      - rule: Drop and execute new binary in container
        condition: and not proc.cmdline in (drop_and_execute_new_binary_in_container_cmdline)
        override:
          condition: append

      # Rule : Read environment variable from /proc files

      - list: known_binaries_to_read_environment_variables_from_proc_files
        items: [
          Agent.Worker,
          .NET ThreadPool,
          .NET Sockets,
        ]
        override:
          items: append

      # Rule : Launch Ingress Remote File Copy Tools in Container

      - list: user_known_ingress_remote_file_copy_images
        items: [
          mcr.microsoft.com/azuremonitor/containerinsights/ciprod,
          iwlsacr.azurecr.io/ubuntu-agent,
          docker.io/library/busybox,
        ]

      - macro: user_known_ingress_remote_file_copy_activities
        condition: (container.image.repository in (user_known_ingress_remote_file_copy_images))
        override:
          condition: replace

      # Rule : Unexpected UDP Traffic

      - rule: Unexpected UDP Traffic
        enabled: false
        override:
          enabled: replace

      # Rule : Contact EC2 Instance Metadata Service From Container

      - rule: Contact EC2 Instance Metadata Service From Container
        enabled: false
        override:
          enabled: replace

      # Rule : Change namespace privileges via unshare

      - rule: Change namespace privileges via unshare
        enabled: false
        override:
          enabled: replace

      # Rule : Contact cloud metadata service from container

      # disable rule since missing some field data
      - rule: Contact cloud metadata service from container
        enabled: false
        override:
          enabled: replace

      # Rule : Read ssh information

      - list: user_known_read_ssh_information_activities_images
        items: [
          quay.io/argoproj/argocd,
          ]

      - macro: user_known_read_ssh_information_activities
        condition: (container.image.repository in (user_known_read_ssh_information_activities_images))
        override:
          condition: replace

      # Rule : Create Hardlink Over Sensitive Files

      - list: pname_create_hardlink_over_sensitive_files
        items: [
          buildkit-runc,
          ]

      - rule: Create Hardlink Over Sensitive Files
        condition: and not proc.pname in (pname_create_hardlink_over_sensitive_files)
        override:
          condition: append

      # Rule: Schedule Cron Jobs

      - list: user_known_cron_jobs_images
        items: [
          mcr.microsoft.com/azuremonitor/containerinsights/ciprod,
          ]

      - macro: user_known_cron_jobs
        condition: (container.image.repository in (user_known_cron_jobs_images)
                    or container.id="")
        override:
          condition: replace

      # Rule: Exfiltrating Artifacts via Kubernetes Control Plane

      - list: exfiltrating_artifacts_via_kubernetes_control_plane_images
        items: [
          iwlsacr.azurecr.io/ubuntu-agent,
          apscommonacr.azurecr.io/aks/ubuntu-agent,
          ]

      - rule: Exfiltrating Artifacts via Kubernetes Control Plane
        condition: and not container.image.repository in (exfiltrating_artifacts_via_kubernetes_control_plane_images)
        override:
          condition: append

      # Rule: Launch Ingress Remote File Copy Tools in Container

      - list: user_known_ingress_remote_file_copy_activities_images
        items: [
          docker.io/moby/buildkit,
        ]

      - macro: user_known_ingress_remote_file_copy_activities
        condition: (container.image.repository in (user_known_ingress_remote_file_copy_activities_images))
        override:
          condition: replace


      # Rule: Packet socket created in container

      - list: packet_socket_created_in_container_images
        items: [
          ghcr.io/inspektor-gadget/inspektor-gadget,
          ]

      - rule: Packet socket created in container
        condition: and not container.image.repository in (packet_socket_created_in_container_images)
        override:
          condition: append

Environment

k8s, falco in container

Falco version:

Mon Dec  2 15:05:25 2024: Falco version: 0.39.2 (x86_64)
Mon Dec  2 15:05:25 2024: Falco initialized with configuration files:
Mon Dec  2 15:05:25 2024:    /etc/falco/falco.yaml | schema validation: ok
Mon Dec  2 15:05:25 2024: System info: Linux version 5.15.167.1-2.cm2 (root@CBL-Mariner) (gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Oct 29 03:13:52 UTC 2024
{"default_driver_version":"7.3.0+driver","driver_api_version":"8.0.0","driver_schema_version":"2.0.0","engine_version":"43","engine_version_semver":"0.43.0","falco_version":"0.39.2","libs_version":"0.18.2","plugin_api_version":"3.7.0"}

System info:

Mon Dec  2 15:06:10 2024: Falco version: 0.39.2 (x86_64)
Mon Dec  2 15:06:10 2024: Falco initialized with configuration files:
Mon Dec  2 15:06:10 2024:    /etc/falco/falco.yaml | schema validation: ok
Mon Dec  2 15:06:10 2024: System info: Linux version 5.15.167.1-2.cm2 (root@CBL-Mariner) (gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Oct 29 03:13:52 UTC 2024
Mon Dec  2 15:06:10 2024: Loading rules from:
Mon Dec  2 15:06:10 2024:    /etc/falco/falco_rules.yaml | schema validation: ok
Mon Dec  2 15:06:10 2024:    /etc/falco/falco-incubating_rules.yaml | schema validation: ok
Mon Dec  2 15:06:10 2024:    /etc/falco/rules.d/custom.local.yaml | schema validation: ok
Mon Dec  2 15:06:10 2024: /etc/falco/rules.d/custom.local.yaml: Ok, with warnings
4 Warnings:
In rules content: (/etc/falco/falco-incubating_rules.yaml:0:0)
    rule 'System procs network activity': (/etc/falco/falco-incubating_rules.yaml:663:2)
------
- rule: System procs network activity
  ^
------
LOAD_NO_EVTTYPE (Condition has no event-type restriction): Rule matches too many evt.type values. This has a significant performance penalty.
In rules content: (/etc/falco/falco-incubating_rules.yaml:0:0)
    rule 'Unexpected UDP Traffic': (/etc/falco/falco-incubating_rules.yaml:746:2)
------
- rule: Unexpected UDP Traffic
  ^
------
LOAD_NO_EVTTYPE (Condition has no event-type restriction): Rule matches too many evt.type values. This has a significant performance penalty.
In rules content: (/etc/falco/falco-incubating_rules.yaml:0:0)
    rule 'Network Connection outside Local Subnet': (/etc/falco/falco-incubating_rules.yaml:1108:2)
------
- rule: Network Connection outside Local Subnet
  ^
------
LOAD_NO_EVTTYPE (Condition has no event-type restriction): Rule matches too many evt.type values. This has a significant performance penalty.
In rules content: (/etc/falco/rules.d/custom.local.yaml:0:0)
    list 'user_known_ingress_remote_file_copy_images': (/etc/falco/rules.d/custom.local.yaml:388:2)
------
- list: user_known_ingress_remote_file_copy_images
  ^
------
LOAD_UNUSED_LIST (Unused list): List not referred to by any other rule/macro
{
  "machine": "x86_64",
  "nodename": "falco-k65rc",
  "release": "5.15.167.1-2.cm2",
  "sysname": "Linux",
  "version": "#1 SMP Tue Oct 29 03:13:52 UTC 2024"

Cloud provider: AKS
OS:

NAME="Common Base Linux Mariner"
VERSION="2.0.20241029"
ID=mariner
VERSION_ID="2.0"
PRETTY_NAME="CBL-Mariner/Linux"
ANSI_COLOR="1;34"
HOME_URL="https://aka.ms/cbl-mariner"
BUG_REPORT_URL="https://aka.ms/cbl-mariner"
SUPPORT_URL="https://aka.ms/cbl-mariner"

Kernel: Linux aks-default-24804127-vmss00000A 5.15.167.1-2.cm2 #1 SMP Tue Oct 29 03:13:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Installation method: helm

@FedeDP
Copy link
Contributor

FedeDP commented Dec 6, 2024

@jemag can you try to use jemalloc as allocator on the faulty node?
See #2495 (comment). Jemalloc wiki for reference: https://github.com/jemalloc/jemalloc/wiki/Getting-Started

@jemag
Copy link

jemag commented Dec 6, 2024

@FedeDP unfortunately yesterday we cycled through nodes during an AKS node image update and the problematic node is now gone.

I cannot reproduce the problem now. I don't know what kind of specific circumstances could have caused this, especially since all those nodes were part of an Azure VM scale set and should all be similar. If I notice the problem happening again, I will make sure to try to use jemalloc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In progress
Development

No branches or pull requests