Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new(driver, libsinsp, libscap): Add kernel signals exe_ino, exe_ino_ctime, exe_ino_mtime, pidns_init_start_ts + derived filter fields #595

Merged
merged 23 commits into from
Dec 5, 2022

Conversation

incertum
Copy link
Contributor

@incertum incertum commented Sep 12, 2022

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap-engine-udig

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Dropping an implant, making the file executable and executing the implant is amongst one of the oldest tricks. While memory based cyber attacks mostly circumvent touching disk, reliably detecting drifts, that is, a suspicious new executable is executed is often considered a crucial baseline detection.

Falco's upstream rules "Container Drift Detected (chmod)" and "Container Drift Detected (open+create)" aim to detect the creation of a new executable in a container (drift). However, both rules are disabled by default, because those rules can be noisy in un-profiled environments and workloads. Finally, currently there are no easy or robust mechanisms to correlate above rules that are based on file operation events with the events where the executable is run (execve).

This PR attempts to address this gap via adding enhanced kernel signals to spawned processes. While the proposed signals won't replace the need to monitor file operation events, they can help reduce the search space for tracking spawned processes where for example chmod +x was run against the executable file on disk prior to execution (this causes ctime of inode to change, but we don't know if it was chmod related or a different status change operation). In addition, end users could use these fields for selected rules to augment information available for incident response.

New derived filter fields based on new kernel signals

"proc.exe_ino", "Inode number of executable image file on disk", "The inode number of the executable image file on disk. Can be correlated with fd.ino."

"proc.exe_ino.ctime", "Last status change time (ctime - epoch ns) of exe file on disk", "Last status change time (ctime - epoch nanoseconds) of executable image file on disk (inode->ctime). Time is changed by writing or by setting inode information e.g. owner, group, link count, mode etc."

"proc.exe_ino.mtime", "Last modification time (mtime - epoch ns) of exe file on disk", "Last modification time (mtime - epoch nanoseconds) of executable image file on disk (inode->mtime). Time is changed by file modifications, e.g. by mknod, truncate, utime, write of more than zero bytes etc. For tracking changes in owner, group, link count or mode, use proc.exe_ino.ctime instead."

"proc.exe_ino.ctime_duration_proc_start", "Number of nanoseconds between ctime exe file and proc clone ts", "Number of nanoseconds between modifying status of executable image and spawning a new process using the changed executable image."

"proc.exe_ino.ctime_duration_pidns_start", "Number of nanoseconds between pidns start ts and ctime exe file", "Number of nanoseconds between pid namespace start ts and ctime exe file if pidns start predates ctime."

"proc.pidns_init_start_ts", "Start ts of pid namespace (epoch ns)", "Start ts (epoch ns) of pid namespace; approximate start ts of container if pid in container or start ts of host if pid in host namespace."

"container.start_ts", "Container start ts (epoch in ns)", "Container start ts (epoch in ns) based on proc.pidns_init_start_ts."

"container.duration", "Number of nanoseconds since the container start ts", "Number of nanoseconds since the container start ts."

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Includes cleanup, mainly make sched_prog_exec_4 and execve_family_flags filler alike in terms of style. Refactored (no logic changes) get_exe_writable to avoid few redundant _READ()s on same kernel structures within the same filler (@LucaGuerra).

This PR is not yet ready. Hoping for some early feedback to make these new signals better :)

Checklist (this PR)

Checklist (future PR)

  • Initial attempt for on host anomaly detection for container drift use case (expectation would be to first PR a proposal doc as this would mark a significant new feature)

Does this PR introduce a user-facing change?:

new: kernel signals `exe_ino`, `exe_ino_ctime`, `exe_ino_mtime`, `pidns_init_start_ts`, plus derived filter fields

@FedeDP
Copy link
Contributor

FedeDP commented Sep 12, 2022

This is a huge PR @incertum!
A small note that must be addressed before leaving "wip" state:

  • i think scap_procs must be updated fetching new info from proc, if we can, right?

Aside from this, it looks really cool, thank you!

@loresuso
Copy link
Member

Hello @incertum! The container drift detection seems to be something really significant, thanks for spending time on it and trying to detect this kind of behavior!
Since I see in your checklist that you are open to discuss also different ideas, although this one is really cool, I want to ask you what you think about this other approach that I came up with and described here:

#287

The main problem of this approach is that it relies on overlayfs and so it cannot work with old kernels and container runtimes that do not use it. It also needs to be tested across a wider variety of kernels to be sure that it's working, since it was like an experiment for me. I would be happy to know what you think about it!

@incertum
Copy link
Contributor Author

@FedeDP ty let me look into scap-procs - once feature complete will implement this for modern_bpf, scap file and kmod, always leave the kmod fun for the end :) Also still need to test this on more kernel versions and distros than just the one I was quickly developing on ...

@loresuso was actually lurking around that overlayfs PR a good while ago. Thanks for experimenting ❤️! In general I believe more and stronger kernel signals just like the one you proposed are needed, let's chat more.
What is needed to merge it? I approve, really nice work and think this is an excellent feature that adds even more signal for the container use case and I think it's ok that it doesn't work for super old kernels etc. Besides containers would also be interested in nailing this for bare-metal hosts.

@loresuso more signals are needed for detecting memory attacks or RCE in a more general and robust way (executables are just one aspect), one step at a time though. And I saw you also refactored the get_exe_writable and created similar get_exe_inode lol, we can sync on how to merge this cleanup into one approach.
Also once all new kernel signals we can come up with at the moment are merged, wanna team up on creating a strong and robust userspace logic to nail it? Would be amazing if some rules come out at the other end that can be enabled by default aka they can work in unknown environments. Called it anomaly detection, but we can also call it advanced signal correlation etc 🙃.


Re fetching the container start time or pid namespace creation time works too still monkeying around if this is best implemented kernel side. Something like somehow fetching the start time of pid=1 as seen from process namespace or the creation ts of the pid namespace the process belongs too ... would you have any thoughts on this?

@LucaGuerra
Copy link
Contributor

Hey folks, I'd like to add my thoughts to the discussion since I originally introduced the is_exe_writable flag for this purpose, discussed a lot with Lorenzo about its evolution is_exe_upper_layer and am very interested in basically catching suspicious executions. While it's true that attacks can be fully in memory (which would bypass any file-based rule of course) we all know that a defense-in-depth strategy needs to consider many cases. Also, I expect the most common attacks to be indeed file based. This is a bit of a larger discussion that we may want to expand somewhere.

Regarding attack scenarios, the proposed fields would allow us to add another way to filter events to try and reduce the noise from this kind of rules. I would love for Falco to be able to have a set of rules to deal with the standard "drop + execute" case. This is what comes to mind:

  • In containers you can do, depending on how your container is built, one of two things: you either use is_exe_writable with containers that runs as regular user but has executable files normally owned by root (this is the default if you run as user!) or is_exe_upper_layer which works with containers executed as root as well 😎 This alerts for new executables at all times.
  • On hosts in my opinion the best bet is is_exe_writable and inspect non-root users I think because root on a host does way too many things 😭 . Installing and updating software is common, downloading and running software happens often during normal deployments ... So many regular actions would trigger drop+execute that may make this pretty useless :/ But in some deployments it's not expected for regular users to bring their own binaries, and that's what I would want to catch. Also, remember that true root can change mtime and ctime of all files if it wants.

@incertum 's idea I think is definitely clever, as it allows us to add the time dimension to the above. You can say "If a regular user is running an executable that they can modify AND it has been modified 'recently', then alert". This allows us to detect drops without drowning in noise caused by system-wide software updates and new deployments. Same goes for containers. In that case I like the stronger properties of is_exe_upper_layer because you can't easily evade it if you're inside a container. Even if you drop a file today and schedule its execution at some other time it will be caught.

In conclusion, I probably want all of these fields 😎 I actually wanted is_exe_upper_layer in 0.33.0 but there's so much content that is going into that release that we probably want to merge it right after the release so we have time to test it and see that it doesn't break too many things (every new thing happening at process start as you could see is a little tricky...). Does it make sense to you? The first step as you mentioned could be to refactor and generalize the exe inode data collection in the kernel and ebpf.

@incertum
Copy link
Contributor Author

@LucaGuerra ❤️ 😎 as always a fantastic summary and technical assessment of what the actual problem here is. Fully agree that all these signals combined will be super valuable in addition to existing metadata fields. It's nice to see three folks having come to similar conclusions, that is, (1) it is at process startup where we need to fetch better kernel signals and (2) this old problem "drop+execute" has not yet been well addressed.

Of course the "host" is the more tricky one, doesn't change the fact that I have been asked to fix / solve this ... so thinking we won't get away without determining a pattern of past behavior of the applications that are running, and analyze behaviors outside the past behavior. There will be both data modeling challenges and software implementation challenges, the good news is similar problems have been solved in the industry before and we can build upon this. Needless to say let's start more basic and iterate.

How about first merging @loresuso PR that features is_exe_upper_layer after the upcoming release freeze, I'll continue monkeying around a bit for next 2 weeks and see if maybe there are more kernel signals that could be valuable. Perhaps you stumble across something new as well 🙃 that would be cool.
After everything is merged we collaborate on a fresh PR that just does userspace modeling? Also happy to offer deploying a prototype to production to be able to better assess how well it may work and also check that Falco does not deteriorate in case we introduce some significant new userspace features.


... Also, I expect the most common attacks to be indeed file based. This is a bit of a larger discussion that we may want to expand somewhere.

Would you have ideas re what the best forum would be to expand on those Threat Modeling discussions?

Also, remember that true root can change mtime and ctime of all files if it wants.

Yeah you can never just have nice things in security, hence why I am a big fan of multi-signal correlations.

@loresuso
Copy link
Member

loresuso commented Sep 15, 2022

Thanks @incertum @LucaGuerra, this conversation is getting more and more interesting!
I strongly agree that all these signals combined together are needed to improve the detection capabilities of the drop+execute pattern. So, soon after the release, I'll try my best to get the exe_upper_layer merged. Some help in testing it better before the merge would be really appreciated!
Also, I wanted to say that I am thrilled to team up altogether to discuss how to improve the detection capabilities of Falco with these new signals.

I also believe that we have to expand the conversation (maybe in Slack or a Github issue?) to other attack patterns. I think we may want to research a bit on fileless execution (especially the one implemented with memfd_create. Execution from tmpfs) and post container escapes behaviors (like accessing files outside overlayfs from not mounted fs). I think these patterns are widespread too nowadays and I have some ideas that I would love to share with you!

@incertum
Copy link
Contributor Author

incertum commented Sep 15, 2022

Edited: We have moved all brainstorming to #615 in order to keep this PR focused.

@poiana poiana added size/XL and removed size/L labels Sep 19, 2022
@incertum
Copy link
Contributor Author

Kernel side solution for robustness reasons: Add pid namespace init task start ts to generically approx container or host start ts and compute time deltas useful for detections, such as container duration or duration between pidns start ts and ctime exe file if pidns ts predates ctime. A general detection use case can be that if suspicious events happen in multiple containers of a deployment near container start it's more likely to be "normal". The longer a container runs the longer it is "exposed".

What questions do you have re the proposed approach to solve above? Would it be possible to check soundness of this approach? This would be much appreciated. Initial experimentation seemed correct ts values for various scenarios, but will continue testing.

@incertum
Copy link
Contributor Author

Another kernel side signal that would like to look into and possibly add to this PR would be:

"Interpreter scripts" aka text files with execute permissions (see https://man7.org/linux/man-pages/man2/execve.2.html)
For example chmod +x a.sh && ./a.sh or chmod +x a.sh && exec ./a.sh is currently logged as "proc.exepath":"/tmp/a.sh","proc.name":"a.sh","proc.cmdline":"a.sh ./a.sh", but the interpreter was configured as #! /bin/sh and we wouldn't know what interpreter binary ran the script directly or that it was not a binary without inferring from extension if even available and we know how fragile that is.

Please note, not talking about the use case where you run the interpreter and pass the script, like /bin/sh a.sh would give "proc.exepath":"/bin/sh","proc.name":"sh","proc.cmdline":"sh a.sh".

Any thoughts on above? @LucaGuerra @loresuso @FedeDP @Andreagit97

After that this PR should be feature complete and can start finalizing it, followed by code optimization review.

@incertum incertum force-pushed the new-executable-enhanced-signal branch from d3d1c9d to a5510e7 Compare September 23, 2022 05:27
@incertum incertum changed the title [WIP] - Add kernel signals exe_ino, exe_ino_ctime, exe_ino_mtime [WIP] - Add kernel signals exe_ino, exe_ino_ctime, exe_ino_mtime, pidns_init_start_ts + derived filter fields Sep 23, 2022
@incertum incertum force-pushed the new-executable-enhanced-signal branch from a5510e7 to b4b54f4 Compare September 24, 2022 00:15
@poiana poiana removed the size/XL label Sep 24, 2022
incertum and others added 11 commits December 2, 2022 11:12
Consistently have constant m_boot_ts_epoch for pidns_init_start_ts when vpid != pid.

Signed-off-by: Melissa Kilby <[email protected]>
* Add pidns_init_start_time to sched_prog_fork_3.
* Ensure consistent unsigned long long usage and init variable properly.

Signed-off-by: Melissa Kilby <[email protected]>
…to sched bpfs

* cleanup some debugging leftovers.

Signed-off-by: Melissa Kilby <[email protected]>
* address minor reviewers comments
* properly init some variables to 0 that were overlooked
* use new macro CHECK_RES(res)
* perform pidns start ts lookup only when in childtid (raw syscall tracepoints)
* formalize consistent helper function epoch_ns_from_time also in modern_bpf
* minor modern_bpf refactor based on reviewers comments
* additional cleanup after a fresh look

Co-authored-by: Andrea Terzolo <[email protected]>
Signed-off-by: Melissa Kilby <[email protected]>
* remove redudant CHECK_RES(res) when possible
* cleanup epoch_ns_from_time helper function
* modern_bpf rename function variable for extract__task_pidns_start_time

Co-authored-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Melissa Kilby <[email protected]>
Signed-off-by: Andrea Terzolo <[email protected]>
@incertum incertum force-pushed the new-executable-enhanced-signal branch from 18f1676 to 140f612 Compare December 2, 2022 19:12
@Andreagit97 Andreagit97 force-pushed the new-executable-enhanced-signal branch from 4693f3a to db9270e Compare December 3, 2022 22:49
Signed-off-by: Andrea Terzolo <[email protected]>
@Andreagit97 Andreagit97 force-pushed the new-executable-enhanced-signal branch from db9270e to 074625a Compare December 3, 2022 22:59
@Andreagit97
Copy link
Member

The last commit should fix windows CI, just removed scap_get_host_boot_time_ns() helper since we already have scap_get_boot_time() :)

@incertum
Copy link
Contributor Author

incertum commented Dec 4, 2022

The last commit should fix windows CI, just removed scap_get_host_boot_time_ns() helper since we already have scap_get_boot_time() :)

lol classic, thanks for keeping the new more reliable method to get a constant boot ts :)

Signed-off-by: Andrea Terzolo <[email protected]>
Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Contributor

poiana commented Dec 5, 2022

LGTM label has been added.

Git tree hash: 7f7068dcd291f5ed76d4ec430bd06adb37f263bf

@poiana poiana added the approved label Dec 5, 2022
@poiana
Copy link
Contributor

poiana commented Dec 5, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: FedeDP, incertum, LucaGuerra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana merged commit 4e52eed into falcosecurity:master Dec 5, 2022
@incertum incertum deleted the new-executable-enhanced-signal branch December 8, 2023 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants