Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement containers metadata support as a plugin #3403

Open
FedeDP opened this issue Nov 13, 2024 · 6 comments
Open

Reimplement containers metadata support as a plugin #3403

FedeDP opened this issue Nov 13, 2024 · 6 comments
Milestone

Comments

@FedeDP
Copy link
Contributor

FedeDP commented Nov 13, 2024

Motivation

Historically, Falco container's information retrieval has always caused headaches to users; moreover, the current code does cause headaches to developers too.
While it's not always working fine, it also has some limitations:

  • There is no support for multiple container runtimes on a single node: Support multiple container runtime in one node #3279
  • Since the container info retrieval is done asynchronously at first clone/execve/fork event, we lose container metadata information for the very first syscall events generated by threads spawned in a new container
  • As previously stated, the code is also very hard to extend; there has been some previous attempts to improve it but for now that's what we got

Feature

Rewrite container info support as a plugin, and completely drop libsinsp internal implementation.
The plugin will leverage container engines SDKs to get notified when new containers are created/deleted; the container creation happens before the container is even started thus it should give us a time advantage and should avoid event without container metadata.
It will also be responsible of extracting all container and remaining k8s related filterchecks.

Main idea is to drop container manager from sinsp, and container_id from threadinfo.
The container_id will be added as a foreign key leveraging the libsinsp state table API by the plugin, that will take care of filling it.
Basically, sinsp without the plugin loaded will be completely unaware of containers.

All of the above should be done without breaking changes for the end users. In Falco 0.39 we added a container_engines configuration key: https://github.com/falcosecurity/falco/blob/master/falco.yaml#L1312; since it is in incubating state, we can drop it (since we will rely on the plugin init_config of course). That should be the only user-facing change.

The container plugin github link: https://github.com/FedeDP/container_plugin
Its readme has the architecture explanation and diagram: https://github.com/FedeDP/container_plugin/raw/main/architecture.svg
Also, there is a TODO file with stuff letf to do: https://github.com/FedeDP/container_plugin/blob/main/TODO.md

Ongoing libs PR with big container related stuff cleanup: falcosecurity/libs#2207
See the PR for relevant TODOs.

Alternatives

No clear alternative.

Additional context

Linking required PRs:

Listing related issues/PRs:

Next step would be to move user/group manager to a plugin too.

@FedeDP
Copy link
Contributor Author

FedeDP commented Nov 13, 2024

/milestone 0.40.0

@poiana poiana added this to the 0.40.0 milestone Nov 13, 2024
@leogr
Copy link
Member

leogr commented Nov 13, 2024

Thank you @FedeDP , this is awesome! 🤩

@Issif
Copy link
Member

Issif commented Nov 13, 2024

😁

@FedeDP
Copy link
Contributor Author

FedeDP commented Dec 9, 2024

An update about where i am.
Last week i patched sinsp-example to support plugins (falcosecurity/libs#2179); this allowed me to run a simple test to check new container plugin performances against sinsp internal implementation:

sudo ./libsinsp/examples/sinsp-example -p "/home/federico/Work/container/libcontainer.so" -m -f "container_plugin.id!=host" -o "%evt.num) [%evt.time] %evt.type | id: %container_plugin.id | image_plugin: %container_plugin.image | image_sinsp: %container.image" &> out.txt

As you can see, i just changed a couple of fields exposed by the plugin (container.id -> container_plugin.id and container.image -> container_plugin.image) to be able to check which metadata comes sooner, ie: the one from the plugin vs the one from sinsp.

Results were great, as expected :)

Wed Dec 4 10:59:02 2024: [warning] [container] [EXPERIMENTAL] This plugin is in active development and may undergo changes in behavior without prioritizing backward compatibility.
-- Try to open: 'modern_bpf' engine.
-- Engine 'modern_bpf' correctly opened.
-- Start capture
213418) [10:59:04.082391254] clone | id: 6ae75f90719b | image_plugin: ubuntu | image_sinsp: 
213424) [10:59:04.082400231] page_fault | id: 6ae75f90719b | image_plugin: ubuntu | image_sinsp: 

The very first event in the container (ie: the clone) is already enriched with metadata from the plugin, while sinsp has just started the async lookup for the docker infos.
It takes ~2800 events in the container (mostly schedswitches) until sinsp is finally able to retrieve container's metadata:

254332) [10:59:04.173562276] openat | id: 6ae75f90719b | image_plugin: ubuntu | image_sinsp: 
254340) [10:59:04.173575088] switch | id: 6ae75f90719b | image_plugin: ubuntu | image_sinsp: 
261002) [10:59:04.183296185] openat | id: 6ae75f90719b | image_plugin: ubuntu | image_sinsp: ubuntu
261004) [10:59:04.183299313] futex | id: 6ae75f90719b | image_plugin: ubuntu | image_sinsp: ubuntu

Time-wise, it took 10:59:04.183296185 - 10:59:04.082391254 ie: ~100ms for the sinsp implementation to catch up!

Also, my upstream cleanup/drop_container_manager branch is now building fine, libsinsp unit tests pass fine, python e2e test framework passes fine, and sinsp-example is able to load the container plugin in LIVE mode, and filtering/extracting fields works fine.
I am currently fixing a sinsp bug that does not let me load PARSE capability plugins while replaying scap files, so that replaying scap files will work too (i already have a local patch and then i am able to load a scap file and filtering on container's metadata fields).

@FedeDP
Copy link
Contributor Author

FedeDP commented Dec 16, 2024

Here is the new timeline for this one:

  • we will release it for Falco 0.41.
  • we will try to merge libs cleanup right after the 0.20 tag, ie: ~february
  • we will donate the plugin to falcosecurity in a similar time frame
  • once we merge libs PR and the plugin is available upstream, we will port it on Falco master
  • once we test a bit Falco master, we will tag a 0.40+container_plugin release that will basically be 0.40 + bumped libs + container plugin
  • Finally, we will ask the community to grab and test it as much as they can :)

In the end, the plan is to have this in time officially for 0.41:
/milestone 0.41.0

@FedeDP
Copy link
Contributor Author

FedeDP commented Dec 20, 2024

Opened the huge libs cleanup PR: falcosecurity/libs#2207
The CI is all green except for an asan-related issue (it spots a memleak) in the run-e2e-tests-amd64 (asan) CI.
I am still investigating it.

Also, the plugin gained support for with_size init configuration that enables the image size request; by default it is off (like it was in libs, ie: Falco never used it and had no way to enable it, but other consumers could) because it is inherently slow; it also gained support for containerd matcher (stolen from falcosecurity/libs#2195) and i completed write_thread_category implementation.

Will update issue body with latest info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants