TLS: use fewer inotify instances and/or be more resilient to them not working #2513

nyh · 2024-10-21T07:02:37Z

Recently some ScyllaDB test runs failed (scylladb/scylladb#21199) with the error:

std::system_error (error system:24, could not create inotify instance: Too many open files)

It turns out that Seastar's TLS implementation internally uses inotify to automatically recognize when the certificate files have changed. The error message is misleading - for inotify_init() an EMFILE errno (which gets printed as "Too many open files") does not refer the limit on number of open files, but to the separate per-user limit on the number of inotify instances. This number is configured in /proc/sys/fs/inotify/max_user_instances and is often fairly low - e.g., on my Fedora 40 it defaults to just 128. It appears that Seastar creates an inotify instance for each shard, and the ScyllaDB test framework ran many tests in parallel, and the result was running out of inotify instances.

The ScyllaDB testers easily solved this problem by increasing the /proc/sys/fs/inotify/max_user_instances, but I'm worried that this problem can hit other Seastar users as well, who won't be aware that Seastar TLS even uses inotify, and that /proc/sys/fs/inotify/max_user_instances is so low. I want to propose that we consider two options, perhaps even both:

Use fewer inotify instances - perhaps even just once per per process instead of one per shard (e.g., have just shard 0 be responsible for the very rare inotify work). On modern many-shard machines, this will make a big difference.
Be forgiving towards failing inotify_init() in TLS: Make it a logged warning (don't use the phrase "Too many open files" in the logged message) - not an exception. When inotify_init() failed, either the certificate reloading would not work at all - or if we consider it an important feature, we can implement something similar without inotify (e.g., once a minute check if the file changed).

CC @elcallio

The text was updated successfully, but these errors were encountered:

bhalevy · 2024-12-08T10:14:41Z

@regevran please assign

bhalevy · 2024-12-08T10:17:02Z

Is it possible the inotify instances are leaking somehow when running the tests via test.py?

elcallio · 2024-12-08T10:31:20Z

No. The problem is that we use a few per shard (one for each file referenced - typically 3-4 per TLS config, say up to 3-4 TLS configs in scylla.yaml, run 8 shards and then run tests in parallel. This adds up. With ludicrously small default settings we hit the ceiling.

I've done some work to maybe make the TLS usage in scylla use a shard-0-only solution for TLS inotify, but I got sidetracked. In any case, however you design it, it will be slightly hackish...

…ose rebuild Refs scylladb#2513 Adds a more advanced callback type, which takes both actual reloading builder as argument (into which new files are loaded), and allows proper future-wait in callback. Exposes certificates rebuilding (via builder) to allow "manual", quick, reload of certs. The point of these seemingly small changes is to allow client code to, for example, limit actual reloadable_certs (and by extension inotify watches) to shard 0 (or whatever), and simply use this as a trigger for manual reload of other shards. Note: we cannot do any magical "shard-0-only" file monitor in the objects themselves, not without making the certs/builders sharded or similarly stored (which contradict the general design of light objects, copyable between shards etc). But with this, a calling app in which certs _are_ held in sharded manners, we can fairly easily delegate non-shard-0 ops in a way that fits that topology. Note: a builder can be _called_ from any shard (as long as it is safe in its originating shard), but the objects returned are only valid on the current shard. Similarly, it is safe to share the reloading builder across shards _in the callback_, since rebuilding is blocked for the duration of the call.

…ose rebuild Refs #2513 Adds a more advanced callback type, which takes both actual reloading builder as argument (into which new files are loaded), and allows proper future-wait in callback. Exposes certificates rebuilding (via builder) to allow "manual", quick, reload of certs. The point of these seemingly small changes is to allow client code to, for example, limit actual reloadable_certs (and by extension inotify watches) to shard 0 (or whatever), and simply use this as a trigger for manual reload of other shards. Note: we cannot do any magical "shard-0-only" file monitor in the objects themselves, not without making the certs/builders sharded or similarly stored (which contradict the general design of light objects, copyable between shards etc). But with this, a calling app in which certs _are_ held in sharded manners, we can fairly easily delegate non-shard-0 ops in a way that fits that topology. Note: a builder can be _called_ from any shard (as long as it is safe in its originating shard), but the objects returned are only valid on the current shard. Similarly, it is safe to share the reloading builder across shards _in the callback_, since rebuilding is blocked for the duration of the call. Closes #2573

mykaul · 2024-12-30T08:56:42Z

6f39b89 is merged in Seastar, what else do we need?

elcallio · 2025-01-07T10:20:40Z

The issue cannot really be solved on seastar level. 6f39b89 makes it possible however for calling code to avoid shard-multiplication of the inotify usage.

…ds' from Calle Wilund Refs scylladb/seastar#2513 Reloadable certificates use inotify instances. On a loaded test (CI) server, we've seen cases where we literally run out of capacity. This patch uses the extended callback and reload capability of seastar TLS to only create actual reloadable certificate objects on shard 0 for our main TLS points (encryption only does TLS on shard 0 already). Closes #22425 * github.com:scylladb/scylladb: alternator: Make server peering sharded and reuse reloadable certs messaging_service: Share reloadability of certificates across shards redis/controller: Reuse shard 0 reloadable certificates for all shards controller: Reuse shard 0 reloadable certificates for all shards generic_server: Allow sharing reloadability of certificates across shards

elcallio · 2025-02-18T09:22:10Z

#22425 was merged. It might fix this issue "enough"...

nyh · 2025-02-18T09:34:59Z

#22425 was merged. It might fix this issue "enough"...

The correct link is scylladb/scylladb#22425

However, this link is to is a patch in Scylla, not in Seastar. I was under the impression (which is why I opened this issue in the Seastar repository) that it was Seastar which is opening these inotify instances, and it was Seastar that does it on every shard.

It seems the solution in scylladb/scylladb#22425 was to only load the certificates on shard 0 because "encryption only does TLS on shard 0 already". Is this fact determined by Seastar or Scylla?

If it's just a Scylla implementation that decides to only run TLS on shard 0, then other Seastar users who want to do TLS on all shards (which... sounds reasonable... no?) will still have this problem and it's not solved.
If it's Seastar's implementation that only runs TLS on shard 0 (?), then Seastar should know not to open the certificate-reloding inotify things on every shard and should do this only on shard 0, even without patching the caller (Scylla).

elcallio · 2025-02-18T09:53:58Z

The problem cannot really be solved on a seastar level, since the objects in question do not have any "shard-sharing" attributes - they are not sharded<> objects, or even services of any kind. In fact, they have no knowledge of their counterparts/copies across shards.

#2573 adds some support for users of the objects to possibly reduce inotify footprint (by doing the actual inotify listen only on shard 0 typically), by delegating some event processing to smp callbacks or whatnot, and scylladb/scylladb#22425 uses it on a scylla level.

nyh added the TLS label Oct 21, 2024

nyh mentioned this issue Oct 21, 2024

random tests fail on failure to create inotify instances (Too many open files) scylladb/scylladb#21199

Closed

bhalevy assigned regevran Dec 8, 2024

elcallio self-assigned this Dec 8, 2024

elcallio mentioned this issue Dec 11, 2024

tls: Add optional builder + future-wait to cert reload callback + expose rebuild #2573

Closed

elcallio mentioned this issue Jan 21, 2025

TLS: reduce inotify usage by sharing reloadability across shards scylladb/scylladb#22425

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TLS: use fewer inotify instances and/or be more resilient to them not working #2513

TLS: use fewer inotify instances and/or be more resilient to them not working #2513

nyh commented Oct 21, 2024 •

edited

Loading

bhalevy commented Dec 8, 2024

bhalevy commented Dec 8, 2024

elcallio commented Dec 8, 2024

mykaul commented Dec 30, 2024

elcallio commented Jan 7, 2025

elcallio commented Feb 18, 2025

nyh commented Feb 18, 2025

elcallio commented Feb 18, 2025

TLS: use fewer inotify instances and/or be more resilient to them not working #2513

TLS: use fewer inotify instances and/or be more resilient to them not working #2513

Comments

nyh commented Oct 21, 2024 • edited Loading

bhalevy commented Dec 8, 2024

bhalevy commented Dec 8, 2024

elcallio commented Dec 8, 2024

mykaul commented Dec 30, 2024

elcallio commented Jan 7, 2025

elcallio commented Feb 18, 2025

nyh commented Feb 18, 2025

elcallio commented Feb 18, 2025

nyh commented Oct 21, 2024 •

edited

Loading