rate-limited kernel GC approach: persistent maybeFreeKrefs, gradual GC actions #8930

warner · 2024-02-15T22:21:17Z

What is the Problem Being Solved?

If a vat drops a lot of objects at the same time, we can wind up with a large number of "GC Actions" in the kernel's gcAction queue. This can result in large GC deliveries to other vats, which may later cause a BOYD to perform more work than we can accomodate.

#8928 is a fix to make sure that vat termination doesn't cause this problem, but it could also be caused by a vat (somehow, deliberately?) performing a single syscall.dropImports() call with a large number of vrefs, like clearing or deleting a large virtual/durable collection all at once. This might be prevented by other means (#8417, or simply the vat being terminated by some kernel-side "you've done too much work" policy).

But if those fixes don't work, this is an approach that might spread the GC work out over time.

Description of the Design

the old design

As a crank is processed, syscalls like syscall.dropImports and syscall.retireExports() will change (decrement) the koNN.refcount kvStore entries, and/or delete c-list entries. We keep an in-RAM Set named maybeFreeKrefs which is added to each time a refcount touches zero. At end-of-crank, in kernelKeeper.processRefCounts(), we check the krefs to see if their refcount is still zero, and then compute GC actions to be taken. For example, when the refcount drops to zero, we want to send a dispatch.dropExports() to the owning vat. These actions are added to a durable set named the "GC Action Set", stored with kvStore.set('gcActions'), whose value is a JSON-serialized array of strings like "v12 dropExport ko345".

Then, at the start of the next crank, the kernel queries the gcActions set by calling processGCActionSet. This unserializes the kvStore key, collates the entries by vatID and type, re-examines the refcounts and c-list states to see which actions must still be done (and which are negated by eg a kref being re-exported), writes the modified gcActions set back to the kvStore, and returns a single batch of actions to take (one vatID, one type, multiple krefs). The kernel then performs this delivery (eg dispatch.dropExports(krefs)). This makes gcActions behave like a queue, which is serviced to completion before we service the normal run-queue. The runPolicy is consulted each time, and the computrons consumed by GC deliveries are submitted to the runPolicy, so runs/blocks will end prematurely if we do too much GC work.

problems with the current approach

The costs of doing this cleanup are multiplied by the amount of data (and references) that was held by the old vat. The points of concern are:

when terminating a large vat, the kernel must enumerate and delete every vatstore entry for the dead vat (all kvStore keys that start with vNN.vs.). This requires a kvStore.getNextKey() plus a kvStore.delete() for each one
the kernel must enumerate and remove every c-list entry, which entails decrementing a reachable and/or recognizable refcount for each import, in addition to deleting the two (bidirectional) c-list kvStore entries themselves
- the function which decrements the refcount must read the old value, and then write the new value (or possibly delete the entry, if the counts reach zero)
- that function usually pushes the kref into the in-RAM maybeFreeKrefs Set (since most objects have only a single importer)
at end-of-crank, processRefcounts must deserialize the old set (possibly large)
- then for everything in maybeFreeKrefs it does about three kvStore reads, to deduce a GC action
- the proposed GC action strings are pushed into an in-RAM Set, which can get large
- finally the Set is serialized (once) and written back to gcActions
then, at the start of every crank until the gcActions set is empty, we:
- read and deserialize the gcActions key (possibly large)
- collate and examine everything in it (multiple kvStore reads per item)
- discard everything but the highest-priority batch
- serialize and write back the remaining gcActions (perhaps most of them)

In the cases we're worried about, this produces two large actions: a dropExports(krefs) for everything that was once imported by the terminated vat, followed by a retireExports() for the same set. krefs.length could easily be 200k items.

the new approach

The first idea is to move the "what actions should we take" out of processRefcounts and into processGCActionSet (or some renamed equivalent). processRefcounts should only record krefs that need further investigation. This minimizes the work we do "inline", during crank that does syscall.dropImports (or termination/etc). We're effectively taking the in-RAM maybeFreeKrefs Set and making it persistent.

The second idea is for (the replacement for) processGCActionSet to do its work one kref at a time, and to have a limit on how many krefs it processes in a single call. For each kref, it should do the same refcount/c-list checks that processRefcounts used to do, and produce a GC action to take. It should try to batch work together to produce a moderate-sized GC action (eg one dispatch.dropExports() with 100 krefs, rather than 100 calls each with a single kref). Each kref it scans may cause a different GC action, so we'll need some heuristics to decide how far to look before choosing a single action to take.

The third idea is to improve the way that the persistent maybeFreeKrefs is stored, so it can accomodate 200k+ entries without causing O(N^2) behavior in the addition or removal APIs. We can safely tolerate extra items in this set, because our new processGCActionSet makes its decisions late, just before performing the GC delivery. So we can afford for maybeFreeKrefs to store the same kref multiple times, or to hold a kref that was already processed earlier: it will be ignored the second time it comes around.

So instead of storing a single large set, we store a moderate number of moderate-sized sets. I'm thinking 1k krefs per chunk, a list of "closed" chunks (lists of actions) with exactly 1k krefs, and a single "open" chunk (set of krefs) on top. Every time we add an item, we look at the open chunk/set and add the item if it is not already present. If that causes the set to be full, we convert the open chunk into a closed chunk/list, and open a new empty chunk. When we want to remove a kref, we pull it from the open chunk, and if that was empty, we convert the latest closed chunk back into a set. If there are no closed chunks, and the open chunk is empty, then the overall set is empty, and there is no GC work that needs to be done.

The add/remove operations on the chunks are still O(N^2), but we're limiting N to 1k instead of 200k. The tradeoff is, of course, more kvStore entries. If we stored every maybe-free kref in its own kvStore entry, add/delete would be cheap, but we'd have an explosion of kvStore entries, which is part of the IAVL problem that we're trying to avoid. It might still be a win, though, if the vatstore or c-list contribution to kvStore is already a constant factor larger than the number of maybe-free krefs.

The fourth part is to have processGCActionSet do a limited amount of work before yielding to "normal" vat work, and/or have a way for the host application to tell SwingSet that "now is a good time to do cleanup work". For this, I'm thinking changes to runPolicy() like the ones described in #8928, where we don't do any GC work at all until the host decides that we're at the end of a block (and still have computrons left in our budget), and then the host does a special controller.run() that is allowed to start with some finite amount of GC work.

Security Considerations

This should help with the other-vat consequences of a vat dropping/retiring a large number of krefs all at once. It doesn't help with the work done during the crank that drops those krefs: we need other means to reduce that work (like #8417).

Other vats should not be able to observe the consequences of spreading this GC out work over time. Those vats will receive dropExports more slowly than before, and their WeakMaps will be pruned at different times, but liveslots prevents userspace from observing those changes.

Scaling Considerations

This should help, somewhat, with scaling problems caused by vats accumulating large numbers of imports.

Test Plan

Unit tests on all the kernel changes.

Upgrade Considerations

At kernel startup time, it will need to look at the kvStore and notice that there is no persistent maybeFreeKrefs set (or list of chunks), and initialize one.

If gcActions is non-empty, we should extract their krefs and add them to maybeFreeKrefs, then delete gcActions. This should accomodate the (unlikely but can't be ruled out) possibility that the kernel was in the middle of doing GC work when it was halted for upgrade. We'll need unit tests to demonstrate that this case works correctly.

The text was updated successfully, but these errors were encountered:

warner added enhancement New feature or request SwingSet package: SwingSet performance Performance related issues labels Feb 15, 2024

mhofman mentioned this issue May 8, 2024

Leak of durable objects through old virtual and heap references after upgrade #9338

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rate-limited kernel GC approach: persistent maybeFreeKrefs, gradual GC actions #8930

rate-limited kernel GC approach: persistent maybeFreeKrefs, gradual GC actions #8930

warner commented Feb 15, 2024

rate-limited kernel GC approach: persistent maybeFreeKrefs, gradual GC actions #8930

rate-limited kernel GC approach: persistent maybeFreeKrefs, gradual GC actions #8930

Comments

warner commented Feb 15, 2024

What is the Problem Being Solved?

Description of the Design

the old design

problems with the current approach

the new approach

Security Considerations

Scaling Considerations

Test Plan

Upgrade Considerations