rate-limited kernel GC approach: persistent maybeFreeKrefs, gradual GC actions #8930
Labels
enhancement
New feature or request
performance
Performance related issues
SwingSet
package: SwingSet
What is the Problem Being Solved?
If a vat drops a lot of objects at the same time, we can wind up with a large number of "GC Actions" in the kernel's gcAction queue. This can result in large GC deliveries to other vats, which may later cause a BOYD to perform more work than we can accomodate.
#8928 is a fix to make sure that vat termination doesn't cause this problem, but it could also be caused by a vat (somehow, deliberately?) performing a single
syscall.dropImports()
call with a large number of vrefs, like clearing or deleting a large virtual/durable collection all at once. This might be prevented by other means (#8417, or simply the vat being terminated by some kernel-side "you've done too much work" policy).But if those fixes don't work, this is an approach that might spread the GC work out over time.
Description of the Design
the old design
As a crank is processed, syscalls like
syscall.dropImports
andsyscall.retireExports()
will change (decrement) thekoNN.refcount
kvStore entries, and/or delete c-list entries. We keep an in-RAM Set namedmaybeFreeKrefs
which is added to each time a refcount touches zero. At end-of-crank, inkernelKeeper.processRefCounts()
, we check the krefs to see if their refcount is still zero, and then compute GC actions to be taken. For example, when the refcount drops to zero, we want to send adispatch.dropExports()
to the owning vat. These actions are added to a durable set named the "GC Action Set", stored withkvStore.set('gcActions')
, whose value is a JSON-serialized array of strings like"v12 dropExport ko345"
.Then, at the start of the next crank, the kernel queries the
gcActions
set by callingprocessGCActionSet
. This unserializes the kvStore key, collates the entries by vatID and type, re-examines the refcounts and c-list states to see which actions must still be done (and which are negated by eg a kref being re-exported), writes the modifiedgcActions
set back to the kvStore, and returns a single batch of actions to take (one vatID, one type, multiple krefs). The kernel then performs this delivery (egdispatch.dropExports(krefs)
). This makesgcActions
behave like a queue, which is serviced to completion before we service the normal run-queue. TherunPolicy
is consulted each time, and the computrons consumed by GC deliveries are submitted to therunPolicy
, so runs/blocks will end prematurely if we do too much GC work.problems with the current approach
The costs of doing this cleanup are multiplied by the amount of data (and references) that was held by the old vat. The points of concern are:
kvStore
keys that start withvNN.vs.
). This requires akvStore.getNextKey()
plus akvStore.delete()
for each onemaybeFreeKrefs
Set (since most objects have only a single importer)processRefcounts
must deserialize the old set (possibly large)maybeFreeKrefs
it does about three kvStore reads, to deduce a GC actiongcActions
gcActions
set is empty, we:gcActions
key (possibly large)gcActions
(perhaps most of them)In the cases we're worried about, this produces two large actions: a
dropExports(krefs)
for everything that was once imported by the terminated vat, followed by aretireExports()
for the same set.krefs.length
could easily be 200k items.the new approach
The first idea is to move the "what actions should we take" out of
processRefcounts
and intoprocessGCActionSet
(or some renamed equivalent).processRefcounts
should only record krefs that need further investigation. This minimizes the work we do "inline", during crank that doessyscall.dropImports
(or termination/etc). We're effectively taking the in-RAMmaybeFreeKrefs
Set and making it persistent.The second idea is for (the replacement for)
processGCActionSet
to do its work one kref at a time, and to have a limit on how many krefs it processes in a single call. For each kref, it should do the same refcount/c-list checks thatprocessRefcounts
used to do, and produce a GC action to take. It should try to batch work together to produce a moderate-sized GC action (eg onedispatch.dropExports()
with 100 krefs, rather than 100 calls each with a single kref). Each kref it scans may cause a different GC action, so we'll need some heuristics to decide how far to look before choosing a single action to take.The third idea is to improve the way that the persistent
maybeFreeKrefs
is stored, so it can accomodate 200k+ entries without causing O(N^2) behavior in the addition or removal APIs. We can safely tolerate extra items in this set, because our newprocessGCActionSet
makes its decisions late, just before performing the GC delivery. So we can afford formaybeFreeKrefs
to store the same kref multiple times, or to hold a kref that was already processed earlier: it will be ignored the second time it comes around.So instead of storing a single large set, we store a moderate number of moderate-sized sets. I'm thinking 1k krefs per chunk, a list of "closed" chunks (lists of actions) with exactly 1k krefs, and a single "open" chunk (set of krefs) on top. Every time we add an item, we look at the open chunk/set and add the item if it is not already present. If that causes the set to be full, we convert the open chunk into a closed chunk/list, and open a new empty chunk. When we want to remove a kref, we pull it from the open chunk, and if that was empty, we convert the latest closed chunk back into a set. If there are no closed chunks, and the open chunk is empty, then the overall set is empty, and there is no GC work that needs to be done.
The add/remove operations on the chunks are still O(N^2), but we're limiting N to 1k instead of 200k. The tradeoff is, of course, more
kvStore
entries. If we stored every maybe-free kref in its ownkvStore
entry, add/delete would be cheap, but we'd have an explosion of kvStore entries, which is part of the IAVL problem that we're trying to avoid. It might still be a win, though, if the vatstore or c-list contribution to kvStore is already a constant factor larger than the number of maybe-free krefs.The fourth part is to have
processGCActionSet
do a limited amount of work before yielding to "normal" vat work, and/or have a way for the host application to tell SwingSet that "now is a good time to do cleanup work". For this, I'm thinking changes torunPolicy()
like the ones described in #8928, where we don't do any GC work at all until the host decides that we're at the end of a block (and still have computrons left in our budget), and then the host does a specialcontroller.run()
that is allowed to start with some finite amount of GC work.Security Considerations
This should help with the other-vat consequences of a vat dropping/retiring a large number of krefs all at once. It doesn't help with the work done during the crank that drops those krefs: we need other means to reduce that work (like #8417).
Other vats should not be able to observe the consequences of spreading this GC out work over time. Those vats will receive
dropExports
more slowly than before, and their WeakMaps will be pruned at different times, but liveslots prevents userspace from observing those changes.Scaling Considerations
This should help, somewhat, with scaling problems caused by vats accumulating large numbers of imports.
Test Plan
Unit tests on all the kernel changes.
Upgrade Considerations
At kernel startup time, it will need to look at the kvStore and notice that there is no persistent
maybeFreeKrefs
set (or list of chunks), and initialize one.If
gcActions
is non-empty, we should extract their krefs and add them tomaybeFreeKrefs
, then deletegcActions
. This should accomodate the (unlikely but can't be ruled out) possibility that the kernel was in the middle of doing GC work when it was halted for upgrade. We'll need unit tests to demonstrate that this case works correctly.The text was updated successfully, but these errors were encountered: