Skip to content

Commit

Permalink
fix(swingset): use "dirt" to schedule vat reap/bringOutYourDead
Browse files Browse the repository at this point in the history
`dispatch.bringOutYourDead()`, aka "reap", triggers garbage collection
inside a vat, and gives it a chance to drop imported c-list vrefs that
are no longer referenced by anything inside the vat.

Previously, each vat has a configurable parameter named
`reapInterval`, which defaults to a kernel-wide
`defaultReapInterval` (but can be set separately for each vat). This
defaults to 1, mainly for unit testing, but real applications set it
to something like 200.

This caused BOYD to happen once every 200 deliveries, plus an extra
BOYD just before we save an XS heap-state snapshot.

This commit switches to a "dirt"-based BOYD scheduler, wherein we
consider the vat to get more and more dirty as it does work, and
eventually it reaches a `reapDirtThreshold` that triggers the
BOYD (which resets the dirt counter).

We continue to track `dirt.deliveries` as before, with the same
defaults. But we add a new `dirt.gcKrefs` counter, which is
incremented by the krefs we submit to the vat in GC deliveries. For
example, calling `dispatch.dropImports([kref1, kref2])` would increase
`dirt.gcKrefs` by two.

The `reapDirtThreshold.gcKrefs` limit defaults to 20. For normal use
patterns, this will trigger a BOYD after ten krefs have been dropped
and retired. We choose this value to allow the #8928 slow vat
termination process to trigger BOYD frequently enough to keep the BOYD
cranks small: since these will be happening constantly (in the
"background"), we don't want them to take more than 500ms or so. Given
the current size of the large vats that #8928 seeks to terminate, 10
krefs seems like a reasonable limit. And of course we don't want to
perform too many BOYDs, so `gcKrefs: 20` is about the smallest
threshold we'd want to use.

External APIs continue to accept `reapInterval`, and now also accept
`reapGCKrefs`.

* kernel config record
  * takes `config.defaultReapInterval` and `defaultReapGCKrefs`
  * takes `vat.NAME.creationOptions.reapInterval` and `.reapGCKrefs`
* `controller.changeKernelOptions()` still takes `defaultReapInterval`
   but now also accepts `defaultReapGCKrefs`

The APIs available to userspace code (through `vatAdminSvc`) are
unchanged (partially due to upgrade/backwards-compatibility
limitations), and continue to only support setting `reapInterval`.
Internally, this just modifies `reapDirtThreshold.deliveries`.

* `E(vatAdminSvc).createVat(bcap, { reapInterval })`
* `E(adminNode).upgrade(bcap, { reapInterval })`
* `E(adminNode).changeOptions({ reapInterval })`

Internally, the kernel-wide state records `defaultReapDirtThreshold`
instead of `defaultReapInterval`, and each vat records
`.reapDirtThreshold` in their `vNN.options` key instead of
`vNN.reapInterval`. The current dirt level is recorded in
`vNN.reapDirt`.

The kernel will automatically upgrade both the kernel-wide and the
per-vat state upon the first reboot with the new kernel code. The old
`reapCountdown` value is used to initialize the vat's
`reapDirt.deliveries` counter, so the upgrade shouldn't disrupt the
existing schedule. Vats which used `reapInterval = 'never'` (eg comms)
will get a `reapDirtThreshold` of all 'never' values, so they continue
to inhibit BOYD. Otherwise, all vats get a `threshold.gcKrefs` of 20.

We do not track dirt when the corresponding threshold is 'never', to
avoid incrementing the comms dirt counters forever.

This design leaves room for adding `.computrons` to the dirt record,
as well as tracking a separate `snapshotDirt` counter (to trigger XS
heap snapshots, ala #6786). We add `reapDirtThreshold.computrons`, but
do not yet expose an API to set it.

Future work includes:
* upgrade vat-vat-admin to let userspace set `reapDirtThreshold`

New tests were added to exercise the upgrade process, and other tests
were updated to match the new internal initialization pattern.

We now reset the dirt counter upon any BOYD, so this also happens to
help with #8665 (doing a `reapAllVats()` resets the delivery counters,
so future BOYDs will be delayed, which is what we want). But we should
still change `controller.reapAllVats()` to avoid BOYDs on vats which
haven't received any deliveries.

closes #8980
  • Loading branch information
warner committed Apr 15, 2024
1 parent 53cff42 commit 967e458
Show file tree
Hide file tree
Showing 17 changed files with 789 additions and 113 deletions.
33 changes: 30 additions & 3 deletions packages/SwingSet/src/controller/initializeKernel.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,32 @@ import { insistVatID } from '../lib/id.js';
import { makeVatSlot } from '../lib/parseVatSlots.js';
import { insistStorageAPI } from '../lib/storageAPI.js';
import { makeVatOptionRecorder } from '../lib/recordVatOptions.js';
import makeKernelKeeper from '../kernel/state/kernelKeeper.js';
import makeKernelKeeper, {
DEFAULT_DELIVERIES_PER_BOYD,
DEFAULT_GC_KREFS_PER_BOYD,
} from '../kernel/state/kernelKeeper.js';
import { exportRootObject } from '../kernel/kernel.js';
import { makeKernelQueueHandler } from '../kernel/kernelQueue.js';

/**
* @typedef { import('../types-external.js').SwingSetKernelConfig } SwingSetKernelConfig
* @typedef { import('../types-external.js').SwingStoreKernelStorage } SwingStoreKernelStorage
* @typedef { import('../types-internal.js').InternalKernelOptions } InternalKernelOptions
* @typedef { import('../types-internal.js').ReapDirtThreshold } ReapDirtThreshold
*/

function makeVatRootObjectSlot() {
return makeVatSlot('object', true, 0);
}

/*
* @param {SwingSetKernelConfig} config
* @param {SwingStoreKernelStorage} kernelStorage
* @param {*} [options]
* @returns {Promise<string | undefined>} KPID of the bootstrap message
* result promise
*/

export async function initializeKernel(config, kernelStorage, options = {}) {
const {
verbose = false,
Expand All @@ -33,14 +51,22 @@ export async function initializeKernel(config, kernelStorage, options = {}) {
assert(!wasInitialized);
const {
defaultManagerType,
defaultReapInterval,
defaultReapInterval = DEFAULT_DELIVERIES_PER_BOYD,
defaultReapGCKrefs = DEFAULT_GC_KREFS_PER_BOYD,
relaxDurabilityRules,
snapshotInitial,
snapshotInterval,
} = config;
/** @type { ReapDirtThreshold } */
const defaultReapDirtThreshold = {
deliveries: defaultReapInterval,
gcKrefs: defaultReapGCKrefs,
computrons: 'never', // TODO no knob?
};
/** @type { InternalKernelOptions } */
const kernelOptions = {
defaultManagerType,
defaultReapInterval,
defaultReapDirtThreshold,
relaxDurabilityRules,
snapshotInitial,
snapshotInterval,
Expand Down Expand Up @@ -86,6 +112,7 @@ export async function initializeKernel(config, kernelStorage, options = {}) {
'useTranscript',
'critical',
'reapInterval',
'reapGCKrefs',
'nodeOptions',
]);
const vatID = kernelKeeper.allocateVatIDForNameIfNeeded(name);
Expand Down
1 change: 1 addition & 0 deletions packages/SwingSet/src/controller/initializeSwingset.js
Original file line number Diff line number Diff line change
Expand Up @@ -398,6 +398,7 @@ export async function initializeSwingset(
managerType: 'local',
useTranscript: false,
reapInterval: 'never',
reapGCKrefs: 'never',
},
};
}
Expand Down
80 changes: 63 additions & 17 deletions packages/SwingSet/src/kernel/kernel.js
Original file line number Diff line number Diff line change
Expand Up @@ -361,6 +361,7 @@ export default function buildKernel(
*
* @typedef { import('@agoric/swingset-liveslots').MeterConsumption } MeterConsumption
* @typedef { import('../types-internal.js').MeterID } MeterID
* @typedef { import('../types-internal.js').Dirt } Dirt
*
* Any delivery crank (send, notify, start-vat.. anything which is allowed
* to make vat delivery) emits one of these status events if a delivery
Expand All @@ -379,7 +380,7 @@ export default function buildKernel(
* didDelivery?: VatID, // we made a delivery to a vat, for run policy and save-snapshot
* computrons?: BigInt, // computron count for run policy
* meterID?: string, // deduct those computrons from a meter
* decrementReapCount?: { vatID: VatID }, // the reap counter should decrement
* measureDirt?: [ VatID, Dirt ], // the dirt counter should increment
* terminate?: { vatID: VatID, reject: boolean, info: SwingSetCapData }, // terminate vat, notify vat-admin
* vatAdminMethargs?: RawMethargs, // methargs to notify vat-admin about create/upgrade results
* } } CrankResults
Expand Down Expand Up @@ -446,16 +447,16 @@ export default function buildKernel(
* event handler.
*
* Two flags influence this:
* `decrementReapCount` is used for deliveries that run userspace code
* `measureDirt` is used for non-BOYD deliveries
* `meterID` means we should check a meter
*
* @param {VatID} vatID
* @param {DeliveryStatus} status
* @param {boolean} decrementReapCount
* @param {boolean} measureDirt
* @param {MeterID} [meterID]
* @returns {CrankResults}
*/
function deliveryCrankResults(vatID, status, decrementReapCount, meterID) {
function deliveryCrankResults(vatID, status, measureDirt, meterID) {
let meterUnderrun = false;
let computrons;
if (status.metering?.compute) {
Expand Down Expand Up @@ -499,8 +500,13 @@ export default function buildKernel(
results.terminate = { vatID, ...status.vatRequestedTermination };
}

if (decrementReapCount && !(results.abort || results.terminate)) {
results.decrementReapCount = { vatID };
if (measureDirt && !(results.abort || results.terminate)) {
const dirt = { deliveries: 1 };
if (computrons) {
// this is BigInt, but we use plain Number in Dirt records
dirt.computrons = Number(computrons);
}
results.measureDirt = [vatID, dirt];
}

// We leave results.consumeMessage up to the caller. Send failures
Expand Down Expand Up @@ -601,6 +607,8 @@ export default function buildKernel(
if (!vatWarehouse.lookup(vatID)) {
return NO_DELIVERY_CRANK_RESULTS; // can't collect from the dead
}
const vatKeeper = kernelKeeper.provideVatKeeper(vatID);
vatKeeper.addDirt({ gcKrefs: krefs.length });
/** @type { KernelDeliveryDropExports | KernelDeliveryRetireExports | KernelDeliveryRetireImports } */
const kd = harden([type, krefs]);
if (type === 'retireExports') {
Expand All @@ -613,7 +621,7 @@ export default function buildKernel(
}
const vd = vatWarehouse.kernelDeliveryToVatDelivery(vatID, kd);
const status = await deliverAndLogToVat(vatID, kd, vd);
return deliveryCrankResults(vatID, status, false); // no meterID
return deliveryCrankResults(vatID, status, true); // no meterID
}

/**
Expand All @@ -628,11 +636,13 @@ export default function buildKernel(
if (!vatWarehouse.lookup(vatID)) {
return NO_DELIVERY_CRANK_RESULTS; // can't collect from the dead
}
const vatKeeper = kernelKeeper.provideVatKeeper(vatID);
/** @type { KernelDeliveryBringOutYourDead } */
const kd = harden([type]);
const vd = vatWarehouse.kernelDeliveryToVatDelivery(vatID, kd);
const status = await deliverAndLogToVat(vatID, kd, vd);
return deliveryCrankResults(vatID, status, false); // no meter
vatKeeper.clearReapDirt(); // BOYD zeros out the when-to-BOYD counters
return deliveryCrankResults(vatID, status, false); // no meter, BOYD clears dirt
}

/**
Expand Down Expand Up @@ -739,9 +749,17 @@ export default function buildKernel(
function setKernelVatOption(vatID, option, value) {
switch (option) {
case 'reapInterval': {
// This still controls reapDirtThreshold.deliveries, and we do not
// yet offer controls for the other limits (gcKrefs or computrons).
if (value === 'never' || isNat(value)) {
const vatKeeper = kernelKeeper.provideVatKeeper(vatID);
vatKeeper.updateReapInterval(value);
const threshold = { ...vatKeeper.getReapDirtThreshold() };
if (value === 'never') {
threshold.deliveries = value;
} else {
threshold.deliveries = Number(value);
}
vatKeeper.setReapDirtThreshold(threshold);
} else {
console.log(`WARNING: invalid reapInterval value`, value);
}
Expand Down Expand Up @@ -877,6 +895,7 @@ export default function buildKernel(
const boydVD = vatWarehouse.kernelDeliveryToVatDelivery(vatID, boydKD);
const boydStatus = await deliverAndLogToVat(vatID, boydKD, boydVD);
const boydResults = deliveryCrankResults(vatID, boydStatus, false);
vatKeeper.clearReapDirt();

// we don't meter bringOutYourDead since no user code is running, but we
// still report computrons to the runPolicy
Expand Down Expand Up @@ -951,7 +970,7 @@ export default function buildKernel(
startVatKD,
startVatVD,
);
const startVatResults = deliveryCrankResults(vatID, startVatStatus, false);
const startVatResults = deliveryCrankResults(vatID, startVatStatus, true);
computrons = addComputrons(computrons, startVatResults.computrons);

if (startVatResults.terminate) {
Expand Down Expand Up @@ -1292,13 +1311,11 @@ export default function buildKernel(
}
}
}
if (crankResults.decrementReapCount) {
if (crankResults.measureDirt) {
// deliveries cause garbage, garbage needs collection
const { vatID } = crankResults.decrementReapCount;
const [vatID, dirt] = crankResults.measureDirt;
const vatKeeper = kernelKeeper.provideVatKeeper(vatID);
if (vatKeeper.countdownToReap()) {
kernelKeeper.scheduleReap(vatID);
}
vatKeeper.addDirt(dirt); // might schedule a reap for that vat
}

// Vat termination (during delivery) is triggered by an illegal
Expand Down Expand Up @@ -1572,10 +1589,12 @@ export default function buildKernel(
'bundleID',
'enablePipelining',
'reapInterval',
'reapGCKrefs',
]);
const {
bundleID = 'b1-00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
reapInterval = 'never',
reapGCKrefs = 'never',
enablePipelining,
} = creationOptions;
const vatID = kernelKeeper.allocateVatIDForNameIfNeeded(name);
Expand All @@ -1587,6 +1606,7 @@ export default function buildKernel(
const options = {
name,
reapInterval,
reapGCKrefs,
enablePipelining,
managerType,
};
Expand Down Expand Up @@ -1728,14 +1748,38 @@ export default function buildKernel(
}

function changeKernelOptions(options) {
assertKnownOptions(options, ['defaultReapInterval', 'snapshotInterval']);
assertKnownOptions(options, [
'defaultReapInterval',
'defaultReapGCKrefs',
'snapshotInterval',
]);
kernelKeeper.startCrank();
try {
for (const option of Object.getOwnPropertyNames(options)) {
const value = options[option];
switch (option) {
case 'defaultReapInterval': {
kernelKeeper.setDefaultReapInterval(value);
if (typeof value === 'number') {
assert(value > 0, `defaultReapInterval = ${value}`);
} else {
assert.equal(value, 'never', `defaultReapInterval = ${value}`);
}
kernelKeeper.setDefaultReapDirtThreshold({
...kernelKeeper.getDefaultReapDirtThreshold(),
deliveries: value,
});
break;
}
case 'defaultReapGCKrefs': {
if (typeof value === 'number') {
assert(value > 0, `defaultReapGCKrefs = ${value}`);
} else {
assert.equal(value, 'never', `defaultReapGCKrefs = ${value}`);
}
kernelKeeper.setDefaultReapDirtThreshold({
...kernelKeeper.getDefaultReapDirtThreshold(),
gcKrefs: value,
});
break;
}
case 'snapshotInterval': {
Expand Down Expand Up @@ -1768,6 +1812,7 @@ export default function buildKernel(
if (!started) {
throw Error('must do kernel.start() before step()');
}
kernelKeeper.maybeUpgradeKernelState();
kernelKeeper.startCrank();
await null;
try {
Expand Down Expand Up @@ -1805,6 +1850,7 @@ export default function buildKernel(
let count = 0;
await null;
for (;;) {
kernelKeeper.maybeUpgradeKernelState();
kernelKeeper.startCrank();
try {
kernelKeeper.establishCrankSavepoint('start');
Expand Down
Loading

0 comments on commit 967e458

Please sign in to comment.