Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cosmos-sdk vat snapshot/transcript retention configuration #10032

Merged
merged 7 commits into from
Sep 6, 2024
9 changes: 9 additions & 0 deletions golang/cosmos/util/util.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,15 @@ import (
"github.com/spf13/viper"
)

func IndexOf[T comparable](a []T, x T) int {
for i, s := range a {
if s == x {
return i
}
}
return -1
}

func NewFileOnlyViper(v1 *viper.Viper) (*viper.Viper, error) {
v2 := viper.New()
v2.SetConfigFile(v1.ConfigFileUsed())
Expand Down
119 changes: 107 additions & 12 deletions golang/cosmos/x/swingset/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ import (
"github.com/spf13/viper"

"github.com/cosmos/cosmos-sdk/client/flags"
pruningtypes "github.com/cosmos/cosmos-sdk/pruning/types"
serverconfig "github.com/cosmos/cosmos-sdk/server/config"
servertypes "github.com/cosmos/cosmos-sdk/server/types"

"github.com/Agoric/agoric-sdk/golang/cosmos/util"
Expand All @@ -15,8 +17,24 @@ import (
const (
ConfigPrefix = "swingset"
FlagSlogfile = ConfigPrefix + ".slogfile"

SnapshotRetentionOptionDebug = "debug"
SnapshotRetentionOptionOperational = "operational"

TranscriptRetentionOptionArchival = "archival"
TranscriptRetentionOptionOperational = "operational"
)

var snapshotRetentionValues []string = []string{
SnapshotRetentionOptionDebug,
SnapshotRetentionOptionOperational,
}

var transcriptRetentionValues []string = []string{
TranscriptRetentionOptionArchival,
TranscriptRetentionOptionOperational,
}

// DefaultConfigTemplate defines a default TOML configuration section for the SwingSet VM.
// Values are pulled from a "Swingset" property, in accord with CustomAppConfig from
// ../../daemon/cmd/root.go.
Expand All @@ -27,30 +45,74 @@ const DefaultConfigTemplate = `
###############################################################################

[swingset]
# slogfile is the path at which a SwingSet log "slog" file should be written.
# The path at which a SwingSet log "slog" file should be written.
# If relative, it is interpreted against the application home directory
# (e.g., ~/.agoric).
# May be overridden by a SLOGFILE environment variable, which if relative is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overridden? I thought the config took precedence if present?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, CLI option > environment variable > config file: https://github.com/spf13/viper?tab=readme-ov-file#why-viper

Viper uses the following precedence order. Each item takes precedence over the item below it:

  • explicit call to Set
  • flag
  • env
  • config
  • key/value store
  • default

# interpreted against the working directory.
slogfile = "{{ .Swingset.SlogFile }}"

# The maximum number of vats that the SwingSet kernel will bring online. A lower number
# requires less memory but may have a negative performance impact if vats need to
# be frequently paged out to remain under this limit.
max_vats_online = {{ .Swingset.MaxVatsOnline }}
max-vats-online = {{ .Swingset.MaxVatsOnline }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Member

@mhofman mhofman Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


# Retention of vat snapshots, with values analogous to those of export
# 'artifactMode' (cf.
# https://github.com/Agoric/agoric-sdk/blob/master/packages/swing-store/docs/data-export.md#optional--historical-data ).
# * "debug": keep all snapshots
# * "operational": keep only the last snapshot
vat-snapshot-retention = "{{ .Swingset.VatSnapshotRetention }}"

# Retention of vat transcript spans, with values analogous to those of export
# 'artifactMode' (cf.
# https://github.com/Agoric/agoric-sdk/blob/master/packages/swing-store/docs/data-export.md#optional--historical-data ).
# * "archival": keep all transcript spans
# * "operational": keep only necessary transcript spans (i.e., since the
# last snapshot of their vat)
# * "default": determined by 'pruning' ("archival" if 'pruning' is "nothing",
# otherwise "operational")
vat-transcript-retention = "{{ .Swingset.VatTranscriptRetention }}"
`

// SwingsetConfig defines configuration for the SwingSet VM.
// "mapstructure" tag data is used to direct reads from app.toml;
// "json" tag data is used to populate init messages for the VM.
// This should be kept in sync with SwingsetConfigShape in
// ../../../../packages/cosmic-swingset/src/chain-main.js.
// TODO: Consider extensions from docs/env.md.
type SwingsetConfig struct {
// SlogFile is the absolute path at which a SwingSet log "slog" file should be written.
// SlogFile is the path at which a SwingSet log "slog" file should be written.
// If relative, it is interpreted against the application home directory
SlogFile string `mapstructure:"slogfile" json:"slogfile,omitempty"`

// MaxVatsOnline is the maximum number of vats that the SwingSet kernel will have online
// at any given time.
MaxVatsOnline int `mapstructure:"max_vats_online" json:"maxVatsOnline,omitempty"`
MaxVatsOnline int `mapstructure:"max-vats-online" json:"maxVatsOnline,omitempty"`

// VatSnapshotRetention controls retention of vat snapshots,
// and has values analogous to those of export `artifactMode` (cf.
// ../../../../packages/swing-store/docs/data-export.md#optional--historical-data ).
// * "debug": keep all snapshots
// * "operational": keep only the last snapshot
VatSnapshotRetention string `mapstructure:"vat-snapshot-retention" json:"vatSnapshotRetention,omitempty"`

// VatTranscriptRetention controls retention of vat transcript spans,
// and has values analogous to those of export `artifactMode` (cf.
// ../../../../packages/swing-store/docs/data-export.md#optional--historical-data ).
// * "archival": keep all transcript spans
// * "operational": keep only necessary transcript spans (i.e., since the
// last snapshot of their vat)
// * "default": determined by `pruning` ("archival" if `pruning` is
// "nothing", otherwise "operational")
VatTranscriptRetention string `mapstructure:"vat-transcript-retention" json:"vatTranscriptRetention,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure omitempty makes sense here. I'd argue this ends up required at the JS/golang interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, I'd rather not send an empty string only for this field if that ever comes up. If it truly were required, we'd enforce that on the JS side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how it could ever be empty given the resolution logic

}

var DefaultSwingsetConfig = SwingsetConfig{
SlogFile: "",
MaxVatsOnline: 50,
SlogFile: "",
MaxVatsOnline: 50,
VatSnapshotRetention: "operational",
VatTranscriptRetention: "default",
}

func SwingsetConfigFromViper(resolvedConfig servertypes.AppOptions) (*SwingsetConfig, error) {
Expand All @@ -66,11 +128,44 @@ func SwingsetConfigFromViper(resolvedConfig servertypes.AppOptions) (*SwingsetCo
return nil, nil
}
v.MustBindEnv(FlagSlogfile, "SLOGFILE")
wrapper := struct{ Swingset SwingsetConfig }{}
if err := v.Unmarshal(&wrapper); err != nil {
// See CustomAppConfig in ../../daemon/cmd/root.go.
type ExtendedConfig struct {
serverconfig.Config `mapstructure:",squash"`
Swingset SwingsetConfig `mapstructure:"swingset"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yikes I missed the missing mapstructure in the last PR review, how did it even work, does it default to a lowercasing of the struct property?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so.

}
extendedConfig := ExtendedConfig{}
if err := v.Unmarshal(&extendedConfig); err != nil {
return nil, err
}
ssConfig := &extendedConfig.Swingset

// Validate vat snapshot retention only if non-empty (because otherwise it
// it will be omitted, leaving the VM to apply its own defaults).
if ssConfig.VatSnapshotRetention != "" {
gibson042 marked this conversation as resolved.
Show resolved Hide resolved
if util.IndexOf(snapshotRetentionValues, ssConfig.VatSnapshotRetention) == -1 {
err := fmt.Errorf(
"value for vat-snapshot-retention must be in %q",
snapshotRetentionValues,
)
return nil, err
}
}

// Default/validate vat transcript retention.
if ssConfig.VatTranscriptRetention == "" || ssConfig.VatTranscriptRetention == "default" {
if extendedConfig.Pruning == pruningtypes.PruningOptionNothing {
ssConfig.VatTranscriptRetention = TranscriptRetentionOptionArchival
} else {
ssConfig.VatTranscriptRetention = TranscriptRetentionOptionOperational
}
}
if util.IndexOf(transcriptRetentionValues, ssConfig.VatTranscriptRetention) == -1 {
err := fmt.Errorf(
"value for vat-transcript-retention must be in %q",
transcriptRetentionValues,
)
return nil, err
}
config := &wrapper.Swingset

// Interpret relative paths from config files against the application home
// directory and from other sources (e.g. env vars) against the current
Expand Down Expand Up @@ -101,11 +196,11 @@ func SwingsetConfigFromViper(resolvedConfig servertypes.AppOptions) (*SwingsetCo
return filepath.Abs(path)
}

resolvedSlogFile, err := resolvePath(config.SlogFile, FlagSlogfile)
resolvedSlogFile, err := resolvePath(ssConfig.SlogFile, FlagSlogfile)
if err != nil {
return nil, err
}
config.SlogFile = resolvedSlogFile
ssConfig.SlogFile = resolvedSlogFile

return config, nil
return ssConfig, nil
}
7 changes: 4 additions & 3 deletions packages/cosmic-swingset/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
"author": "Agoric",
"license": "Apache-2.0",
"dependencies": {
"@endo/errors": "^1.2.5",
"@agoric/builders": "^0.1.0",
"@agoric/cosmos": "^0.34.1",
"@agoric/deploy-script-support": "^0.10.3",
Expand All @@ -34,15 +33,17 @@
"@agoric/vm-config": "^0.1.0",
"@endo/bundle-source": "^3.4.0",
"@endo/env-options": "^1.1.6",
"@endo/far": "^1.1.5",
"@endo/errors": "^1.2.5",
"@endo/import-bundle": "^1.2.2",
"@endo/init": "^1.1.4",
"@endo/far": "^1.1.5",
"@endo/marshal": "^1.5.3",
"@endo/nat": "^5.0.10",
"@endo/patterns": "^1.4.3",
"@endo/promise-kit": "^1.1.5",
"@iarna/toml": "^2.2.3",
"@opentelemetry/sdk-metrics": "~1.9.0",
"@opentelemetry/api": "~1.3.0",
"@opentelemetry/sdk-metrics": "~1.9.0",
"anylogger": "^0.21.0",
"deterministic-json": "^1.0.5",
"import-meta-resolve": "^2.2.1",
Expand Down
59 changes: 38 additions & 21 deletions packages/cosmic-swingset/src/chain-main.js
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ import { fork } from 'node:child_process';

import { Fail, q } from '@endo/errors';
import { E } from '@endo/far';
import { makeMarshal } from '@endo/marshal';
import { isNat } from '@endo/nat';
import { M, mustMatch } from '@endo/patterns';
import engineGC from '@agoric/internal/src/lib-nodejs/engine-gc.js';
import { waitUntilQuiescent } from '@agoric/internal/src/lib-nodejs/waitUntilQuiescent.js';
import {
Expand All @@ -25,7 +28,6 @@ import {
makeChainStorageRoot,
makeSerializeToStorage,
} from '@agoric/internal/src/lib-chainStorage.js';
import { makeMarshal } from '@endo/marshal';
import { makeShutdown } from '@agoric/internal/src/node/shutdown.js';

import * as STORAGE_PATH from '@agoric/internal/src/chain-storage-paths.js';
Expand Down Expand Up @@ -72,7 +74,27 @@ const toNumber = specimen => {
* @typedef {object} CosmosSwingsetConfig
* @property {string} [slogfile]
* @property {number} [maxVatsOnline]
* @property {'debug' | 'operational'} [vatSnapshotRetention]
* @property {'archival' | 'operational'} [vatTranscriptRetention]
*/
const SwingsetConfigShape = M.splitRecord(
// All known properties are optional, but unknown properties are not allowed.
{},
{
slogfile: M.string(),
maxVatsOnline: M.number(),
vatSnapshotRetention: M.or('debug', 'operational'),
vatTranscriptRetention: M.or('archival', 'operational'),
},
{},
);
const validateSwingsetConfig = swingsetConfig => {
mustMatch(swingsetConfig, SwingsetConfigShape);
const { maxVatsOnline } = swingsetConfig;
maxVatsOnline === undefined ||
(isNat(maxVatsOnline) && maxVatsOnline > 0) ||
Fail`maxVatsOnline must be a positive integer`;
};
gibson042 marked this conversation as resolved.
Show resolved Hide resolved

/**
* A boot message consists of cosmosInitAction fields that are subject to
Expand Down Expand Up @@ -100,19 +122,6 @@ const makeBootMsg = initAction => {
};
};

/**
* Extract local Swingset-specific configuration which is
* not part of the consensus.
*
* @param {CosmosSwingsetConfig} [resolvedConfig]
*/
const makeSwingsetConfig = resolvedConfig => {
const { maxVatsOnline } = resolvedConfig || {};
return {
maxVatsOnline,
};
};

/**
* @template {unknown} [T=unknown]
* @param {(req: string) => string} call
Expand Down Expand Up @@ -301,13 +310,24 @@ export default async function main(progname, args, { env, homedir, agcc }) {
// here so 'sendToChainStorage' can close over the single mutable instance,
// when we updated the 'portNums.storage' value each time toSwingSet was called.
async function launchAndInitializeSwingSet(initAction) {
const { XSNAP_KEEP_SNAPSHOTS, NODE_HEAP_SNAPSHOTS = -1 } = env;

/** @type {CosmosSwingsetConfig} */
const swingsetConfig = harden(initAction.resolvedConfig || {});
validateSwingsetConfig(swingsetConfig);
const { slogfile, vatSnapshotRetention, vatTranscriptRetention } =
swingsetConfig;
const keepSnapshots = vatSnapshotRetention
? vatSnapshotRetention !== 'operational'
: ['1', 'true'].includes(XSNAP_KEEP_SNAPSHOTS);
const keepTranscripts = vatTranscriptRetention
? vatTranscriptRetention !== 'operational'
: false;

// As a kludge, back-propagate selected configuration into environment variables.
const { slogfile } = initAction.resolvedConfig || {};
// eslint-disable-next-line dot-notation
if (slogfile) env['SLOGFILE'] = slogfile;

const swingsetConfig = makeSwingsetConfig(initAction.resolvedConfig);

const sendToChainStorage = msg => chainSend(portNums.storage, msg);
// this object is used to store the mailbox state.
const fromBridgeMailbox = data => {
Expand Down Expand Up @@ -442,7 +462,6 @@ export default async function main(progname, args, { env, homedir, agcc }) {
serviceName: TELEMETRY_SERVICE_NAME,
});

const { XSNAP_KEEP_SNAPSHOTS, NODE_HEAP_SNAPSHOTS = -1 } = env;
const slogSender = await makeSlogSender({
stateDir: stateDBDir,
env,
Expand All @@ -455,9 +474,6 @@ export default async function main(progname, args, { env, homedir, agcc }) {
trueValue: pathResolve(stateDBDir, 'store-trace.log'),
});

const keepSnapshots =
XSNAP_KEEP_SNAPSHOTS === '1' || XSNAP_KEEP_SNAPSHOTS === 'true';

const nodeHeapSnapshots = Number.parseInt(NODE_HEAP_SNAPSHOTS, 10);

let lastCommitTime = 0;
Expand Down Expand Up @@ -539,6 +555,7 @@ export default async function main(progname, args, { env, homedir, agcc }) {
swingStoreExportCallback,
swingStoreTraceFile,
keepSnapshots,
keepTranscripts,
afterCommitCallback,
swingsetConfig,
});
Expand Down
2 changes: 2 additions & 0 deletions packages/cosmic-swingset/src/launch-chain.js
Original file line number Diff line number Diff line change
Expand Up @@ -331,6 +331,7 @@ export async function launch({
swingStoreTraceFile,
swingStoreExportCallback,
keepSnapshots,
keepTranscripts,
afterCommitCallback = async () => ({}),
swingsetConfig,
}) {
Expand Down Expand Up @@ -373,6 +374,7 @@ export async function launch({
traceFile: swingStoreTraceFile,
exportCallback: swingStoreExportSyncCallback,
keepSnapshots,
keepTranscripts,
});
const { kvStore, commit } = hostStorage;

Expand Down
12 changes: 6 additions & 6 deletions packages/swing-store/docs/data-export.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,13 +188,13 @@ Once the new SwingStore is fully populated with the previously-exported data, th

Some of the data maintained by SwingStore is not strictly necessary for kernel execution, at least under normal circumstances. For example, once a vat worker performs a heap snapshot, we no longer need the transcript entries from before the snapshot was taken, since vat replay will start from the snapshot point. We split each vat's transcript into "spans", delimited by heap snapshot events, and the "current span" is the most recent one (still growing), whereas the "historical spans" are all closed and immutable. Likewise, we only really need the most recent heap snapshot for each vat: older snapshots might be interesting for experiments that replay old transcripts with different versions of the XS engine, but no normal kernel will ever need them.

Most validators would prefer to prune this data, to reduce their storage needs. But we can imagine some [extreme upgrade scenarios](https://github.com/Agoric/agoric-sdk/issues/1691) that would require access to these historical transcript spans. Our compromise is to record *validation data* for these historical spans in the export data, but omit the spans themselves from the export artifacts. Validators can delete the old spans at will, and if we ever need them in the future, we can add code that will fetch copies from an archive service, validate them against the export data hashes, and re-insert the relevant entries into the SwingStore.
Most blockchain validator nodes would prefer to prune this data, to reduce their storage needs. But we can imagine some [extreme upgrade scenarios](https://github.com/Agoric/agoric-sdk/issues/1691) that would require access to these historical transcript spans. Our compromise is to record *validation data* for these historical spans in the export data, but omit the spans themselves from the export artifacts. Validators can delete the old spans at will, and if we ever need them in the future, we can add code that will fetch copies from an archive service, validate them against the export data hashes, and re-insert the relevant entries into the SwingStore.

Likewise, each time a heap snapshot is recorded, we cease to need any previous snapshot. And again, as a hedge against even more drastic recovery scenarios, we strike a compromise between minimizing retained data and the ability to validate old snapshots, by retaining only their hashes.

As a result, for each active vat, the first-stage Export Data contains a record for every old transcript span, plus one for the current span. It also contains a record for every old heap snapshot, plus one for the most recent heap snapshot, plus a `.current` record that points to the most recent snapshot. However the exported artifacts may or may not include blobs for the old transcript spans, or for the old heap snapshots.
As a result, for each active vat, the first-stage Export Data contains a record for every old heap snapshot, plus one for the most recent heap snapshot, plus a `.current` record that points to the most recent snapshot. It also contains a record for every old transcript span, plus one for the current span. However the exported artifacts may or may not include blobs for the old heap snapshots, or for the old transcript spans.

The `openSwingStore()` function has an option named `keepTranscripts` (which defaults to `true`), which causes the transcriptStore to retain the old transcript items. A second option named `keepSnapshots` (which defaults to `false`) causes the snapStore to retain the old heap snapshots. Opening the swingStore with a `false` option does not necessarily delete the old items immediately, but they'll probably get deleted the next time the kernel triggers a heap snapshot or transcript-span rollover. Validators who care about minimizing their disk usage will want to set both to `false`. In the future, we will arrange the SwingStore SQLite tables to provide easy `sqlite3` CLI commands that will delete the old data, so validators can also periodically use the CLI command to prune it.
The `openSwingStore()` function has an option named `keepSnapshots` (which defaults to `false`), which causes the snapStore to retain the old heap snapshots. A second option named `keepTranscripts` (which defaults to `true`) causes the transcriptStore to retain the old transcript items. Opening the swingStore with a `false` option does not necessarily delete the old items immediately, but they may get deleted the next time the kernel triggers a heap snapshot or transcript-span rollover. Hosts who care about minimizing their disk usage will want to set both to `false`. In the future, we will arrange the SwingStore SQLite tables to provide easy `sqlite3` CLI commands that will delete the old data, for use in periodic pruning.

When exporting, the `makeSwingStoreExporter()` function takes an `artifactMode` option (in an options bag). This serves to both limit, and provide some minimal guarantees about, the set of artifacts that will be provided in the export. The defined values of `artifactMode` each build upon the previous one:

Expand All @@ -218,9 +218,9 @@ While `importSwingStore()`'s options bag accepts the same options as `openSwingS
So, to avoid pruning current-incarnation historical transcript spans when exporting from one swingstore to another, you must set (or avoid overriding) the following options along the way:

* the original swingstore must not be opened with `{ keepTranscripts: false }`, otherwise the old spans will be pruned immediately
* the export must use `makeSwingStoreExporter(dirpath, { artifactMode: 'replay'})`, otherwise the export will omit the old spans
* the import must use `importSwingStore(exporter, dirPath, { artifactMode: 'replay'})`, otherwise the import will ignore the old spans
* subsequent `openSwingStore` calls must not use `keepTranscripts: false`, otherwise the new swingstore will prune historical spans as new ones are created (during `rolloverSpan`).
* the export must use `makeSwingStoreExporter(dirpath, { artifactMode: 'replay' })`, otherwise the export will omit the old spans
* the import must use `importSwingStore(exporter, dirPath, { artifactMode: 'replay' })`, otherwise the import will ignore the old spans
* subsequent `openSwingStore` calls must not use `keepTranscripts: false`, otherwise the new swingstore will prune historical spans they are replaced during `rolloverSpan`.

## Implementation Details

Expand Down
Loading
Loading