Implement framework for benchmarking operations on chain #8327

FUDCo · 2023-09-12T23:32:06Z

What is the Problem Being Solved?

Right now we don't really have a good way to get performance measurements regarding the operation of contracts or any other code that is running in a vat or vats on chain, short of examining the operation of the production chain itself. Clearly it's not practical or wise to just try things out in production, which makes performance engineering our overall system challenging. We want to fix this by providing tooling for developers to write performance benchmarks, execute them in a mostly realistic environment, and then measure their performance, all in support of a normal development code-test-debug-try-again lifecycle except focused on performance engineering.

Description of the design

To this end, we've identified a couple of different strategies:

Swingset-runner is capable of running arbitrary swingsets and can be adapted to running relatistic benchmarks by adding code to emulate the function of the bridge device and other chain-specific machinery such as that which cosmic-swingset provides. Moreover, swingset-runner already has support for benchmark orchestration and data collection. Since its means of dynamically loading code is by launching vats, a benchmark test itself has to be done from within the swingset via a driver vat that implements the benchmark logic itself. Note that this is quite different from how Ava tests are written, even though we have a strong suspicion that there are quite a few tests that are likely to be the seeds of benchmarks of the functionality that those tests exercise. However, these tests would need substantial adaptation to be run from inside a vat. On the other hand, the resulting performance simulation should be quite accurate. We call this approach the "inside view".

Tests written using Ava are capable of driving a swingset from the outside, but Ava itself is not really architected to be a benchmark driver (though we have made a preliminary step in that direction: see #7960). However, a simple benchmark driver framework inspired by Ava but specifically intended for the implementation of benchmarks instead of correctness tests should, in principle, be relatively straightforward to construct. This framework would take care of setting up all the basic chain infrastructure (e.g., by executing the chain bootstrap that gets the vats and devices that constitute the basic Agoric ecosystem up and running), leaving the benchmark authors to only have to implement the parts of the benchmark that involve the specific functionality being measured. This framework would also take care of measuring timing values & other resource usage, then collecting and recording this data, in much the same manner as swingset-runner already does. The principal benefits of this strategy are speed and simplicity from the perspective of the benchmark authors. We call this approach the "outside view".

In principle these two approaches are complementary, though it seems likely that one or the other will become the dominant form (I'd bet on that being the outside view approach due to developer convenience, though I personally like the inside view approach more).

Other considerations

This issue is an epic to track our work on these frameworks. Note that as of this writing, substantial development work on both fronts has already happened (in particular, the first pass at the inside view has already landed in the form of PR #8239). This issue is backfilling the informal plan that we have already been following, so that it can be properly tracked and monitored in our project management system.

Tasks

Give feedback

Get rid of passableEncoding.js in favor of kmarshal
Refactor benchmark driver and bootstrap test support code into a separate library
Document benchmark data format
Take the next step along the learning curve by implementing a couple more benchmarks along with whatever improvements they demand
Options

The text was updated successfully, but these errors were encountered:

FUDCo · 2023-09-21T07:49:15Z

Here is a list of plausible benchmarking tool features which have not yet been implemented.

One major reason these haven't been implemented is that we lack enough operational experience with this very immature tooling to know if these features are things we actually need or want yet. But somebody thought of them and so I'm collecting them here so the ideas don't get lost.

Per round setup and teardown functions, to complement the overall benchmark setup and teardown functions
Alternate stop criteria instead of or in addition to the number of rounds
- Total elapsed time running benchmark rounds
- A custom "stop?" predicated provided by the benchmark author as part of the benchmark definition
"Agent" definitions selected via a benchmark configuration option
Swingset configuration and/or chain configuration selected from a palette of pre-configured choices or specified directly by providing paths to configuration files
Use globbing instead of regexps for benchmark selection filters

`PassableEncoding` (not to be confused with `encodePassable`) was a non-standard serialization scheme used in some tests, which encoded remote references in-band using Symbols with magic names rather than using the normal marshal package machinery that puts these into the 'slots' element of the standard capdata format. This bypassed various message filtering and transformation logic in the kernel and also required special methods to be present in the bootstrap vat to translate this encoding and relay messages to their actual intended destinations. This has now been removed. The relatively small number of tests which used `passableEncoding` have been updated to use `kmarshal` instead. Messages and data are now encoded in a form that all the other code understands. Test messages are also now delivered directly to their destinations without having to count on the existence of a relayer. In support of this, the controller's `queueToVatRoot` method has been augmented by the addition of a `queueToVatObject` method, allowing tests to send messages to specific objects, targeted using remotable references of the sort returned by `kunser`. The test support library that a lot of the bootstrap tests use has been updated to use this improved mechanism. In addition, `kmarshal` itself has been upgraded using a trick that MarkM provided for tagging promises, which allows `kmarshal` to be truly stateless. The (former) statefulness of `kmarshal` caused problems when the module was imported into different compartments, as each compartment ended up with its own module instance and thus its own version of the state. This in turn caused these compartments to have different beliefs about how particular promises were represented, which caused various things to break. That's all fixed now. One wart which has NOT been taken care of in this PR, but which will be addressed in a follow-on PR that we were already planning for, is the duplication of `kmarshal.js` in both the SwingSet package and the liveslots package. The forthcoming PR will perform a bunch of file renaming and relocation to put a bunch of support tooling, used by both benchmarks and tests, into a package of its own, thereby eliminating a lot of weird dependencies and files in places they don't belong. As part of this I plan to relocate `kmarshal` into a package of its own that can then be cleanly imported by the kernel, liveslots, and the various tests and test support tooling. All this is in support of issue #8327

`PassableEncoding` (not to be confused with `encodePassable`) was a non-standard serialization scheme used in some tests, which encoded remote references in-band using Symbols with magic names rather than using the normal marshal package machinery that puts these into the 'slots' element of the standard capdata format. This bypassed various message filtering and transformation logic in the kernel and also required special methods to be present in the bootstrap vat to translate this encoding and relay messages to their actual intended destinations. This has now been removed. The relatively small number of tests which used `passableEncoding` have been updated to use `kmarshal` instead. Messages and data are now encoded in a form that all the other code understands. Test messages are also now delivered directly to their destinations without having to count on the existence of a relayer. In support of this, the controller's `queueToVatRoot` method has been augmented by the addition of a `queueToVatObject` method, allowing tests to send messages to specific objects, targeted using remotable references of the sort returned by `kunser`. The test support library that a lot of the bootstrap tests use has been updated to use this improved mechanism. In addition, `kmarshal` itself has been upgraded using a trick that MarkM provided for tagging promises, which allows `kmarshal` to be truly stateless. The (former) statefulness of `kmarshal` caused problems when the module was imported into different compartments, as each compartment ended up with its own module instance and thus its own version of the state. This in turn caused these compartments to have different beliefs about how particular promises were represented, which caused various things to break. That's all fixed now. One wart which has NOT been taken care of in this PR, but which will be addressed in a follow-on PR that we were already planning for, is the duplication of `kmarshal.js` in both the SwingSet package and the liveslots package. The forthcoming PR will perform a bunch of file renaming and relocation to put a bunch of support tooling, used by both benchmarks and tests, into a package of its own, thereby eliminating a lot of weird dependencies and files in places they don't belong. As part of this I plan to relocate `kmarshal` into a package of its own that can then be cleanly imported by the kernel, liveslots, and the various tests and test support tooling. All this is in support of issue Agoric#8327

FUDCo added the enhancement New feature or request label Sep 12, 2023

FUDCo assigned warner and FUDCo Sep 12, 2023

FUDCo mentioned this issue Sep 12, 2023

Introducing The Benchmarkerator, a platform for standalone performance benchmarks #8312

Merged

warner added the performance Performance related issues label Sep 14, 2023

FUDCo mentioned this issue Sep 22, 2023

Eliminate the passableEncoding hack #8373

Merged

FUDCo mentioned this issue Sep 28, 2023

Package kmarshal #8399

Merged

erights mentioned this issue Sep 28, 2023

fix(SwingSet): adapt to promise tagging support #8403

Merged

FUDCo mentioned this issue Oct 2, 2023

Move benchmarkerator and some test support code out of the boot package #8421

Merged

toliaqat mentioned this issue Oct 5, 2023

CI job to run benchmarks and publish data to a dashboard #7964

Closed

ivanlei unassigned warner Nov 27, 2023

FUDCo mentioned this issue Dec 11, 2023

Create a "blog post" working through use of Benchmarkerator #8647

Closed

warner assigned warner and unassigned FUDCo Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement framework for benchmarking operations on chain #8327

Implement framework for benchmarking operations on chain #8327

FUDCo commented Sep 12, 2023 •

edited

Loading

Tasks

FUDCo commented Sep 21, 2023

Implement framework for benchmarking operations on chain #8327

Implement framework for benchmarking operations on chain #8327

Comments

FUDCo commented Sep 12, 2023 • edited Loading

What is the Problem Being Solved?

Description of the design

Other considerations

Tasks

FUDCo commented Sep 21, 2023

FUDCo commented Sep 12, 2023 •

edited

Loading