Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement framework for benchmarking operations on chain #8327

Open
1 of 4 tasks
FUDCo opened this issue Sep 12, 2023 · 1 comment
Open
1 of 4 tasks

Implement framework for benchmarking operations on chain #8327

FUDCo opened this issue Sep 12, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request performance Performance related issues

Comments

@FUDCo
Copy link
Contributor

FUDCo commented Sep 12, 2023

What is the Problem Being Solved?

Right now we don't really have a good way to get performance measurements regarding the operation of contracts or any other code that is running in a vat or vats on chain, short of examining the operation of the production chain itself. Clearly it's not practical or wise to just try things out in production, which makes performance engineering our overall system challenging. We want to fix this by providing tooling for developers to write performance benchmarks, execute them in a mostly realistic environment, and then measure their performance, all in support of a normal development code-test-debug-try-again lifecycle except focused on performance engineering.

Description of the design

To this end, we've identified a couple of different strategies:

Swingset-runner is capable of running arbitrary swingsets and can be adapted to running relatistic benchmarks by adding code to emulate the function of the bridge device and other chain-specific machinery such as that which cosmic-swingset provides. Moreover, swingset-runner already has support for benchmark orchestration and data collection. Since its means of dynamically loading code is by launching vats, a benchmark test itself has to be done from within the swingset via a driver vat that implements the benchmark logic itself. Note that this is quite different from how Ava tests are written, even though we have a strong suspicion that there are quite a few tests that are likely to be the seeds of benchmarks of the functionality that those tests exercise. However, these tests would need substantial adaptation to be run from inside a vat. On the other hand, the resulting performance simulation should be quite accurate. We call this approach the "inside view".

Tests written using Ava are capable of driving a swingset from the outside, but Ava itself is not really architected to be a benchmark driver (though we have made a preliminary step in that direction: see #7960). However, a simple benchmark driver framework inspired by Ava but specifically intended for the implementation of benchmarks instead of correctness tests should, in principle, be relatively straightforward to construct. This framework would take care of setting up all the basic chain infrastructure (e.g., by executing the chain bootstrap that gets the vats and devices that constitute the basic Agoric ecosystem up and running), leaving the benchmark authors to only have to implement the parts of the benchmark that involve the specific functionality being measured. This framework would also take care of measuring timing values & other resource usage, then collecting and recording this data, in much the same manner as swingset-runner already does. The principal benefits of this strategy are speed and simplicity from the perspective of the benchmark authors. We call this approach the "outside view".

In principle these two approaches are complementary, though it seems likely that one or the other will become the dominant form (I'd bet on that being the outside view approach due to developer convenience, though I personally like the inside view approach more).

Other considerations

This issue is an epic to track our work on these frameworks. Note that as of this writing, substantial development work on both fronts has already happened (in particular, the first pass at the inside view has already landed in the form of PR #8239). This issue is backfilling the informal plan that we have already been following, so that it can be properly tracked and monitored in our project management system.

Tasks

@FUDCo
Copy link
Contributor Author

FUDCo commented Sep 21, 2023

Here is a list of plausible benchmarking tool features which have not yet been implemented.

One major reason these haven't been implemented is that we lack enough operational experience with this very immature tooling to know if these features are things we actually need or want yet. But somebody thought of them and so I'm collecting them here so the ideas don't get lost.

  • Per round setup and teardown functions, to complement the overall benchmark setup and teardown functions
  • Alternate stop criteria instead of or in addition to the number of rounds
    • Total elapsed time running benchmark rounds
    • A custom "stop?" predicated provided by the benchmark author as part of the benchmark definition
  • "Agent" definitions selected via a benchmark configuration option
  • Swingset configuration and/or chain configuration selected from a palette of pre-configured choices or specified directly by providing paths to configuration files
  • Use globbing instead of regexps for benchmark selection filters

FUDCo added a commit that referenced this issue Sep 22, 2023
`PassableEncoding` (not to be confused with `encodePassable`) was a non-standard
serialization scheme used in some tests, which encoded remote references in-band
using Symbols with magic names rather than using the normal marshal package
machinery that puts these into the 'slots' element of the standard capdata
format.  This bypassed various message filtering and transformation logic in the
kernel and also required special methods to be present in the bootstrap vat to
translate this encoding and relay messages to their actual intended
destinations.

This has now been removed.  The relatively small number of tests which used
`passableEncoding` have been updated to use `kmarshal` instead.  Messages and
data are now encoded in a form that all the other code understands.  Test
messages are also now delivered directly to their destinations without having to
count on the existence of a relayer.

In support of this, the controller's `queueToVatRoot` method has been augmented
by the addition of a `queueToVatObject` method, allowing tests to send messages
to specific objects, targeted using remotable references of the sort returned
by `kunser`.  The test support library that a lot of the bootstrap tests use
has been updated to use this improved mechanism.

In addition, `kmarshal` itself has been upgraded using a trick that MarkM
provided for tagging promises, which allows `kmarshal` to be truly stateless.
The (former) statefulness of `kmarshal` caused problems when the module was
imported into different compartments, as each compartment ended up with its own
module instance and thus its own version of the state. This in turn caused these
compartments to have different beliefs about how particular promises were
represented, which caused various things to break.  That's all fixed now.

One wart which has NOT been taken care of in this PR, but which will be
addressed in a follow-on PR that we were already planning for, is the
duplication of `kmarshal.js` in both the SwingSet package and the liveslots
package.  The forthcoming PR will perform a bunch of file renaming and
relocation to put a bunch of support tooling, used by both benchmarks and tests,
into a package of its own, thereby eliminating a lot of weird dependencies and
files in places they don't belong.  As part of this I plan to relocate
`kmarshal` into a package of its own that can then be cleanly imported by the
kernel, liveslots, and the various tests and test support tooling.

All this is in support of issue #8327
FUDCo added a commit that referenced this issue Sep 24, 2023
`PassableEncoding` (not to be confused with `encodePassable`) was a non-standard
serialization scheme used in some tests, which encoded remote references in-band
using Symbols with magic names rather than using the normal marshal package
machinery that puts these into the 'slots' element of the standard capdata
format.  This bypassed various message filtering and transformation logic in the
kernel and also required special methods to be present in the bootstrap vat to
translate this encoding and relay messages to their actual intended
destinations.

This has now been removed.  The relatively small number of tests which used
`passableEncoding` have been updated to use `kmarshal` instead.  Messages and
data are now encoded in a form that all the other code understands.  Test
messages are also now delivered directly to their destinations without having to
count on the existence of a relayer.

In support of this, the controller's `queueToVatRoot` method has been augmented
by the addition of a `queueToVatObject` method, allowing tests to send messages
to specific objects, targeted using remotable references of the sort returned
by `kunser`.  The test support library that a lot of the bootstrap tests use
has been updated to use this improved mechanism.

In addition, `kmarshal` itself has been upgraded using a trick that MarkM
provided for tagging promises, which allows `kmarshal` to be truly stateless.
The (former) statefulness of `kmarshal` caused problems when the module was
imported into different compartments, as each compartment ended up with its own
module instance and thus its own version of the state. This in turn caused these
compartments to have different beliefs about how particular promises were
represented, which caused various things to break.  That's all fixed now.

One wart which has NOT been taken care of in this PR, but which will be
addressed in a follow-on PR that we were already planning for, is the
duplication of `kmarshal.js` in both the SwingSet package and the liveslots
package.  The forthcoming PR will perform a bunch of file renaming and
relocation to put a bunch of support tooling, used by both benchmarks and tests,
into a package of its own, thereby eliminating a lot of weird dependencies and
files in places they don't belong.  As part of this I plan to relocate
`kmarshal` into a package of its own that can then be cleanly imported by the
kernel, liveslots, and the various tests and test support tooling.

All this is in support of issue #8327
turadg pushed a commit that referenced this issue Sep 26, 2023
`PassableEncoding` (not to be confused with `encodePassable`) was a non-standard
serialization scheme used in some tests, which encoded remote references in-band
using Symbols with magic names rather than using the normal marshal package
machinery that puts these into the 'slots' element of the standard capdata
format.  This bypassed various message filtering and transformation logic in the
kernel and also required special methods to be present in the bootstrap vat to
translate this encoding and relay messages to their actual intended
destinations.

This has now been removed.  The relatively small number of tests which used
`passableEncoding` have been updated to use `kmarshal` instead.  Messages and
data are now encoded in a form that all the other code understands.  Test
messages are also now delivered directly to their destinations without having to
count on the existence of a relayer.

In support of this, the controller's `queueToVatRoot` method has been augmented
by the addition of a `queueToVatObject` method, allowing tests to send messages
to specific objects, targeted using remotable references of the sort returned
by `kunser`.  The test support library that a lot of the bootstrap tests use
has been updated to use this improved mechanism.

In addition, `kmarshal` itself has been upgraded using a trick that MarkM
provided for tagging promises, which allows `kmarshal` to be truly stateless.
The (former) statefulness of `kmarshal` caused problems when the module was
imported into different compartments, as each compartment ended up with its own
module instance and thus its own version of the state. This in turn caused these
compartments to have different beliefs about how particular promises were
represented, which caused various things to break.  That's all fixed now.

One wart which has NOT been taken care of in this PR, but which will be
addressed in a follow-on PR that we were already planning for, is the
duplication of `kmarshal.js` in both the SwingSet package and the liveslots
package.  The forthcoming PR will perform a bunch of file renaming and
relocation to put a bunch of support tooling, used by both benchmarks and tests,
into a package of its own, thereby eliminating a lot of weird dependencies and
files in places they don't belong.  As part of this I plan to relocate
`kmarshal` into a package of its own that can then be cleanly imported by the
kernel, liveslots, and the various tests and test support tooling.

All this is in support of issue #8327
turadg pushed a commit that referenced this issue Sep 26, 2023
`PassableEncoding` (not to be confused with `encodePassable`) was a non-standard
serialization scheme used in some tests, which encoded remote references in-band
using Symbols with magic names rather than using the normal marshal package
machinery that puts these into the 'slots' element of the standard capdata
format.  This bypassed various message filtering and transformation logic in the
kernel and also required special methods to be present in the bootstrap vat to
translate this encoding and relay messages to their actual intended
destinations.

This has now been removed.  The relatively small number of tests which used
`passableEncoding` have been updated to use `kmarshal` instead.  Messages and
data are now encoded in a form that all the other code understands.  Test
messages are also now delivered directly to their destinations without having to
count on the existence of a relayer.

In support of this, the controller's `queueToVatRoot` method has been augmented
by the addition of a `queueToVatObject` method, allowing tests to send messages
to specific objects, targeted using remotable references of the sort returned
by `kunser`.  The test support library that a lot of the bootstrap tests use
has been updated to use this improved mechanism.

In addition, `kmarshal` itself has been upgraded using a trick that MarkM
provided for tagging promises, which allows `kmarshal` to be truly stateless.
The (former) statefulness of `kmarshal` caused problems when the module was
imported into different compartments, as each compartment ended up with its own
module instance and thus its own version of the state. This in turn caused these
compartments to have different beliefs about how particular promises were
represented, which caused various things to break.  That's all fixed now.

One wart which has NOT been taken care of in this PR, but which will be
addressed in a follow-on PR that we were already planning for, is the
duplication of `kmarshal.js` in both the SwingSet package and the liveslots
package.  The forthcoming PR will perform a bunch of file renaming and
relocation to put a bunch of support tooling, used by both benchmarks and tests,
into a package of its own, thereby eliminating a lot of weird dependencies and
files in places they don't belong.  As part of this I plan to relocate
`kmarshal` into a package of its own that can then be cleanly imported by the
kernel, liveslots, and the various tests and test support tooling.

All this is in support of issue #8327
@warner warner assigned warner and unassigned FUDCo Jan 29, 2024
anilhelvaci pushed a commit to Jorge-Lopes/agoric-sdk that referenced this issue Feb 16, 2024
`PassableEncoding` (not to be confused with `encodePassable`) was a non-standard
serialization scheme used in some tests, which encoded remote references in-band
using Symbols with magic names rather than using the normal marshal package
machinery that puts these into the 'slots' element of the standard capdata
format.  This bypassed various message filtering and transformation logic in the
kernel and also required special methods to be present in the bootstrap vat to
translate this encoding and relay messages to their actual intended
destinations.

This has now been removed.  The relatively small number of tests which used
`passableEncoding` have been updated to use `kmarshal` instead.  Messages and
data are now encoded in a form that all the other code understands.  Test
messages are also now delivered directly to their destinations without having to
count on the existence of a relayer.

In support of this, the controller's `queueToVatRoot` method has been augmented
by the addition of a `queueToVatObject` method, allowing tests to send messages
to specific objects, targeted using remotable references of the sort returned
by `kunser`.  The test support library that a lot of the bootstrap tests use
has been updated to use this improved mechanism.

In addition, `kmarshal` itself has been upgraded using a trick that MarkM
provided for tagging promises, which allows `kmarshal` to be truly stateless.
The (former) statefulness of `kmarshal` caused problems when the module was
imported into different compartments, as each compartment ended up with its own
module instance and thus its own version of the state. This in turn caused these
compartments to have different beliefs about how particular promises were
represented, which caused various things to break.  That's all fixed now.

One wart which has NOT been taken care of in this PR, but which will be
addressed in a follow-on PR that we were already planning for, is the
duplication of `kmarshal.js` in both the SwingSet package and the liveslots
package.  The forthcoming PR will perform a bunch of file renaming and
relocation to put a bunch of support tooling, used by both benchmarks and tests,
into a package of its own, thereby eliminating a lot of weird dependencies and
files in places they don't belong.  As part of this I plan to relocate
`kmarshal` into a package of its own that can then be cleanly imported by the
kernel, liveslots, and the various tests and test support tooling.

All this is in support of issue Agoric#8327
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Performance related issues
Projects
None yet
Development

No branches or pull requests

2 participants