Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(sequencer)!: exclusively use Borsh encoding for stored data (ENG-768) #1492

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Fraser999
Copy link
Contributor

Summary

This PR primarily changes the encoding format of all data being written to storage to Borsh.

Background

We currently have a variety of different encoding formats, and this can be confusing, sub-optimal (in terms of storage space consumed and serialization/deserialization performance) and potentially problematic as e.g. JSON-encoding leaves a lot of flexibility in how the actual serialized data can look.

This PR is part one of three which aim to improve the performance and quality of the storage component. As such, the APIs of the various StateReadExt and StateWriteExt extension traits were updated slightly in preparation for the upcoming changes. In broad terms, for getters this meant having ref parameters rather than value ones (even for copyable types like [u8; 32] this is significantly more performant), and for "putters", parameters which are used for DB keys are refs, while the DB value parameters become values, since in the next PR these values will be added to a cache.

Changes

  • Added a new storage module. This will ultimately contain our own equivalents of the cnidarium types, but for now consists only of a collection of submodules for types currently being written to storage. There is a top-level enum StoredValue which becomes the only type being written to storage by our own code.
  • To accommodate for converting between the storage types and the corresponding domain types defined in astria-core and astria-merkle, some of these have been provided with constructors named unchecked_from_parts. This allows the type to be constructed from a trusted source like our own DB, skipping any validation steps which might otherwise be done.
  • Updated all StateReadExt and StateWriteExt traits to use the new StoredValue, which internally uses Borsh encoding.
  • Updated the APIs of all these extension traits as described above. This change resulted in a slew of similar updates to avoid needless copying [u8; 32] and [u8; 20] arrays, and that has unfortunately made the PR larger than expected.
  • A few core types which currently derive or implement serde traits have had that removed, since it only existed to support JSON-encoding the type for storing. In one case it was for JSON-encoding to a debug log line, but I replaced that with a Display impl.

Testing

  • Existing unit tests of the StateReadExt and StateWriteExt traits already had almost full coverage. Where coverage was missing, I added round trip tests.

Breaking Changelist

  • The on-disk format of stored data has changed.

Related Issues

Closes #1434.

@github-actions github-actions bot added conductor pertaining to the astria-conductor crate sequencer pertaining to the astria-sequencer crate composer pertaining to composer labels Sep 11, 2024
Copy link
Member

@SuperFluffy SuperFluffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level comment on the approach taken in this design:

At a high level, sequencer is defined by a set of components, each of which provides ways to change and read their respective state. So far this also meant that each component had full control over how the objects written to its state are converted to/from the domain types.

The storage and stored_value modules invert this logic by defining and passing in the on-disk types into the components.

I am ambivalent about this change since on the one hand it potentially allows avoiding code duplication. But on the other hand it breaks with the separation between the components we had so far.

@Fraser999
Copy link
Contributor Author

So far this also meant that each component had full control over how the objects written to its state are converted to/from the domain types.

I see that as one of the main problems with the existing approach. It's because we've allowed this flexibility that different components use different serialization for the same domain types. In a lot of cases, there is only the domain type - the component would serialize directly from the domain type into bytes. I'm very keen to remove that flexibility so that all components have to use Borsh and we have a complete collection of storage types which map to/from domain types, both serving different purposes (domain types supporting business logic only, storage types supporting being held in a DB only).

The storage and stored_value modules invert this logic by defining and passing in the on-disk types into the components.

This is only temporary until #1436 is completed. When that work is completed, the components will be dealing with the domain types only.

But on the other hand it breaks with the separation between the components we had so far.

Well, I don't think there's complete separation currently. For example, accounts::StateReadExt is used in a few other components. But this isn't really changed with the new approach IMO. Ultimately the components will continue to access/modify state relating to other components via their state extension traits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
composer pertaining to composer conductor pertaining to the astria-conductor crate sequencer pertaining to the astria-sequencer crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use single serialization format for all block state data
2 participants