Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Declarative RLP Encoding/Decoding #7975

Draft
wants to merge 104 commits into
base: master
Choose a base branch
from
Draft

Conversation

emlautarom1
Copy link
Contributor

@emlautarom1 emlautarom1 commented Dec 26, 2024

Changes

  • Introduce an alternative approach to RLP encoding and decoding, based on a declarative API with support for code generation through Source Generators

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other: Description

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

The core library has 100% test coverage. Source generated code might not be fully covered.

Documentation

Requires documentation update

  • Yes
  • No

Requires explanation in Release Notes

  • Yes
  • No

Remarks

When we started working on refactoring our TxDecoder one thing that came up was how unergonomic is to work with our current RLP API. We even have some comments on the code itself mentioning these difficulties, for example:

/// <summary>
/// We pay a high code quality tax for the performance optimization on RLP.
/// Adding more RLP decoders is costly (time wise) but the path taken saves a lot of allocations and GC.
/// Shall we consider code generation for this? We could potentially generate IL from attributes for each
/// RLP serializable item and keep it as a compiled call available at runtime.
/// It would be slightly slower but still much faster than what we would get from using dynamic serializers.
/// </summary>

/// <summary>
/// We pay a big copy-paste tax to maintain ValueDecoders but we believe that the amount of allocations saved
/// make it worth it. To be reviewed periodically.
/// Question to Lukasz here -> would it be fine to always use ValueDecoderContext only?
/// I believe it cannot be done for the network items decoding and is only relevant for the DB loads.
/// </summary>

This PR introduces a new RLP API based on #7334 (comment) with several improvements:

  • Describe the structure of a record and get encoding and decoding for free. No code duplication required.
  • Records can be described using other records. Supports conditional, exceptions, function calls, etc.
  • Decoding and encoding are extensible through classes that can be defined anywhere, plus some extension methods.
  • Minimal core library with 100% code coverage.
  • Supports backtracking.
  • All function calls are known ahead of time (no virtual or override). Interfaces are only used to enforce implementations.
  • Despite the extensive usage of lambdas, no closures are required (all lambdas are static). You can still use them if you want to, but overloads are provided to avoid them.
  • Automatically generate the required code through Source Generators.

@emlautarom1
Copy link
Contributor Author

Updated generators to use string interpolation. There are some places where we still use StringBuilder though.

@LukaszRozmej
Copy link
Member

This is how the generated code looks like for the Rose Tree (formatted):

Ok but can we do something real-life for us? Like BlockHeader for example?

Copy link
Member

@LukaszRozmej LukaszRozmej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about RlpBehaviors?


public int Length { get; private set; }

private byte[] _buffer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We often write into Netty arena-based buffers to avoid allocations, would be good to support that.

Copy link
Contributor

@Scooletz Scooletz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What an interesting take on this topic 😍 A few remarks provided as comments. General:

  1. Probably [SkipLocalInit] would be beneficial.
  2. Benchmarking with potential ASM output
  3. Converting some of the existing ones and performing mano-a-mano comparison.

public static int Read(ReadOnlySpan<byte> source)
{
Span<byte> buffer = stackalloc byte[sizeof(Int32)];
source.CopyTo(buffer[^source.Length..]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe.ReadUnalaligned? Why copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method was added due to BinaryPrimitives.ReadInt32BigEndian requiring exactly 4 byte to read a Int32, so we pad source with enough 0 so it can properly decode.

I've never used Unsafe.ReadUnaligned, not even usafe code. How would that look?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use Bytes class we already have?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't think about it, but seems like implementations are the same:

Span<byte> fourBytes = stackalloc byte[4];
bytes.CopyTo(fourBytes[(4 - bytes.Length)..]);
return BinaryPrimitives.ReadInt32BigEndian(fourBytes);

Seems a bit overkill to add a project reference for 3 LOCs.

action(ref lengthWriter, ctx);
var serialized = new byte[lengthWriter.Length];
var contentWriter = RlpWriter.ContentWriter(serialized);
action(ref contentWriter, ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is it used? Do we write the data twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is what used? We "write" the data twice to first compute the length (LengthWriter), and then we actually write the bytes into a buffer (ContentWriter).

};
}

public static RlpWriter ContentWriter(byte[] buffer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered adding an option aligned with Utf8JsonWriter where there's a ctor that accepts IBufferWriter<byte>? This would mean that ufortunately dicontinued chunks should be supported, but could allow to provide a writer over anything. Maybe this could help to address Netty comment from @LukaszRozmej

https://learn.microsoft.com/en-us/dotnet/api/system.text.json.utf8jsonwriter.-ctor?view=net-9.0#system-text-json-utf8jsonwriter-ctor(system-buffers-ibufferwriter((system-byte))-system-text-json-jsonwriteroptions)

Copy link
Contributor Author

@emlautarom1 emlautarom1 Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I picked byte[] as a safe default but there is no reason why other type could not be used.

  • IBufferWriter<T> is quite small, it's part of the std lib and it supports Span-based APIs.

  • NettyRlpStream is based on IByteBuffer, the latter which is defined in DotNetty.Buffers. It supports the operations that we need but it's quite large.

Now, the issue is that IByteBuffer and IBufferWriter are unrelated so we would need to pick one (unless we start doing conversions).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IBufferWriter is probably more useful, we can try making adapter to IByteBuffer if we want


public void Initialize(IncrementalGeneratorInitializationContext context)
{
var provider = context.SyntaxProvider.CreateSyntaxProvider(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider attribute based creation using ForAttributeWithMetadataName. It should be much more cheaper than scanning all the records and only then select on attribute basis.

@emlautarom1
Copy link
Contributor Author

@LukaszRozmej RlpBehaviors are not explicitly supported by the API but you can get the same behavior by manually passing any "context" when reading/writing.

Note that we recently changed the Rlp interface so trailing bytes throw by default. If you want more control over what happens before or after reading/writing you can use the RlpReader and RlpWriter APIs directly.

@emlautarom1
Copy link
Contributor Author

I've added a benchmark that encodes and decodes an AccessList as defined in:

public class AccessList : IEnumerable<(Address Address, AccessList.StorageKeysEnumerable StorageKeys)>

Results on my machine are the following:

| Method  | Mean     | Error   | StdDev  | Ratio |
|-------- |---------:|--------:|--------:|------:|
| Current | 343.9 us | 1.43 us | 1.34 us |  1.00 |
| Fluent  | 834.9 us | 2.34 us | 2.19 us |  2.43 |

There is room for a possible optimization: some records like Address have a known fixed byte size which we can leverage to avoid having to copy bytes twice: once to figure out the length and the other to actually copy the bytes.

@emlautarom1
Copy link
Contributor Author

Replacing Marshal.SizeOf<T>() with sizeof(T) and some unsafe annotations gives quite the boost at no cost:

| Method  | Mean     | Error   | StdDev  | Ratio | RatioSD |
|-------- |---------:|--------:|--------:|------:|--------:|
| Current | 359.8 us | 5.03 us | 4.70 us |  1.00 |    0.02 |
| Fluent  | 626.2 us | 2.90 us | 2.42 us |  1.74 |    0.02 |

var size = sizeof(T);
Span<byte> bigEndian = stackalloc byte[size];
value.WriteBigEndian(bigEndian);
bigEndian = bigEndian.TrimStart((byte)0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TrimStart does not seem to be heavily optimized. There might be something better that we can use, specially considering that we're removing leading zeros.

@emlautarom1 emlautarom1 requested review from Scooletz and LukaszRozmej and removed request for Scooletz January 2, 2025 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants