-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Declarative RLP Encoding/Decoding #7975
base: master
Are you sure you want to change the base?
Conversation
- Test for Set Theoretical Representation
- Extend instances
- Good things happen to those who respect symmetry.
- Code works, but compiler complains
- Primitives are integers, byte sequences, and lists - No need for `IntRlpConverter` (covered by primitives)
- Replace virtual calls with conditional and inner state flag - Refactor call sites
- Rename to `FastRlp` to avoid conflicts
Updated generators to use string interpolation. There are some places where we still use |
…ature/declarative-rlp
Ok but can we do something real-life for us? Like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about RlpBehaviors
?
|
||
public int Length { get; private set; } | ||
|
||
private byte[] _buffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We often write into Netty arena-based buffers to avoid allocations, would be good to support that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What an interesting take on this topic 😍 A few remarks provided as comments. General:
- Probably
[SkipLocalInit]
would be beneficial. - Benchmarking with potential ASM output
- Converting some of the existing ones and performing mano-a-mano comparison.
public static int Read(ReadOnlySpan<byte> source) | ||
{ | ||
Span<byte> buffer = stackalloc byte[sizeof(Int32)]; | ||
source.CopyTo(buffer[^source.Length..]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe.ReadUnalaligned
? Why copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method was added due to BinaryPrimitives.ReadInt32BigEndian
requiring exactly 4 byte to read a Int32
, so we pad source
with enough 0
so it can properly decode.
I've never used Unsafe.ReadUnaligned
, not even usafe
code. How would that look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use Bytes
class we already have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't think about it, but seems like implementations are the same:
nethermind/src/Nethermind/Nethermind.Core/Extensions/Bytes.cs
Lines 439 to 441 in c874877
Span<byte> fourBytes = stackalloc byte[4]; | |
bytes.CopyTo(fourBytes[(4 - bytes.Length)..]); | |
return BinaryPrimitives.ReadInt32BigEndian(fourBytes); |
Seems a bit overkill to add a project reference for 3 LOCs.
action(ref lengthWriter, ctx); | ||
var serialized = new byte[lengthWriter.Length]; | ||
var contentWriter = RlpWriter.ContentWriter(serialized); | ||
action(ref contentWriter, ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is it used? Do we write the data twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is what used? We "write" the data twice to first compute the length (LengthWriter
), and then we actually write the bytes into a buffer (ContentWriter
).
}; | ||
} | ||
|
||
public static RlpWriter ContentWriter(byte[] buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered adding an option aligned with Utf8JsonWriter
where there's a ctor that accepts IBufferWriter<byte>
? This would mean that ufortunately dicontinued chunks should be supported, but could allow to provide a writer over anything. Maybe this could help to address Netty comment from @LukaszRozmej
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I picked byte[]
as a safe default but there is no reason why other type could not be used.
-
IBufferWriter<T>
is quite small, it's part of the std lib and it supportsSpan
-based APIs. -
NettyRlpStream
is based onIByteBuffer
, the latter which is defined inDotNetty.Buffers
. It supports the operations that we need but it's quite large.
Now, the issue is that IByteBuffer
and IBufferWriter
are unrelated so we would need to pick one (unless we start doing conversions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IBufferWriter
is probably more useful, we can try making adapter to IByteBuffer
if we want
|
||
public void Initialize(IncrementalGeneratorInitializationContext context) | ||
{ | ||
var provider = context.SyntaxProvider.CreateSyntaxProvider( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider attribute based creation using ForAttributeWithMetadataName. It should be much more cheaper than scanning all the records and only then select on attribute basis.
@LukaszRozmej Note that we recently changed the |
I've added a benchmark that encodes and decodes an
Results on my machine are the following:
There is room for a possible optimization: some records like |
- Nice speedup
Replacing
|
var size = sizeof(T); | ||
Span<byte> bigEndian = stackalloc byte[size]; | ||
value.WriteBigEndian(bigEndian); | ||
bigEndian = bigEndian.TrimStart((byte)0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TrimStart
does not seem to be heavily optimized. There might be something better that we can use, specially considering that we're removing leading zeros.
Changes
Types of changes
What types of changes does your code introduce?
Testing
Requires testing
If yes, did you write tests?
Notes on testing
The core library has 100% test coverage. Source generated code might not be fully covered.
Documentation
Requires documentation update
Requires explanation in Release Notes
Remarks
When we started working on refactoring our
TxDecoder
one thing that came up was how unergonomic is to work with our current RLP API. We even have some comments on the code itself mentioning these difficulties, for example:nethermind/src/Nethermind/Nethermind.Serialization.Rlp/Eip2930/AccessListDecoder.cs
Lines 17 to 23 in b81070d
nethermind/src/Nethermind/Nethermind.Serialization.Rlp/Eip2930/AccessListDecoder.cs
Lines 76 to 81 in b81070d
This PR introduces a new RLP API based on #7334 (comment) with several improvements:
virtual
oroverride
). Interfaces are only used to enforce implementations.static
). You can still use them if you want to, but overloads are provided to avoid them.