Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: RLP::decode() should return an empty string instead of "0" when decoding 0x80 #539

Merged
merged 8 commits into from
Nov 23, 2023

Conversation

Quentash
Copy link
Contributor

@Quentash Quentash commented Nov 14, 2023

The PR #500 brought a bug where the decoding of the input [0x80] returns an String containing the byte "0x00" while it should return an empty String instead.

Pull Request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no API changes)
  • Build-related changes
  • Documentation content changes
  • Other (please describe):

What is the current behavior?

Resolves: #538

What is the new behavior?

  • RLP::decode now return an empty string instead of a String containing "0x00" when decoding "0x80" byte.

Does this introduce a breaking change?

  • Yes
  • No

Some tests in eth_transaction_test.cairo uses test data that were constructed on top of this error, those tests should be corrected as well. Contacting @bajpai244 for more enlightment.

@Eikix
Copy link
Member

Eikix commented Nov 20, 2023

Is this ready to review?

@bajpai244
Copy link
Collaborator

@Quentash, so according to the website, both:

"" and 0x0 are [0x80]

image

Hence, our implementation isn't wrong I guess, is there some bug u have seen in some decoding with this approach? Would love to check that case.

Otherwise, at least from the website, this seems to be the default value fo 0x0 if u are decoding an int from the decoding.

@Quentash
Copy link
Contributor Author

Quentash commented Nov 20, 2023

@bajpai244
Those example are a bit confusing because indeed, it looks like you can encode an integer 0 and it shows that the result should be 0x80. But what borther me in that case is that you could use both 0x00 and 0x80 to decode a "0" and that doesn't seem right.
If you look at the decoding script example in the doc :
image
you can see that decoding a "0x80" prefix would return a strLen set to 0, which sounds more like an empty string than an integer(0).
Also, if 0x80 is decoded as a 0, how could we decode empty string ? We wouldn't have that possibility anymore, or am I missing something ?

And if you look at the encoding script ;
image
image

you can see that if we encode an 0, it will be encoded as 0x00, but if we encode an empty string, it will be encoded as 0x80.

I am actually wondering if the example aren't a little bit incorrect or, at least, not consistent with the code examples they give.
I haven't met any transactions failing or whatsoever, I was just re-coding RLP for Alexandria and noticed this inconsistency(imo).
It just doesn't feel right that you could either use 0x00 or 0x80 to represent a "0x00".

Don't hestitate to ping me on TG/disc if you wanna dive deeper into this topic !

@bajpai244
Copy link
Collaborator

bajpai244 commented Nov 20, 2023

@Quentash I get what u mean here and I can see the inconsistency as well, I think we can put decoding of 0x80 as empty string if u want, but then we also have to make sure that we convert decode "" to 0!

Otherwise we will fail the RLP tests { which already seems to be the case in this PR, the tests are failing }.

Can u investigate whether there are some official Ethereum tests for this this? We can then confirm whether it should be decoded "" or 0.

Alloy can be a great place to check for this!


So, I guess action steps can be:

  • Investigate Alloy-rs to check what they do in their RLP implementation.
  • Investigate if the ethereum foundation has some official tests we can test against.

After this, we should have enough evidence to take some action with confidence.

@Quentash
Copy link
Contributor Author

The only failing tests are those that require a valid signature. Because I have changed the transactions to fit the modifications ( replacing "80" by "00" ) but therefore, those transactions need to be signed again because the r, s, v must be different. ( And this is where I initially needed your help )

But

In those tests : https://github.com/ethereum/tests/blob/develop/RLPTests/rlptest.json

You can notice that both an empty string and 0 as input expects 0x80 as output. ( Alloy-rlp also do a test where the input is 0 as u_64 and it expects 0x80 as output ).

I feel like they have different implementations that would either take integers or strings as input ( https://github.com/alloy-rs/rlp/blob/main/crates/rlp/src/decode.rs ).
But in KKT implementation, we decode/encode some Span which we apparently interpret as String by default.

I think our implementation is also different because we return RLPItems. Maybe deciding that a RLPItem::String with a len() = 0 should be interpreted as a empty string or a 0 should be done at a "higher level" ? ( like in eth_transaction.cairo for example )

Because if we never return an empty string, how could we differentiate an empty string and a string containing "0" ? (0x80 from 0x8100)

@bajpai244
Copy link
Collaborator

The only failing tests are those that require a valid signature. Because I have changed the transactions to fit the modifications ( replacing "80" by "00" ) but therefore, those transactions need to be signed again because the r, s, v must be different. ( And this is where I initially needed your help )

But

In those tests : https://github.com/ethereum/tests/blob/develop/RLPTests/rlptest.json

You can notice that both an empty string and 0 as input expects 0x80 as output. ( Alloy-rlp also do a test where the input is 0 as u_64 and it expects 0x80 as output ).

I feel like they have different implementations that would either take integers or strings as input ( https://github.com/alloy-rs/rlp/blob/main/crates/rlp/src/decode.rs ). But in KKT implementation, we decode/encode some Span which we apparently interpret as String by default.

I think our implementation is also different because we return RLPItems. Maybe deciding that a RLPItem::String with a len() = 0 should be interpreted as a empty string or a 0 should be done at a "higher level" ? ( like in eth_transaction.cairo for example )

Because if we never return an empty string, how could we differentiate an empty string and a string containing "0" ? (0x80 from 0x8100)

@Quentash , going through this it looks like that the interpretation can depend on what we expect as output! I think depending on the type be expect to decode, if we are decoding for string, then 0x80 should be "", otherwise it looks like it should be 0 if are doing in case of uint.

Let's decode 0x80, add a comment that if u are expecting an int, then "" means 0x0.

I am sending u the key I used to sign this data, so that u can update ur test!

Also, good research :)

@Quentash
Copy link
Contributor Author

@Quentash , going through this it looks like that the interpretation can depend on what we expect as output! I think depending on the type be expect to decode, if we are decoding for string, then 0x80 should be "", otherwise it looks like it should be 0 if are doing in case of uint.

Let's decode 0x80, add a comment that if u are expecting an int, then "" means 0x0.

I am sending u the key I used to sign this data, so that u can update ur test!

Also, good research :)

This is also what I understood. Let's do that, I'm on it, and thanks a lot Ser !

@Quentash
Copy link
Contributor Author

As discussed,

  • I've added a comment in decode() explaining that en empty string will be parsed as a 0 for an integer.
  • I've modified the parser functions in RLPHelperImpl to parse empty strings as a 0.
  • Test datas are back to what they were ( 0x80 representing a 0 ).

I think this is ready to be reviewed, don't hesitate to ping me if I missed something.
And thanks again for your help !

@Quentash Quentash marked this pull request as ready for review November 21, 2023 14:46
crates/utils/src/rlp.cairo Outdated Show resolved Hide resolved
Copy link
Collaborator

@enitrat enitrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall

@Quentash
Copy link
Contributor Author

I've added a verification on the payload length.
Indeed, if the payload length si so big that its value holds on more than 4 bytes, let string_len: u32 = U32Trait::from_bytes(string_len_bytes).unwrap(); would fail.
So I've implemented a new error PayloadTooLong being returned when the payload length holds on more than 4 bytes + 2 tests to proc the error with both a long string and a long list.

@bajpai244
Copy link
Collaborator

@Quentash this looks good to me, approving { + I guess u can change the position of the comment, although I don't have a very hard take on that. }

@bajpai244 bajpai244 self-requested a review November 22, 2023 14:50
bajpai244
bajpai244 previously approved these changes Nov 22, 2023
@bajpai244
Copy link
Collaborator

@enitrat I guess u can also do another review of this, and if it works for u then we can merge!

gg @Quentash :)

crates/utils/src/rlp.cairo Outdated Show resolved Hide resolved
crates/utils/src/rlp.cairo Outdated Show resolved Hide resolved
@Quentash
Copy link
Contributor Author

  • Unboyscouted the PayloadTooLong error
  • Finally removed the comment about empty string

Copy link
Collaborator

@enitrat enitrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work 💪

@enitrat enitrat added this pull request to the merge queue Nov 23, 2023
Merged via the queue into kkrt-labs:main with commit 876ccef Nov 23, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: RLP::decode() should return an empty string instead of "0" when decoding 0x80
4 participants