Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Uint8Array when applicable #42

Open
mschneider opened this issue May 30, 2022 · 8 comments
Open

Support Uint8Array when applicable #42

mschneider opened this issue May 30, 2022 · 8 comments

Comments

@mschneider
Copy link
Contributor

Hey @kklas , thank you for your help with the bytes type. I am currently working on a project, where I would like to use this for more data types, specifically:

  1. Vec<u8>
  2. fixed size arrays [u8; 32]

I had a look at the code already and was able to get it to work on a local branch: https://github.com/mschneider/anchor-client-gen/tree/max/uint8array2 but curious to hear your thoughts as this somewhat breaks with the current abstraction of the library on a few levels.

My ideal outcome would be:

  1. always use the right native array type
  2. always verify size when initializing from user input
  3. possibly pad when initializing from too short user input

Let me know what you think about these changes.

@kklas
Copy link
Owner

kklas commented May 30, 2022

Isn't Vec<u8> already encoded as Uint8Array after that PR?

In any case, technically for "bytes" fields it makes sense to be Uint8Array because:

  • there is a specific type in the IDL for it (I think only Vec<u8> gets parsed into this type)
  • the project-serum/borsh lib which this package (and also anchor's ts client) uses supports vecU8 layouts already (link), which returns Buffer when decoded

For others you need to rely on, for example borsh.vec(borsh.u16()) which will return Array<number> in this case.

Why handle Vec specifically as bytes in the IDL and decode into Buffer in borsh-ts and others as Array? IDK ask Armani.

Could this be done? Sure it could. Do we want this? IDK maybe, but I think probably not.

Uint32 is more difficult to work with than Array. Calling .values() on Uint64Array will return an array of bigint while we normally use BN for u64 so this will be a mess. Uint128Array doesn't even exists so what do you use for Vec<u128>?
For Vec<u8> it kinda makes sense to specially mark them as bytes in the IDL and decode them as Buffer/Uint8Array because you most often represent bytes like that especially when you want to encode some raw data in your account. But for others like Vec<u64> / [u64; n] you are most often storing just a series of numbers so it makes sense to me to just keep those as Array<BN>.

IDK TBH. You might as well ask why Vec gets a special encoding as "bytes" in the IDL and not as vec of u8? This might be a valid question for the anchor guys.

But what is your use case? What do you need "native array types" for that you cant do with just arrays?

As for

  1. always verify size when initializing from user input

This makes sense to do.

  1. possibly pad when initializing from too short user input

Isn't this already done automatically when encoding the layout with the serum borsh package?

@mschneider
Copy link
Contributor Author

mschneider commented May 31, 2022

Correct Vec is already encoded as bytes, hence Uint8Array. I'm currently looking at fixed length u8 arrays:

I'm mainly concerned with input validation for overflows, when assigning to Vec<u8> & [u8; 4]:

  1. [1, 2, 3]
  2. [1, 2, 3, 256]
  3. [1, 2, 3, 4, 5]

How are these currently handled?

@mschneider
Copy link
Contributor Author

mschneider commented May 31, 2022

First case, would return obscure errors like this one, which make it really hard to trace, which field actually caused the error:

    TypeError: Blob.encode[data] requires (length 32) Buffer as src
      at Blob.encode (node_modules/buffer-layout/lib/Layout.js:2321:13)
      at Structure.encode (node_modules/buffer-layout/lib/Layout.js:1263:26)
      at WrappedLayout.encode (node_modules/@project-serum/borsh/src/index.ts:106:24)
      at Structure.encode (node_modules/buffer-layout/lib/Layout.js:1263:26)
      at Structure.encode (node_modules/buffer-layout/lib/Layout.js:1263:26)

I'm concerned that the second case will trigger an equally obscure error message, while choosing a more explicit type for the generated structure / interfaces will make the issue more transparent to the programmer using the generated code as the type checker can identify which argument potentially causes an overflow.

@mschneider
Copy link
Contributor Author

The above error comes from passing an Uint8array for a blob(32) struct field, apparently we won't be able to support Uint8Array at all and always need to pass Buffer when interacting with borsh due to this check https://github.com/pabigot/buffer-layout/blob/main/lib/Layout.js#L2319

I'll send a new PR that first updates all usage of Uint8Array to Buffer, so we don't run into this issue anywhere

@kklas
Copy link
Owner

kklas commented May 31, 2022

apparently we won't be able to support Uint8Array at all and always need to pass Buffer when interacting with borsh

We're not sending Uint8Array anywhere to borsh. Buffer is sent.

@kklas
Copy link
Owner

kklas commented May 31, 2022

How are these currently handled?

They're handled as array of numbers both in IDL, borsh layout, and the generator.

I'm mainly concerned with input validation for overflows, when assigning to Vec & [u8; 4]

Javascript is not a typed language. Typescript helps a lot but it's not perfect, the typing is only present at compile time. Fixed sized arrays don't exist neither in javascript nor typescript, so not sure what we can expect there. The only thing we can maybe do is check for array length when constructing types or call instructions.

Keep in mind that the code generated by anchor-client-gen is a bit lower level-ish and you will probably want to create a lib that wraps around it and is more semantic and include your checks there. See #5 (comment)

@mschneider
Copy link
Contributor Author

Yeah, so the idl does preserve array length and type bounds, so it should be possible. I'll follow your advise and create a nice lib around the generated code to solve some of those issues on our end so that it is very integration friendly

@mschneider
Copy link
Contributor Author

Also dropped the constant array size binding to Buffer approach, turned out to cause some nasty issues when trying to use Anchor's fetch helpers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants