-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stateless alternative to buffered's Reader class #112
Comments
I definitely agree with the general idea, but I'd prefer to see a signature like this: fun box peek_u32_be(seq: ReadSeq[U8] box, offset: USize): U32 ?
fun box peek_u32_le(seq: ReadSeq[U8] box, offset: USize): U32 ?
I don't think this is actually a common use case? I think pretty much everyone using a feature like this in Pony is using it to read either a network protocol or a file format, which is designed either as big-endian or little-endian as part of the protocol/format specification, regardless of the system architecture it's running on. So I'd be really surprised to see any code "in the wild" that doesn't make a static choice for big-endian vs little-endian. |
Totally agree with 1.
About
To give some context, this is a case where I would have needed it if I wanted to support multiple architecture for the same Pony app: I am doing a wrapper for SDL2. To deal with events (and emulate polymorphism in C) SDL2 returns a union of different structs. The first byte, give the event type and can be used to cast the union to the correct structure. What I did (by doing the same thing that the Rust wrapper) is generate a byte array big enough to fit all the structs of the union and gave it to SDL2 via FFI. Then, once SDL2 has filled it, I read the first byte to know the event type and then read the content using functions like To sump up, I think the following functions covers all cases while maintaining a simple API (except maybe for the functions' names) with good performance for the common cases. // The functions you probably want to use in 99% of cases.
fun box peek_u32_be(seq: ReadSeq[U8] box, offset: USize): U32 ?
fun box peek_u32_le(seq: ReadSeq[U8] box, offset: USize): U32 ?
// A one that just call one of the upper 2 depending of a parameter
type EndianMode is (BigEndian | LittleEndian)
fun box peek_u32(array: ReadSeq[U8] box, endianMode: EndianMode, offset: USize) : U32 ?
// And basically a one that retrieve the processor EndianMode and call the previous one
fun box peek_u32_processor_endian_mode(seq: ReadSeq[U8] box, offset: USize): U32 ? |
@codec-abc - Thanks for giving those concrete examples (TIFF and SDL2) to justify the motivation. You convinced me 😉. I like the idea you shared of providing both variants so that those who know the choice statically pay no extra perf cost, and get a more succinct syntax. 👍
Whoops, that was a bit of a typo. I should have said "I'd prefer to see it as a type parameter rather than a runtime parameter". Something like this: fun box peek_u32[E: EndianMode](array: Array[U8] box, offset: USize) : U32 ? However, I'm okay with the approach you shared above in your latest comment, so I don't think that's necessary. |
Since this issue is focused on a very small point it shouldn't be too hard to write a proper RFC (and even the implementation after that) which I will try to do in the near future. But before writing a proper one, I would like your opinion -and anyone interested too- on the following points:
|
It's unclear to me what the actual changes are. I use Reader in a lot of high performance code so I'm very interested in the actual changes to it and the possible performance ramifications. I'd rather see this does as a separate package/class that could be eventually brought together if we think that is appropriate. Ideally, I'd like to see these exist as a 3rd party library first. Evolve some there (as I suspect they are going to be prone to some API changes early on) and then moved over for inclusion in the standard library. |
The changes are to add the following stateless functions to the standard library fun peek_X_be(seq: ReadSeq[U8] box, offset: USize): X?
fun peek_X_le(seq: ReadSeq[U8] box, offset: USize): X?
fun peek_X[E: EndianMode](array: ReadSeq[U8] box, offset: USize): X?
fun peek_X_processor_endian_mode(seq: ReadSeq[U8] box, offset: USize): X? for any X in (U16 | I16 | U32 | I32 | U64 | I64 | U128 | I128 | F32 | F64 | String) and nothing more. Obviously, The Reader class could use it internally (as it provide more abstraction about reading data from a chunk of bytes) but it is not even necessary. |
Reader is stateful. If you are proposing adding stateless methods to it, while leaving the rest of Reader stateful, I think that's a confusing API. I think stateless methods such as these would make sense in a primitive of their own. |
I do agree. Also I think it makes sense to add the reverse ones at the same time, ie fun x_to_U8_be(x: X): ReadSeq[U8]
fun x_to_U8_le(x: X): ReadSeq[U8]
fun x_to_U8[E: EndianMode](x: X): ReadSeq[U8]
fun x_to_U8_processor_endian_mode(x: X): ReadSeq[U8] |
@SeanTAllen - As I understood it, the proposal was not to change the way That is, the idea is that you should be able to, for example, read a big-endian I32 from a particular offset in a |
The Reader class of the buffered package contains functions that are useful to read common basic type (such as F32, F64, U32, etc..) from a byte array. Unfortunately, those functions are not stateless and cannot be used in some wider context (my use case was a Array[U8 val] ref). It would be great to add stateless functions that Reader can be built on and Pony's users can reused too. They would have a signature similar to this one:
I suggest that it could be a great addition to add endian-less function that would pick automatically one of the two method depending on which architecture it is running to avoid code like this:
The text was updated successfully, but these errors were encountered: