Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use experimental :pack should be removed #442

Open
lizmat opened this issue Oct 4, 2024 · 24 comments
Open

use experimental :pack should be removed #442

lizmat opened this issue Oct 4, 2024 · 24 comments
Labels
language Changes to the Raku Programming Language

Comments

@lizmat
Copy link
Collaborator

lizmat commented Oct 4, 2024

The P5pack module covers most of the functionality needed.

And the use experimental :pack functionality has bugs: rakudo/rakudo#1875

@lizmat lizmat added the language Changes to the Raku Programming Language label Oct 4, 2024
@lizmat
Copy link
Collaborator Author

lizmat commented Oct 4, 2024

Case in point: use experimental :pack doesn't even export unpack, and nobody has noticed for at least 6 years?

% raku -e 'use experimental :pack; unpack(42)'
===SORRY!=== Error while compiling -e
Undeclared routine:
    unpack used at line 1. Did you mean 'pack'?

@jonathanstowe
Copy link

The unpack is provided by Buf.unpack . Never been quite sure about that asymmetry.

The Net::AMQP is quite a heavy user of pack/unpack so will need a rework but that isn't a showstopper with sufficient warning.

But IIRC pack was made experimental retrospectively so there may well be a lot of older modules using it which have just had the use experimental "pack" slapped on them to keep them working.

@coke
Copy link
Contributor

coke commented Oct 4, 2024

https://gist.github.com/Whateverable/cb0ceefdb6724aa0181b4b5ea19b00d0

@jonathanstowe
Copy link

https://gist.github.com/Whateverable/cb0ceefdb6724aa0181b4b5ea19b00d0

Yeah, "modules of a certain age" 😺

@lizmat
Copy link
Collaborator Author

lizmat commented Oct 5, 2024

The Net::AMQP is quite a heavy user of pack/unpack so will need a rework but that isn't a showstopper with sufficient warning.

I thought that P5pack was a drop-in replacement. I guess the Buf.unpack doesn't make it so. Guess I'll need to look at it again to make that work

@jonathanstowe
Copy link

FWIW some of the examples above ☝️ probably don't even need that. The Net::FTP, for example, has instances of:

$_.unpack("A*");

(and no other uses of the pack functionality)

Which is basically:

$_.decode('ascii')

Or something.

@jonathanstowe
Copy link

The Net::DNS has something on the lines of:

            $inc-size = $client.read(2);
            $inc-size = $inc-size.unpack('n');
            $incoming = $client.read($inc-size);

Which is basically :

            $inc-size = $client.read(2);
            $incoming = $client.read($inc-size.read-uint16);

@lizmat
Copy link
Collaborator Author

lizmat commented Oct 5, 2024

Which makes one wonder whether $client's class should have a read-uint16 (and friends) methods.

@lizmat
Copy link
Collaborator Author

lizmat commented Oct 6, 2024

P5pack 0.0.15 is now drop-in compatible with use experimental :rakuast.

As an additional comment: P5pack should probably get a complete makeover building code with RakuAST, similar to printf.

@lizmat lizmat changed the title Experimental :pack should be removed use experimental :pack should be removed Oct 7, 2024
@Leont
Copy link

Leont commented Oct 12, 2024

In my experience (implementing binary protocols), this general problem area is not well solved right now. read-uint16 and friends are clunky, but pack and friends are too Perlish and not nearly Rakuish. It's kind of begging for a more comprehensive module (I have some ideas, but they're not developed enough to start writing such a module).

@lizmat
Copy link
Collaborator Author

lizmat commented Oct 12, 2024

@Leont Could P5pack serve as an "nqp" in your opinion for such a module?

@raiph
Copy link

raiph commented Oct 12, 2024

@Leont

(implementing binary protocols) ...

What did/do you make of @alabamenhu's Binex?

(Iirc they paused their work on it until RakuAST had sufficiently matured and Rakoons were sufficiently interested in there being a better solution in this space. Or perhaps they came up with another solution; if so I guess it would be good to hear from them what that was.)

@Leont
Copy link

Leont commented Oct 12, 2024

@Leont Could P5pack serve as an "nqp" in your opinion for such a module?

No, I think that's wiring things exactly the wrong way around. Though I do think RakuAST is the right way to go about these things.

@Leont
Copy link

Leont commented Oct 12, 2024

What did/do you make of @alabamenhu's Binex?

I doesn't look like it's a good solution for my problem. Binary formats don't generally involve things like backtracking, I don't think regexes are quite the paradigm.

What I really want that pack doesn't offer includes:

  • Mapping high level types to low level ones. Often I don't want to deal with primitives, I want to deal with higher level types (enums in particular are common).
  • Support optional/dynamic fields ("if bit X is set, we expect a integer here, otherwise we don't")
  • Thorough support for bitfields (including items taking up multiple bits).
  • More sensible encoding support

Ideally it would also be usable with streams instead of bufs, but that may be a little too ambitious for now.

@lizmat
Copy link
Collaborator Author

lizmat commented Oct 13, 2024

@Leont feels to me you're advocating a superset of the functionality pack/unpack provide, with a more object oriented syntax for specifying the format?

@lizmat
Copy link
Collaborator Author

lizmat commented Oct 13, 2024

usable with streams instead of bufs

Are you thinking along the lines of: a format eating N bytes from a stream, then N again, and exposing that as a Supply ?

@niner
Copy link

niner commented Oct 13, 2024

Personally, I have always felt pack/unpack to be a bit of an anachronism in a language with (native) types. When I needed something to inspect MoarVM's bytecode files I therefore just wrote this little ByteReader that inspects struct like types to figure out what to read: https://gist.github.com/niner/63a718023aba72e0dffc39c1ccd84e32

It's very little code that lets me declare structs as simple classes with natively typed members and let's me read those structs via e.g. $reader.read-struct(Header);. The two major features this is missing for a generically useful binary format parser are support for variable length fields and support for type/kind fields, i.e. data that tells you what exact struct to expect. I guess inheritance and default values or the like would easily take care of the latter.

@lizmat
Copy link
Collaborator Author

lizmat commented Oct 13, 2024

Also MoarVM::Bytecode

@Leont
Copy link

Leont commented Oct 13, 2024

@Leont feels to me you're advocating a superset of the functionality pack/unpack provide, with a more object oriented syntax for specifying the format?

Yes, possibly something of a high level AST even (I mean, the problem involves both loops,conditionals and state). Something like pack could easily be implemented on top of that for people who want a simple interface for simple problems.

Are you thinking along the lines of: a format eating N bytes from a stream, then N again, and exposing that as a Supply ?

Yeah something like that, except it wouldn't quite be a fixed number of bytes.

@lizmat
Copy link
Collaborator Author

lizmat commented Oct 13, 2024

a format eating N bytes from a stream

A format eating the bytes from a stream that it needs, and again, exposing that as a Supply

:-)

@jonathanstowe
Copy link

I don't think regexes are quite the paradigm

Yep. This was part of the conversation when pack was made experimental in the first place and it always seemed like it was a "when all you have to hand is a hammer, everything looks a nail" sort of thing. Or more accurately a Swiss Army Knife where you have a bunch of tools which plausibly could cover all the bases but in practice turn out to be useful only if you're prepared to use them inappropriately.

Part of the problem is that a lot of the binary data that is in the wild is either in a format that seemed natural in the language that it was originally implemented in (probably extinct,) or seemed natural for the application and was specified in a way that could be easily implemented in any relatively low level language available at the time without the kind of abstractions we are used to. The former includes stuff like COBOL PIC(X) formatted data or the "dump a C struct" like utmp which is quite easy to deal with by pack or the read_foo methods, the latter maybe not so much.

@Leont
Copy link

Leont commented Oct 13, 2024

or the "dump a C struct" like utmp which is quite easy to deal with by pack or the read_foo methods

Yeah, pack is generally pretty good at that iff you know the exact format. IME binary protocols tend to be more complicated. Conditionals are common (e.g. mqtt or diameter), and tend to have a bunch of special cases (e.g. in Postgres a length marker of -1 means the equivalent of an undefined value).

@jonathanstowe
Copy link

a more object oriented syntax for specifying the format

In the simple (static,) case this could work well with, say, some role to allow for construction from a Blob and some traits to specify the translation to Raku typed attributes. But in the dynamic case where the structure of subsequent data is dependent on earlier values then we're going into DSL territory rather than something that can be simply declarative.

A relatively simple example of the more complex case might be a dBase .dbf file where the header part might specify the number of fields, followed by that number of fixed length field descriptions, then followed by the data of the actual rows who's length and structure is determined by the previous field descriptions.

@alabamenhu
Copy link

What did/do you make of @alabamenhu's Binex?

I doesn't look like it's a good solution for my problem. Binary formats don't generally involve things like backtracking, I don't think regexes are quite the paradigm.

What I really want that pack doesn't offer includes:

* Mapping high level types to low level ones. Often I don't want to deal with primitives, I want to deal with higher level types (enums in particular are common).

* Support optional/dynamic fields ("if bit X is set, we expect a integer here, otherwise we don't")

* Thorough support for bitfields (including items taking up multiple bits).

* More sensible encoding support

Ideally it would also be usable with streams instead of bufs, but that may be a little too ambitious for now.

So when I'm done, you should be able to do most of this stuff. I didn't really imagine Binex with backtracking in mind -- when I use grammars, I'm almost always using tokens as most common formats are generally designed carefully to avoid needing any kind of backtracking. But I did imagine being able to do basically everything you've pointed up there except for encoding (which I figured would just pass a blob to a decode during the action phase). As raiph pointed out, though, I did hold off on developing it further. I think RakuAST is sufficiently mature I could get back to it, though.

But it doesn't really create an equivalent pack equivalent, so it probably wouldn't fit your use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language Changes to the Raku Programming Language
Projects
None yet
Development

No branches or pull requests

7 participants