Textual representation of the wire format of a type (recursively) #143

ia0 · 2024-05-13T12:51:00Z

I would like to have a textual representation of the wire format of a type. I would use it as a generated file in my repository during reviews. The format could look like an annotated grammar of the wire format. Here is an example of the workflow I have in mind:

Let's assume a crate foo in a repository, that contains a few serializable types like Foo and Bar.
Next to that crate, I would have some foo.postcard file containing a textual representation of Foo in wire format (essentially the language recognized by deserialization, which is a bit more than the one produced by serialization because postcard is not canonical).
I have a CI test to make sure that file is in sync with the code.
During review, if that CI test is green, then I can review the postcard file to see how the wire format is changing, and if I consider it an acceptable change or not (and also how it impacts versioning).

The textual representation can simply be some annotated grammar of the language recognized by postcard for the type. For example:

enum Foo { A(FooA), B { b1: FooB1, b2: FooB2 } }
struct Bar { a: BarA, b: BarB }

Would become something like this:

Foo |=
| A=0x00 FooA
| B=0x01 b1=FooB1 b2=FooB2

Bar &=
& a=BarA
& b=BarB

Note how named things are prefixed with name=. Those annotations do not affect the wire format (the language recognized). Terminals are just bytes (0x00 to 0xff). Identifiers (Rust paths) are non-terminals unless they are a name. Also note the difference between "unions" using |= and "sequences" using &= (the symbol is repeated on each line to support empty unions and sequences). Ideally the file would recursively contain all definitions (here FooA, FooB1, etc).

In my case, I would consider the following change acceptable (it forgets a name, thus doesn't affect the wire format):

-enum Foo { A(FooA), B { b1: FooB1, b2: FooB2 } }
+enum Foo { A(FooA), B(FooB1, FooB2) }

Would result in the following diff:

Foo |=
| A=0x00 FooA
-| B=0x01 b1=FooB1 b2=FooB2
+| B=0x01 FooB1 FooB2

I would also consider the following diff acceptable (when deprecating a variant):

-enum Foo { A(FooA), B { b1: FooB1, b2: FooB2 } }
+enum Foo { _A(Infaillible), B { b1: FooB1, b2: FooB2 } }

Foo |=
-| A=0x00 FooA
+| _A=0x00 Infaillible
| B=0x01 b1=FooB1 b2=FooB2

Infaillible |=

I suspect the experimental "schema" feature could be useful, however I see at least 2 problems:

One that I'm trying to fix in Fix schema implementation for slice references #142.
The fact that schema is not exactly the wire format, but something a bit more high-level. It's possible to write the function as a user, but that would encode a bit of the postcard format logic in user code. I think it would be better for postcard to directly provide a wire-level description like this:

pub type WireSchema = Named<WireRule>; // name is the type name (could be required)
pub struct Named<T> {
    name: Option<&'static str>,
    object: T,
}
pub enum WireRule {
    Union(&'static [Named<WireSequence>]), // name is the variant name (could be required)
    Sequence(WireSequence),
}
pub type WireSequence = &'static [Named<WireToken>]; // name is the field name (optional)
pub enum WireToken {
    Varint(Varint),
    U8, I8, F32, F64, // notice that Char is missing (it's defined)
    Seq(WireSequence), // length-encoded sequence, pretty-printed as `n*(...)`
    Constant(u8),
    Schema(&'static str),
}

Note that maps are really just n*(k v) where k and v are schema names. And byte sequences are just n*u8. By definition of Seq, n is varint(usize).

The text was updated successfully, but these errors were encountered:

jamesmunns · 2024-05-13T13:26:05Z

Hey @ia0, I'm definitely open to a separate tool that consumes the Schema information and outputs some consistent grammar or other reviewable output. If you do this, please feel free to open a PR to the README to link to it, and I would be open to potentially upstreaming it in the future.

At the moment I believe you could print with Debug, or serialize the schema to some format, such as JSON, and you could check that with something like insta, but I am open to a more purpose built tool.

I think this would be a useful stepping stone towards also generating ser/de impls in languages other than Rust (and not with Serde) in the future potentially.

ia0 · 2024-05-13T15:20:50Z

I'll try to get something on my side first, since the wire format is stable (and I don't expect it to change on the parts that I'm using, I'm not using char so I'm not worried about #101). If I'm satisfied with what I have and believe it makes sense to be part of postcard, I'll update this issue.

jamesmunns · 2024-05-13T16:15:12Z

For what it's worth, I plan to release postcard 2.0 soon, BUT I plan to keep the 1.0 wire format, e.g. I won't address #101 in postcard 2.0, but rather tat the next breaking wire format.

I'm definitely interested in seeing what you build! Part of the intent of the Schema derive was to be able to do these kinds of things, I just hadn't gotten to it yet.

ia0 · 2024-05-14T17:48:13Z

I went with this wire representation and convert from SdmTy here. Here is what the output looks like:

% cd crates/protocol
% cargo run --example=schema --features=_schema
DeviceError=0: {} -> (space:u8 code:u16)
AppletRequest=1: (applet_id:{Default:(0:u32)} request:(n:usize u8^n)) -> ()
AppletResponse=2: {Default:(0:u32)} -> (response:{None:(0:u32) Some:(1:u32 (n:usize u8^n))})
PlatformReboot=3: () -> {}
AppletTunnel=4: (applet_id:{Default:(0:u32)} delimiter:(n:usize u8^n)) -> ()
PlatformInfo=5: () -> (serial:(n:usize u8^n) version:(n:usize u8^n))
PlatformVendor=6: (n:usize u8^n) -> (n:usize u8^n)

The general format is <variant>=<discriminant>: <request> -> <response> where only request and response are a pretty-print of the wire format. The pretty print uses parentheses for concatenation and curly braces for disjoint union (using a u32 discriminant). Names are optional and prefixed as <name>:. The special (n:usize <wire>^n) is a length-encoded array. Maybe I'll use [<wire>] instead.

ia0 · 2024-07-31T14:11:49Z

I'm definitely interested in seeing what you build!

Ok, I finally have something that I can show. I later learned about your postcard-rpc work (through the RustNL video on Youtube) so I'll try to use that as a comparison point since that's definitely something I would have tried if I knew about it. Here are the main points:

There is a Wire trait which is essentially a mix of serde::Serialize, serde::Deserialize, and postcard::Schema. See below for the differences.
The serde::Serializer and serde::Deserializer equivalent are an internal detail and not exposed (except for the derive macro). Users only need to care about &[u8] and Box<[u8]>.
The postcard::Schema is not a constant but a function, because constants prevent recursive types to have a schema. The function simply builds a graph of the schema where nodes are types and edges point to their sub-types.
The main reasons not to use postcard over serde or protocol buffers are described in the documentation and essentially due to the narrow use-case of host-device RPC communication. I describe the most important differences and assumptions below.
Variants support explicit tag which permits to have "holes". This is used to deprecate variants on device. This is related to Specify the discriminant for enum variants #109.
The Wire trait is only implemented for covariant types. This avoids issues like Why "The 'de lifetime should not appear in the type to which the Deserialize impl applies"? serde-rs/serde#2190 due to users confusing data types and views into serialized data.
This essentially permits to define an RPC API like this which generates something like this.
The RPC protocol doesn't support out-of-order like postcard-rpc or topics. Users have to implement those themselves with 2 RPC calls (AppletRequest and AppletResponse) to dissociate requests from responses (with a "pull" strategy for the response).
The schema permits to generate reviewable output like this and automated checks like this which essentially tests that the set of RPC calls is monotonic (only new calls can be added, except on device where calls can be removed) on each PR to make sure that a recent host can always communicate with an old device.
EDIT: Forgot to say, but the wire format should be compatible with postcard. In particular, Wire is not implemented for char which means that it might also be compatible with postcard v2.

Hope this helps!

jamesmunns · 2024-07-31T17:19:06Z

Neat, thanks! I'll take a look at this soon. Btw @ia0 if you want to have a call to chat about this some time, I'd be interested, my email is on my profile!

ia0 mentioned this issue May 13, 2024

derive(Schema) introduces incorrect bounds instead of doing perfect derive #153

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Textual representation of the wire format of a type (recursively) #143

Textual representation of the wire format of a type (recursively) #143

ia0 commented May 13, 2024

jamesmunns commented May 13, 2024

ia0 commented May 13, 2024

jamesmunns commented May 13, 2024

ia0 commented May 14, 2024

ia0 commented Jul 31, 2024 •

edited

Loading

jamesmunns commented Jul 31, 2024

Textual representation of the wire format of a type (recursively) #143

Textual representation of the wire format of a type (recursively) #143

Comments

ia0 commented May 13, 2024

jamesmunns commented May 13, 2024

ia0 commented May 13, 2024

jamesmunns commented May 13, 2024

ia0 commented May 14, 2024

ia0 commented Jul 31, 2024 • edited Loading

jamesmunns commented Jul 31, 2024

ia0 commented Jul 31, 2024 •

edited

Loading