-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust support #22
Comments
Wow, that's cool :) May be you should post it to some Rust-related mailing lists / communities, to attract more peer review? I don't know a thing about Rust myself ;) |
After working with the language a bit, I've come to realise how difficult of a task this would be. Rust is certainly an interesting language to work with, but it comes with a huge list of new challenges that don't exist in other languages. I don't see myself being able to overcome them, so I'm going to quit. There's a whole stack of architecture questions that are easy to answer in other languages, but actually very complex in Rust. The design of code generated by KS follows the easy patterns because that's how pretty much every other language works. But Rust is unique in many areas. Rust has what they call a foreign function interface which lets you interact with C libraries, perhaps that's a more reasonable approach for Rust. But for now, I'm going to close this because I'm not able to make any more progress. (I enjoyed experimenting with Rust: it sort of feels like a modern approach to C. The memory safety syntax is very complex for beginners, but the rest of language is quite simple and expressive. The tools are a pleasure to use as well. Would definitely recommend anyone to spend a weekend with it if they've ever been curious about it.) |
Maybe someone would return to Rust implementation in future, so I thought it might be interesting to document your findings or at least map out possible obstacles on the path. What does the language lack that impedes KS implementation? Something about the type system? Memory management? Strict compile-time checks? Lack of "everything is an expression"? Something else? |
The main pain point I discovered is around the references to If instance methods required these three pointers to be passed as arguments, then it might be more possible. For example: // hard to do
fn get_calculated_instance(self) { ... }
// probably easier, but consumer must pass root/parent manually
fn get_calculated_instance(self, io: Stream, parent: Struct, root: Struct) { ... } From what it seems, Rust is best approached by creating an architecture that specifically fits Rust's "memory safety first" style. Trying to modify an existing architecture so that it works in Rust is very difficult. |
That's kind of strange. I guess that Rust has something that resembles an object (i.e. bundle of data + methods) and I don't see any difference in storing some "useful" data, i.e. attributes / instances vs storing a reference (pointer?) to But that said, I know almost nothing about Rust, so I can only make theories. |
I'm reorganizing all "new language support" issues, adding relevant tag. I'll reopen this one, as a few people asked for Rust support again, and it's better to keep it open, so it would be easier to find and people won't create duplicates again and again. |
Yes, please, very much need Rust support from Kaitai! Happy to help with discussions and possible implementation if needed. |
@berkus You probably know of http://doc.kaitai.io/new_language.html — let's just follow. I'd be happy to launch a Rust-targeting branch as soon as we'll get basic |
No, this is new to me - I just started actually using Kaitai today, loving it so far. Will take a look! |
I'm going to start tentatively taking a second look at this one because the second edition of the Rust book has a dedicated chapter on reference cycles which might help me solve the roadblocks I was hitting last year. I still have no experience with actual proper Rust code so if anybody has more experience with the language, I'd love some assistance! Question for @GreyCat: I've manually assembled the code for a For the time being, it'd be good to get some feedback so here's what I've got so far for hello_world: extern crate kaitai_struct;
use kaitai_struct::{Stream, Struct};
use std::io::{Cursor, Read, Seek};
use std::fs::File;
struct HelloWorld {
one: u8,
}
impl Struct for HelloWorld {
// Rust requires all fields to be initialised so we auto-generate an empty ctor
fn empty() -> HelloWorld {
HelloWorld { one: 0 }
}
// Passes any io::Error through to the caller using the ? operator
fn from_file(path: &str) -> std::result::Result<HelloWorld, std::io::Error> {
let mut buf = File::open(path)?;
Ok(HelloWorld::new(
&mut buf,
&None::<Box<Struct>>,
&None::<Box<Struct>>,
))
}
// Constructor from a stream
// Box<> References to parent and root are temporary and the types will change.
fn new<S: Stream>(
stream: &mut S,
parent: &Option<Box<Struct>>,
root: &Option<Box<Struct>>,
) -> HelloWorld {
let mut this = HelloWorld::empty();
this.read(stream, &parent, &root);
this
}
// Read takes a reference to stream/parent/root, this may change later.
fn read<S: Stream>(
&mut self,
stream: &mut S,
parent: &Option<Box<Struct>>,
root: &Option<Box<Struct>>,
) {
self.one = stream.read_u1().unwrap();
}
}
#[cfg(test)]
mod tests {
use kaitai_struct::Struct;
use HelloWorld;
#[test]
fn test_hello_world() {
let hw = HelloWorld::from_file("./src/fixed_struct.bin")
.expect("File not found!");
assert_eq!(hw.one, 0x50);
}
} I'm going to start working out the |
Looks really cool :) As for code location, I'd just leave it here at the moment. It's not like putting it somewhere into the tests repo would help anything — we need test specs, we need the parser code to be generated, etc, so the hand-written generation result doesn't really help much in terms of CI or stuff like that. May be we could ask some Rust online communities for some input / suggestions on this one? |
Neat, you wouldn't by any chance know a good way to represent the parent/root references in Rust? I have looked at it a few times and it just seems to create a painful chain of types like |
It's pretty painful because of borrow checker. Usually peeps do it via a Vec and integer indices pointing to parent or children from a T. Probably some rust tree package like ego_tree or idtree may give you help? Otherwise I believe |
Since I don't know the status of this, I decided to start an alternate implementation. Still very much a WIP, but hello world now successfully compiles to this: // This is a generated file! Please edit source .ksy file and use kaitai-struct-compiler to rebuild
use std::{
option::Option,
boxed::Box,
io::Result
};
use kaitai_struct::{
KaitaiStream,
KaitaiStruct
};
pub struct HelloWorld {
pub one: u8,
}
impl KaitaiStruct for HelloWorld {
fn new<S: KaitaiStream>(stream: &mut S,
_parent: &Option<Box<KaitaiStruct>>,
_root: &Option<Box<KaitaiStruct>>)
-> Result<Self>
where Self: Sized {
let mut s = Self {
one: 0,
};
s.read(stream, _parent, _root)?;
Ok(s)
}
fn read<S: KaitaiStream>(&mut self,
stream: &mut S,
_parent: &Option<Box<KaitaiStruct>>,
_root: &Option<Box<KaitaiStruct>>)
-> Result<()>
where Self: Sized {
self.one = stream.read_u1()?;
Ok(())
}
} I've tested this and confirmed it all working. My next step is to port some of the test suite over, writing test harnesses for Rust. From there, I can get them all passing, one by one. |
@CWood1 Thanks, looks really cool! Note that:
|
Oh wow, that all looks much easier than I'd anticipated! Is there any documentation on generating and running the tests for a given language automatically? Tried to figure it out from travis.yml, but that does strange things with packages which won't work on my Arch installation. Thanks for the help, I'll take a look at getting the test translator ported next. |
Unfortunately, not now, as this whole thing was literally implemented a month or so ago. Travis yaml won't help much here, as specs are to autogenerated on the fly during normal build process. To run test translator, you'd generally want to:
|
So, I've got the translator outputting tests now. Few compilation issues here and there due to incorrect syntax, nothing too extreme. Next question: how exactly does one run the compiler against the test suite, then the tests against the compiled specs? Once I get these compiler errors resolved, that's my next goal is to have 50something failing tests I can then make pass. |
@CWood1 Could you show me what exactly have you achieved so far? |
Okay, where we're at (I'll be pushing all this shortly, soon as I've got a couple compiler errors ironed out), I've got a rust library project under Therefore, open questions:
Doing the above manually is relatively easy, though tedious. Are there ways by which I can script them, under existing frameworks? Or would that need to be developed from scratch? |
There is a huge mess with terminology we have here ;) If by "compiler", you mean ksc (i.e. app that translates ksy files into parsing source code in target language), then normally, it is run with build-formats script, it runs the compiler on all formats for all target languages, and outputs results into If by "compiler", you mean kst (i.e. app that translates kst files into test specs in target languages), then it is right now runnable by invoking I suspect there's slight misunderstanding on what needs to be done. Actually we have 3 "places" to update:
That's why I asked you to show what you've got, so I could understand which of these 3 things you were working on :)
No equivalent of these exists so far in any other languages. C++ might benefit from similar automation, but so far we've just added all tests manually to CMake project. This is actually a somewhat sloppy moment, but for compiled languages you don't usually have an option to go with "build everything that would build, and run the tests with everything's that we've built". Normally (for C#, C++, Java, Go), a single broken file ruins the build, so we employ various tricks to reduce that risk. Having a manually maintained list of tests that "should run ok" is a first step to mitigate that problem.
ci-all script invokes all |
Architecture question: current implementations don't seem to handle "we ran out of data" well. Is it just up to the user to reset the stream position in these scenarios? |
Fwiw the most elegant solution that immediately comes to mind would be to
reset the stream position ourselves, then error with a predefined
"insufficient data" error. There are cases - network streams for instance -
in which the right response is wait for data and try again, and other
instances - files, for example - where an error should be returned from the
tool using whatever grammar. Wdyt?
I've seen your PR, and while I haven't had time to look through it in
depth, I like the way you handle lifetimes at first glance. I'm not
entirely sure how you handle streams yet, that bit isn't clear to me
without closer inspection which I haven't been able to do.
I'm not entirely sure I agree with having the stream of the parsed struct
be tied to the lifetime of the stream though. Not unless we make `_root`
the owner of the stream. Otherwise the user will be required to keep dead
streams around for no reason. Say, loading in a file, parsing, but then
needing to keep it around for the lifetime of the entire tool running.
Given kaitai is run once to parse and there's your structure, I'm not sure
I agree with tying the lifetime of the struct to the lifetime of the
stream. Wdyt?
…On Wed, 24 Apr 2019, 20:15 bspeice, ***@***.***> wrote:
Architecture question: current implementations don't seem to handle "we
ran out of data" well. Is it just up to the user to reset the stream
position in these scenarios?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAMY5CSPIVO6Y7Z6XERBJKTPSCWUPANCNFSM4CMVPRSQ>
.
|
On stream recovery: had a quick conversation with @GreyCat on Gitter, and that was where I came to; it's probably up to the user to save the stream position (using And I spoke incorrectly earlier on lifetimes, the borrow checker thoroughly convinced me of the error of my ways. What I should have said was the lifetime of the struct must be bounded by the lifetime of the stream (that is, the stream must live at least as long as the struct). I'm against |
Point taken on one stream being used for multiple structs. I'm still not
entirely sold on having the struct lifetime bounded by the stream lifetime
though. If, say, a struct gets returned across a library interface, we now
need another return value, which is just "you don't need to touch this,
just save it somewhere", which seems a touch inelegant and abstraction
leaky. It may not be possible to avoid it, I'm not sure without digging
into your design more, but I'd rather avoid it if possible for elegance and
better abstractions, if that makes sense. Thoughts?
…On Wed, 24 Apr 2019, 20:51 bspeice, ***@***.***> wrote:
On stream recovery: had a quick conversation with @GreyCat
<https://github.com/GreyCat> on Gitter, and that was where I came to;
it's probably up to the user to save the stream position (using _io.pos
or similar) before parsing, and then seeking back to that if desired. Think
we're both agreed.
And I spoke incorrectly earlier on lifetimes, the borrow checker
thoroughly convinced me of the error of my ways. What I should have said
was the lifetime of the struct must be bounded by the lifetime of the
stream (that is, the stream must live at least as long as the struct). I'm
against _root owning the stream, as very often the same stream will be
used to parse multiple root structures (network streams).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAMY5CRIRDLN673WQSINYGDPSC223ANCNFSM4CMVPRSQ>
.
|
Yeah, returning across lib boundaries is going to be painful because the structs I've been creating don't own their data (unlike other Kaitai implementations). And now that I think about it, My personal use case is interested in having the zero-copy style, but maybe we make that a compile option to choose between owning and non-owning codegen? EDIT: It should be noted that structs owning their data would likely still have issues with lib boundaries because of |
A compile option seems like quite an elegant way to do this actually. I
take your point about zero copy too, so tradeoffs. But, if we can support
both, why not. Would that be baked into kaitai itself do you think, or as a
feature flag in the rust output?
On a related note, I have a vested interest in making the output support
nostd, as one of my plans for this would be in my kernel. Happy to talk
about this in more depth at a later date on im, and we can go over design,
but this will probably influence the design a little. For the most part it
will mostly be a feature switched reimport of std into core. Think nom does
the same?
…On Wed, 24 Apr 2019, 21:10 bspeice, ***@***.***> wrote:
Yeah, returning across lib boundaries is going to be painful because the
structs I've been creating don't own their data (unlike other Kaitai
implementations). And now that I think about it, #[derive(Clone)] won't
work because they'd just copy the _parent and _root references.
My personal use case is interested in having the zero-copy style, but
maybe we make that a compile option to choose between owning and non-owning
codegen?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAMY5CVT5HKEOYYNPERNBQLPSC5DLANCNFSM4CMVPRSQ>
.
|
On compile options, baked into the Kaitai compiler, it basically switches structs from using For Finally, having learned my lesson on the borrow checker coming out with some nasty surprises, I've got a pretty basic test case available here that doesn't yet compile; if we can get this working (currently has issues with proving validity of |
Re `no_std`, heap access is still possible with `alloc` crate. It's just a
little more limited than full `std`, but I honestly don't think it'll be a
problem. Might be necessary to have a slightly limited feature set to
support `no_std`, but that'll be it.
Re your test case, I'll pitch in again on Saturday if it still doesn't
compile. I can't get to my workstation until then I'm afraid, but I'm happy
to pitch in from there. I'm getting more and more sold on your approach to
solving the circular reference issue, I quite like the reference approach.
If there's a way I can sync my work into that, I will.
Would it be worth considering as well initialising member structs from the
parent read, rather than pushing that responsibility onto the caller?
Otherwise it's much more boilerplate, which I'm keen to avoid from an API
design standpoint. In my ideal world, it should be possible to say
Struct::read(std::io::Cursor), or similar, and that initialises the whole
Struct.
…On Wed, 24 Apr 2019, 22:40 bspeice, ***@***.***> wrote:
On compile options, baked into the Kaitai compiler, it basically switches
structs from using &[u8] in the non-owning case to Vec<u8> in the owning
case.
For nom's no_std support - sure looks like it
<https://github.com/Geal/nom/blob/5764a02d20bed7a8b44d0d014eb20806f8589117/src/lib.rs#L372-L399>.
no_std would be difficult (both process_* functions in the runtime and
conditional repeats in read need heap allocation), but we can
feature-gate those and let users deal with the compilation issues.
Finally, having learned my lesson on the borrow checker coming out with
some nasty surprises, I've got a pretty basic test case available here
<https://github.com/bspeice/kaitai_rust/blob/master/tests/lifetime_validation.rs>
that doesn't yet compile; if we can get this working (currently has issues
with proving validity of _parent and _root) then I think we're in the
clear. It may require some API changes (giving _root and _parent
references to read instead of storing in the struct seems like a good
idea), but still making progress.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAMY5CSKVYH3VFYIM4VJRP3PSDHTHANCNFSM4CMVPRSQ>
.
|
The more I work with it, the more I'm convinced we have to initialize members in the parent read. Because self-referencing is such an issue (to the point where I'm not sure it's possible) I think the current class needs to own its children, and the Should have some interesting stuff drawn up tomorrow. |
Alright, new issues. Because |
Would you be able to give an example of that? I'm not quite sure what the
conflict is right now.
…On Thu, 25 Apr 2019, 18:43 bspeice, ***@***.***> wrote:
Alright, new issues. Because instance definitions can use _parent and
_root, and we compute instances lazily, we've got conflicts with
lifetimes. It's not a pretty API, but my plan for now is to always require
_root and _parent being passed into the instance retrieval method, but
we'll need to figure out a better API for the future.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAMY5CV7BA3BWAIQ3A2KYZDPSHUT7ANCNFSM4CMVPRSQ>
.
|
The example I posted yesterday is good enough to demonstrate the issue. There's two things that are problematic:
The example fn read<'s: 'a, S: KStream>(&mut self, stream: &'s mut S) -> Result<(), KError<'s>> {
self.bytes = stream.read_bytes(1)?;
let mut child = TestChildStruct::new(Some(self), Some(self.root()));
child.read(stream)?;
self.child = Some(&child);
Ok(())
} The first issue has to do with error[E0499]: cannot borrow `*stream` as mutable more than once at a time
--> tests/lifetime_validation.rs:38:20
|
34 | fn read<'s: 'a, S: KStream>(&'a mut self, stream: &'s mut S) -> Result<(), KError<'s>> {
| -- lifetime `'s` defined here
35 | self.bytes = stream.read_bytes(1)?;
| ---------------------
| |
| first mutable borrow occurs here
| returning this value requires that `*stream` is borrowed for `'s`
...
38 | child.read(stream)?;
| ^^^^^^ second mutable borrow occurs here The second definitely makes sense as being illegal. When we create a new child struct, we borrow error[E0506]: cannot assign to `self.child` because it is borrowed
--> tests/lifetime_validation.rs:39:9
|
20 | impl<'a> KStruct<'a> for TestRootStruct<'a> {
| -- lifetime `'a` defined here
...
37 | let mut child = TestChildStruct::new(Some(self), self.root);
| -------------------------------------------
| | |
| | borrow of `self.child` occurs here
| argument requires that `*self` is borrowed for `'a`
38 | child.read(stream)?;
39 | self.child = Some(child);
| ^^^^^^^^^^^^^^^^^^^^^^^^ assignment to borrowed `self.child` occurs here There's a total of seven errors that show up in the mini example I gave. I'm simultaneously glad because that example is catching a lot of issues early, but they're still problematic. |
OK, lots of updates and some good news. I have a definition that passes the borrow checker, works for all the simple structures I can think of, and lets us scale into the future. I'm documenting how I went through the process in case anyone else is crazy enough to try re-building this with a different structure in the future. First, a super-simple sketch of what we're ultimately after. We need to read things in, have nested structures (and thus deal with ownership) while maintain zero-copy access to the underlying stream. From there, we try to add a So, we need to make the streams immutable. Thankfully, Rust has a trick for dealing with this - the Finally, we add in the definition of The result is the API available here: kaitai_rust. It includes a couple things not available in the Playground (a definition of Now, there are a couple of issues that need to be addressed before it's considered
Those concerns aside, it's not unreasonable to think that there's a Rust implementation on the horizon. |
Big news! A simple Websockets-based example is now running, so at the very least, a PoC compiler is ready. Following up from the last post, Now, it's time for a couple last cleanup things. First, the code isn't something I'm super proud of, so a refactoring pass is in order. Second, we don't have a good way of handling calculated endianness, similar to Go. Finally, I have no idea what else will break as soon as we start trying other types, the Websocket message is a reasonably simple format. Still, progress is progress. |
Definitely. Now that I have a design that I know will work well, I think getting CI on it makes a ton of sense. For merge consensus, I'll delegate to you guys - is it worth looking to integrate the compiler/runtime I've been working on and then fix things in CI as we go? Or would we rather have testing done beforehand? |
@GreyCat followed up via chat; |
Why hasn't this been implemented yet? Seeing rust absent from the gallery of other languages makes me not exactly inclined to use this for anything. |
Describe maven and gradle plugins
In case anyone seeks to improve the Rust support or its documentation, funding is available: |
That's interesting. I know a contract developer who's done significant work
on this. How much funding is available and what's the process? Or, maybe
better, who should he contact to discuss further?
…On Sun, May 5, 2024 at 9:13 PM Daniel Maslowski ***@***.***> wrote:
In case anyone seeks to improve the Rust support or its documentation,
funding is available:
https://nlnet.nl/project/Kaitai-Rust/
—
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABKVVIAAXVYRXLGOPQTCNLZA3KM5AVCNFSM4CMVPRS2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBZGUYDIMRZGY3A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The most recent efforts to add Rust were from @Agile86 and @revitalyr in these pull requests:
The grant from NLnet has multiple parts, one of them is to follow up on these efforts with the goal of integrating them to Kaitai Struct. This includes adding missing pieces of test infrastructure (partial builder), reviewing the code, aligning it with KS standards and conventions, fixing issues etc. |
Yes, @Agile86 and @revitalyr are the developers I'm thinking about. :-D I'm sure they would appreciate the funding opportunity to complete their work on Rust support, and the reasons we've pursued it over the years are largely the same as stated by NGIO Entrust. We're using Kaitai for building digital forensics parsers. |
Adding this so it can be tracked, but I plan on doing most of this myself. I've not yet had a need for Rust but I've been wanting to try it out anyway, figured this is a good excuse.
I've created a runtime for Rust: https://github.com/kaitai-io/kaitai_struct_rust_runtime
I'll be adding matching tests and a compiler for Rust in the future.
Note that I don't know Rust at all so I'm learning as I go. If anybody has coded with it before I would very much appreciate advice, suggestions, or even a code review of the runtime!
The text was updated successfully, but these errors were encountered: