-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the different variants of the Unix ar format #126
base: master
Are you sure you want to change the base?
Conversation
This is a workaround. Only one of the three instances is present at any time, so a type switch would be more appropriate. However, when an instance's type is switched, it is impossible to refer to the fields of any specific type case (since it is not statically known if that case is actually the one that was chosen). By using three separate instances, we can refer to the fields of any of the instances, but have to manually check that the instance in question is actually present.
Hi. Thanks for your contribution.
|
Thank you for the feedback 😃
Good idea, will do.
I assume you mean the space-padded number fields ( Actually on second thought a separate type for "space-padded number literal" might not be a bad idea, because I could move the string-to-integer conversion into the type as well.
Again I'm guessing you mean the metadata part (modification time, user ID, etc.)? I don't know what else could be moved into a shared type/file, because of the subtle differences between the variants.
Yeah, the commit history here is not very nice, I can clean it up a little once I've fixed the other issues. (I don't like squashing away history when I'm still working on the PR - it makes it hard for others to see what the previous comments/reviews were talking about.) If this repo has it enabled, GitHub also lets you squash when merging the PR. That way a squashed commit is merged into the repo, but the real history is kept in the PR.
Both variants are supported actually 😃 But you're right, perhaps the text shouldn't be copy-pasted into each variant. Maybe I should change the I wanted to try writing a |
Right. There is a proposal to fix it: kaitai-io/kaitai_struct#88
Thanks for the info.
Yes, I mean mostly it for now.
I don't understand the purpose of writing a separate spec for deb, if its binary format is identical to |
The |
kaitai-io/kaitai_struct#81 may be useful for it when it is implemented. |
meta: | ||
id: member_metadata | ||
title: Unix ar archive member metadata | ||
license: CC0-1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it import space_padded_number
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, it should. Why does this even compile? This seems like a bug...
This is clearer, because it makes it obvious that there is always exactly one active case. By using .as<...>, this can be used in all ar variants.
The deb format is based on the ar format.
Alright, I've tried, and I don't think the deb format can be specified in a useful way with Kaitai's current feature set (I would need at least some kind of simple |
The main advantage is that this makes the specs easier to use - most users only care about the parsed value, so it should be easily accessilbe without an extra .value each time. Those who need the unparsed text value can access it via the ..._raw fields. This way the four ar variants now also have the same interface. For example, a member's data size is now always accessed as member.size, where previously it was sometimes member.size (ar_bsd) and sometimes member.size.value (all other specs). This allows writing generic code that can work with any of the ar specs. Currently the compiler knows nothing about this common interface, which means that using the interface generically is only possible from dynamically-typed languages, such as Python. If something similar to kaitai-io/kaitai_struct#314 is implemented in the future, the interface could be properly specified in Kaitai Struct so that the compiler can understand, enforce and properly expose it.
Here is a very basic CLI tool to read ar archives using these specs. I mostly wrote it as a proof-of-concept to see if it's possible to write code that works generically with multiple specs. In this case it worked out very well - the differences between the format variants are almost completely abstracted away by the specs, and the application code only needs very few variant-specific branches. This sort of generic code is currently only possible with dynamically-typed languages like Python. Similar code in statically-typed languages wouldn't compile, because the classes generated by ksc for each spec don't implement a common base class/interface - this would require support from ksc, see kaitai-io/kaitai_struct#314. |
The ar format is not properly standardized, so there are a few different variants. Each one has its own spec - all variants use the same basic structure, but the details are different enough that they can't be merged into a single spec very well.
Some parts of the ar format(s) don't translate very well to Kaitai (for example I had to use some substream and instance trickery to parse the name fields properly). If there's a cleaner or more efficient way to implement something, please let me know. (These are also the first larger Kaitai specs I've written, so there might be useful features that I don't know about 😃)