Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the different variants of the Unix ar format #126

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

dgelessus
Copy link
Contributor

The ar format is not properly standardized, so there are a few different variants. Each one has its own spec - all variants use the same basic structure, but the details are different enough that they can't be merged into a single spec very well.

Some parts of the ar format(s) don't translate very well to Kaitai (for example I had to use some substream and instance trickery to parse the name fields properly). If there's a cleaner or more efficient way to implement something, please let me know. (These are also the first larger Kaitai specs I've written, so there might be useful features that I don't know about 😃)

This is a workaround. Only one of the three instances is present at
any time, so a type switch would be more appropriate. However, when an
instance's type is switched, it is impossible to refer to the fields
of any specific type case (since it is not statically known if that
case is actually the one that was chosen).

By using three separate instances, we can refer to the fields of any
of the instances, but have to manually check that the instance in
question is actually present.
@KOLANICH
Copy link
Contributor

KOLANICH commented Mar 8, 2019

Hi. Thanks for your contribution.

  1. I guess that the different variants should go into ar subdir.

  2. types different only by strings lengths can be parametrized using params.

  3. chunks the same for several files can be moved into separate files and imported.

  4. If only a part of type can be parameterized and reused, it can ge moved into an own type, cannot it?

  5. Please learn to use git commit --amend and git push --force and squash all the commits into a single one.

  6. The Debian package format (.deb) is also based on the ar format.

    Which one? Could you add deb to extension lists of the relevant formats and remove this note from irrelevant ones?

@dgelessus
Copy link
Contributor Author

Thank you for the feedback 😃

I guess that different variants should go into ar subdir.

Good idea, will do.

types different only by strings lengths can be parametrized using params.

I assume you mean the space-padded number fields (modified_timestamp_dec, user_id_dec, etc.)? I could make a separate type for that, but that would create an extra nesting level, right? (members[0].user_id_dec would become members[0].user_id_dec.text for example.) Or does Kaitai have something like typedefs?

Actually on second thought a separate type for "space-padded number literal" might not be a bad idea, because I could move the string-to-integer conversion into the type as well.

chunks the same for several files can be moved into separate files and imported.

If only a part of type can be parameterized and reused, it can ge moved into an own type, cannot it?

Again I'm guessing you mean the metadata part (modification time, user ID, etc.)? I don't know what else could be moved into a shared type/file, because of the subtle differences between the variants.

Please learn to use git commit --amend and git push --force and squash all the commits into a single one.

Yeah, the commit history here is not very nice, I can clean it up a little once I've fixed the other issues. (I don't like squashing away history when I'm still working on the PR - it makes it hard for others to see what the previous comments/reviews were talking about.)

If this repo has it enabled, GitHub also lets you squash when merging the PR. That way a squashed commit is merged into the repo, but the real history is kept in the PR.

The Debian package format (.deb) is also based on the ar format.

Which one? Could you add deb to extension lists of the relevant formats and remove this note from irrelevant ones?

Both variants are supported actually 😃 debs don't require any "long file name" extensions (all file names are 15 bytes or shorter and don't contain any spaces), and trailing slashes are optional in debs. (See the deb(5) man page.)

But you're right, perhaps the text shouldn't be copy-pasted into each variant. Maybe I should change the ar_sysv and ar_bsd descriptions to say "see the ar_common description for general info" and only explain what the variants do differently.

I wanted to try writing a deb spec as well - based on the man page, it looks doable. But if that doesn't work, I'll list the deb extension in the ar specs.

@KOLANICH
Copy link
Contributor

KOLANICH commented Mar 9, 2019

I could make a separate type for that, but that would create an extra nesting level, right?

Right. There is a proposal to fix it: kaitai-io/kaitai_struct#88

If this repo has it enabled, GitHub also lets you squash when merging the PR.

Thanks for the info.

Again I'm guessing you mean the metadata part (modification time, user ID, etc.)?

Yes, I mean mostly it for now.

I wanted to try writing a deb spec as well - based on the man page, it looks doable. But if that doesn't work, I'll list the deb extension in the ar specs.

I don't understand the purpose of writing a separate spec for deb, if its binary format is identical to ars.

@dgelessus
Copy link
Contributor Author

I don't understand the purpose of writing a separate spec for deb, if its binary format is idenntical to ars.

The deb format is basically a subset of ar format with very specific requirements on what archive members can/must exist, how they are named, and in which order they are stored. I want to see if those requirements can be specified in a useful way using Kaitai. If the deb spec ends up being too complicated or not useful, I'll leave it be, but I want to give it a try at least.

@KOLANICH
Copy link
Contributor

kaitai-io/kaitai_struct#81 may be useful for it when it is implemented.

meta:
id: member_metadata
title: Unix ar archive member metadata
license: CC0-1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it import space_padded_number too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it should. Why does this even compile? This seems like a bug...

This is clearer, because it makes it obvious that there is always
exactly one active case. By using .as<...>, this can be used in all
ar variants.
The deb format is based on the ar format.
@dgelessus
Copy link
Contributor Author

Alright, I've tried, and I don't think the deb format can be specified in a useful way with Kaitai's current feature set (I would need at least some kind of simple assert for a few of the constraints). I've added the deb and udeb extensions to the relevant ar specs.

The main advantage is that this makes the specs easier to use - most
users only care about the parsed value, so it should be easily
accessilbe without an extra .value each time. Those who need the
unparsed text value can access it via the ..._raw fields.

This way the four ar variants now also have the same interface.
For example, a member's data size is now always accessed as
member.size, where previously it was sometimes member.size (ar_bsd) and
sometimes member.size.value (all other specs). This allows writing
generic code that can work with any of the ar specs.

Currently the compiler knows nothing about this common interface, which
means that using the interface generically is only possible from
dynamically-typed languages, such as Python. If something similar to
kaitai-io/kaitai_struct#314 is implemented in the future, the interface
could be properly specified in Kaitai Struct so that the compiler can
understand, enforce and properly expose it.
@dgelessus
Copy link
Contributor Author

Here is a very basic CLI tool to read ar archives using these specs. I mostly wrote it as a proof-of-concept to see if it's possible to write code that works generically with multiple specs. In this case it worked out very well - the differences between the format variants are almost completely abstracted away by the specs, and the application code only needs very few variant-specific branches.

This sort of generic code is currently only possible with dynamically-typed languages like Python. Similar code in statically-typed languages wouldn't compile, because the classes generated by ksc for each spec don't implement a common base class/interface - this would require support from ksc, see kaitai-io/kaitai_struct#314.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants