Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix WAV and add RIFF #229

Merged
merged 14 commits into from
May 30, 2020
Merged

Fix WAV and add RIFF #229

merged 14 commits into from
May 30, 2020

Conversation

generalmimon
Copy link
Member

Wave sample files: http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/Samples.html
WAV with bext chunk: http://www.noiseofnorway.net/wp-content/uploads/2011/11/170139-Printer-Scanner-close-scanner-lid-Schoeps.wav

RIFF description can be tested with any of its subtypes: WAV, AVI, RMID, DLS, SF2, ...

common/riff.ksy Outdated Show resolved Hide resolved
@KOLANICH
Copy link
Contributor

KOLANICH commented Sep 22, 2019

Thank you for the contribution. It is a long-lasting tech debt to unify all riff-based formats under a riff.ksy spec, but it requires multiple features lacking from KSC currently: kaitai-io/kaitai_struct#81, kaitai-io/kaitai_struct#458, and maybe kaitai-io/kaitai_struct#135 and kaitai-io/kaitai_struct#196.

You may also want to add https://www.johnloomis.org/cpe102/asgn/asgn1/riff.html into doc-ref.

and Fixes: #230 into commit message. And please, don't use master branch for pull requests. Create an own branch for each PR.

common/riff.ksy Outdated
- id: form_type
type: str
size: 4
pad-right: 0x20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that we need this pad-right. pad-right is the contract that all the strings have padding stripped here. But the strings are 4 bytes, 4 bytes is 1 dword, one dword is a unit for reading memory on 32-bit archs and is faster than reading 3 bytes. The strings are fixed length. There is no chance that there will be arbitrary long padding. So IMHO pad-right here doesn't make any sense and would have prevened KSC from optimizing the code if such optimizations ever were introduced.

There are two types of RIFF structs: with asserts (chunk and
parent_chunk_data) and without asserts (chunk_generic and
parent_chunk_data_generic). Those without asserts won't be
needed when kaitai-io/kaitai_struct#435 (comment) is resolved.
@generalmimon
Copy link
Member Author

@KOLANICH
Could you please take a look at the new approach I take? I've already explained it on #230 (comment), just split generic RIFF types to those with assertions and without assertions (sometimes you want to force chunk to be of specific chunk ID, sometimes not).

I think it's now quite readable and less error-prone. Silly mistakes like this one (the marked line also causes kaitai-io/kaitai_struct#604) should no longer happen if all RIFF-based formats use the common classes as containers for specific data. I would continue with the other RIFF formats if nobody objects.

The riff.ksy format can be used for exploring chunk structure of any RIFF-compliant binary file. It is generic, it only implements INFO list chunk for which RIFF specification says the following: "The ‘INFO’ list is a registered global chunk that can be used within any RIFF file." I don't expect that someone would use it in production, but it is quite handy for development.

@KOLANICH
Copy link
Contributor

KOLANICH commented Nov 26, 2019

I wonder if we really need to assert chunk ids instead of just switching on them everywhere? I mean just read a chunk id and interpret the blob based on the read chunk id only, without any checks wheter this chunk type is allowed to be here.

@generalmimon
Copy link
Member Author

You are right, thank you the suggestion! I haven't realized that the assertions are not vital. Is it now OK?

@generalmimon
Copy link
Member Author

@KOLANICH, can this PR be already merged?

@KOLANICH
Copy link
Contributor

KOLANICH commented Dec 22, 2019

I'm sorry, but I am not the one who merges PRs into this repo. @GreyCat is the one who decides and merges.

@generalmimon
Copy link
Member Author

Actually, I can merge it myself, because @GreyCat trusts me that I won't do silly things 😏, but I hesitate to do so when I'm not 100% sure that it is OK. I wonder why you don't want to enter the Kaitai team, you belong there more than everyone else. I understand that you might not want to feel the pressure to somehow deserve the membership (e.g. at least one contribution per week 😂), but I think everyone understands that you're busy person and you can participate only once in a while.

@KOLANICH
Copy link
Contributor

KOLANICH commented Dec 22, 2019

I wonder why you don't want to enter the Kaitai team, you belong there more than everyone else.

  • I cannot be sure in cybersecurity of my devices. In fact I am sure in their insecurity. I don't want other's repos being compromised because of this.
  • GreyCat doesn't trust me because I am a nonconformist and don't follow the style guide I personally consider defective :) It is probably the reason why my PRs are ignored for such a long time. It is completely OK ;)

at least one contribution per week

AFAIK there are no such requirements.

@generalmimon
Copy link
Member Author

I cannot be sure in cybersecurity of my devices. (...) I don't want other's repos being compromised because of this.

Nothing is ever secure, but I can't imagine the degree of boredom of a hacker that would bother with screwing up an open-source repo like this, but thanks for the info 🙂

@KOLANICH
Copy link
Contributor

KOLANICH commented Dec 22, 2019

open-source repo like this

access to an org gives access to all the repos. this repo is probably safe, ksys have no io, but if an adversary have managed to backdoor the KSC repo or builds, it would be devastating.

@generalmimon generalmimon linked an issue Apr 7, 2020 that may be closed by this pull request
@generalmimon
Copy link
Member Author

@GreyCat, can it be finally merged? It's blocking my master branch (which is of course my fault, but it quite sucks), every time I want to contribute something I must be careful because the branch master isn't master. And I actually don't see a place for improvement here.

common/riff/chunk.ksy Outdated Show resolved Hide resolved
common/riff/chunk.ksy Outdated Show resolved Hide resolved
@GreyCat
Copy link
Member

GreyCat commented Apr 13, 2020

Apologies for not reviewing this earlier. I totally agree on the motivation and approach, but there are two things that I've flagged:

  • Naming/file placement - that's minor and likely easy to fix. Actually, nowadays we might even just keep these pieces as subtypes inside riff.ksy and address them using something like type: 'riff::chunk'
  • Using strings to work with RIFF chunk identifiers. That's probably not too major too - we can have it declared as an u4, as then in every specific format we can convert it to .wav-specific, .avi-specific, etc, map of enum values.

What do you think?

@generalmimon
Copy link
Member Author

generalmimon commented Apr 13, 2020

@GreyCat

Actually, nowadays we might even just keep these pieces as subtypes inside riff.ksy and address them using something like type: 'riff::chunk'

At first, I meant the riff spec to be primarily for debugging and reverse-engineering (and it also can be used for extracting metadata from the INFO chunk), but yeah, I agree with this approach. It makes perfect sense to include the chunk and parent_chunk_data as subtypes, we get rid of two import-only files which are not meaningful on their own. And it's better to write riff::parent_chunk_data than riff_parent_chunk_data.

  • Using strings to work with RIFF chunk identifiers. That's probably not too major too - we can have it declared as an u4, as then in every specific format we can convert it to .wav-specific, .avi-specific, etc, map of enum values.

At time of writing the specs, I was experimenting with the enum implementation as well. The problem was it didn't come to my mind that I can use enum key in value instances as well (the docs don't mention it at all), so I thought that the only use enums is to introduce an enum on the riff KSY level with all possible RIFF chunk IDs that can ever occur, which just isn't possible. So I chose the only viable approach besides enums with respect to people (people typically don't know that number 0x45564157 means "WAVE" in little-endian), and that are plain text strings.

But yeah, you're right, string conversion and comparison is inefficent compared to a primitive integer read, and the inconvenience of working with plain numbers can be gracefully solved with converting the integer to format-specific (wav, avi, ...) chunk ID dictionary enum just using a value instance with the enum key.

I'll go ahead and try to incorporate the suggestions.

@generalmimon generalmimon requested a review from GreyCat May 12, 2020 19:52
@GreyCat
Copy link
Member

GreyCat commented May 30, 2020

All the latest changes look great, thanks!

However, there are currently 2 conflicting fragment. Can I ask you to take a look and resolve these conflicts, so we can merge this one?

@generalmimon
Copy link
Member Author

generalmimon commented May 30, 2020

@GreyCat

However, there are currently 2 conflicting fragment. Can I ask you to take a look and resolve these conflicts, so we can merge this one?

Yeah, I no longer hoped that this PR can be merged in a reasonable amount of time, so I've recently fixed the WAV format manually so that it at least works (it was broken due to kaitai-io/kaitai_struct#604 and merging this PR would also fix it).

Thanks for the review, please merge it if all looks good to you 🙂

@GreyCat GreyCat merged commit d353072 into kaitai-io:master May 30, 2020
@GreyCat
Copy link
Member

GreyCat commented May 30, 2020

Thanks for this great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resource Interchange File Format
3 participants