Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New public API #20

Open
ikrivosheev opened this issue Aug 8, 2022 · 2 comments
Open

New public API #20

ikrivosheev opened this issue Aug 8, 2022 · 2 comments

Comments

@ikrivosheev
Copy link

@tim-weis hello! Thank you for the great library! I want to explain my use case of this library.

I write tool set for analyze malware files. I parse doc (xls, ppt, ...) file using: https://github.com/mdsteele/rust-cfb. Then I need to get macros from file. I try to using your library. but I need new public API. I don't want to parse the same file twice and I cannot load file into memory because I work on large file stream.
In doc file, macros are not at the root of the cfb container.

I create PR: #19, where I add new API:

  1. Change Vec<u8> to fn open_project<R: Read + Seek>(reader: R) - this is needed for CompoundFile.
  2. Add fn open_project_with_path<P: AsRef<Path>, R: Read + Seek>(root: P, reader: R) - find VBA project along the path different from the root one
  3. fn open_project_with_path_container<P: AsRef<Path>, R: Read + Seek>(root: P, mut container: CompoundFile<R>) - like previous method, but I already parsed cfb container
@tim-weis
Copy link
Owner

Quick update here, and apologies for the delay, Ivan.

First up, thanks for taking the time to explain the issue. If I understand, there are actually two issues here:

  1. Accessing a VBA project that's located in a stream other than the default / stream.
    The current implementation doesn't allow clients to specify a non-default location here. Bullet point 2. in the issue report seems to address this, providing a way to optionally pass this information down. That looks reasonable.
  2. Allow the library to operate on a client-controlled CFB.
    That's a valid use case, too. What's causing me grief is that this necessitates moving a foreign type into the public API, something I'd like to avoid if at all possible.

While I do wish to address both issues, I've been literally losing a bit of sleep over how to specifically deal with the latter. I'd been tossing around a few ideas, such as:

  • Taking a step back and exposing the decoders rather than the surrounding utility functionality.
    That'd put control over (and responsibility for) the storage container back to clients, basing the entire API off of Read + Seek. Utility support could be added back in, possibly under a feature gate, but I haven't actually found time to try how that would work out.
  • Hide the concrete CFB access implementation behind an Adapter trait, and use that in the public API instead.
    The idea here is to keep the API generic, and not over-constrain it to a particular CFB implementation. Clients would have to provide an implementation that encapsulates the access details, possibly with the library supplying a "standard" implementation based off of the cfb crate. Again, I haven't had time to play with this idea to find out whether it is suitable.

Either way, a breaking API change is imminent. If you can keep working off of your local fork that'd buy me some time to think this through. If you'd rather have me publish an interim version to crates.io, I can do that, too. Though the expectation should be that a loud, possibly disruptive API break is upcoming.

@ikrivosheev
Copy link
Author

@tim-weis hello! Thank you for response.

I can wait, at the moment I copy some piece of code into my project.
I think is better to develop new API for the library. I can close my PR and create a new one with a new API with whom I would be comfortable to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants