-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to read images from the Compound File? #13
Comments
Here's an example tool I wrote that uses this library: https://github.com/richardlehane/comdump |
Thank you, I'll give it a try |
Unfortunately, the image cannot be exported correctly |
The Microsoft Compound File Binary File format is a container format with a file-system like structure. All this library does is allow you to traverse that structure and access the contents of the files contained within. I.e. it will export the content of all contained files correctly, but it won't interpret those files for you. That really comes down to what type of file you are dealing with (lots of different applications have used MS-CFB as a container format e.g. old MS Office family etc.). If you can provide more details about the types of files you are trying to access, or provide a sample, may be able to help further |
|
So it looks like the file you are working with is a MS Word document. I used comdump to unpack the document and saw these streams: It does appear there is an image inside the "Data" stream (I've highlighted the start of a JPG header in that stream) but just truncating the Data file by cutting the first 478 bytes didn't result in a working JPG file. My library just interprets the underlying container format (MS-CFBF) but it isn't capable of parsing the structures within the Word streams (or any of the other file formats that are based on MS-CFBF). In order to write a program to parse these structures you could refer to this MS documentation: https://learn.microsoft.com/en-us/openspecs/office_file_formats/ms-doc/ccd7b486-7881-484c-a137-51170af7cc22 Or you could use a library that does parse Word, like Aspose: https://products.aspose.app/words/parser If it is just a one off, and you have access to MS Office, you could also just follow this guide to extracting embedded images which is to save the file as an HTML page: https://support.microsoft.com/en-au/topic/wd-how-to-extract-embedded-images-from-a-word-document-f478bf7f-3bba-6afb-6ddc-3eeb284af36b |
how to read images from the Compound File?
The text was updated successfully, but these errors were encountered: