how to read images from the Compound File? #13

jilieryuyi · 2023-06-13T08:46:29Z

how to read images from the Compound File?

richardlehane · 2023-06-13T10:17:01Z

Here's an example tool I wrote that uses this library: https://github.com/richardlehane/comdump
This will dump out the contents of a compound file to disk and is perhaps what you want?

jilieryuyi · 2023-06-14T08:55:13Z

Here's an example tool I wrote that uses this library: https://github.com/richardlehane/comdump This will dump out the contents of a compound file to disk and is perhaps what you want?

Thank you, I'll give it a try

jilieryuyi · 2023-06-14T10:12:13Z

Here's an example tool I wrote that uses this library: https://github.com/richardlehane/comdump This will dump out the contents of a compound file to disk and is perhaps what you want?

Unfortunately, the image cannot be exported correctly

richardlehane · 2023-06-14T10:28:10Z

The Microsoft Compound File Binary File format is a container format with a file-system like structure. All this library does is allow you to traverse that structure and access the contents of the files contained within. I.e. it will export the content of all contained files correctly, but it won't interpret those files for you. That really comes down to what type of file you are dealing with (lots of different applications have used MS-CFB as a container format e.g. old MS Office family etc.). If you can provide more details about the types of files you are trying to access, or provide a sample, may be able to help further

jilieryuyi · 2023-06-15T01:25:11Z

The Microsoft Compound File Binary File format is a container format with a file-system like structure. All this library does is allow you to traverse that structure and access the contents of the files contained within. I.e. it will export the content of all contained files correctly, but it won't interpret those files for you. That really comes down to what type of file you are dealing with (lots of different applications have used MS-CFB as a container format e.g. old MS Office family etc.). If you can provide more details about the types of files you are trying to access, or provide a sample, may be able to help further

1.zip

richardlehane · 2023-06-15T09:27:52Z

So it looks like the file you are working with is a MS Word document. I used comdump to unpack the document and saw these streams:

It does appear there is an image inside the "Data" stream (I've highlighted the start of a JPG header in that stream) but just truncating the Data file by cutting the first 478 bytes didn't result in a working JPG file.

My library just interprets the underlying container format (MS-CFBF) but it isn't capable of parsing the structures within the Word streams (or any of the other file formats that are based on MS-CFBF). In order to write a program to parse these structures you could refer to this MS documentation: https://learn.microsoft.com/en-us/openspecs/office_file_formats/ms-doc/ccd7b486-7881-484c-a137-51170af7cc22 Or you could use a library that does parse Word, like Aspose: https://products.aspose.app/words/parser

If it is just a one off, and you have access to MS Office, you could also just follow this guide to extracting embedded images which is to save the file as an HTML page: https://support.microsoft.com/en-au/topic/wd-how-to-extract-embedded-images-from-a-word-document-f478bf7f-3bba-6afb-6ddc-3eeb284af36b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to read images from the Compound File? #13

how to read images from the Compound File? #13

jilieryuyi commented Jun 13, 2023

richardlehane commented Jun 13, 2023

jilieryuyi commented Jun 14, 2023

jilieryuyi commented Jun 14, 2023

richardlehane commented Jun 14, 2023

jilieryuyi commented Jun 15, 2023

richardlehane commented Jun 15, 2023

how to read images from the Compound File? #13

how to read images from the Compound File? #13

Comments

jilieryuyi commented Jun 13, 2023

richardlehane commented Jun 13, 2023

jilieryuyi commented Jun 14, 2023

jilieryuyi commented Jun 14, 2023

richardlehane commented Jun 14, 2023

jilieryuyi commented Jun 15, 2023

richardlehane commented Jun 15, 2023