Undetected PDF that can still be opened in pdf reader. #25

simmisj · 2019-12-12T14:22:14Z

Hi.
Recently I received a pdf document that was not corrupt and could be opened in a pdf reader but was not detected as a pdf by Mime-Detective.
The pdf standard says that a pdf document should start with the magic number and a version number. See 'Technical overview - File structure' here: https://en.wikipedia.org/wiki/PDF But the document that I received started with a new line and this òÀ� followed by the magic number and version number. You can replicate this by taking any working pdf document and adding it to the beginning of the file in a text editor. Setting the pdf type offset to 4 makes Mime-Detective detect it as a pdf since it skips the added gibberish.
The issue here is, since pdf readers can safely open such documents, shouldn't Mime-Detective detect it as a valid pdf document?
The problem seems to be in the GetFileMatchingCount method in MimeTypes class. It expects the header to be the first thing it sees and breaks out immediately.
Cheers!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Undetected PDF that can still be opened in pdf reader. #25

Undetected PDF that can still be opened in pdf reader. #25

simmisj commented Dec 12, 2019

Undetected PDF that can still be opened in pdf reader. #25

Undetected PDF that can still be opened in pdf reader. #25

Comments

simmisj commented Dec 12, 2019