Outlook Message Parser is a small open source Java library that parses Outlook .msg files.
<dependency>
<groupId>org.simplejavamail</groupId>
<artifactId>outlook-message-parser</artifactId>
<version>1.7.9</version>
</dependency>
Outlook Message Parser is a continuation (or fork if that project independently continues) of msgparser.
Under the hood it uses the Apache POI - POIFS library to parse the message files which use the OLE 2 Compound Document format. Thus, it is merely a convenience library that covers the details of the .msg file. The implementation is based on the information provided at fileformat.info.
v1.7.9 (10-October-2020)
v1.7.8 (4-August-2020)
- #35 Clarify permission to publish project using Apache v2 license
v1.7.0 - v1.7.7 (9-January-2020 - 17-July-2020)
- v1.7.7 - #34 Wrong encoding for bodyHTML
- v1.7.5 - #31 Bugfix for attachments with special characters in the name
- v1.7.4 - #27 Same as 1.7.3, but now also for chinese senders
- v1.7.3 - #27 When from name/address are not available (unsent emails), these fields are filled with binary garbage
- v1.7.2 - #26 To email address is not handled properly when name is omitted
- v1.7.1 - #25 NPE on ClientSubmitTime when original message has not been sent yet
- v1.7.1 - #23 Bug: _nameid directory should not be parsed (and causing invalid HTML body)
- v1.7.0 - #18 Upgrade Apache POI 3.9 -> 4.x
Note: Apache POI requires minimum Java 8
v1.6.0 (8-January-2020)
- #21 Multiple TO recipients are not handles properly
v1.5.0 (18-December-2019)
- #20 CC and BCC recipients are not parsed properly
- #19 Use real Outlook ContentId Attribute to resolve CID Attachments
v1.4.1 (22-October-2019)
- #17 Fixed encoding error for UTF-8's Windows legacy name (cp)65001
v1.4.0 (13-October-2019)
- #9 Replaced the RFC to HTML converter with a brand new RFC-compliant convert! (thanks to @fadeyev!)
v1.3.0 (4-October-2019)
- #14 Dependency problem with Java9+, missing Jakarta Activation Framework
- #13 HTML start tags with extra space not handled correctly
- #11 SimpleRTF2HTMLConverter inserts too many
tags - #10 Embedded images with DOS-like names are classified as attachments
- #9 SimpleRTF2HTMLConverter removes some valid tags during conversion
v1.2.1 (12-May-2019)
- Ignore non S/MIME related content types when extracting S/MIME metadata
- Added toString and equals methods to the S/MIME data classes
v1.1.21 (4-May-2019)
- Upgraded mediatype recognition based on file extension for incomplete attachments
- Added / improved support for public S/MIME meta data
v1.1.20 (14-April-2019)
- #7 Fix missing S/MIME header details that are needed to determine the type of S/MIME application
v1.1.19 (10-April-2019)
- Log rtf compression error, but otherwise ignore it and keep going and extract what we can.
v1.1.18 (5-April-2019)
- #6 Missing mimeTag for attachments should be guessed based on file extension
v1.1.17 (19-August-2018)
- #3 implemented robust support for character sets / code pages in RTF to HTML conversion (fixes chinese support #3)
- fixed bug where too much text was cleaned up as part of superfluous RTF cleanup step when converting to HTML
- Performance boost in the RTF -> HTML converter
v1.1.16 (~28-Februari-2017)
- First Maven deployment, continuing version number from 1.1.15 of msgparser (https://github.com/bbottema/msgparser)
v1.16
- Added support for replyTo name and address
- cleaned up code (1st wave)