Skip to content

Version 2.2.0, 2022-06-13

Compare
Choose a tag to compare
@MartinThoma MartinThoma released this 13 Jun 19:53
· 1056 commits to main since this release
2.2.0
f0cd829

What's Changed

The 2.2.0 release improves text extraction (#969 - again by @pubpub-zz 🙏):

  • Improvements around /Encoding / /ToUnicode
  • Extraction of CMaps improved
  • Fallback for font def missing
  • Support for /Identity-H and /Identity-V: utf-16-be
  • Support for /GB-EUC-H / /GB-EUC-V / GBp/c-EUC-H / /GBpc-EUC-V (beta release for evaluation)
  • Arabic (for evaluation)
  • Whitespace extraction improvements

Those changes should mainly improve the text extraction for non-ASCII alphabets,
e.g. Russian / Chinese / Japanese / Korean / Arabic.

Full Changelog: 2.1.1...2.2.0