Version 2.1.0, 2022-06-06 #955
MartinThoma
announced in
Announcements
Replies: 1 comment
-
Making a bit of advertisement that PyPDF2 is now better than ever: https://www.linkedin.com/posts/martin-thoma_github-py-pdfbenchmarks-benchmarking-activity-6939589461190062080-TBwr and https://twitter.com/_martinthoma/status/1533823440758661121 😄 @pubpub-zz pushed PyPDF2 from 86% to 96% in my benchmark - according to the benchmark, we are now be better than pdftotext and pdfminer.six in text extraction 😄 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What's Changed
The highlight of the 2.1.0 release is the most massive improvement to the
text extraction capabilities of PyPDF2 since 2016 🥳🎊 A very big thank you goes
to pubpub-zz who took a lot of time and
knowledge about the PDF format to finally get those improvements into PyPDF2.
Thank you 🤗💚
In case the new function causes any issues, you can use
_extract_text_old
for the old functionality. Please also open a bug ticket in that case.
There were several people who have attempted to bring similar improvements to
PyPDF2. All of those were valuable. The main reason why they didn't get merged
is the big amount of open PRs / issues. pubpub-zz was the most comprehensive
PR which also incorporated the latest changes of PyPDF2 2.0.0.
Thank you to VictorCarlquist for #858 and
asabramo for #464 🤗
New Features (ENH)
Bug Fixes (BUG)
Robustness (ROB)
Documentation (DOC)
Developer Experience (DEV)
Testing (TST)
Code Style (STY)
Full Changelog: 2.0.0...2.1.0
New Contributors
Full Changelog: 2.0.0...2.1.0
This discussion was created from the release Version 2.1.0, 2022-06-06.
Beta Was this translation helpful? Give feedback.
All reactions