Releases: py-pdf/pypdf
Releases · py-pdf/pypdf
Version 5.1.0, 2024-10-27
What's new
New Features (ENH)
- Add
layout_mode_font_height_weight
argument toPageObject.extract_text()
(#2920) by @hpierre001
Bug Fixes (BUG)
- Fix font specificier for FreeText annotation (#2893) by @ssjkamei
- Line breaks are not generated due to incorrect calculation of text leading (#2890) by @ssjkamei
- Improve handling of spaces in text extraction (#2882) by @ssjkamei
Robustness (ROB)
- Soft failure for flate encode image mode 1 with wrong LUT size (#2900) by @stefan6419846
Documentation (DOC)
- Use latest package versions (#2907) by @stefan6419846
- Correct example of reading FileAttachment annotation (#2906) by @j-t-1
Developer Experience (DEV)
- Update pinned requirements (#2918) by @stefan6419846
- Make make_release.py compatible with Windows environment (#2894) by @pubpub-zz
Maintenance (MAINT)
- Remove references to outdated Python versions (#2919) by @stefan6419846
- Generalize the method of obtaining space_code (#2891) by @ssjkamei
- Unnecessary character mapping process (#2888) by @ssjkamei
- New LZW decoding implementation (#2887) by @MartinThoma
Testing (TST)
- Add LzwCodec for encoding (#2883) by @MartinThoma
Code Style (STY)
Version 5.0.1, 2024-09-29
Version 5.0.1, 2024-09-29
New Features (ENH)
- Add
full
parameter to PdfWriter constructor (#2865)
Bug Fixes (BUG)
- Update pyproject.toml with minimum Python version of 3.8 (#2859)
- Cope with unbalanced delimiters in dictionary object (#2878)
- Cope with encoding with too many differences (#2873)
- Missing spaces in extract_text() method (#1328) (#2868)
- Tolerate truncated files and no warning when jumping startxref (#2855)
Robustness (ROB)
- Repair PDF with invalid Root object (#2880)
- Continue parsing dictionary object when error is detected (#2872)
- Merge documents with invalid pages in named destinations (#2857)
- Tolerate comments in arrays (#2856)
Developer Experience (DEV)
- Use latest Python version for benchmarking (#2879)
Maintenance (MAINT)
Version 5.0.0, 2024-09-17
Version 5.0.0, 2024-09-17
This version drops support for Python 3.7 (not maintained since July 2023), PdfMerger (use PdfWriter instead) and AnnotationBuilder (use annotations instead).
Deprecations (DEP)
- Remove the deprecated PfdMerger and AnnotationBuilder classes and other deprecations cleanup (#2813)
- Drop Python 3.7 support (#2793)
New Features (ENH)
- Add capability to remove /Info from PDF (#2820)
- Add incremental capability to PdfWriter (#2811)
- Add UniGB-UTF16 encodings (#2819)
- Accept utf strings for metadata (#2802)
- Report PdfReadError instead of RecursionError (#2800)
- Compress PDF files merging identical objects (#2795)
Bug Fixes (BUG)
- Fix sheared image (#2801)
Robustness (ROB)
- Robustify .set_data() (#2821)
- Raise PdfReadError when missing /Root in trailer (#2808)
- Fix extract_text() issues on damaged PDFs (#2760)
- Handle images with empty data when processing an image from bytes (#2786)
Developer Experience (DEV)
Version 4.3.1, 2024-07-21
Version 4.3.0, 2024-07-14
What's new
New Features (ENH)
- Accept ETen-B5 and UniCNS-UTF16 encodings (#2721) by @pubpub-zz
- Add decode_as_image() to ContentStreams (#2615) by @pubpub-zz
- context manager for PdfReader (#2666) by @tibor-reiss
- Add capability to set font and size in fields (#2636) by @pubpub-zz
- Allow to pass input file without named argument (#2576) by @pubpub-zz
Bug Fixes (BUG)
- Fix deprecation for Ressources when using old constants (#2705) by @stefan6419846
- Fix images issue 4 bits encoding and LUT starting with UTF16_BOM (#2675) by @pubpub-zz
- Reading large compressed images takes huge time to process (#2644) by @snanda85
- Highlighted Text Cannot Be Printed (#2604) by @Nifury
- Fix UnboundLocalError on malformed pdf (#2619) by @farjasju
Documentation (DOC)
- Various improvements on docstrings and examples by @j-t-1
Robustness (ROB)
- Cope with missing Standard 14 fonts in fields (#2677) by @pubpub-zz
- Improve inline image extraction (#2622) by @pubpub-zz
- Cope with loops in Fields tree (#2656) by @pubpub-zz
- Discard /I in choice fields for compatibility with Acrobat (#2614) by @pubpub-zz
- Cope with some issues in pillow (#2595) by @pubpub-zz
- Cope with some image extraction issues (#2591) by @pubpub-zz
Maintenance (MAINT)
- Deprecate interiour_color with replacement interior_color (#2706) by @j-t-1
- Add deprecate_with_replacement to PdfWriter.find_bookmark (#2674) by @j-t-1
Code Style (STY)
Version 4.2.0, 2024-04-07
What's new
New Features (ENH)
- Allow multiple charsets for NameObject.read_from_stream (#2585) by @pubpub-zz
- Add support for /Kids in page labels (#2562) by @stefan6419846
- Allow to update fields on many pages (#2571) by @pubpub-zz
- Tolerate PDF with invalid xref pointed objects (#2335) by @pubpub-zz
- Add Enforce from PDF2.0 in viewer_preferences (#2511) by @pubpub-zz
- Add += and -= operators to ArrayObject (#2510) by @pubpub-zz
Bug Fixes (BUG)
- Fix merge_page sometimes generating unknown operator 'QQ' (#2588) by @rfotino
- Fix fields update where annotations are kids of field (#2570) by @pubpub-zz
- Process CMYK images without a filter correctly (#2557) by @pubpub-zz
- Extract text in layout mode without finding resources (#2555) by @pubpub-zz
- Prevent recursive loop in some PDF files (#2505) by @pubpub-zz
Robustness (ROB)
- Tolerate "truncated" xref (#2580) by @pubpub-zz
- Replace error by warning for EOD in RunLengthDecode/ASCIIHexDecode (#2334) by @pubpub-zz
- Rebuild xref table if one entry is invalid (#2528) by @pubpub-zz
- Robustify stream extraction (#2526) by @pubpub-zz
Documentation (DOC)
- Update release process for latest changes (#2564) by @stefan6419846
- Encryption/decryption: Clone document instead of copying all pages (#2546) by @redfast00
- Minor improvements (#2542) by @j-t-1
- Update annotation list (#2534) by @j-t-1
- Update references and formatting (#2529) by @j-t-1
- Correct threads reference, plus minor changes (#2521) by @j-t-1
- Minor readability increases (#2515) by @j-t-1
- Simplify PaperSize examples (#2504) by @j-t-1
- Minor improvements (#2501) by @j-t-1
Developer Experience (DEV)
- Remove unused dependencies (#2572) by @stefan6419846
- Remove page labels PR link from message (#2561) by @stefan6419846
- Fix changelog generator regarding whitespace and handling of "Other" group (#2492) by @stefan6419846
- Add REL to known PR prefixes (#2554) by @stefan6419846
- Release using the REL commit instead of git tag (#2500) by @MartinThoma
- Unify code between PdfReader and PdfWriter (#2497) by @pubpub-zz
- Bump softprops/action-gh-release from 1 to 2 (#2514) by @dependabot[bot]
Maintenance (MAINT)
- Ressources → Resources (and internal name childs) (#2550) by @pubpub-zz
- Fix typos found by codespell (#2549) by @stefan6419846
- Update Read the Docs configuration (#2538) by @j-t-1
- Add root_object, _info and _ID to PdfReader (#2495) by @pubpub-zz
Testing (TST)
- Allow loading truncated images if required (#2586) by @stefan6419846
- Fix download issues from #2562 (#2578) by @pubpub-zz
- Improve test_get_contents_from_nullobject to show real use-case (#2524) by @stefan6419846
- Add missing test annotations (#2507) by @stefan6419846
Version 4.1.0, 2024-03-03
What's new
Generating name objects (NameObject
) without a leading slash is considered deprecated now. Previously, just a plain warning would be logged, leading to possibly invalid PDF files. According to our deprecation policy, this will log a DeprecationWarning for now.
New Features (ENH)
- Add get_pages_from_field (#2494) by @pubpub-zz
- Add reattach_fields function (#2480) by @pubpub-zz
- Automatic access to pointed object for IndirectObject (#2464) by @pubpub-zz
Bug Fixes (BUG)
- missing error on name without leading / (#2387) by @Rak424
- encode_pdfdocencoding() always returns bytes (#2440) by @sbourlon
- BI in text content identified as image tag (#2459) by @pubpub-zz
Robustness (ROB)
- Missing basefont entry in type 3 font (#2469) by @pubpub-zz
Documentation (DOC)
Developer Experience (DEV)
- Fix changelog for UTF-8 characters (#2462) by @stefan6419846
Maintenance (MAINT)
- Add _get_page_number_from_indirect in writer (#2493) by @pubpub-zz
- Remove user assignment for feature requests (#2483) by @stefan6419846
- Remove reference to old 2.0.0 branch (#2482) by @stefan6419846
Testing (TST)
- Fix benchmark failures (#2481) by @stefan6419846
- Resolve file naming conflict in test_iss1767 (#2445) by @sbourlon
Version 4.0.2, 2024-02-18
What's new
Bug Fixes (BUG)
- Use NumberObject for /Border elements of annotations (#2451) by @rsinger417
Documentation (DOC)
- Document easier way to update metadata (#2454) by @stefan6419846
- Typo
Polyline
\xe2\x86\x92PolyLine
in adding-pdf-annotations.md (#2426) by @CWKSC
Developer Experience (DEV)
- Bump codecov/codecov-action from 3 to 4 (#2430) by @dependabot[bot]
Testing (TST)
- Avoid catching not emitted warnings (#2429) by @stefan6419846
Version 4.0.1, 2024-01-28
What's new
Bug Fixes (BUG)
Testing (TST)
- Skip tests using fpdf2 if it's not installed (#2419) by @MartinThoma
Version 4.0.0, 2024-01-19
What's new
pypdf==4.0.0 is a big milestone forward:
- We finally have a layout-mode text extraction. This enables users who want to detect / extract tables with heuristics to give it a try.
- We deprecated a lot of the old PyPDF2 API that was either not following PEP8 naming styles or was not using a property. Users coming from PyPDF2 might want to switch to pypdf<4.0.0 first to get helpful error messages that show the new API in their specific cases.
A big 'Thank you!' the the whole pypdf community for your work. Thanks to you, pypdf is better than ever.
Kudos to @shartzog who added the layout-mode with his first contribution!
Deprecations (DEP)
- Drop Python 3.6 support (#2369) by @MartinThoma
- Remove deprecated code (#2367) by @MartinThoma
- Remove deprecated XMP properties (#2386) by @stefan6419846
New Features (ENH)
- Add "layout" mode for text extraction (#2388) by @shartzog
- Add Jupyter Notebook integration for PdfReader (#2375) by @MartinThoma
- Improve/rewrite PDF permission retrieval (#2400) by @stefan6419846
Bug Fixes (BUG)
- PdfWriter.add_uri was setting the wrong type (#2406) by @pmiller66
- Add support for GBK2K cmaps (#2385) by @stefan6419846
Documentation (DOC)
- Add pmiller66 for #2406 as a contributor by @MartinThoma
- Add missing expand parameter (#2393) by @Atomnp
- Resolve build warnings (#2380) by @stefan6419846
- Fix testing prerequisites (#2381) by @stefan6419846
- Improve formatting of contributors page (#2383) by @stefan6419846
- Add Tobeabellwether as a contributor for #2341 by @MartinThoma
Developer Experience (DEV)
- Make dependabot aware of our PR prefixes (#2415) by @stefan6419846
- Fail on Sphinx issues (#2405) by @stefan6419846
- Move title check to own workflow (#2384) by @MasterOdin
- Write to temporary files instead of the working directory (#2379) by @stefan6419846
- Ensure that the PR titles have the correct format (#2378) by @stefan6419846
Maintenance (MAINT)
- Return None instead of -1 when page is not attached (#2376) by @MartinThoma
- Complete FileSpecificationDictionaryEntries constants (#2416) by @MartinThoma
- Replace warning with logging.error (#2377) by @MartinThoma
Testing (TST)
- Add missing pytest.mark.samples annotations (#2412) by @kitterma
- Correctly close temporary files (#2396) by @stefan6419846
- Fix side effect #2379 (#2395) by @pubpub-zz
- Add test for layout extraction mode (#2390) by @MartinThoma
Code Style (STY)
- Use the UserAccessPermissions enum (#2398) by @MartinThoma
- Run black (#2370) by @MartinThoma