Skip to content

Releases: openzim/python-scraperlib

4.0.0

05 Aug 09:33
6489f2a
Compare
Choose a tag to compare

Added

  • Add utility function to compute ZIM Tags #164, including deduplication #156
  • Metadata does not automatically drops control characters #159
  • New indexing.IndexData class to hold title, content and keywords to pass to libzim to index an item
  • Automatically index PDF documents content #167
  • Automatically set proper title on PDF documents #168
  • Expose new optimization.get_optimization_method to get the proper optimization method to call for a given image format
  • Add optimization.get_optimization_method to get the proper optimization method to call for a given image format
  • New creator.Creator.convert_and_check_metadata to convert metadata to bytes or str for known use cases and check proper type is passed to libzim
  • Add svg2png image conversion function #113
  • Add conversion.convert_svg2png image conversion function + support for SVG in probing.format_for #113
  • Add i18n.Lang class used as typed result of i18n operations #151

Changed

  • BREAKING Renamed zimscraperlib.image.convertion to zimscraperlib.image.conversion to fix typo
  • BREAKING Many changes in type hints to match the real underlying code
  • BREAKING Force all boolean arguments (and some other non-obvious parameters) to be keyword-only in function calls for clarity / disambiguation (see ruff rule FBT002)
  • Prefer to use IO[bytes] to io.BytesIO when possible since it is more generic
  • BREAKING i18n.NotFound renamed i18n.NotFoundError
  • BREAKING types.get_mime_for_name now returns str | None
  • BREAKING creator.Creator.add_metadata and creator.Creator.validate_metadata now only accepts bytes | str as value (it must have been converted before call)
  • BREAKING second argument of creator.Creator.add_metadata has been renamed to value instead of content to align with other methods
  • When a type issue arises in metadata checks, wrong value type is displayed in exception
  • BREAKING i18n.get_language_details(), i18n.get_iso_lang_data(), i18n.find_language_names() and i18n.update_with_macro now process / return a new typed Lang class #151
  • BREAKING Rename i18.NotFound to i18n.NotFoundError

Removed

  • BREAKING Remove translation features in i18n: Locale class + _ and setlocale functions #134

Fixed

  • Metadata length validation is buggy for unicode strings #158
  • Pillow 10.4.0 reveals improper type hints for image probing functions #177
  • Enhance error when locale fails to setup #157

v3.4.0

21 Jun 11:26
8b040a6
Compare
Choose a tag to compare

Added

  • zim.creator.Creator._log_metadata() to log (DEBUG) all metadata set on _metadata (prior to start()) #155
  • New utility function to confirm ZIM can be created at given location / name #163

Changed

  • Migrate the VideoWebmLow and VideoWebmHigh presets to VP9 for smaller file size #79
    • New preset versions are v3 and v2 respectively
  • Simplify type annotations by replacing Union and Optional with pipe character ("|") for improved readability and clarity #150
  • Calling Creator._log_metadata() on Creator.start() if running in DEBUG #155

Fixed

  • Add back the --runinstalled flag for test execution to allow smooth testing on other build chains #139

3.3.2

25 Mar 09:16
2923655
Compare
Choose a tag to compare

Added

  • Add support for disable_metadata_checks and ignore_duplicates arguments in make_zim_file function ("zimwritefs-mode")

Changed

  • Relaxed constraints on Python dependencies
  • Upgraded optional dependencies used for test and QA

3.3.1

27 Feb 14:21
e31f5ed
Compare
Choose a tag to compare

Added

  • Set a user-agent for handle_user_provided_file #103

Changed

  • Migrate to generic syntax in all std collections #140

Fixed

  • Do not modify the ffmpeg_args in reencode function #144

3.3.0

14 Feb 09:42
04181e9
Compare
Choose a tag to compare

Added

  • New disable_metadata_checks parameter in zimscraperlib.zim.creator.Creator initializer, allowing to disable metadata check at startup (assuming the user will validate them on its own) #119

Changed

  • Rework the VideoWebmLow preset for faster encoding and smaller file size #122
    • preset has been bumped to version 2
    • when using an S3 cache, all videos using this preset will be reencoded and uploaded to cache again (it will replace the same file encoded with preset version 1)
  • When reencoding a video, ffmpeg now uses only 1 CPU thread by default (new arg to reencode allows to override this default value)
  • Using openZIM Python bootstrap conventions (including hatch-openzim plugin) #120
  • Add support for Python 3.12, drop Python 3.7 support #118
  • Replace "iso-369" by "iso639-lang" library
  • Replace "file-magic" by "python-magic" library for Alpine Linux support and better maintenance

Fixed

  • Fixed type hints of zimscraperlib.zim.Item and subclasses, and zimscraperlib.image.optimization:convert_image

3.2.0

16 Dec 18:25
4dc3012
Compare
Choose a tag to compare

Added

  • Add utility function to compute/check ZIM descriptions #110

Changed

  • Using pylibzim 3.4.0

Removed

  • Support for Python 3.7 (EOL)

3.1.1

18 Jul 18:19
4f8c3cc
Compare
Choose a tag to compare

Changed

  • Fixed declared (hint) return type of download.stream_file #104
  • Fixed declared (hint) type of content param for Creator.add_item_for #107

3.1.0

05 May 10:17
Compare
Choose a tag to compare

Changed

  • Using pylibzim 3.1.0
  • ZIM metadata check now allows multiple values (comma-separated) for Language
  • Using yt_dlp instead of youtube_dl

Removed

  • Dropped support for Python 3.6

3.0.0

31 Mar 11:07
Compare
Choose a tag to compare

⚠️ Warning: this release introduce several API changes to zim.creator.Creator and zim.filesystem.make_zim_file

Added

  • zim.creator.Creator.config_metadata method (returning Self) exposing all mandatory Metdata, all standard ones and allowing extra text metdadata.
  • zim.creator.Creator.config_dev_metadata method setting stub metdata for all mandatory ones (allowing overrides)
  • zim.metadata module with a list of per-metadata validation functions
  • zim.creator.Creator.validate_metadata (called on start) to verify metadata respects the spec (and its recommendations)
  • zim.filesystem.make_zim_file accepts a new optional long_description param.
  • i18n.is_valid_iso_639_3 to check ISO-639-3 codes
  • image.probing.is_valid_image to check Image format and size

Changed

  • zim.creator.Creator main_path argument now mandatory
  • zim.creator.Creator.start now fails on missing required or invalid metadata
  • zim.creator.Creator.add_metadata nows enforces validation checks
  • zim.filesystem.make_zim_file renamed its favicon_path param to illustration_path
  • zim.creator.Creator.config_indexing language argument now optionnal when indexing=False
  • zim.creator.Creator.config_indexing now validates language is ISO- 639-3 when indexing=True

Removed

  • zim.creator.Creator.update_metadata. See .config_metadata() instead
  • zim.creator.Creator language argument. See .config_metadata() instead
  • zim.creator.Creator keyword arguments. See .config_metadata() instead
  • zim.creator.Creator.add_default_illustration. See .config_metadata() instead
  • zim.archibe.Archive.media_counter (deprecated in 2.0.0)

2.1.0

06 Mar 16:32
Compare
Choose a tag to compare

Added

  • zim.creator.Creator(language=) can be specified as List[str]. ["eng", "fra"], ["eng"], "eng,fra", "eng" are all valid values.

Changed

  • Fixed zim.providers.URLProvider returning incomplete streams under certain circumstances (from openzim/kolibri#40)
  • Fixed zim.creator.Creator not supporting multiple values in for Language metadata, as required by the spec