Skip to content

0.8.0 (2023-05-31)

Compare
Choose a tag to compare
@dolfinus dolfinus released this 31 May 12:34
· 709 commits to develop since this release
9c6b44c

Breaking Changes

  • Rename methods of FileConnection classes:

    • get_directoryresolve_dir
    • get_fileresolve_file
    • listdirlist_dir
    • mkdircreate_dir
    • rmdirremove_dir

    New naming should be more consistent.
    They were undocumented in previous versions, but someone could use these methods, so this is a breaking change. (#36)

  • Deprecate onetl.core.FileFilter class, replace it with new classes:

    • onetl.file.filter.Glob
    • onetl.file.filter.Regexp
    • onetl.file.filter.ExcludeDir

    Old class will be removed in v1.0.0. (#43)

  • Deprecate onetl.core.FileLimit class, replace it with new class onetl.file.limit.MaxFilesCount.

    Old class will be removed in v1.0.0. (#44)

  • Change behavior of BaseFileLimit.reset method.

    This method should now return self instead of None. Return value could be the same limit object or a copy, this is an implementation detail. (#44)

  • Replaced FileDownloader.filter and .limit with new options .filters and .limits:

    FileDownloader(
        ...,
        filter=FileFilter(glob="*.txt", exclude_dir="/path"),
        limit=FileLimit(count_limit=10),
    )
    FileDownloader(
        ...,
        filters=[Glob("*.txt"), ExcludeDir("/path")],
        limits=[MaxFilesCount(10)],
    )

    This allows to developers to implement their own filter and limit classes, and combine them with existing ones.

    Old behavior still supported, but it will be removed in v1.0.0. (#45)

  • Removed default value for FileDownloader.limits, user should pass limits list explicitly. (#45)

  • Move classes from module onetl.core:

    from onetl.core import DBReader
    from onetl.core import DBWriter
    from onetl.core import FileDownloader
    from onetl.core import FileUploader
    from onetl.core import FileResult
    from onetl.core import FileSet

    with new modules onetl.db and onetl.file:

    from onetl.db import DBReader
    from onetl.db import DBWriter
    
    from onetl.file import FileDownloader
    from onetl.file import FileUploader
    
    # not a public interface
    from onetl.file.file_result import FileResult
    from onetl.file.file_set import FileSet

    Imports from old module onetl.core still can be used, but marked as deprecated. Module will be removed in v1.0.0. (#46)

Features

  • Add rename_dir method.

    Method was added to following connections:

    • FTP
    • FTPS
    • HDFS
    • SFTP
    • WebDAV

    It allows to rename/move directory to new path with all its content.

    S3 does not have directories, so there is no such method in that class. (#40)

  • Add onetl.file.FileMover class.

    It allows to move files between directories of remote file system. Signature is almost the same as in FileDownloader, but without HWM support. (#42)

Improvements

  • Document all public methods in FileConnection classes:

    • download_file
    • resolve_dir
    • resolve_file
    • get_stat
    • is_dir
    • is_file
    • list_dir
    • create_dir
    • path_exists
    • remove_file
    • rename_file
    • remove_dir
    • upload_file
    • walk (#39)
  • Update documentation of check method of all connections - add usage example and document result type. (#39)

  • Add new exception type FileSizeMismatchError.

    Methods connection.download_file and connection.upload_file now raise new exception type instead of RuntimeError, if target file after download/upload has different size than source. (#39)

  • Add new exception type DirectoryExistsError - it is raised if target directory already exists. (#40)

  • Improved FileDownloader / FileUploader exception logging.

    If DEBUG logging is enabled, print exception with stacktrace instead of printing only exception message. (#42)

  • Updated documentation of FileUploader.

    • Class does not support read strategies, added note to documentation.
    • Added examples of using run method with explicit files list passing, both absolute and relative paths.
    • Fix outdated imports and class names in examples. (#42)
  • Updated documentation of DownloadResult class - fix outdated imports and class names. (#42)

  • Improved file filters documentation section.

    Document interface class onetl.base.BaseFileFilter and function match_all_filters. (#43)

  • Improved file limits documentation section.

    Document interface class onetl.base.BaseFileLimit and functions limits_stop_at / limits_reached / reset_limits. (#44)

  • Added changelog.

    Changelog is generated from separated news files using towncrier. (#47)

Misc

  • Improved CI workflow for tests.
    • If developer haven't changed source core of a specific connector or its dependencies, run tests only against maximum supported versions of Spark, Python, Java and db/file server.
    • If developed made some changes in a specific connector, or in core classes, or in dependencies, run tests for both minimal and maximum versions.
    • Once a week run all aganst for minimal and latest versions to detect breaking changes in dependencies
    • Minimal tested Spark version is 2.3.1 instead on 2.4.8. (#32)

Full Changelog: 0.7.2...0.8.0