0.8.0 (2023-05-31)
Breaking Changes
-
Rename methods of
FileConnection
classes:get_directory
→resolve_dir
get_file
→resolve_file
listdir
→list_dir
mkdir
→create_dir
rmdir
→remove_dir
New naming should be more consistent.
They were undocumented in previous versions, but someone could use these methods, so this is a breaking change. (#36) -
Deprecate
onetl.core.FileFilter
class, replace it with new classes:onetl.file.filter.Glob
onetl.file.filter.Regexp
onetl.file.filter.ExcludeDir
Old class will be removed in v1.0.0. (#43)
-
Deprecate
onetl.core.FileLimit
class, replace it with new classonetl.file.limit.MaxFilesCount
.Old class will be removed in v1.0.0. (#44)
-
Change behavior of
BaseFileLimit.reset
method.This method should now return
self
instead ofNone
. Return value could be the same limit object or a copy, this is an implementation detail. (#44) -
Replaced
FileDownloader.filter
and.limit
with new options.filters
and.limits
:FileDownloader( ..., filter=FileFilter(glob="*.txt", exclude_dir="/path"), limit=FileLimit(count_limit=10), )
FileDownloader( ..., filters=[Glob("*.txt"), ExcludeDir("/path")], limits=[MaxFilesCount(10)], )
This allows to developers to implement their own filter and limit classes, and combine them with existing ones.
Old behavior still supported, but it will be removed in v1.0.0. (#45)
-
Removed default value for
FileDownloader.limits
, user should pass limits list explicitly. (#45) -
Move classes from module
onetl.core
:from onetl.core import DBReader from onetl.core import DBWriter from onetl.core import FileDownloader from onetl.core import FileUploader from onetl.core import FileResult from onetl.core import FileSet
with new modules
onetl.db
andonetl.file
:from onetl.db import DBReader from onetl.db import DBWriter from onetl.file import FileDownloader from onetl.file import FileUploader # not a public interface from onetl.file.file_result import FileResult from onetl.file.file_set import FileSet
Imports from old module
onetl.core
still can be used, but marked as deprecated. Module will be removed in v1.0.0. (#46)
Features
-
Add
rename_dir
method.Method was added to following connections:
FTP
FTPS
HDFS
SFTP
WebDAV
It allows to rename/move directory to new path with all its content.
S3
does not have directories, so there is no such method in that class. (#40) -
Add
onetl.file.FileMover
class.It allows to move files between directories of remote file system. Signature is almost the same as in
FileDownloader
, but without HWM support. (#42)
Improvements
-
Document all public methods in
FileConnection
classes:download_file
resolve_dir
resolve_file
get_stat
is_dir
is_file
list_dir
create_dir
path_exists
remove_file
rename_file
remove_dir
upload_file
walk
(#39)
-
Update documentation of
check
method of all connections - add usage example and document result type. (#39) -
Add new exception type
FileSizeMismatchError
.Methods
connection.download_file
andconnection.upload_file
now raise new exception type instead ofRuntimeError
, if target file after download/upload has different size than source. (#39) -
Add new exception type
DirectoryExistsError
- it is raised if target directory already exists. (#40) -
Improved
FileDownloader
/FileUploader
exception logging.If
DEBUG
logging is enabled, print exception with stacktrace instead of printing only exception message. (#42) -
Updated documentation of
FileUploader
.- Class does not support read strategies, added note to documentation.
- Added examples of using
run
method with explicit files list passing, both absolute and relative paths. - Fix outdated imports and class names in examples. (#42)
-
Updated documentation of
DownloadResult
class - fix outdated imports and class names. (#42) -
Improved file filters documentation section.
Document interface class
onetl.base.BaseFileFilter
and functionmatch_all_filters
. (#43) -
Improved file limits documentation section.
Document interface class
onetl.base.BaseFileLimit
and functionslimits_stop_at
/limits_reached
/reset_limits
. (#44) -
Added changelog.
Changelog is generated from separated news files using towncrier. (#47)
Misc
- Improved CI workflow for tests.
- If developer haven't changed source core of a specific connector or its dependencies, run tests only against maximum supported versions of Spark, Python, Java and db/file server.
- If developed made some changes in a specific connector, or in core classes, or in dependencies, run tests for both minimal and maximum versions.
- Once a week run all aganst for minimal and latest versions to detect breaking changes in dependencies
- Minimal tested Spark version is 2.3.1 instead on 2.4.8. (#32)
Full Changelog: 0.7.2...0.8.0