Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/1730 extend filesystem sftp #1769

Merged
merged 45 commits into from
Sep 14, 2024
Merged

Conversation

donotpush
Copy link
Collaborator

@donotpush donotpush commented Aug 29, 2024

Description

Extend filesystem source and destination to work with sftp.

As a user, I want to be able to load data from and to sftp server. Probably it can be done already with fsspec

  • test that it could be done with existing filesystem source
  • create a credentials type for sftp
  • add instructions and docs how to do it

A few implementation hints. Look in fsspec_filesystem:

  1. we use last modified / created timestamp for incremental loading. use MTIME_DISPATCH to define a mapping (I hope sftp fsspec has it - upload time should be available)
  2. check FilesystemConfiguration on how we add credentials for particular filesystems (based on protocol)

tests:

  • we have a fixture (a set of files) that we use for testing on all filesystems
  • look for where glob_files is tested to start
  • probably a local sftp server for testing is good
  • I expect some problems with https certs... how self-signed certs are handled.

Related Issues

Copy link

netlify bot commented Aug 29, 2024

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit 7a43d37
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/66e54a2f7450020008f6e6a8
😎 Deploy Preview https://deploy-preview-1769--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@donotpush donotpush force-pushed the feat/1730-extend-filesystem-sftp branch from 03393ce to 5fd34be Compare September 2, 2024 16:02
@donotpush donotpush requested a review from rudolfix September 5, 2024 12:04
@donotpush donotpush marked this pull request as ready for review September 5, 2024 15:47
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high quality PR! code and sftp specific tests LGTM. what we need to do is to integrate those tests with other filesystem tests. I think we won't need a separate github ci workflow. we'll also reuse a lot of existing general purpose tests that ie. test the glob funtionality

please try the following:

  1. in tests/.dlt/config.toml add bucket url to the test sftp server
  2. add sftp to ALL_FILESYSTEM_DRIVERS and WITH_GDRIVE_BUCKETS (tests/load/utils)

now all tests that test buckets directly will see sftp.

ie.

@pytest.mark.parametrize("write_disposition", ("replace", "append", "merge"))
@pytest.mark.parametrize("layout", TEST_FILE_LAYOUTS)
def test_successful_load(write_disposition: str, layout: str, with_gdrive_buckets_env: str) -> None:

there are plenty of tests that run filesystem as destination. those are also enabled by the above.

you can skip the whole module (I'm referring to test_filesystem_sftp). see how we skip inactive destinations here:

def skip_if_not_active(destination: str) -> None:

and do the same for filesystems.

when this is done you can merge your test_destination_sftp CI workflow with test_local_destinations.

just enable sftp via

ALL_FILESYSTEM_DRIVERS: "[\"memory\", \"file\"]"

start the sftp server and stuff should run

@donotpush
Copy link
Collaborator Author

high quality PR! code and sftp specific tests LGTM. what we need to do is to integrate those tests with other filesystem tests. I think we won't need a separate github ci workflow. we'll also reuse a lot of existing general purpose tests that ie. test the glob funtionality

please try the following:

  1. in tests/.dlt/config.toml add bucket url to the test sftp server
  2. add sftp to ALL_FILESYSTEM_DRIVERS and WITH_GDRIVE_BUCKETS (tests/load/utils)

now all tests that test buckets directly will see sftp.

ie.

@pytest.mark.parametrize("write_disposition", ("replace", "append", "merge"))
@pytest.mark.parametrize("layout", TEST_FILE_LAYOUTS)
def test_successful_load(write_disposition: str, layout: str, with_gdrive_buckets_env: str) -> None:

there are plenty of tests that run filesystem as destination. those are also enabled by the above.

you can skip the whole module (I'm referring to test_filesystem_sftp). see how we skip inactive destinations here:

def skip_if_not_active(destination: str) -> None:

and do the same for filesystems.

when this is done you can merge your test_destination_sftp CI workflow with test_local_destinations.

just enable sftp via

ALL_FILESYSTEM_DRIVERS: "[\"memory\", \"file\"]"

start the sftp server and stuff should run

@rudolfix thanks for the review! setting up the sftp server was fun. I've addressed all the requested changes - could you take another look and let me know if I missed anything?

@donotpush donotpush requested a review from rudolfix September 13, 2024 10:00
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks for the tests and docs. I added the extra for sftp to follow our conventions (and also fsspec)

@rudolfix rudolfix merged commit 4e45ea4 into devel Sep 14, 2024
59 of 61 checks passed
@rudolfix rudolfix deleted the feat/1730-extend-filesystem-sftp branch September 14, 2024 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend filesystem source and destination to work with sftp.
2 participants