Skip to content

Commit

Permalink
Add sftp option for filesystem source
Browse files Browse the repository at this point in the history
  • Loading branch information
VioletM committed Sep 19, 2024
1 parent c96ce7b commit 4e56372
Show file tree
Hide file tree
Showing 5 changed files with 42 additions and 6 deletions.
7 changes: 5 additions & 2 deletions docs/website/docs/dlt-ecosystem/destinations/filesystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ bucket_url='\\?\UNC\localhost\c$\a\b\c'
:::

### SFTP
Run `pip install "dlt[sftp]` which will install the `paramiko` package alongside `dlt`, enabling secure SFTP transfers.
Run `pip install "dlt[sftp]"` which will install the `paramiko` package alongside `dlt`, enabling secure SFTP transfers.

Configure your SFTP credentials by editing the `.dlt/secrets.toml` file. By default, the file contains placeholders for AWS credentials. You should replace these with your SFTP credentials.

Expand All @@ -304,7 +304,10 @@ sftp_gss_deleg_creds # Delegate credentials with GSS-API, defaults to True
sftp_gss_host # Host for GSS-API, defaults to None
sftp_gss_trust_dns # Trust DNS for GSS-API, defaults to True
```
> For more information about credentials parameters: https://docs.paramiko.org/en/3.3/api/client.html#paramiko.client.SSHClient.connect

:::info
For more information about credentials parameters: https://docs.paramiko.org/en/3.3/api/client.html#paramiko.client.SSHClient.connect
:::

### Authentication Methods

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ keywords: [readers source and filesystem, files, filesystem, readers source, clo
import Header from '../_source-info-header.md';
<Header/>

Filesystem source allows loading files from remote locations (AWS S3, Google Cloud Storage, Google Drive, Azure) or the local filesystem seamlessly. Filesystem source natively supports `csv`, `parquet`, and `jsonl` files and allows customization for loading any type of structured files.
Filesystem source allows loading files from remote locations (AWS S3, Google Cloud Storage, Google Drive, Azure, sftp server) or the local filesystem seamlessly. Filesystem source natively supports `csv`, `parquet`, and `jsonl` files and allows customization for loading any type of structured files.

To load unstructured data (`.pdf`, `.txt`, e-mail), please refer to the [unstructured data source](https://github.com/dlt-hub/verified-sources/tree/master/sources/unstructured_data).

Expand Down Expand Up @@ -75,6 +75,7 @@ To get started with your data pipeline, follow these steps:
{"label": "AWS S3", "value": "aws"},
{"label": "GCS/GDrive", "value": "gcp"},
{"label": "Azure", "value": "azure"},
{"label": "SFTP", "value": "sftp"},
{"label": "Local filesystem", "value": "local"},
]}>

Expand Down Expand Up @@ -122,6 +123,18 @@ For more info, see

</TabItem>

<TabItem value="sftp">

dlt supports several authentication methods:

1. Key-based authentication
2. SSH Agent-based authentication
3. Username/Password authentication
4. GSS-API authentication

Learn more about sftp authentication options in [SFTP section](../../destinations/filesystem#sftp). To obtain credentials, contact your server administrator.
</TabItem>

<TabItem value="local">
You don't need any credentials for the local filesystem.
</TabItem>
Expand All @@ -143,6 +156,7 @@ a bucket, can be specified in `config.toml`.
{"label": "AWS S3", "value": "aws"},
{"label": "GCS/GDrive", "value": "gcp"},
{"label": "Azure", "value": "azure"},
{"label": "SFTP", "value": "sftp"},
{"label": "Local filesystem", "value": "local"},
]}>

Expand Down Expand Up @@ -195,6 +209,24 @@ bucket_url="gs://<bucket_name>/<path_to_files>/"
```
</TabItem>

<TabItem value="sftp">

Learn how to set up sftp credentials for each authentication method in the [SFTP section](../../destinations/filesystem#sftp).
For example, in case of key-based authentication, you can configure the source the following way:

```toml
# secrets.toml
[sources.filesystem.credentials]
sftp_username = "foo"
sftp_key_filename = "/path/to/id_rsa" # Replace with the path to your private key file
sftp_key_passphrase = "your_passphrase" # Optional: passphrase for your private key

# config.toml
[sources.filesystem] # use [sources.readers.credentials] for the "readers" source
bucket_url = "sftp://[hostname]/[path]"
```
</TabItem>

<TabItem value="local">

You can use both native local filesystem paths and `file://` URI. Absolute, relative, and UNC Windows paths are supported.
Expand All @@ -219,7 +251,7 @@ bucket_url='~\Documents\csv_files\'
</Tabs>

You can also specify the credentials using Environment variables. The name of the corresponding environment
variable should be slightly different than the corresponding name in the `toml` file. Simply replace dots `.` with double
variable should be slightly different from the corresponding name in the `toml` file. Simply replace dots `.` with double
underscores `__`:

```sh
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ The Filesystem source allows seamless loading of files from the following locati
* Google Cloud Storage
* Google Drive
* Azure
* remote filesystem (via sftp)
* local filesystem

The Filesystem source natively supports `csv`, `parquet`, and `jsonl` files and allows customization for loading any type of structured files.
Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/tutorial/filesystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: Learn how to load data files like JSON, JSONL, CSV, and Parquet fro
keywords: [dlt, tutorial, filesystem, cloud storage, file system, python, data pipeline, incremental loading, json, jsonl, csv, parquet, duckdb]
---

This tutorial is for you if you need to load data files like JSONL, CSV, and Parquet from either Cloud Storage (ex. AWS S3, Google Cloud Storage, Google Drive, Azure Blob Storage) or a local file system.
This tutorial is for you if you need to load data files like JSONL, CSV, and Parquet from either Cloud Storage (ex. AWS S3, Google Cloud Storage, Google Drive, Azure Blob Storage), a remote (SFTP), or a local file system.

## What you will learn

Expand Down
2 changes: 1 addition & 1 deletion docs/website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ const sidebars = {
{
type: 'category',
label: 'Filesystem & cloud storage',
description: 'AWS S3, Google Cloud Storage, Azure Blob Storage, local file system',
description: 'AWS S3, Google Cloud Storage, Azure, sftp, local file system',
link: {
type: 'doc',
id: 'dlt-ecosystem/verified-sources/filesystem/index',
Expand Down

0 comments on commit 4e56372

Please sign in to comment.