Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Add sftp option for filesystem source #1845

Merged
merged 3 commits into from
Oct 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion docs/website/docs/dlt-ecosystem/destinations/filesystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,10 @@ sftp_gss_deleg_creds # Delegate credentials with GSS-API, defaults to True
sftp_gss_host # Host for GSS-API, defaults to None
sftp_gss_trust_dns # Trust DNS for GSS-API, defaults to True
```
> For more information about credentials parameters: https://docs.paramiko.org/en/3.3/api/client.html#paramiko.client.SSHClient.connect

:::info
For more information about credentials parameters: https://docs.paramiko.org/en/3.3/api/client.html#paramiko.client.SSHClient.connect
:::

### Authentication methods

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ keywords: [readers source and filesystem, files, filesystem, readers source, clo
import Header from '../_source-info-header.md';
<Header/>

Filesystem source allows loading files from remote locations (AWS S3, Google Cloud Storage, Google Drive, Azure) or the local filesystem seamlessly. Filesystem source natively supports `csv`, `parquet`, and `jsonl` files and allows customization for loading any type of structured files.
Filesystem source allows loading files from remote locations (AWS S3, Google Cloud Storage, Google Drive, Azure Blob Storage, SFTP server) or the local filesystem seamlessly. Filesystem source natively supports `csv`, `parquet`, and `jsonl` files and allows customization for loading any type of structured files.

To load unstructured data (`.pdf`, `.txt`, e-mail), please refer to the [unstructured data source](https://github.com/dlt-hub/verified-sources/tree/master/sources/unstructured_data).

Expand Down Expand Up @@ -75,6 +75,7 @@ To get started with your data pipeline, follow these steps:
{"label": "AWS S3", "value": "aws"},
{"label": "GCS/GDrive", "value": "gcp"},
{"label": "Azure", "value": "azure"},
{"label": "SFTP", "value": "sftp"},
{"label": "Local filesystem", "value": "local"},
]}>

Expand Down Expand Up @@ -122,6 +123,18 @@ For more info, see

</TabItem>

<TabItem value="sftp">

dlt supports several authentication methods:

1. Key-based authentication
2. SSH Agent-based authentication
3. Username/Password authentication
4. GSS-API authentication

Learn more about SFTP authentication options in [SFTP section](../../destinations/filesystem#sftp). To obtain credentials, contact your server administrator.
</TabItem>

<TabItem value="local">
You don't need any credentials for the local filesystem.
</TabItem>
Expand All @@ -143,6 +156,7 @@ a bucket, can be specified in `config.toml`.
{"label": "AWS S3", "value": "aws"},
{"label": "GCS/GDrive", "value": "gcp"},
{"label": "Azure", "value": "azure"},
{"label": "SFTP", "value": "sftp"},
{"label": "Local filesystem", "value": "local"},
]}>

Expand Down Expand Up @@ -195,6 +209,24 @@ bucket_url="gs://<bucket_name>/<path_to_files>/"
```
</TabItem>

<TabItem value="sftp">

Learn how to set up SFTP credentials for each authentication method in the [SFTP section](../../destinations/filesystem#sftp).
For example, in case of key-based authentication, you can configure the source the following way:

```toml
# secrets.toml
[sources.filesystem.credentials]
sftp_username = "foo"
sftp_key_filename = "/path/to/id_rsa" # Replace with the path to your private key file
sftp_key_passphrase = "your_passphrase" # Optional: passphrase for your private key

# config.toml
[sources.filesystem] # use [sources.readers.credentials] for the "readers" source
bucket_url = "sftp://[hostname]/[path]"
```
</TabItem>

<TabItem value="local">

You can use both native local filesystem paths and `file://` URI. Absolute, relative, and UNC Windows paths are supported.
Expand All @@ -219,7 +251,7 @@ bucket_url='~\Documents\csv_files\'
</Tabs>

You can also specify the credentials using Environment variables. The name of the corresponding environment
variable should be slightly different than the corresponding name in the `toml` file. Simply replace dots `.` with double
variable should be slightly different from the corresponding name in the `toml` file. Simply replace dots `.` with double
VioletM marked this conversation as resolved.
Show resolved Hide resolved
underscores `__`:

```sh
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
---
title: Filesystem & Buckets
description: dlt-verified source for Filesystem & Buckets
title: Filesystem & cloud storage
description: dlt-verified source for Filesystem & cloud storage
keywords: [readers source and filesystem, files, filesystem, readers source, cloud storage]
---

The Filesystem source allows seamless loading of files from the following locations:
* AWS S3
* Google Cloud Storage
* Google Drive
* Azure
* Azure Blob Storage
* remote filesystem (via SFTP)
* local filesystem

The Filesystem source natively supports `csv`, `parquet`, and `jsonl` files and allows customization for loading any type of structured files.
Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/tutorial/filesystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: Learn how to load data files like JSON, JSONL, CSV, and Parquet fro
keywords: [dlt, tutorial, filesystem, cloud storage, file system, python, data pipeline, incremental loading, json, jsonl, csv, parquet, duckdb]
---

This tutorial is for you if you need to load data files like JSONL, CSV, and Parquet from either Cloud Storage (e.g., AWS S3, Google Cloud Storage, Google Drive, Azure Blob Storage) or a local file system.
This tutorial is for you if you need to load data files like JSONL, CSV, and Parquet from either Cloud Storage (e.g., AWS S3, Google Cloud Storage, Google Drive, Azure Blob Storage), a remote (SFTP), or a local file system.

## What you will learn

Expand Down
2 changes: 1 addition & 1 deletion docs/website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ const sidebars = {
{
type: 'category',
label: 'Filesystem & cloud storage',
description: 'AWS S3, Google Cloud Storage, Azure Blob Storage, local file system',
description: 'AWS S3, Google Cloud Storage, Azure, SFTP, local file system',
link: {
type: 'doc',
id: 'dlt-ecosystem/verified-sources/filesystem/index',
Expand Down
Loading