From edaa4bb677ee3ab99ad66803748e4725c5cb69ed Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Fri, 24 May 2024 12:04:46 +0000 Subject: [PATCH 1/2] Updated as per comments and resolved merge conflicts --- docs/website/docs/dlt-ecosystem/destinations/filesystem.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/docs/website/docs/dlt-ecosystem/destinations/filesystem.md b/docs/website/docs/dlt-ecosystem/destinations/filesystem.md index 3e2e08013c..9ac8ad20b8 100644 --- a/docs/website/docs/dlt-ecosystem/destinations/filesystem.md +++ b/docs/website/docs/dlt-ecosystem/destinations/filesystem.md @@ -379,6 +379,11 @@ Please note: The filesystem destination configuration supports advanced layout customization and the inclusion of additional placeholders. This can be done through `config.toml` or programmatically when initializing via a factory method. +To set using environment variables: +```sh +export DESTINATION__FILESYSTEM__KWARGS = '{"auto_mkdir": true/false}' +``` + :::tip For handling deeply nested layouts, consider enabling automatic directory creation for the local filesystem destination. This can be done by setting `kwargs = '{"auto_mkdir": true}'` to facilitate the creation of directories automatically. ::: @@ -391,6 +396,8 @@ To configure the layout and placeholders using `config.toml`, use the following layout = "{table_name}/{test_placeholder}/{YYYY}-{MM}-{DD}/{ddd}/{mm}/{load_id}.{file_id}.{ext}" extra_placeholders = { "test_placeholder" = "test_value" } current_datetime="2024-04-14T00:00:00" +# for automatic directory creation in the local filesystem +kwargs = '{"auto_mkdir": true}' ``` :::note From 2672dba4933be214b4ec156c8a76714682c27a21 Mon Sep 17 00:00:00 2001 From: Violetta Mishechkina Date: Mon, 27 May 2024 13:23:36 +0200 Subject: [PATCH 2/2] Updated layout for filesystem docs --- .../dlt-ecosystem/destinations/filesystem.md | 52 ++++++++++--------- 1 file changed, 28 insertions(+), 24 deletions(-) diff --git a/docs/website/docs/dlt-ecosystem/destinations/filesystem.md b/docs/website/docs/dlt-ecosystem/destinations/filesystem.md index 9ac8ad20b8..9c7d961d3a 100644 --- a/docs/website/docs/dlt-ecosystem/destinations/filesystem.md +++ b/docs/website/docs/dlt-ecosystem/destinations/filesystem.md @@ -21,9 +21,7 @@ pip install s3fs so pip does not fail on backtracking. ::: -## Setup Guide - -### 1. Initialise the dlt project +## Initialise the dlt project Let's start by initializing a new dlt project as follows: ```sh @@ -33,9 +31,9 @@ Let's start by initializing a new dlt project as follows: This command will initialize your pipeline with chess as the source and the AWS S3 filesystem as the destination. ::: -### 2. Set up bucket storage and credentials +## Set up bucket storage and credentials -#### AWS S3 +### AWS S3 The command above creates a sample `secrets.toml` and requirements file for AWS S3 bucket. You can install those dependencies by running: ```sh pip install -r requirements.txt @@ -100,7 +98,7 @@ You need to create an S3 bucket and a user who can access that bucket. `dlt` doe 5. To grab the access and secret key for the user. Go to IAM > Users and in the “Security Credentials”, click on “Create Access Key”, and preferably select “Command Line Interface” and create the access key. 6. Grab the “Access Key” and “Secret Access Key” created that are to be used in "secrets.toml". -##### Using S3 compatible storage +#### Using S3 compatible storage To use an S3 compatible storage other than AWS S3 like [MinIO](https://min.io/) or [Cloudflare R2](https://www.cloudflare.com/en-ca/developer-platform/r2/), you may supply an `endpoint_url` in the config. This should be set along with AWS credentials: @@ -114,7 +112,7 @@ aws_secret_access_key = "please set me up!" # copy the secret access key here endpoint_url = "https://.r2.cloudflarestorage.com" # copy your endpoint URL here ``` -##### Adding Additional Configuration +#### Adding Additional Configuration To pass any additional arguments to `fsspec`, you may supply `kwargs` and `client_kwargs` in the config as a **stringified dictionary**: @@ -124,7 +122,7 @@ kwargs = '{"use_ssl": true, "auto_mkdir": true}' client_kwargs = '{"verify": "public.crt"}' ``` -#### Google Storage +### Google Storage Run `pip install "dlt[gs]"` which will install the `gcfs` package. To edit the `dlt` credentials file with your secret info, open `.dlt/secrets.toml`. @@ -147,14 +145,14 @@ if you have default google cloud credentials in your environment (i.e. on cloud Use **Cloud Storage** admin to create a new bucket. Then assign the **Storage Object Admin** role to your service account. -#### Azure Blob Storage +### Azure Blob Storage Run `pip install "dlt[az]"` which will install the `adlfs` package to interface with Azure Blob Storage. Edit the credentials in `.dlt/secrets.toml`, you'll see AWS credentials by default replace them with your Azure credentials. Two forms of Azure credentials are supported: -##### SAS token credentials +#### SAS token credentials Supply storage account name and either sas token or storage account key @@ -174,7 +172,7 @@ If you have the correct Azure credentials set up on your machine (e.g. via azure you can omit both `azure_storage_account_key` and `azure_storage_sas_token` and `dlt` will fall back to the available default. Note that `azure_storage_account_name` is still required as it can't be inferred from the environment. -##### Service principal credentials +#### Service principal credentials Supply a client ID, client secret and a tenant ID for a service principal authorized to access your container @@ -188,7 +186,7 @@ azure_client_secret = "client_secret" azure_tenant_id = "tenant_id" # please set me up! ``` -#### Local file system +### Local file system If for any reason you want to have those files in a local folder, set up the `bucket_url` as follows (you are free to use `config.toml` for that as there are no secrets required) ```toml @@ -196,10 +194,24 @@ If for any reason you want to have those files in a local folder, set up the `bu bucket_url = "file:///absolute/path" # three / for an absolute path ``` -`dlt` correctly handles the native local file paths. Indeed, using the `file://` schema may be not intuitive especially for Windows users. +:::tip +For handling deeply nested layouts, consider enabling automatic directory creation for the local filesystem destination. This can be done by setting `kwargs` in `secrets.toml`: ```toml [destination.filesystem] +kwargs = '{"auto_mkdir": true}' +``` + +Or by setting environment variable: +```sh +export DESTINATION__FILESYSTEM__KWARGS = '{"auto_mkdir": true/false}' +``` +::: + +`dlt` correctly handles the native local file paths. Indeed, using the `file://` schema may be not intuitive especially for Windows users. + +```toml +[destination.unc_destination] bucket_url = 'C:\a\b\c' ``` @@ -379,24 +391,16 @@ Please note: The filesystem destination configuration supports advanced layout customization and the inclusion of additional placeholders. This can be done through `config.toml` or programmatically when initializing via a factory method. -To set using environment variables: -```sh -export DESTINATION__FILESYSTEM__KWARGS = '{"auto_mkdir": true/false}' -``` - -:::tip -For handling deeply nested layouts, consider enabling automatic directory creation for the local filesystem destination. This can be done by setting `kwargs = '{"auto_mkdir": true}'` to facilitate the creation of directories automatically. -::: - #### Configuration via `config.toml` To configure the layout and placeholders using `config.toml`, use the following format: ```toml +[destination.filesystem] layout = "{table_name}/{test_placeholder}/{YYYY}-{MM}-{DD}/{ddd}/{mm}/{load_id}.{file_id}.{ext}" extra_placeholders = { "test_placeholder" = "test_value" } current_datetime="2024-04-14T00:00:00" -# for automatic directory creation in the local filesystem +# for automatic directory creation in the local filesystem kwargs = '{"auto_mkdir": true}' ``` @@ -485,4 +489,4 @@ managed in the regular way by the final destination you have configured. You will also notice `init` files being present in the root folder and the special `dlt` folders. In the absence of the concepts of schemas and tables in blob storages and directories, `dlt` uses these special files to harmonize the behavior of the `filesystem` destination with the other implemented destinations. - + \ No newline at end of file