Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

credentials moved to configuration, added configuration pages #703

Merged
merged 53 commits into from
Oct 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
e33c2e5
credentials moved to configuration, added configuration pages
AstrakhantsevaAA Oct 20, 2023
f88e483
add description
AstrakhantsevaAA Oct 20, 2023
a89fdf2
del description
AstrakhantsevaAA Oct 20, 2023
fe2f9c2
fix paths
AstrakhantsevaAA Oct 20, 2023
3a3b0cc
fix broken links
AstrakhantsevaAA Oct 20, 2023
a7001c2
refactored secrets and configs
AstrakhantsevaAA Oct 20, 2023
713f55b
refactored config providers
AstrakhantsevaAA Oct 20, 2023
54f4750
refactored config specs
AstrakhantsevaAA Oct 20, 2023
382185f
return "credentials" as a section slug
AstrakhantsevaAA Oct 25, 2023
27a7142
refactor links
AstrakhantsevaAA Oct 25, 2023
41f9ea8
updates config docs, requests changes
rudolfix Oct 25, 2023
2d64dfa
add built in creds
AstrakhantsevaAA Oct 25, 2023
41cdb18
refactor specs
AstrakhantsevaAA Oct 25, 2023
cab66ef
add explanation for secrets and config
AstrakhantsevaAA Oct 25, 2023
e84792f
refactor configuration
AstrakhantsevaAA Oct 26, 2023
be9f7ab
refactor configuration
AstrakhantsevaAA Oct 26, 2023
18802c5
move add creds to how to
AstrakhantsevaAA Oct 26, 2023
20536fb
convert comments to text
AstrakhantsevaAA Oct 26, 2023
af65c64
rename
AstrakhantsevaAA Oct 26, 2023
c55b8d9
del imports
AstrakhantsevaAA Oct 26, 2023
acc695a
Update docs/website/docs/walkthroughs/add_credentials.md
AstrakhantsevaAA Oct 27, 2023
475049f
Update docs/website/docs/walkthroughs/add_credentials.md
AstrakhantsevaAA Oct 27, 2023
6b8c062
Update docs/website/docs/general-usage/credentials/config_providers.md
AstrakhantsevaAA Oct 27, 2023
22f1dfb
Update docs/website/docs/general-usage/credentials/config_providers.md
AstrakhantsevaAA Oct 27, 2023
9e06d4a
Update docs/website/docs/general-usage/credentials/config_providers.md
AstrakhantsevaAA Oct 27, 2023
dc2021c
Update docs/website/docs/general-usage/credentials/configuration.md
AstrakhantsevaAA Oct 27, 2023
4eccfd1
Update docs/website/docs/general-usage/credentials/configuration.md
AstrakhantsevaAA Oct 27, 2023
5a2371c
Update docs/website/docs/general-usage/credentials/configuration.md
AstrakhantsevaAA Oct 27, 2023
6ca4cc4
Update docs/website/docs/general-usage/credentials/configuration.md
AstrakhantsevaAA Oct 27, 2023
91a2162
Update docs/website/docs/general-usage/credentials/configuration.md
AstrakhantsevaAA Oct 27, 2023
eb60449
Update docs/website/docs/general-usage/credentials/config_providers.md
AstrakhantsevaAA Oct 27, 2023
fe489bc
Update docs/website/docs/general-usage/credentials/config_providers.md
AstrakhantsevaAA Oct 27, 2023
56d3021
add more details about secrets and config
AstrakhantsevaAA Oct 27, 2023
8acf9bc
intro for providers
AstrakhantsevaAA Oct 27, 2023
aa52d9a
intro for specs
AstrakhantsevaAA Oct 27, 2023
2d3cd43
small changes
AstrakhantsevaAA Oct 27, 2023
1ff254b
spec examples with sources
AstrakhantsevaAA Oct 27, 2023
74a9ade
small changes
AstrakhantsevaAA Oct 27, 2023
16feb84
delete link to name convention
AstrakhantsevaAA Oct 30, 2023
64d2229
refactor
AstrakhantsevaAA Oct 30, 2023
286662a
refactor
AstrakhantsevaAA Oct 30, 2023
7f5195b
refactor
AstrakhantsevaAA Oct 30, 2023
2bec2b7
refactor
AstrakhantsevaAA Oct 30, 2023
5c7a233
fix typo
AstrakhantsevaAA Oct 30, 2023
733d926
add info about home dir
AstrakhantsevaAA Oct 30, 2023
c00273f
Update docs/website/docs/general-usage/credentials/configuration.md
AstrakhantsevaAA Oct 31, 2023
6d212bf
Update docs/website/docs/general-usage/credentials/config_providers.md
AstrakhantsevaAA Oct 31, 2023
c6da5fc
Update docs/website/docs/general-usage/credentials/config_specs.md
AstrakhantsevaAA Oct 31, 2023
3b51ae9
wip
AstrakhantsevaAA Oct 31, 2023
6a2ff9f
more info about Configuration classes
AstrakhantsevaAA Oct 31, 2023
340fb9b
more about tomls
AstrakhantsevaAA Oct 31, 2023
b0006c6
fix link
AstrakhantsevaAA Oct 31, 2023
5648034
fix layout
AstrakhantsevaAA Oct 31, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/technical/secrets_and_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,14 +77,14 @@ You should type your function signatures! The effort is very low and it gives `d

```python
@dlt.source
def google_sheets(spreadsheet_id: str, tab_names: List[str] = dlt.config.value, credentials: GcpClientCredentialsWithDefault = dlt.secrets.value, only_strings: bool = False):
def google_sheets(spreadsheet_id: str, tab_names: List[str] = dlt.config.value, credentials: GcpServiceAccountCredentials = dlt.secrets.value, only_strings: bool = False):
...
```
Now:
1. you are sure that you get a list of strings as `tab_names`
2. you will get actual google credentials (see `CredentialsConfiguration` later) and your users can pass them in many different forms.

In case of `GcpClientCredentialsWithDefault`
In case of `GcpServiceAccountCredentials`
* you may just pass the `service_json` as string or dictionary (in code and via config providers)
* you may pass a connection string (used in sql alchemy) (in code and via config providers)
* or default credentials will be used
Expand Down Expand Up @@ -331,7 +331,7 @@ It tells you exactly which paths `dlt` looked at, via which config providers and

## Working with credentials (and other complex configuration values)

`GcpClientCredentialsWithDefault` is an example of a **spec**: a Python `dataclass` that describes the configuration fields, their types and default values. It also allows to parse various native representations of the configuration. Credentials marked with `WithDefaults` mixin are also to instantiate itself from the machine/user default environment ie. googles `default()` or AWS `.aws/credentials`.
`GcpServiceAccountCredentials` is an example of a **spec**: a Python `dataclass` that describes the configuration fields, their types and default values. It also allows to parse various native representations of the configuration. Credentials marked with `WithDefaults` mixin are also to instantiate itself from the machine/user default environment ie. googles `default()` or AWS `.aws/credentials`.

As an example, let's use `ConnectionStringCredentials` which represents a database connection string.

Expand Down Expand Up @@ -421,7 +421,7 @@ In fact for each decorated function a spec is synthesized. In case of `google_sh
@configspec
class GoogleSheetsConfiguration:
tab_names: List[str] = None # manadatory
credentials: GcpClientCredentialsWithDefault = None # mandatory secret
credentials: GcpServiceAccountCredentials = None # mandatory secret
only_strings: Optional[bool] = False
```

Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/destinations/duckdb.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ p = dlt.pipeline(pipeline_name='chess', destination='duckdb', dataset_name='ches

This destination accepts database connection strings in format used by [duckdb-engine](https://github.com/Mause/duckdb_engine#configuration).

You can configure a DuckDB destination with [secret / config values](../../general-usage/credentials.md) (e.g. using a `secrets.toml` file)
You can configure a DuckDB destination with [secret / config values](../../general-usage/credentials) (e.g. using a `secrets.toml` file)
```toml
destination.duckdb.credentials=duckdb:///_storage/test_quack.duckdb
```
Expand Down
4 changes: 0 additions & 4 deletions docs/website/docs/general-usage/configuration.md

This file was deleted.

146 changes: 146 additions & 0 deletions docs/website/docs/general-usage/credentials/config_providers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
---
title: Configuration Providers
description: Configuration dlt Providers
keywords: [credentials, secrets.toml, secrets, config, configuration, environment
variables, provider]
---

# Configuration Providers


Configuration Providers in the context of the `dlt` library
refer to different sources from which configuration values
and secrets can be retrieved for a data pipeline.
These providers form a hierarchy, with each having its own
priority in determining the values for function arguments.

## The provider hierarchy

If function signature has arguments that may be injected, `dlt` looks for the argument values in
AstrakhantsevaAA marked this conversation as resolved.
Show resolved Hide resolved
providers.

### Providers

1. **Environment Variables**: At the top of the hierarchy are environment variables.
If a value for a specific argument is found in an environment variable,
dlt will use it and will not proceed to search in lower-priority providers.

2. **Vaults (Airflow/Google/AWS/Azure)**: These are specialized providers that come
after environment variables. They can provide configuration values and secrets.
However, they typically focus on handling sensitive information.

3. **`secrets.toml` and `config.toml` Files**: These files are used for storing both
configuration values and secrets. `secrets.toml` is dedicated to sensitive information,
while `config.toml` contains non-sensitive configuration data.

4. **Default Argument Values**: These are the values specified in the function's signature.
They have the lowest priority in the provider hierarchy.

### Example

```python
@dlt.source
def google_sheets(
spreadsheet_id=dlt.config.value,
tab_names=dlt.config.value,
credentials=dlt.secrets.value,
only_strings=False
):
sheets = build('sheets', 'v4', credentials=Services.from_json(credentials))
tabs = []
for tab_name in tab_names:
data = sheets.get(spreadsheet_id, tab_name).execute().values()
tabs.append(dlt.resource(data, name=tab_name))
return tabs
```

In case of `google_sheets()` it will look
for: `spreadsheet_id`, `tab_names` and `credentials`.

Each provider has its own key naming convention, and dlt is able to translate between them.

**The argument name is a key in the lookup**.

At the top of the hierarchy are Environment Variables, then `secrets.toml` and
`config.toml` files. Providers like Airflow/Google/AWS/Azure Vaults will be inserted **after** the Environment
provider but **before** TOML providers.

For example, if `spreadsheet_id` is found in environment variable `SPREADSHEET_ID`, `dlt` will not look in TOML files
and below.

The values passed in the code **explicitly** are the **highest** in provider hierarchy. The **default values**
of the arguments have the **lowest** priority in the provider hierarchy.

:::info
Explicit Args **>** ENV Variables **>** Vaults: Airflow etc. **>** `secrets.toml` **>** `config.toml` **>** Default Arg Values
:::

Secrets are handled only by the providers supporting them. Some providers support only
secrets (to reduce the number of requests done by `dlt` when searching sections).

1. `secrets.toml` and environment may hold both config and secret values.
1. `config.toml` may hold only config values, no secrets.
1. Various vaults providers hold only secrets, `dlt` skips them when looking for values that are not
secrets.

:::info
Context-aware providers will activate in the right environments i.e. on Airflow or AWS/GCP VMachines.
:::

## Provider key formats

### TOML vs. Environment Variables

Providers may use different formats for the keys. `dlt` will translate the standard format where
sections and key names are separated by "." into the provider-specific formats.

1. For TOML, names are case-sensitive and sections are separated with ".".
1. For Environment Variables, all names are capitalized and sections are separated with double
underscore "__".

Example: When `dlt` evaluates the request `dlt.secrets["my_section.gcp_credentials"]` it must find
the `private_key` for Google credentials. It will look

1. first in env variable `MY_SECTION__GCP_CREDENTIALS__PRIVATE_KEY` and if not found,
1. in `secrets.toml` with key `my_section.gcp_credentials.private_key`.

### Environment provider

Looks for the values in the environment variables.

### TOML provider

The TOML provider in dlt utilizes two TOML files:

- `secrets.toml `- This file is intended for storing sensitive information, often referred to as "secrets".
- `config.toml `- This file is used for storing configuration values.

By default, the `.gitignore` file in the project prevents `secrets.toml` from being added to
version control and pushed. However, `config.toml` can be freely added to version control.

:::info
**TOML provider always loads those files from `.dlt` folder** which is looked **relative to the
current Working Directory**.
:::

Example: If your working directory is `my_dlt_project` and your project has the following structure:

```
my_dlt_project:
|
pipelines/
|---- .dlt/secrets.toml
|---- google_sheets.py
```

and you run `python pipelines/google_sheets.py` then `dlt` will look for `secrets.toml` in
`my_dlt_project/.dlt/secrets.toml` and ignore the existing
`my_dlt_project/pipelines/.dlt/secrets.toml`.

If you change your working directory to `pipelines` and run `python google_sheets.py` it will look for
`my_dlt_project/pipelines/.dlt/secrets.toml` as (probably) expected.

:::caution
It's worth mentioning that the TOML provider also has the capability to read files from `~/.dlt/`
(located in the user's home directory) in addition to the local project-specific `.dlt` folder.
:::
Loading
Loading