-
Notifications
You must be signed in to change notification settings - Fork 197
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
a7001c2
commit 713f55b
Showing
1 changed file
with
79 additions
and
32 deletions.
There are no files selected for viewing
111 changes: 79 additions & 32 deletions
111
docs/website/docs/general-usage/configuration/config_providers.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,61 +1,108 @@ | ||
--- | ||
title: Secrets and Config Providers | ||
description: Secrets and Config Providers | ||
keywords: [credentials, secrets.toml, environment variables] | ||
title: Config Providers | ||
description: Configuration dlt Providers | ||
keywords: [credentials, secrets.toml, secrets, config, configuration, environment | ||
variables, provider] | ||
--- | ||
|
||
## Providers | ||
If function signature has arguments that may be injected, `dlt` looks for the argument values in providers. **The argument name is a key in the lookup**. In case of `google_sheets()` it will look for: `tab_names`, `credentials` and `strings_only`. | ||
# Config Providers | ||
## The provider hierarchy | ||
|
||
Each provider has its own key naming convention and dlt is able to translate between them. | ||
If function signature has arguments that may be injected, `dlt` looks for the argument values in | ||
providers. **The argument name is a key in the lookup**. | ||
|
||
Providers form a hierarchy. At the top are environment variables, then `secrets.toml` and `config.toml` files. Providers like google, aws, azure vaults can be inserted after the environment provider. | ||
Example: | ||
|
||
```python | ||
import dlt | ||
|
||
|
||
@dlt.source | ||
def google_sheets( | ||
spreadsheet_id, | ||
tab_names=dlt.config.value, | ||
credentials=dlt.secrets.value, | ||
only_strings=False | ||
): | ||
sheets = build('sheets', 'v4', credentials=Services.from_json(credentials)) | ||
tabs = [] | ||
for tab_name in tab_names: | ||
data = sheets.get(spreadsheet_id, tab_name).execute().values() | ||
tabs.append(dlt.resource(data, name=tab_name)) | ||
return tabs | ||
``` | ||
|
||
For example if `spreadsheet_id` is in environment, dlt does not look into other providers. | ||
In case of `google_sheets()` it will look | ||
for: `tab_names`, `credentials` and `only_strings`. | ||
|
||
The values passed in the code explitly are the **highest** in provider hierarchy. | ||
The default values of the arguments have the **lowest** priority in the provider hierarchy. | ||
Each provider has its own key naming convention, and dlt is able to translate between them. | ||
|
||
> **Summary of the hierarchy** | ||
> explicit args > env variables > ...vaults, airflow etc > secrets.toml > config.toml > default arg values | ||
Providers form a hierarchy. At the top are environment variables, then `secrets.toml` and | ||
`config.toml` files. Providers like Google/AWS/Azure Vaults can be inserted after the environment | ||
provider. | ||
|
||
Secrets are handled only by the providers supporting them. Some of the providers support only secrets (to reduce the number of requests done by `dlt` when searching sections) | ||
1. `secrets.toml` and environment may hold both config and secret values | ||
2. `config.toml` may hold only config values, no secrets | ||
3. various vaults providers hold only secrets, `dlt` skips them when looking for values that are not secrets. | ||
For example, if `spreadsheet_id` is in environment, dlt does not look into other providers. | ||
|
||
⛔ Context aware providers will activate in right environments ie. on Airflow or AWS/GCP VMachines | ||
The values passed in the code **explicitly** are the **highest** in provider hierarchy. The **default values** | ||
of the arguments have the **lowest** priority in the provider hierarchy. | ||
|
||
### Provider key formats. toml vs. environment variable | ||
> **Summary of the hierarchy:** | ||
> | ||
> explicit args > env variables > ...vaults, airflow etc. > secrets.toml > config.toml > default arg values | ||
Providers may use diffent formats for the keys. `dlt` will translate the standard format where sections and key names are separated by "." into the provider specific formats. | ||
Secrets are handled only by the providers supporting them. Some providers support only | ||
secrets (to reduce the number of requests done by `dlt` when searching sections). | ||
|
||
1. for `toml` names are case sensitive and sections are separated with "." | ||
2. for environment variables all names are capitalized and sections are separated with double underscore "__" | ||
1. `secrets.toml` and environment may hold both config and secret values. | ||
1. `config.toml` may hold only config values, no secrets. | ||
1. Various vaults providers hold only secrets, `dlt` skips them when looking for values that are not | ||
secrets. | ||
|
||
Example: | ||
When `dlt` evaluates the request `dlt.secrets["my_section.gcp_credentials"]` it must find the `private_key` for google credentials. It will look | ||
1. first in env variable `MY_SECTION__GCP_CREDENTIALS__PRIVATE_KEY` and if not found | ||
2. in `secrets.toml` with key `my_section.gcp_credentials.private_key` | ||
⛔ Context-aware providers will activate in right environments i.e. on Airflow or AWS/GCP VMachines. | ||
|
||
## Provider key formats | ||
|
||
### `toml` vs. Environment Variables | ||
|
||
Providers may use different formats for the keys. `dlt` will translate the standard format where | ||
sections and key names are separated by "." into the provider-specific formats. | ||
|
||
1. For `toml`, names are case-sensitive and sections are separated with ".". | ||
1. For Environment Variables, all names are capitalized and sections are separated with double | ||
underscore "\_\_". | ||
|
||
Example: When `dlt` evaluates the request `dlt.secrets["my_section.gcp_credentials"]` it must find | ||
the `private_key` for Google credentials. It will look | ||
|
||
1. first in env variable `MY_SECTION__GCP_CREDENTIALS__PRIVATE_KEY` and if not found, | ||
1. in `secrets.toml` with key `my_section.gcp_credentials.private_key`. | ||
|
||
### Environment provider | ||
Looks for the values in the environment variables | ||
|
||
Looks for the values in the environment variables. | ||
|
||
### Toml provider | ||
Tomls provider uses two `toml` files: `secrets.toml` to store secrets and `config.toml` to store configuration values. The default `.gitignore` file prevents secrets from being added to source control and pushed. The `config.toml` may be freely added. | ||
|
||
**Toml provider always loads those files from `.dlt` folder** which is looked **relative to the current working directory**. Example: | ||
if your working dir is `my_dlt_project` and you have: | ||
Tomls provider uses two `toml` files: `secrets.toml` to store secrets and `config.toml` to store | ||
configuration values. The default `.gitignore` file prevents secrets from being added to source | ||
control and pushed. The `config.toml` may be freely added. | ||
|
||
> **Toml provider always loads those files from `.dlt` folder** which is looked **relative to the | ||
> current Working Directory**. | ||
Example: If your working directory is `my_dlt_project` and your project has the following structure: | ||
|
||
``` | ||
my_dlt_project: | ||
| | ||
pipelines/ | ||
|---- .dlt/secrets.toml | ||
|---- google_sheets.py | ||
``` | ||
in it and you run `python pipelines/google_sheets.py` then `dlt` will look for `secrets.toml` in `my_dlt_project/.dlt/secrets.toml` and ignore the existing `my_dlt_project/pipelines/.dlt/secrets.toml` | ||
|
||
if you change your working dir to `pipelines` and run `python google_sheets.py` it will look for `my_dlt_project/pipelines/.dlt/secrets.toml` a (probably) expected. | ||
and you run `python pipelines/google_sheets.py` then `dlt` will look for `secrets.toml` in | ||
`my_dlt_project/.dlt/secrets.toml` and ignore the existing | ||
`my_dlt_project/pipelines/.dlt/secrets.toml`. | ||
|
||
*that was common problem on our workshop - but believe me all other layouts are even worse I've tried* | ||
If you change your working directory to `pipelines` and run `python google_sheets.py` it will look for | ||
`my_dlt_project/pipelines/.dlt/secrets.toml` as (probably) expected. |