Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

credentials moved to configuration, added configuration pages #703

Merged
merged 53 commits into from
Oct 31, 2023
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
e33c2e5
credentials moved to configuration, added configuration pages
AstrakhantsevaAA Oct 20, 2023
f88e483
add description
AstrakhantsevaAA Oct 20, 2023
a89fdf2
del description
AstrakhantsevaAA Oct 20, 2023
fe2f9c2
fix paths
AstrakhantsevaAA Oct 20, 2023
3a3b0cc
fix broken links
AstrakhantsevaAA Oct 20, 2023
a7001c2
refactored secrets and configs
AstrakhantsevaAA Oct 20, 2023
713f55b
refactored config providers
AstrakhantsevaAA Oct 20, 2023
54f4750
refactored config specs
AstrakhantsevaAA Oct 20, 2023
382185f
return "credentials" as a section slug
AstrakhantsevaAA Oct 25, 2023
27a7142
refactor links
AstrakhantsevaAA Oct 25, 2023
41f9ea8
updates config docs, requests changes
rudolfix Oct 25, 2023
2d64dfa
add built in creds
AstrakhantsevaAA Oct 25, 2023
41cdb18
refactor specs
AstrakhantsevaAA Oct 25, 2023
cab66ef
add explanation for secrets and config
AstrakhantsevaAA Oct 25, 2023
e84792f
refactor configuration
AstrakhantsevaAA Oct 26, 2023
be9f7ab
refactor configuration
AstrakhantsevaAA Oct 26, 2023
18802c5
move add creds to how to
AstrakhantsevaAA Oct 26, 2023
20536fb
convert comments to text
AstrakhantsevaAA Oct 26, 2023
af65c64
rename
AstrakhantsevaAA Oct 26, 2023
c55b8d9
del imports
AstrakhantsevaAA Oct 26, 2023
acc695a
Update docs/website/docs/walkthroughs/add_credentials.md
AstrakhantsevaAA Oct 27, 2023
475049f
Update docs/website/docs/walkthroughs/add_credentials.md
AstrakhantsevaAA Oct 27, 2023
6b8c062
Update docs/website/docs/general-usage/credentials/config_providers.md
AstrakhantsevaAA Oct 27, 2023
22f1dfb
Update docs/website/docs/general-usage/credentials/config_providers.md
AstrakhantsevaAA Oct 27, 2023
9e06d4a
Update docs/website/docs/general-usage/credentials/config_providers.md
AstrakhantsevaAA Oct 27, 2023
dc2021c
Update docs/website/docs/general-usage/credentials/configuration.md
AstrakhantsevaAA Oct 27, 2023
4eccfd1
Update docs/website/docs/general-usage/credentials/configuration.md
AstrakhantsevaAA Oct 27, 2023
5a2371c
Update docs/website/docs/general-usage/credentials/configuration.md
AstrakhantsevaAA Oct 27, 2023
6ca4cc4
Update docs/website/docs/general-usage/credentials/configuration.md
AstrakhantsevaAA Oct 27, 2023
91a2162
Update docs/website/docs/general-usage/credentials/configuration.md
AstrakhantsevaAA Oct 27, 2023
eb60449
Update docs/website/docs/general-usage/credentials/config_providers.md
AstrakhantsevaAA Oct 27, 2023
fe489bc
Update docs/website/docs/general-usage/credentials/config_providers.md
AstrakhantsevaAA Oct 27, 2023
56d3021
add more details about secrets and config
AstrakhantsevaAA Oct 27, 2023
8acf9bc
intro for providers
AstrakhantsevaAA Oct 27, 2023
aa52d9a
intro for specs
AstrakhantsevaAA Oct 27, 2023
2d3cd43
small changes
AstrakhantsevaAA Oct 27, 2023
1ff254b
spec examples with sources
AstrakhantsevaAA Oct 27, 2023
74a9ade
small changes
AstrakhantsevaAA Oct 27, 2023
16feb84
delete link to name convention
AstrakhantsevaAA Oct 30, 2023
64d2229
refactor
AstrakhantsevaAA Oct 30, 2023
286662a
refactor
AstrakhantsevaAA Oct 30, 2023
7f5195b
refactor
AstrakhantsevaAA Oct 30, 2023
2bec2b7
refactor
AstrakhantsevaAA Oct 30, 2023
5c7a233
fix typo
AstrakhantsevaAA Oct 30, 2023
733d926
add info about home dir
AstrakhantsevaAA Oct 30, 2023
c00273f
Update docs/website/docs/general-usage/credentials/configuration.md
AstrakhantsevaAA Oct 31, 2023
6d212bf
Update docs/website/docs/general-usage/credentials/config_providers.md
AstrakhantsevaAA Oct 31, 2023
c6da5fc
Update docs/website/docs/general-usage/credentials/config_specs.md
AstrakhantsevaAA Oct 31, 2023
3b51ae9
wip
AstrakhantsevaAA Oct 31, 2023
6a2ff9f
more info about Configuration classes
AstrakhantsevaAA Oct 31, 2023
340fb9b
more about tomls
AstrakhantsevaAA Oct 31, 2023
b0006c6
fix link
AstrakhantsevaAA Oct 31, 2023
5648034
fix layout
AstrakhantsevaAA Oct 31, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/technical/secrets_and_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,14 +77,14 @@ You should type your function signatures! The effort is very low and it gives `d

```python
@dlt.source
def google_sheets(spreadsheet_id: str, tab_names: List[str] = dlt.config.value, credentials: GcpClientCredentialsWithDefault = dlt.secrets.value, only_strings: bool = False):
def google_sheets(spreadsheet_id: str, tab_names: List[str] = dlt.config.value, credentials: GcpServiceAccountCredentials = dlt.secrets.value, only_strings: bool = False):
...
```
Now:
1. you are sure that you get a list of strings as `tab_names`
2. you will get actual google credentials (see `CredentialsConfiguration` later) and your users can pass them in many different forms.

In case of `GcpClientCredentialsWithDefault`
In case of `GcpServiceAccountCredentials`
* you may just pass the `service_json` as string or dictionary (in code and via config providers)
* you may pass a connection string (used in sql alchemy) (in code and via config providers)
* or default credentials will be used
Expand Down Expand Up @@ -331,7 +331,7 @@ It tells you exactly which paths `dlt` looked at, via which config providers and

## Working with credentials (and other complex configuration values)

`GcpClientCredentialsWithDefault` is an example of a **spec**: a Python `dataclass` that describes the configuration fields, their types and default values. It also allows to parse various native representations of the configuration. Credentials marked with `WithDefaults` mixin are also to instantiate itself from the machine/user default environment ie. googles `default()` or AWS `.aws/credentials`.
`GcpServiceAccountCredentials` is an example of a **spec**: a Python `dataclass` that describes the configuration fields, their types and default values. It also allows to parse various native representations of the configuration. Credentials marked with `WithDefaults` mixin are also to instantiate itself from the machine/user default environment ie. googles `default()` or AWS `.aws/credentials`.

As an example, let's use `ConnectionStringCredentials` which represents a database connection string.

Expand Down Expand Up @@ -421,7 +421,7 @@ In fact for each decorated function a spec is synthesized. In case of `google_sh
@configspec
class GoogleSheetsConfiguration:
tab_names: List[str] = None # manadatory
credentials: GcpClientCredentialsWithDefault = None # mandatory secret
credentials: GcpServiceAccountCredentials = None # mandatory secret
only_strings: Optional[bool] = False
```

Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/destinations/duckdb.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ p = dlt.pipeline(pipeline_name='chess', destination='duckdb', dataset_name='ches

This destination accepts database connection strings in format used by [duckdb-engine](https://github.com/Mause/duckdb_engine#configuration).

You can configure a DuckDB destination with [secret / config values](../../general-usage/credentials.md) (e.g. using a `secrets.toml` file)
You can configure a DuckDB destination with [secret / config values](../../general-usage/credentials) (e.g. using a `secrets.toml` file)
```toml
destination.duckdb.credentials=duckdb:///_storage/test_quack.duckdb
```
Expand Down
4 changes: 0 additions & 4 deletions docs/website/docs/general-usage/configuration.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,10 +1,19 @@
---
title: Credentials
title: Adding credentials
description: How to use dlt credentials
keywords: [credentials, secrets.toml, environment variables]
---

# Credentials

@Alena - or you convert this page to how-to or you delete it or you add a reference to other pages. all the thing below are duplicates of things
AstrakhantsevaAA marked this conversation as resolved.
Show resolved Hide resolved
that already exist, besides this one which we can use as an example of passing confgs ecxplicitly

```python
api_key = BaseHook.get_connection('pipedrive_api_key').extra # get it from airflow or other credential store
load_info = pipeline.run(pipedrive_source(pipedrive_api_key=api_key))
```

# Adding credentials

## Adding credentials locally

Expand All @@ -27,10 +36,10 @@ client_email = "client_email" # please set me up!
```
> Note that for toml names are case-sensitive and sections are separated with ".".

For destination credentials, read the [documentation pages for each destination](../dlt-ecosystem/destinations) to create and configure
For destination credentials, read the [documentation pages for each destination](../../dlt-ecosystem/destinations) to create and configure
credentials.

For Verified Source credentials, read the [Setup Guides](../dlt-ecosystem/verified-sources) for each source to find how to get credentials.
For Verified Source credentials, read the [Setup Guides](../../dlt-ecosystem/verified-sources) for each source to find how to get credentials.

Once you have credentials for the source and destination, add them to the file above and save them.

Expand Down Expand Up @@ -98,7 +107,7 @@ If dlt tries to read this from environment variables, it will use a different na

For environment variables all names are capitalized and sections are separated with double underscore "\_\_".

For example for the above secrets, we would need to put into environment:
For example, for the above secrets, we would need to put into environment:

```shell
SOURCES__PIPEDRIVE__PIPEDRIVE_API_KEY
Expand Down
109 changes: 109 additions & 0 deletions docs/website/docs/general-usage/credentials/config_providers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
---
title: Configuration Providers
description: Configuration dlt Providers
keywords: [credentials, secrets.toml, secrets, config, configuration, environment
variables, provider]
---

# Configuration Providers
## The provider hierarchy

If function signature has arguments that may be injected, `dlt` looks for the argument values in
AstrakhantsevaAA marked this conversation as resolved.
Show resolved Hide resolved
providers. **The argument name is a key in the lookup**.

Example:

```python
import dlt


@dlt.source
def google_sheets(
spreadsheet_id,
tab_names=dlt.config.value,
credentials=dlt.secrets.value,
only_strings=False
):
sheets = build('sheets', 'v4', credentials=Services.from_json(credentials))
tabs = []
for tab_name in tab_names:
data = sheets.get(spreadsheet_id, tab_name).execute().values()
tabs.append(dlt.resource(data, name=tab_name))
return tabs
```

In case of `google_sheets()` it will look
for: `tab_names`, `credentials` and `only_strings`.

Each provider has its own key naming convention, and dlt is able to translate between them.

Providers form a hierarchy. At the top are environment variables, then `secrets.toml` and
`config.toml` files. Providers like Airflow/Google/AWS/Azure Vaults will be inserted **after** the environment
provider but **before** `toml` providers.

For example, if `spreadsheet_id` is found in environment variable `SPREADSHEET_ID`, `dlt` will not look in `toml` files
AstrakhantsevaAA marked this conversation as resolved.
Show resolved Hide resolved
and below.

The values passed in the code **explicitly** are the **highest** in provider hierarchy. The **default values**
of the arguments have the **lowest** priority in the provider hierarchy.

> **Summary of the hierarchy:**
AstrakhantsevaAA marked this conversation as resolved.
Show resolved Hide resolved
>
> explicit args > env variables > ...vaults, airflow etc. > secrets.toml > config.toml > default arg values

Secrets are handled only by the providers supporting them. Some providers support only
secrets (to reduce the number of requests done by `dlt` when searching sections).

1. `secrets.toml` and environment may hold both config and secret values.
1. `config.toml` may hold only config values, no secrets.
1. Various vaults providers hold only secrets, `dlt` skips them when looking for values that are not
secrets.

⛔ Context-aware providers will activate in right environments i.e. on Airflow or AWS/GCP VMachines.

## Provider key formats

### `toml` vs. Environment Variables
AstrakhantsevaAA marked this conversation as resolved.
Show resolved Hide resolved

Providers may use different formats for the keys. `dlt` will translate the standard format where
sections and key names are separated by "." into the provider-specific formats.

1. For `toml`, names are case-sensitive and sections are separated with ".".
AstrakhantsevaAA marked this conversation as resolved.
Show resolved Hide resolved
1. For Environment Variables, all names are capitalized and sections are separated with double
underscore "\_\_".

Example: When `dlt` evaluates the request `dlt.secrets["my_section.gcp_credentials"]` it must find
the `private_key` for Google credentials. It will look

1. first in env variable `MY_SECTION__GCP_CREDENTIALS__PRIVATE_KEY` and if not found,
1. in `secrets.toml` with key `my_section.gcp_credentials.private_key`.

### Environment provider

Looks for the values in the environment variables.

### Toml provider
AstrakhantsevaAA marked this conversation as resolved.
Show resolved Hide resolved

Tomls provider uses two `toml` files: `secrets.toml` to store secrets and `config.toml` to store
AstrakhantsevaAA marked this conversation as resolved.
Show resolved Hide resolved
configuration values. The default `.gitignore` file prevents secrets from being added to source
control and pushed. The `config.toml` may be freely added.

> **Toml provider always loads those files from `.dlt` folder** which is looked **relative to the
AstrakhantsevaAA marked this conversation as resolved.
Show resolved Hide resolved
> current Working Directory**.

Example: If your working directory is `my_dlt_project` and your project has the following structure:

```
my_dlt_project:
|
pipelines/
|---- .dlt/secrets.toml
|---- google_sheets.py
```

and you run `python pipelines/google_sheets.py` then `dlt` will look for `secrets.toml` in
`my_dlt_project/.dlt/secrets.toml` and ignore the existing
`my_dlt_project/pipelines/.dlt/secrets.toml`.

If you change your working directory to `pipelines` and run `python google_sheets.py` it will look for
`my_dlt_project/pipelines/.dlt/secrets.toml` as (probably) expected.
Loading
Loading