Skip to content

Commit

Permalink
credentials moved to configuration, added configuration pages
Browse files Browse the repository at this point in the history
  • Loading branch information
AstrakhantsevaAA committed Oct 20, 2023
1 parent 1fb60d7 commit e33c2e5
Show file tree
Hide file tree
Showing 36 changed files with 499 additions and 42 deletions.
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/destinations/duckdb.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ p = dlt.pipeline(pipeline_name='chess', destination='duckdb', dataset_name='ches

This destination accepts database connection strings in format used by [duckdb-engine](https://github.com/Mause/duckdb_engine#configuration).

You can configure a DuckDB destination with [secret / config values](../../general-usage/credentials.md) (e.g. using a `secrets.toml` file)
You can configure a DuckDB destination with [secret / config values](../../general-usage/configuration/credentials.md) (e.g. using a `secrets.toml` file)
```toml
destination.duckdb.credentials=duckdb:///_storage/test_quack.duckdb
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ For more information, read the
> Optionally, you can also input "base_id" and "table_names" in the script, as in the pipeline
> example.
For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/verified-sources/asana.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ For more information, read the
[destination documentation](../../dlt-ecosystem/destinations) to add credentials for your chosen
destination. This will ensure that your data is properly routed to its final destination.

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/verified-sources/chess.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ To add credentials to your destination, follow the instructions in the
[destination](../../dlt-ecosystem/destinations) documentation. This will ensure that your data is
properly routed to its final destination.

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ For more information, read the

1. Replace the value of the "account id" with the one [copied above](#grab-account-id).

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/verified-sources/github.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ For more information, read the
add credentials for your chosen destination, ensuring proper routing of your data to the final
destination.

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ For more information, read the
1. To use queries from `.dlt/config.toml`, run the `simple_load_config()` function in
[pipeline example](https://github.com/dlt-hub/verified-sources/blob/master/sources/google_analytics_pipeline.py).

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ For more information, read the
> Note: You have an option to pass "range_names" and "spreadsheet_identifier" directly to the
> google_spreadsheet function or in ".dlt/config.toml"
For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ For more information, read the

1. Enter credentials for your chosen destination as per the [docs](../destinations/).

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/verified-sources/jira.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ For more information, read the
add credentials for your chosen destination, ensuring proper routing of your data to the final
destination.

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/verified-sources/matomo.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ For more information, read the

1. To monitor live events on a website, enter the `live_event_site_id` (usually it is same as `site_id`).

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ For more information, read the
1. Replace the value of the "database" and "collections_names" with the ones
[copied above](#grab-database-and-collections).
For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)
## Run the pipeline
Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/verified-sources/mux.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ For more information, read the

1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/).

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/verified-sources/notion.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ For more information, read the
your chosen destination. This will ensure that your data is properly routed to its final
destination.

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ For more information, read the

1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/).

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ For more information, read the
add credentials for your chosen destination, ensuring proper routing of your data to the final
destination.

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ For more information, read the
add credentials for your chosen destination, ensuring proper routing of your data to the final
destination.

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/verified-sources/slack.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ For more information, read the

1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/).

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ For more information, read the
1. Finally, follow the instructions in [Destinations](../destinations/) to add credentials for your
chosen destination. This will ensure that your data is properly routed to its final destination.

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/verified-sources/strapi.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ For more information, read the

1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/).

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/dlt-ecosystem/verified-sources/stripe.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ For more information, read the

1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/).

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ For more information, read the

1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/).

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ For more information, read the
1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/).

For more information, read the [General Usage: Credentials.](../../general-usage/credentials)
For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md)

## Run the pipeline

Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/examples/incremental_loading/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ In this example, you'll find a Python script that interacts with the Zendesk Sup

We'll learn:

- How to pass [credentials](../../general-usage/credentials) as dict and how to type the `@dlt.source` function arguments.
- How to pass [credentials](../../general-usage/configuration/credentials.md) as dict and how to type the `@dlt.source` function arguments.
- How to set [the nesting level](../../general-usage/source#reduce-the-nesting-level-of-generated-tables).
- How to enable [incremental loading](../../general-usage/incremental-loading) for efficient data extraction.
- How to specify [the start and end dates](../../general-usage/incremental-loading#using-dltsourcesincremental-for-backfill) for the data loading and how to [opt-in to Airflow scheduler](../../general-usage/incremental-loading#using-airflow-schedule-for-backfill-and-incremental-loading) by setting `allow_external_schedulers` to `True`.
Expand Down
4 changes: 0 additions & 4 deletions docs/website/docs/general-usage/configuration.md

This file was deleted.

61 changes: 61 additions & 0 deletions docs/website/docs/general-usage/configuration/config_providers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: Secrets and Config Providers
description:
keywords: [credentials, secrets.toml, environment variables]
---

## Providers
If function signature has arguments that may be injected, `dlt` looks for the argument values in providers. **The argument name is a key in the lookup**. In case of `google_sheets()` it will look for: `tab_names`, `credentials` and `strings_only`.

Each provider has its own key naming convention and dlt is able to translate between them.

Providers form a hierarchy. At the top are environment variables, then `secrets.toml` and `config.toml` files. Providers like google, aws, azure vaults can be inserted after the environment provider.

For example if `spreadsheet_id` is in environment, dlt does not look into other providers.

The values passed in the code explitly are the **highest** in provider hierarchy.
The default values of the arguments have the **lowest** priority in the provider hierarchy.

> **Summary of the hierarchy**
> explicit args > env variables > ...vaults, airflow etc > secrets.toml > config.toml > default arg values
Secrets are handled only by the providers supporting them. Some of the providers support only secrets (to reduce the number of requests done by `dlt` when searching sections)
1. `secrets.toml` and environment may hold both config and secret values
2. `config.toml` may hold only config values, no secrets
3. various vaults providers hold only secrets, `dlt` skips them when looking for values that are not secrets.

⛔ Context aware providers will activate in right environments ie. on Airflow or AWS/GCP VMachines

### Provider key formats. toml vs. environment variable

Providers may use diffent formats for the keys. `dlt` will translate the standard format where sections and key names are separated by "." into the provider specific formats.

1. for `toml` names are case sensitive and sections are separated with "."
2. for environment variables all names are capitalized and sections are separated with double underscore "__"

Example:
When `dlt` evaluates the request `dlt.secrets["my_section.gcp_credentials"]` it must find the `private_key` for google credentials. It will look
1. first in env variable `MY_SECTION__GCP_CREDENTIALS__PRIVATE_KEY` and if not found
2. in `secrets.toml` with key `my_section.gcp_credentials.private_key`


### Environment provider
Looks for the values in the environment variables

### Toml provider
Tomls provider uses two `toml` files: `secrets.toml` to store secrets and `config.toml` to store configuration values. The default `.gitignore` file prevents secrets from being added to source control and pushed. The `config.toml` may be freely added.

**Toml provider always loads those files from `.dlt` folder** which is looked **relative to the current working directory**. Example:
if your working dir is `my_dlt_project` and you have:
```
my_dlt_project:
|
pipelines/
|---- .dlt/secrets.toml
|---- google_sheets.py
```
in it and you run `python pipelines/google_sheets.py` then `dlt` will look for `secrets.toml` in `my_dlt_project/.dlt/secrets.toml` and ignore the existing `my_dlt_project/pipelines/.dlt/secrets.toml`

if you change your working dir to `pipelines` and run `python google_sheets.py` it will look for `my_dlt_project/pipelines/.dlt/secrets.toml` a (probably) expected.

*that was common problem on our workshop - but believe me all other layouts are even worse I've tried*
107 changes: 107 additions & 0 deletions docs/website/docs/general-usage/configuration/config_specs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
---
title: Configuration specs
description:
keywords: [credentials, secrets.toml, environment variables]
---

## Working with credentials (and other complex configuration values)

`GcpClientCredentialsWithDefault` is an example of a **spec**: a Python `dataclass` that describes the configuration fields, their types and default values. It also allows to parse various native representations of the configuration. Credentials marked with `WithDefaults` mixin are also to instantiate itself from the machine/user default environment ie. googles `default()` or AWS `.aws/credentials`.

As an example, let's use `ConnectionStringCredentials` which represents a database connection string.

```python
@dlt.source
def query(sql: str, dsn: ConnectionStringCredentials = dlt.secrets.value):
...
```

The source above executes the `sql` against database defined in `dsn`. `ConnectionStringCredentials` makes sure you get the correct values with correct types and understands the relevant native form of the credentials.


Example 1: use the dictionary form
```toml
[dsn]
database="dlt_data"
password="loader"
username="loader"
host="localhost"
```

Example:2: use the native form
```toml
dsn="postgres://loader:loader@localhost:5432/dlt_data"
```

Example 3: use mixed form: the password is missing in explicit dsn and will be taken from the `secrets.toml`
```toml
dsn.password="loader
```
```python
query("SELECT * FROM customers", "postgres://loader@localhost:5432/dlt_data")
# or
query("SELECT * FROM customers", {"database": "dlt_data", "username": "loader"...})
```

☮️ We will implement more credentials and let people reuse them when writing pipelines:
- to represent oauth credentials
- api key + api secret
- AWS credentials


### Working with alternatives of credentials (Union types)
If your source/resource allows for many authentication methods you can support those seamlessly for your user. The user just passes the right credentials and `dlt` will inject the right type into your decorated function.

Example:

> read the whole [test](/tests/common/configuration/test_spec_union.py), it shows how to create unions of credentials that derive from the common class so you can handle it seamlessly in your code.
```python
@dlt.source
def zen_source(credentials: Union[ZenApiKeyCredentials, ZenEmailCredentials, str] = dlt.secrets.value, some_option: bool = False):
# depending on what the user provides in config, ZenApiKeyCredentials or ZenEmailCredentials will be injected in `credentials` argument
# both classes implement `auth` so you can always call it
credentials.auth()
return dlt.resource([credentials], name="credentials")

# pass native value
os.environ["CREDENTIALS"] = "email:mx:pwd"
assert list(zen_source())[0].email == "mx"

# pass explicit native value
assert list(zen_source("secret:🔑:secret"))[0].api_secret == "secret"

# pass explicit dict
assert list(zen_source(credentials={"email": "emx", "password": "pass"}))[0].email == "emx"

```
> This applies not only to credentials but to all specs (see next chapter)
## Writing own specs

**specs** let you take full control over the function arguments:
- which values should be injected, the types, default values.
- you can specify optional and final fields
- form hierarchical configurations (specs in specs).
- provide own handlers for `on_error` or `on_resolved`
- provide own native value parsers
- provide own default credentials logic
- adds all Python dataclass goodies to it
- adds all Python `dict` goodies to it (`specs` instances can be created from dicts and serialized from dicts)

This is used a lot in the `dlt` core and may become useful for complicated sources.

In fact for each decorated function a spec is synthesized. In case of `google_sheets` following class is created.
```python
@configspec
class GoogleSheetsConfiguration:
tab_names: List[str] = None # manadatory
credentials: GcpClientCredentialsWithDefault = None # mandatory secret
only_strings: Optional[bool] = False
```

> all specs derive from [BaseConfiguration](/dlt/common/configuration/specs//base_configuration.py)
> all credentials derive from [CredentialsConfiguration](/dlt/common/configuration/specs//base_configuration.py)
> Read the docstrings in the code above
Loading

0 comments on commit e33c2e5

Please sign in to comment.