Skip to content

Commit

Permalink
small changes
Browse files Browse the repository at this point in the history
  • Loading branch information
AstrakhantsevaAA committed Oct 27, 2023
1 parent aa52d9a commit 2d3cd43
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 23 deletions.
10 changes: 4 additions & 6 deletions docs/website/docs/general-usage/credentials/config_providers.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ keywords: [credentials, secrets.toml, secrets, config, configuration, environmen
# Configuration Providers


**Configuration Providers** in the context of the `dlt` library
Configuration Providers in the context of the `dlt` library
refer to different sources from which configuration values
and secrets can be retrieved for a data pipeline.
These providers form a hierarchy, with each having its own
Expand All @@ -19,7 +19,7 @@ priority in determining the values for function arguments.
If function signature has arguments that may be injected, `dlt` looks for the argument values in
providers.

### Configuration Providers
### Providers

1. **Environment Variables**: At the top of the hierarchy are environment variables.
If a value for a specific argument is found in an environment variable,
Expand All @@ -36,8 +36,6 @@ providers.
4. **Default Argument Values**: These are the values specified in the function's signature.
They have the lowest priority in the provider hierarchy.

**The argument name is a key in the lookup**.

### Example

```python
Expand All @@ -61,6 +59,8 @@ for: `spreadsheet_id`, `tab_names` and `credentials`.

Each provider has its own key naming convention, and dlt is able to translate between them.

**The argument name is a key in the lookup**.

At the top of the hierarchy are Environment Variables, then `secrets.toml` and
`config.toml` files. Providers like Airflow/Google/AWS/Azure Vaults will be inserted **after** the Environment
provider but **before** TOML providers.
Expand All @@ -72,8 +72,6 @@ The values passed in the code **explicitly** are the **highest** in provider hie
of the arguments have the **lowest** priority in the provider hierarchy.

:::info
Summary of the hierarchy:

Explicit Args **>** ENV Variables **>** Vaults: Airflow etc. **>** `secrets.toml` **>** `config.toml` **>** Default Arg Values
:::

Expand Down
4 changes: 2 additions & 2 deletions docs/website/docs/general-usage/credentials/config_specs.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,8 @@ We have some ready-made credentials you can reuse:
from dlt.sources.credentials import ConnectionStringCredentials
from dlt.sources.credentials import OAuth2Credentials
from dlt.sources.credentials import GcpServiceAccountCredentials, GcpOAuthCredentials
from dlt.common.configuration.specs import AwsCredentials
from dlt.common.configuration.specs import AzureCredentials
from dlt.sources.credentials import AwsCredentials
from dlt.sources.credentials import AzureCredentials
```

### ConnectionStringCredentials
Expand Down
29 changes: 14 additions & 15 deletions docs/website/docs/general-usage/credentials/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ keywords: [credentials, secrets.toml, secrets, config, configuration, environmen

Secrets and configs are two types of sensitive and non-sensitive information used in a data pipeline:

1. Configs:
1. **Configs**:
- Configs refer to non-sensitive configuration data. These are settings, parameters, or options that define the behavior of a data pipeline.
- They can include things like file paths, database connection strings, API endpoints, or any other settings that affect the pipeline's behavior.
2. Secrets:
2. **Secrets**:
- Secrets are sensitive information that should be kept confidential, such as passwords, API keys, private keys, and other confidential data.
- It's crucial to never hard-code secrets directly into the code, as it can pose a security risk. Instead, they should be stored securely and accessed via a secure mechanism.


Design Principles:
**Design Principles**:

1. Adding configuration and secrets to sources and resources should be no-effort.
2. You can reconfigure the pipeline for production after it is deployed. Deployed and local code should
Expand Down Expand Up @@ -49,11 +49,10 @@ def google_sheets(
return tabs
```

`spreadsheet_id`: The unique identifier of the Google Sheets document.

`tab_names`: A list of tab names to read from the spreadsheet.

`credentials`: Google Sheets credentials as a dictionary ({"private_key": ...}).
`spreadsheet_id`: The unique identifier of the Google Sheets document.\
`tab_names`: A list of tab names to read from the spreadsheet.\
`credentials`: Google Sheets credentials as a dictionary ({"private_key": ...}).\
`only_strings`: Flag to specify if only string data should be retrieved.

`spreadsheet_id` and `tab_names` are configuration values that can be provided directly
when calling the function. `credentials` is a sensitive piece of information.
Expand Down Expand Up @@ -99,21 +98,21 @@ or pass everything via configuration.

2. Option B
```python
# `only_strings` will get the default value False
data_source = google_sheets()
```
In this case `credentials` value will be injected by the `@source` decorator (e.g. from `secrets.toml`),
In this case `credentials` value will be also injected by the `@source` decorator (e.g. from `secrets.toml`),
`spreadsheet_id` and `tab_names` will be injected by the `@source` decorator (e.g. from `config.toml`) as well.

We use `dlt.secrets.value` and `dlt.config.value` to set secrets and configurations via:
- [toml files](config_providers#toml-provider) (secrets.toml & config.toml):
- [TOML files](config_providers#toml-provider) (`secrets.toml` & `config.toml`):
```toml
# google sheet credentials
[sources.google_sheets.credentials]
client_email = <client_email from services.json>
private_key = <private_key from services.json>
project_id = <project_id from services json>
```
Read more about [toml layouts](#secret-and-config-values-layout-and-name-lookup).
Read more about [TOML layouts](#secret-and-config-values-layout-and-name-lookup).
- [Environment Variables](config_providers#environment-provider):
```python
SOURCES__GOOGLE_SHEETS__CREDENTIALS__CLIENT_EMAIL
Expand Down Expand Up @@ -144,7 +143,7 @@ Doing so provides several benefits:
```python
@dlt.source
def google_sheets(
spreadsheet_id: str,
spreadsheet_id: str = dlt.config.value,
tab_names: List[str] = dlt.config.value,
credentials: GcpServiceAccountCredentials = dlt.secrets.value,
only_strings: bool = False
Expand All @@ -155,7 +154,7 @@ def google_sheets(
Now:

1. You are sure that you get a list of strings as `tab_names`.
1. You will get actual Google credentials (see [Credentials Configuration](config_specs)), and your users can
1. You will get actual Google credentials (see [GCP Credential Configuration](config_specs#gcp-credentials)), and your users can
pass them in many different forms.

In case of `GcpServiceAccountCredentials`:
Expand Down Expand Up @@ -455,4 +454,4 @@ In the example above:
1. First it looks into `environ` then in `secrets.toml`. It displays the exact keys tried.
1. Note that `config.toml` was skipped! It may not contain any secrets.

Read more about [Providers](./config_providers).
Read more about [Provider Hierarchy](./config_providers).

0 comments on commit 2d3cd43

Please sign in to comment.