Skip to content

Commit

Permalink
Update structure based on comments
Browse files Browse the repository at this point in the history
  • Loading branch information
VioletM committed Jul 29, 2024
1 parent 5761b43 commit 4c8ed58
Show file tree
Hide file tree
Showing 12 changed files with 319 additions and 175 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
In this example, you'll find a Python script that demonstrates how to load to BigQuery with the custom destination.
We'll learn how to:
- Use [built-in credentials.](../general-usage/credentials/prebuilt_dlt_credential_types#gcp-credentials)
- Use [built-in credentials.](../general-usage/credentials/prebuilt_types#gcp-credentials)
- Use the [custom destination.](../dlt-ecosystem/destinations/destination.md)
- Use pyarrow tables to create complex column types on BigQuery.
- Use BigQuery `autodetect=True` for schema inference from parquet files.
Expand Down
4 changes: 2 additions & 2 deletions docs/examples/google_sheets/google_sheets.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
In this example, you'll find a Python script that demonstrates how to load Google Sheets data using the `dlt` library.
We'll learn how to:
- use [built-in credentials](../general-usage/credentials/prebuilt_dlt_credential_types#gcp-credentials);
- use [union of credentials](../general-usage/credentials/prebuilt_dlt_credential_types#working-with-alternatives-of-credentials-union-types);
- use [built-in credentials](../general-usage/credentials/prebuilt_types#gcp-credentials);
- use [union of credentials](../general-usage/credentials/prebuilt_types#working-with-alternatives-of-credentials-union-types);
- create [dynamically generated resources](../general-usage/source#create-resources-dynamically).
:::tip
Expand Down
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
---
title: Using configs and secrets in code
description: How to use configuration and secrets values in code.
title: Secrets with custom sources
description: How to use secrets inside sources and destinations.
keywords: [credentials, secrets.toml, secrets, config, configuration, environment variables, provider]
---

`dlt` provides a lot of flexibility for managing credentials and configuration. In this section, you will learn how to correctly manage credentials in your custom sources and destinations, how the `dlt` injection mechanism works, and how to get access to configurations managed by `dlt`.

## Injection mechanism

`dlt` has a special treatment for functions decorated with `@dlt.source`, `@dlt.resource`, and `@dlt.destination`. When such a function is called, `dlt` takes the argument names in the signature and supplies (`injects`) the required values by looking for them in [various config providers](how_to_set_up_credentials.md).
`dlt` has a special treatment for functions decorated with `@dlt.source`, `@dlt.resource`, and `@dlt.destination`. When such function is called, `dlt` takes the argument names in the signature and supplies (`injects`) the required values by looking for them in [various config providers](setup).

The injection rules are:
### Injection rules

1. The arguments that are passed explicitly are **never injected**. This makes the injection mechanism optional. For example, for the pipedrive source:
```python
```py
@dlt.source(name="pipedrive")
def pipedrive_source(
pipedrive_api_key: str = dlt.secrets.value,
Expand All @@ -24,20 +24,20 @@ The injection rules are:
my_key = os.environ["MY_PIPEDRIVE_KEY"]
my_source = pipedrive_source(pipedrive_api_key=my_key)
```
`dlt` allows the user to specify the argument `pipedrive_api_key` explicitly if, for some reason, they do not want to use dlt [out-of-the-box options](how_to_set_up_credentials.md) for credentials management.
`dlt` allows the user to specify the argument `pipedrive_api_key` explicitly if, for some reason, they do not want to use [out-of-the-box options](setup) for credentials management.

1. Required arguments (without default values) **are never injected** and must be specified when calling. For example, for the source:

```python
```py
@dlt.source
def slack_data(channels_list: List[str], api_key: str = dlt.secrets.value):
...
```
The argument `channels_list` would not be injected and will output an error if it is not specified explicitly.

1. Arguments with default values are injected if present in config providers; otherwise, defaults from the function signature are used. For example, for the source:
1. Arguments with default values are injected if present in config providers. Otherwise, defaults from the function signature are used. For example, for the source:

```python
```py
@dlt.source
def slack_source(
page_size: int = MAX_PAGE_SIZE,
Expand All @@ -46,7 +46,7 @@ The injection rules are:
):
...
```
`dlt` firstly searches for all three arguments: `page_size`, `access_token`, and `start_date` in config providers in a [specific order](how_to_set_up_credentials.md). If it cannot find them, it will use the default values.
`dlt` firstly searches for all three arguments: `page_size`, `access_token`, and `start_date` in config providers in a [specific order](setup). If it cannot find them, it will use the default values.

1. Arguments with the special default value `dlt.secrets.value` and `dlt.config.value` **must be injected**
(or explicitly passed). If they are not found by the config providers, the code raises an
Expand All @@ -62,12 +62,14 @@ information on what source/resource expects.

Doing so provides several benefits:

1. You'll never receive invalid data types in your code.
1. `dlt` will automatically parse and coerce types for you. In our example, you do not need to parse a list of tabs or a credentials dictionary yourself.
1. We can generate nice sample config and secret files for your source.
1. You can request [built-in and custom credentials](prebuilt_dlt_credential_types) (i.e., connection strings, AWS / GCP / Azure credentials).
1. You'll never receive the invalid data types in your code.
1. `dlt` will automatically parse and coerce types for you, so you don't need to parse it yourself.
1. `dlt` can generate sample config and secret files for your source automatically.
1. You can request [built-in and custom credentials](prebuilt_types) (i.e., connection strings, AWS / GCP / Azure credentials).
1. You can specify a set of possible types via `Union`, i.e., OAuth or API Key authorization.

Let's consider the example:

```py
@dlt.source
def google_sheets(
Expand All @@ -79,20 +81,20 @@ def google_sheets(
...
```

Now:
Now,

1. You are sure that you get a list of strings as `tab_names`.
1. You will get actual Google credentials (see [GCP Credential Configuration](prebuilt_dlt_credential_types#gcp-credentials)), and your users can
pass them in many different forms.
1. You will get actual Google credentials (see [GCP Credential Configuration](prebuilt_types#gcp-credentials)), and users can
pass them in many different forms:

In the case of `GcpServiceAccountCredentials`:
* `service.json` as a string or dictionary (in code and via config providers).
* connection string (used in SQL Alchemy) (in code and via config providers).
* if nothing is passed, the default credentials are used (i.e., those present on Cloud Function runner)

- You may just pass the `service.json` as a string or dictionary (in code and via config providers).
- You may pass a connection string (used in SQL Alchemy) (in code and via config providers).
- If you do not pass any credentials, the default credentials are used (i.e., those present on Cloud Function runner)
## Advanced: Read configs and secrets manually

## Read configs and secrets yourself
`dlt.secrets` and `dlt.config` provide dictionary-like access to configuration values and secrets, respectively.
`dlt` handles credentials and configuration automatically, but also offers flexibility for manual processing.
`dlt.secrets` and `dlt.config` provide dictionary-like access to configuration values and secrets. This allows you to retrieve the necessary information manually if needed.

```py
# use `dlt.secrets` and `dlt.config` to explicitly take
Expand All @@ -106,21 +108,23 @@ data_source = google_sheets(
data_source.run(destination="bigquery")
```

`dlt.config` and `dlt.secrets` behave like dictionaries from which you can request a value with any key name. `dlt` will look in all [config providers](how_to_set_up_credentials.md) - env variables, TOML files, etc., to create these dictionaries. You can also use `dlt.config.get()` or `dlt.secrets.get()` to
`dlt.config` and `dlt.secrets` behave like dictionaries from which you can request a value with any key name. `dlt` will look in all [config providers](setup) - env variables, TOML files, etc. to create these dictionaries. You can also use `dlt.config.get()` or `dlt.secrets.get()` to
request a value cast to a desired type. For example:

```py
credentials = dlt.secrets.get("my_section.gcp_credentials", GcpServiceAccountCredentials)
```
Creates a `GcpServiceAccountCredentials` instance out of values (typically a dictionary) under the **my_section.gcp_credentials** key.
Creates a `GcpServiceAccountCredentials` instance out of values (typically a dictionary) under the `my_section.gcp_credentials` key.

## Advanced: Write configs and secrets in code

### Write configs and secrets in code
**dlt.config** and **dlt.secrets** can also be used as setters. For example:
`dlt.config` and `dlt.secrets` objects can also be used as setters. For example:
```py
dlt.config["sheet_id"] = "23029402349032049"
dlt.secrets["destination.postgres.credentials"] = BaseHook.get_connection('postgres_dsn').extra
```
Will mock the **toml** provider to desired values.

Will mock the `toml` provider to desired values.

## Example

Expand Down
6 changes: 3 additions & 3 deletions docs/website/docs/general-usage/credentials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ keywords: [credentials, secrets.toml, secrets, config, configuration, environmen
---
import DocCardList from '@theme/DocCardList';

`dlt` supports two main types of configurations: configs and secrets. Both configs and secrets can be set up in [various ways](how_to_set_up_credentials.md):
`dlt` pipelines usually require configurations and credentials. These can be set up in [various ways](setup):

1. Environment variables
2. Configuration files (`secrets.toml` and `config.toml`)
3. Configuration or secrets provider
3. Key managers and Vaults

`dlt` automatically extracts configuration settings and secrets based on flexible [naming conventions](how_to_set_up_credentials/#naming-convention). It then [injects](using_config_in_code/#injection-mechanism) these values where needed in code.
`dlt` automatically extracts configuration settings and secrets based on flexible [naming conventions](setup/#naming-convention). It then [injects](config_in_code/#injection-mechanism) these values where needed in code.

# Learn Details About

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,19 @@ keywords: [credentials, secrets.toml, secrets, config, configuration, environmen
variables, specs]
---

Often, credentials do not consist of just one `api_key`, but instead can be quite a complex structure. In this section, you'll learn how dlt supports different credential types and authentication options.
## Overview

Often, credentials do not consist of just one `api_key`, but instead can be quite a complex structure. In this section, you'll learn how `dlt` supports different credential types and authentication options.

:::tip
Learn about the authentication methods supported by the `dlt` RestAPI Client in detail in the [RESTClient section](../http/rest-client.md#authentication).
:::

`dlt` supports different credential types by providing various Python data classes called Configuration Specs. These classes define how complex configuration values, particularly credentials, should be handled. They specify the types, defaults, and parsing methods for these values.

## Working with credentials (and other complex configuration values)

For example, a spec like `GcpServiceAccountCredentials` manages Google Cloud Platform service account credentials, while `ConnectionStringCredentials` handles database connection strings.

### Example
## Example with ConnectionStringCredentials

As an example, let's use `ConnectionStringCredentials` which represents a database connection string.
`ConnectionStringCredentials` handles database connection strings:

```py
from dlt.sources.credentials import ConnectionStringCredentials
Expand All @@ -33,7 +31,7 @@ The source above executes the `sql` against the database defined in `dsn`. `Conn

Below are examples of how you can set credentials in `secrets.toml` and `config.toml` files.

Example 1. Use the **dictionary** form.
### Dictionary form

```toml
[dsn]
Expand All @@ -43,13 +41,15 @@ username="loader"
host="localhost"
```

Example 2. Use the **native** form.
### Native form

```toml
dsn="postgres://loader:loader@localhost:5432/dlt_data"
```

Example 3. Use the **mixed** form: the password is missing in the explicit dsn and will be taken from the `secrets.toml`.
### Mixed form

If all credentials, but the password provided explicitly in the code, `dlt` will look for the password in `secrets.toml`.

```toml
dsn.password="loader"
Expand All @@ -65,7 +65,7 @@ query("SELECT * FROM customers", {"database": "dlt_data", "username": "loader"})

## Built-in credentials

We have some ready-made credentials you can reuse:
`dlt` offers some ready-made credentials you can reuse:

```py
from dlt.sources.credentials import ConnectionStringCredentials
Expand Down Expand Up @@ -127,10 +127,14 @@ credentials.add_scopes(["scope3", "scope4"])

### GCP Credentials

- [GcpServiceAccountCredentials](#gcpserviceaccountcredentials).
- [GcpOAuthCredentials](#gcpoauthcredentials).
#### Examples
* [Google Analytics verified source](https://github.com/dlt-hub/verified-sources/blob/master/sources/google_analytics/__init__.py): the example of how to use GCP Credentials.
* [Google Analytics example](https://github.com/dlt-hub/verified-sources/blob/master/sources/google_analytics/setup_script_gcp_oauth.py): how you can get the refresh token using `dlt.secrets.value`.

#### Types

[Google Analytics verified source](https://github.com/dlt-hub/verified-sources/blob/master/sources/google_analytics/__init__.py): the example of how to use GCP Credentials.
* [GcpServiceAccountCredentials](#gcpserviceaccountcredentials).
* [GcpOAuthCredentials](#gcpoauthcredentials).

#### GcpServiceAccountCredentials

Expand Down Expand Up @@ -235,10 +239,10 @@ property_id = "213025502"

In order for the `auth()` method to succeed:

- You must provide valid `client_id` and `client_secret`, `refresh_token`, and `project_id to get a current **access token** and authenticate with OAuth. Keep in mind that the `refresh_token` must contain all the scopes that you require for your access.
- You must provide valid `client_id`, `client_secret`, `refresh_token`, and `project_id` to get a current **access token** and authenticate with OAuth. Keep in mind that the `refresh_token` must contain all the scopes that is required for your access.
- If the `refresh_token` is not provided, and you run the pipeline from a console or a notebook, `dlt` will use InstalledAppFlow to run the desktop authentication flow.

[Google Analytics example](https://github.com/dlt-hub/verified-sources/blob/master/sources/google_analytics/setup_script_gcp_oauth.py): how you can get the refresh token using `dlt.secrets.value`.


#### Defaults

Expand Down Expand Up @@ -370,33 +374,34 @@ assert list(zen_source())[0].email == "mx"

# Pass explicit native value
assert list(zen_source("secret:🔑:secret"))[0].api_secret == "secret"
```# pass explicit dict
# pass explicit dict
assert list(zen_source(credentials={"email": "emx", "password": "pass"}))[0].email == "emx"

```

> This applies not only to credentials but to all specs (see next chapter).
:::info
This applies not only to credentials but to [all specs](#writing-custom-specs).
:::

Read the [whole test](https://github.com/dlt-hub/dlt/blob/devel/tests/common/configuration/test_spec_union.py), it shows how to create unions
:::tip
Check out the [complete example](https://github.com/dlt-hub/dlt/blob/devel/tests/common/configuration/test_spec_union.py), to learn how to create unions
of credentials that derive from the common class, so you can handle it seamlessly in your code.
:::

## Writing custom specs

**specs** let you take full control over the function arguments:
**Custom specifications** let you take full control over the function arguments. You can

- Which values should be injected, the types, default values.
- You can specify optional and final fields.
- Control which values should be injected, the types, default values.
- Specify optional and final fields.
- Form hierarchical configurations (specs in specs).
- Provide own handlers for `on_partial` (called before failing on missing config key) or `on_resolved`.
- Provide own native value parsers.
- Provide own default credentials logic.
- Adds all Python dataclass goodies to it.
- Adds all Python `dict` goodies to it (`specs` instances can be created from dicts and serialized
- Utilise Python dataclass functionality.
- Utilise Python `dict` functionality (`specs` instances can be created from dicts and serialized
from dicts).

This is used a lot in the `dlt` core and may become useful for complicated sources.

In fact, for each decorated function a spec is synthesized. In the case of `google_sheets`, the following
In fact, `dlt` synthesizes a unique spec for each decorated function. For example, in the case of `google_sheets`, the following
class is created:

```py
Expand Down
Loading

0 comments on commit 4c8ed58

Please sign in to comment.