diff --git a/docs/website/docs/dlt-ecosystem/destinations/duckdb.md b/docs/website/docs/dlt-ecosystem/destinations/duckdb.md index c5e9dd1f14..f167db6bc6 100644 --- a/docs/website/docs/dlt-ecosystem/destinations/duckdb.md +++ b/docs/website/docs/dlt-ecosystem/destinations/duckdb.md @@ -84,7 +84,7 @@ p = dlt.pipeline(pipeline_name='chess', destination='duckdb', dataset_name='ches This destination accepts database connection strings in format used by [duckdb-engine](https://github.com/Mause/duckdb_engine#configuration). -You can configure a DuckDB destination with [secret / config values](../../general-usage/credentials.md) (e.g. using a `secrets.toml` file) +You can configure a DuckDB destination with [secret / config values](../../general-usage/configuration/credentials.md) (e.g. using a `secrets.toml` file) ```toml destination.duckdb.credentials=duckdb:///_storage/test_quack.duckdb ``` diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/airtable.md b/docs/website/docs/dlt-ecosystem/verified-sources/airtable.md index 6718ff15c2..6597022ba9 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/airtable.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/airtable.md @@ -104,7 +104,7 @@ For more information, read the > Optionally, you can also input "base_id" and "table_names" in the script, as in the pipeline > example. -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/asana.md b/docs/website/docs/dlt-ecosystem/verified-sources/asana.md index 6a66f9c739..e2e58361c3 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/asana.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/asana.md @@ -89,7 +89,7 @@ For more information, read the [destination documentation](../../dlt-ecosystem/destinations) to add credentials for your chosen destination. This will ensure that your data is properly routed to its final destination. -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/chess.md b/docs/website/docs/dlt-ecosystem/verified-sources/chess.md index e528f57d87..269e7fcb3a 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/chess.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/chess.md @@ -60,7 +60,7 @@ To add credentials to your destination, follow the instructions in the [destination](../../dlt-ecosystem/destinations) documentation. This will ensure that your data is properly routed to its final destination. -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/facebook_ads.md b/docs/website/docs/dlt-ecosystem/verified-sources/facebook_ads.md index db3c6e0b81..a71c17466f 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/facebook_ads.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/facebook_ads.md @@ -154,7 +154,7 @@ For more information, read the 1. Replace the value of the "account id" with the one [copied above](#grab-account-id). -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/github.md b/docs/website/docs/dlt-ecosystem/verified-sources/github.md index 539d6131ae..d5d01ffba9 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/github.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/github.md @@ -105,7 +105,7 @@ For more information, read the add credentials for your chosen destination, ensuring proper routing of your data to the final destination. -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/google_analytics.md b/docs/website/docs/dlt-ecosystem/verified-sources/google_analytics.md index cf2a6c7a4a..a46d08fbd9 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/google_analytics.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/google_analytics.md @@ -209,7 +209,7 @@ For more information, read the 1. To use queries from `.dlt/config.toml`, run the `simple_load_config()` function in [pipeline example](https://github.com/dlt-hub/verified-sources/blob/master/sources/google_analytics_pipeline.py). -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md index bddfcd3e9e..35cbf8a331 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md @@ -292,7 +292,7 @@ For more information, read the > Note: You have an option to pass "range_names" and "spreadsheet_identifier" directly to the > google_spreadsheet function or in ".dlt/config.toml" -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/hubspot.md b/docs/website/docs/dlt-ecosystem/verified-sources/hubspot.md index c305b5b842..ddf3e92067 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/hubspot.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/hubspot.md @@ -110,7 +110,7 @@ For more information, read the 1. Enter credentials for your chosen destination as per the [docs](../destinations/). -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/jira.md b/docs/website/docs/dlt-ecosystem/verified-sources/jira.md index cbc24d2056..a20c150a27 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/jira.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/jira.md @@ -97,7 +97,7 @@ For more information, read the add credentials for your chosen destination, ensuring proper routing of your data to the final destination. -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/matomo.md b/docs/website/docs/dlt-ecosystem/verified-sources/matomo.md index 374ec92ab9..a8debbbf80 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/matomo.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/matomo.md @@ -98,7 +98,7 @@ For more information, read the 1. To monitor live events on a website, enter the `live_event_site_id` (usually it is same as `site_id`). -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/mongodb.md b/docs/website/docs/dlt-ecosystem/verified-sources/mongodb.md index 35f042f969..83bef0f878 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/mongodb.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/mongodb.md @@ -169,7 +169,7 @@ For more information, read the 1. Replace the value of the "database" and "collections_names" with the ones [copied above](#grab-database-and-collections). -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/mux.md b/docs/website/docs/dlt-ecosystem/verified-sources/mux.md index 5fb794628e..d900213309 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/mux.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/mux.md @@ -83,7 +83,7 @@ For more information, read the 1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/). -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/notion.md b/docs/website/docs/dlt-ecosystem/verified-sources/notion.md index 358ba8c547..1ee1f73373 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/notion.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/notion.md @@ -88,7 +88,7 @@ For more information, read the your chosen destination. This will ensure that your data is properly routed to its final destination. -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/pipedrive.md b/docs/website/docs/dlt-ecosystem/verified-sources/pipedrive.md index 689f8f7808..2eaefa23ca 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/pipedrive.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/pipedrive.md @@ -88,7 +88,7 @@ For more information, read the 1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/). -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/salesforce.md b/docs/website/docs/dlt-ecosystem/verified-sources/salesforce.md index c819c8120b..b9b618108e 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/salesforce.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/salesforce.md @@ -106,7 +106,7 @@ For more information, read the add credentials for your chosen destination, ensuring proper routing of your data to the final destination. -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/shopify.md b/docs/website/docs/dlt-ecosystem/verified-sources/shopify.md index ee91cdc0ea..efeb47e71b 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/shopify.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/shopify.md @@ -98,7 +98,7 @@ For more information, read the add credentials for your chosen destination, ensuring proper routing of your data to the final destination. -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/slack.md b/docs/website/docs/dlt-ecosystem/verified-sources/slack.md index fd25d7818b..e43f0ffb61 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/slack.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/slack.md @@ -101,7 +101,7 @@ For more information, read the 1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/). -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/sql_database.md b/docs/website/docs/dlt-ecosystem/verified-sources/sql_database.md index f219d7c0be..e7fb2337f3 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/sql_database.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/sql_database.md @@ -146,7 +146,7 @@ For more information, read the 1. Finally, follow the instructions in [Destinations](../destinations/) to add credentials for your chosen destination. This will ensure that your data is properly routed to its final destination. -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/strapi.md b/docs/website/docs/dlt-ecosystem/verified-sources/strapi.md index 7a9161d380..3947d2784e 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/strapi.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/strapi.md @@ -90,7 +90,7 @@ For more information, read the 1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/). -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/stripe.md b/docs/website/docs/dlt-ecosystem/verified-sources/stripe.md index a25768981f..b52f06a367 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/stripe.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/stripe.md @@ -90,7 +90,7 @@ For more information, read the 1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/). -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/workable.md b/docs/website/docs/dlt-ecosystem/verified-sources/workable.md index 14530e081e..66ee74ba8b 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/workable.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/workable.md @@ -111,7 +111,7 @@ For more information, read the 1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/). -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/zendesk.md b/docs/website/docs/dlt-ecosystem/verified-sources/zendesk.md index 17fb366371..13dce0b898 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/zendesk.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/zendesk.md @@ -209,7 +209,7 @@ For more information, read the 1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/). -For more information, read the [General Usage: Credentials.](../../general-usage/credentials) +For more information, read the [General Usage: Credentials.](../../general-usage/configuration/credentials.md) ## Run the pipeline diff --git a/docs/website/docs/examples/incremental_loading/index.md b/docs/website/docs/examples/incremental_loading/index.md index 629138dc11..81efd47bb2 100644 --- a/docs/website/docs/examples/incremental_loading/index.md +++ b/docs/website/docs/examples/incremental_loading/index.md @@ -20,7 +20,7 @@ In this example, you'll find a Python script that interacts with the Zendesk Sup We'll learn: -- How to pass [credentials](../../general-usage/credentials) as dict and how to type the `@dlt.source` function arguments. +- How to pass [credentials](../../general-usage/configuration/credentials.md) as dict and how to type the `@dlt.source` function arguments. - How to set [the nesting level](../../general-usage/source#reduce-the-nesting-level-of-generated-tables). - How to enable [incremental loading](../../general-usage/incremental-loading) for efficient data extraction. - How to specify [the start and end dates](../../general-usage/incremental-loading#using-dltsourcesincremental-for-backfill) for the data loading and how to [opt-in to Airflow scheduler](../../general-usage/incremental-loading#using-airflow-schedule-for-backfill-and-incremental-loading) by setting `allow_external_schedulers` to `True`. diff --git a/docs/website/docs/general-usage/configuration.md b/docs/website/docs/general-usage/configuration.md deleted file mode 100644 index d72c7976f2..0000000000 --- a/docs/website/docs/general-usage/configuration.md +++ /dev/null @@ -1,4 +0,0 @@ -# Configuration - -This page is a work in progress. If you have a question about configuration, please send us an email -at community@dlthub.com. We'd be happy to help you! diff --git a/docs/website/docs/general-usage/configuration/config_providers.md b/docs/website/docs/general-usage/configuration/config_providers.md new file mode 100644 index 0000000000..e9a85e2212 --- /dev/null +++ b/docs/website/docs/general-usage/configuration/config_providers.md @@ -0,0 +1,61 @@ +--- +title: Secrets and Config Providers +description: +keywords: [credentials, secrets.toml, environment variables] +--- + +## Providers +If function signature has arguments that may be injected, `dlt` looks for the argument values in providers. **The argument name is a key in the lookup**. In case of `google_sheets()` it will look for: `tab_names`, `credentials` and `strings_only`. + +Each provider has its own key naming convention and dlt is able to translate between them. + +Providers form a hierarchy. At the top are environment variables, then `secrets.toml` and `config.toml` files. Providers like google, aws, azure vaults can be inserted after the environment provider. + +For example if `spreadsheet_id` is in environment, dlt does not look into other providers. + +The values passed in the code explitly are the **highest** in provider hierarchy. +The default values of the arguments have the **lowest** priority in the provider hierarchy. + +> **Summary of the hierarchy** +> explicit args > env variables > ...vaults, airflow etc > secrets.toml > config.toml > default arg values + +Secrets are handled only by the providers supporting them. Some of the providers support only secrets (to reduce the number of requests done by `dlt` when searching sections) +1. `secrets.toml` and environment may hold both config and secret values +2. `config.toml` may hold only config values, no secrets +3. various vaults providers hold only secrets, `dlt` skips them when looking for values that are not secrets. + +⛔ Context aware providers will activate in right environments ie. on Airflow or AWS/GCP VMachines + +### Provider key formats. toml vs. environment variable + +Providers may use diffent formats for the keys. `dlt` will translate the standard format where sections and key names are separated by "." into the provider specific formats. + +1. for `toml` names are case sensitive and sections are separated with "." +2. for environment variables all names are capitalized and sections are separated with double underscore "__" + +Example: +When `dlt` evaluates the request `dlt.secrets["my_section.gcp_credentials"]` it must find the `private_key` for google credentials. It will look +1. first in env variable `MY_SECTION__GCP_CREDENTIALS__PRIVATE_KEY` and if not found +2. in `secrets.toml` with key `my_section.gcp_credentials.private_key` + + +### Environment provider +Looks for the values in the environment variables + +### Toml provider +Tomls provider uses two `toml` files: `secrets.toml` to store secrets and `config.toml` to store configuration values. The default `.gitignore` file prevents secrets from being added to source control and pushed. The `config.toml` may be freely added. + +**Toml provider always loads those files from `.dlt` folder** which is looked **relative to the current working directory**. Example: +if your working dir is `my_dlt_project` and you have: +``` +my_dlt_project: + | + pipelines/ + |---- .dlt/secrets.toml + |---- google_sheets.py +``` +in it and you run `python pipelines/google_sheets.py` then `dlt` will look for `secrets.toml` in `my_dlt_project/.dlt/secrets.toml` and ignore the existing `my_dlt_project/pipelines/.dlt/secrets.toml` + +if you change your working dir to `pipelines` and run `python google_sheets.py` it will look for `my_dlt_project/pipelines/.dlt/secrets.toml` a (probably) expected. + +*that was common problem on our workshop - but believe me all other layouts are even worse I've tried* diff --git a/docs/website/docs/general-usage/configuration/config_specs.md b/docs/website/docs/general-usage/configuration/config_specs.md new file mode 100644 index 0000000000..a8748ef051 --- /dev/null +++ b/docs/website/docs/general-usage/configuration/config_specs.md @@ -0,0 +1,107 @@ +--- +title: Configuration specs +description: +keywords: [credentials, secrets.toml, environment variables] +--- + +## Working with credentials (and other complex configuration values) + +`GcpClientCredentialsWithDefault` is an example of a **spec**: a Python `dataclass` that describes the configuration fields, their types and default values. It also allows to parse various native representations of the configuration. Credentials marked with `WithDefaults` mixin are also to instantiate itself from the machine/user default environment ie. googles `default()` or AWS `.aws/credentials`. + +As an example, let's use `ConnectionStringCredentials` which represents a database connection string. + +```python +@dlt.source +def query(sql: str, dsn: ConnectionStringCredentials = dlt.secrets.value): + ... +``` + +The source above executes the `sql` against database defined in `dsn`. `ConnectionStringCredentials` makes sure you get the correct values with correct types and understands the relevant native form of the credentials. + + +Example 1: use the dictionary form +```toml +[dsn] +database="dlt_data" +password="loader" +username="loader" +host="localhost" +``` + +Example:2: use the native form +```toml +dsn="postgres://loader:loader@localhost:5432/dlt_data" +``` + +Example 3: use mixed form: the password is missing in explicit dsn and will be taken from the `secrets.toml` +```toml +dsn.password="loader +``` +```python +query("SELECT * FROM customers", "postgres://loader@localhost:5432/dlt_data") +# or +query("SELECT * FROM customers", {"database": "dlt_data", "username": "loader"...}) +``` + +☮️ We will implement more credentials and let people reuse them when writing pipelines: +- to represent oauth credentials +- api key + api secret +- AWS credentials + + +### Working with alternatives of credentials (Union types) +If your source/resource allows for many authentication methods you can support those seamlessly for your user. The user just passes the right credentials and `dlt` will inject the right type into your decorated function. + +Example: + +> read the whole [test](/tests/common/configuration/test_spec_union.py), it shows how to create unions of credentials that derive from the common class so you can handle it seamlessly in your code. + +```python +@dlt.source +def zen_source(credentials: Union[ZenApiKeyCredentials, ZenEmailCredentials, str] = dlt.secrets.value, some_option: bool = False): + # depending on what the user provides in config, ZenApiKeyCredentials or ZenEmailCredentials will be injected in `credentials` argument + # both classes implement `auth` so you can always call it + credentials.auth() + return dlt.resource([credentials], name="credentials") + +# pass native value +os.environ["CREDENTIALS"] = "email:mx:pwd" +assert list(zen_source())[0].email == "mx" + +# pass explicit native value +assert list(zen_source("secret:🔑:secret"))[0].api_secret == "secret" + +# pass explicit dict +assert list(zen_source(credentials={"email": "emx", "password": "pass"}))[0].email == "emx" + +``` +> This applies not only to credentials but to all specs (see next chapter) + +## Writing own specs + +**specs** let you take full control over the function arguments: +- which values should be injected, the types, default values. +- you can specify optional and final fields +- form hierarchical configurations (specs in specs). +- provide own handlers for `on_error` or `on_resolved` +- provide own native value parsers +- provide own default credentials logic +- adds all Python dataclass goodies to it +- adds all Python `dict` goodies to it (`specs` instances can be created from dicts and serialized from dicts) + +This is used a lot in the `dlt` core and may become useful for complicated sources. + +In fact for each decorated function a spec is synthesized. In case of `google_sheets` following class is created. +```python +@configspec +class GoogleSheetsConfiguration: + tab_names: List[str] = None # manadatory + credentials: GcpClientCredentialsWithDefault = None # mandatory secret + only_strings: Optional[bool] = False +``` + +> all specs derive from [BaseConfiguration](/dlt/common/configuration/specs//base_configuration.py) + +> all credentials derive from [CredentialsConfiguration](/dlt/common/configuration/specs//base_configuration.py) + +> Read the docstrings in the code above \ No newline at end of file diff --git a/docs/website/docs/general-usage/configuration/configuration.md b/docs/website/docs/general-usage/configuration/configuration.md new file mode 100644 index 0000000000..92cceabc64 --- /dev/null +++ b/docs/website/docs/general-usage/configuration/configuration.md @@ -0,0 +1,278 @@ +--- +title: Secrets and Config +description: +keywords: [credentials, secrets.toml, environment variables] +--- + + +# Secrets and Config + +## Overview + +### General Usage and an Example + +The way config values and secrets are handled should promote correct behavior + +1. secret values should never be present in the pipeline code +2. pipeline may be reconfigured for production after it is deployed. deployed and local code should be identical +3. still it must be easy and intuitive + +For the source extractor function below (reads selected tab from google sheets) we can pass config values in following ways: + +```python + +import dlt + + +@dlt.source +def google_sheets(spreadsheet_id, tab_names=dlt.config.value, credentials=dlt.secrets.value, only_strings=False): + sheets = build('sheets', 'v4', credentials=Services.from_json(credentials)) + tabs = [] + for tab_name in tab_names: + data = sheets.get(spreadsheet_id, tab_name).execute().values() + tabs.append(dlt.resource(data, name=tab_name)) + return tabs + +# WRONG: provide all values directly - wrong but possible. secret values should never be present in the code! +google_sheets("23029402349032049", ["tab1", "tab2"], credentials={"private_key": ""}).run(destination="bigquery") + +# OPTION A: provide config values directly and secrets via automatic injection mechanism (see later) +# `credentials` value will be injected by the `source` decorator +# `spreadsheet_id` and `tab_names` take values from the arguments below +# `only_strings` will be injected by the source decorator or will get the default value False +google_sheets("23029402349032049", ["tab1", "tab2"]).run(destination="bigquery") + + +# OPTION B: use `dlt.secrets` and `dlt.config` to explicitly take those values from providers from the explicit keys +google_sheets(dlt.config["sheet_id"], dlt.config["my_section.tabs"], dlt.secrets["my_section.gcp_credentials"]).run(destination="bigquery") +``` + +> one of the principles is that configuration, credentials and secret values are may be passed explicitly as arguments to the functions. this makes the injection behavior optional. + +### Injection mechanism +Config and secret values are injected to the function arguments if the function is decorated with `@dlt.source` or `@dlt resource` (also `@with_config` which you can applu to any function - used havily in the dlt core) + +The signature of the function `google_sheets` is **explicitly accepting all the necessary configuration and secrets in its arguments**. During runtime, `dlt` tries to supply (`inject`) the required values via various config providers. The injection rules are: +1. if you call the decorated function, the arguments that are passed explicitly are **never injected** +this makes injection mechanism optional + +2. required arguments (ie. `spreadsheet_id`, `tab_names`) are not injected +3. arguments with default values are injected if present in config providers +4. arguments with the special default value `dlt.secrets.value` and `dlt.config.value` **must be injected** (or expicitly passed). If they are not found by the config providers the code raises exception. The code in the functions always receives those arguments. + +additionally `dlt.secrets.value` tells `dlt` that supplied value is a secret and it will be injected only from secure config providers + +### Passing config values and credentials explicitly + +```python +# OPTION B: use `dlt.secrets` and `dlt.config` to explicitly take those values from providers from the explicit keys +google_sheets(dlt.config["sheet_id"], dlt.config["tabs"], dlt.secrets["my_section.gcp_credentials"]).run(destination="bigquery") +``` + +[See example](/docs/examples/credentials/explicit.py) + +### Typing the source and resource signatures + +You should type your function signatures! The effort is very low and it gives `dlt` much more information on what source/resource expects. +1. You'll never receive invalid type signatures +2. We can generate nice sample config and secret files for your source +3. You can request dictionaries or special values (ie. connection strings, service json) to be passed +4. ☮️ you can specify a set of possible types via `Union` ie. OAUTH or Api Key authorization + +```python +@dlt.source +def google_sheets(spreadsheet_id: str, tab_names: List[str] = dlt.config.value, credentials: GcpClientCredentialsWithDefault = dlt.secrets.value, only_strings: bool = False): + ... +``` +Now: +1. you are sure that you get a list of strings as `tab_names` +2. you will get actual google credentials (see `CredentialsConfiguration` later) and your users can pass them in many different forms. + +In case of `GcpClientCredentialsWithDefault` +* you may just pass the `service_json` as string or dictionary (in code and via config providers) +* you may pass a connection string (used in sql alchemy) (in code and via config providers) +* or default credentials will be used + + +## Secret and config values layout. + +`dlt` uses an layout of hierarchical sections to organize the config and secret values. This makes configurations and secrets easy to manage and disambiguates values with the same keys by placing them in the different sections + +> if you know how `toml` files are organized -> this is the same concept! + +> a lot of config values are dictionaries themselves (ie. most of the credentials) and you want the values corresponding to one component to be close together. + +> you can have a separate credentials for your destinations and each of source your pipeline uses, if you have many pipelines in single project, you can have a separate sections corresponding to them. + +Here is the simplest default layout for our `google_sheets` example. + +### OPTION A (default layout) + +**secrets.toml** +```toml +[credentials] +client_email = +private_key = +project_id = +``` +**config.toml** +```toml +tab_names=["tab1", "tab2"] +``` + +As you can see the details of gcp credentials are placed under `credentials` which is argument name to source function + +### OPTION B (explicit layout) + +Here user has full control over the layout + +**secrets.toml** +```toml +[my_section] + + [my_section.gcp_credentials] + client_email = + private_key = +``` +**config.toml** +```toml +[my_section] +tabs=["tab1", "tab2"] + + [my_section.gcp_credentials] + project_id = # I prefer to keep my project id in config file and private key in secrets +``` + +### Default layout and default key lookup during injection + +`dlt` arranges the sections into **default layout** that is used by injection mechanism. This layout makes it easy to configure simple cases but also provides a room for more explicit sections and complex cases ie. having several soures with different credentials or even hosting several pipelines in the same project sharing the same config and credentials. + +``` +pipeline_name + | + |-sources + |- + |- + |- {all source and resource options and secrets} + |- + |- {all source and resource options and secrets} + |- + |... + + |-extract + |- extract options for resources ie. parallelism settings, maybe retries + |-destination + |- + |- {destination options} + |-credentials + |-{credentials options} + |-schema + |- + |-schema settings: not implemented but I'll let people set nesting level, name convention, normalizer etc. here + |-load + |-normalize +``` + +Lookup rules: + +**Rule 1** All the sections above are optional. You are free to arrange your credentials and config without any additional sections +Example: OPTION A (default layout) + +**Rule 2** The lookup starts with the most specific possible path and if value is not found there, it removes the right-most section and tries again. +Example: In case of option A we have just one credentials. But what if `bigquery` credentials are different from `google sheets`? Then we need to allow some sections to separate them. + +```toml +# google sheet credentials +[credentials] +client_email = +private_key = +project_id = + +# bigquery credentials +[destination.credentials] +client_email = +private_key = +project_id = +``` +Now when `dlt` looks for destination credentials, it will encounter the `destination` section and stop there. +When looking for `sources` credentials it will get directly into `credentials` key (corresponding to function argument) + +> we could also rename the argument in the source function! but then we are **forcing** the user to have two copies of credentials. + +Example: let's be even more explicit and use full section path possible +```toml +# google sheet credentials +[sources.google_sheets.credentials] +client_email = +private_key = +project_id = + +# bigquery credentials +[destination.bigquery.credentials] +client_email = +private_key = +project_id = +``` +Where we add destination and source name to be very explicit. + +**Rule 3** You can use your pipeline name to have separate configurations for each pipeline in your project + +Pipeline created/obtained with `dlt.pipeline()` creates a global and optional namespace with the value of `pipeline_name`. All config values will be looked with pipeline name first and then again without it. + +Example: the pipeline is named `ML_sheets` +```toml +[ML_sheets.credentials] +client_email = +private_key = +project_id = +``` + +or maximum path: +```toml +[ML_sheets.sources.google_sheets.credentials] +client_email = +private_key = +project_id = +``` + +### The `sources` section +Config and secrets for decorated sources and resources are kept in `sources..` section. **All sections are optionsl**. For example if source module is named +`pipedrive` and the function decorated with `@dlt.source` is `deals(api_key: str=...)` then `dlt` will look for api key in: +1. `sources.pipedrive.deals.api_key` +2. `sources.pipedrive.api_key` +3. `sources.api_key` +4. `api_key` + +Step 2 in search path allows all the sources/resources in a module to share the same set of credentials. + +Also look at the following [test](/tests/extract/test_decorators.py) : `test_source_sections` + + +## Understanding the exceptions +Now we can finally understand the `ConfigFieldMissingException`. Let's run `chess.py` example without providing the password: + +``` +$ CREDENTIALS="postgres://loader@localhost:5432/dlt_data" python chess.py +... +dlt.common.configuration.exceptions.ConfigFieldMissingException: Following fields are missing: ['password'] in configuration with spec PostgresCredentials + for field "password" config providers and keys were tried in following order: + In Environment Variables key CHESS_GAMES__DESTINATION__POSTGRES__CREDENTIALS__PASSWORD was not found. + In Environment Variables key CHESS_GAMES__DESTINATION__CREDENTIALS__PASSWORD was not found. + In Environment Variables key CHESS_GAMES__CREDENTIALS__PASSWORD was not found. + In secrets.toml key chess_games.destination.postgres.credentials.password was not found. + In secrets.toml key chess_games.destination.credentials.password was not found. + In secrets.toml key chess_games.credentials.password was not found. + In Environment Variables key DESTINATION__POSTGRES__CREDENTIALS__PASSWORD was not found. + In Environment Variables key DESTINATION__CREDENTIALS__PASSWORD was not found. + In Environment Variables key CREDENTIALS__PASSWORD was not found. + In secrets.toml key destination.postgres.credentials.password was not found. + In secrets.toml key destination.credentials.password was not found. + In secrets.toml key credentials.password was not found. +Please refer to https://dlthub.com/docs/general-usage/credentials for more information +``` + +It tells you exactly which paths `dlt` looked at, via which config providers and in which order. In the example above +1. First it looked in a big section `chess_games` which is name of the pipeline +2. In each case it starts with full paths and goes to minimum path `credentials.password` +3. First it looks into `environ` then in `secrets.toml`. It displays the exact keys tried. +4. Note that `config.toml` was skipped! It may not contain any secrets. diff --git a/docs/website/docs/general-usage/credentials.md b/docs/website/docs/general-usage/configuration/credentials.md similarity index 90% rename from docs/website/docs/general-usage/credentials.md rename to docs/website/docs/general-usage/configuration/credentials.md index d0627ca527..4c2843aa9b 100644 --- a/docs/website/docs/general-usage/credentials.md +++ b/docs/website/docs/general-usage/configuration/credentials.md @@ -1,10 +1,10 @@ --- -title: Credentials +title: Adding credentials description: How to use dlt credentials keywords: [credentials, secrets.toml, environment variables] --- -# Credentials +# Adding credentials ## Adding credentials locally @@ -27,10 +27,10 @@ client_email = "client_email" # please set me up! ``` > Note that for toml names are case-sensitive and sections are separated with ".". -For destination credentials, read the [documentation pages for each destination](../dlt-ecosystem/destinations) to create and configure +For destination credentials, read the [documentation pages for each destination](../../dlt-ecosystem/destinations) to create and configure credentials. -For Verified Source credentials, read the [Setup Guides](../dlt-ecosystem/verified-sources) for each source to find how to get credentials. +For Verified Source credentials, read the [Setup Guides](../../dlt-ecosystem/verified-sources) for each source to find how to get credentials. Once you have credentials for the source and destination, add them to the file above and save them. @@ -98,7 +98,7 @@ If dlt tries to read this from environment variables, it will use a different na For environment variables all names are capitalized and sections are separated with double underscore "\_\_". -For example for the above secrets, we would need to put into environment: +For example, for the above secrets, we would need to put into environment: ```shell SOURCES__PIPEDRIVE__PIPEDRIVE_API_KEY diff --git a/docs/website/docs/general-usage/glossary.md b/docs/website/docs/general-usage/glossary.md index 38bf4ee01b..240807e6fa 100644 --- a/docs/website/docs/general-usage/glossary.md +++ b/docs/website/docs/general-usage/glossary.md @@ -53,11 +53,11 @@ Describes the structure of normalized data (e.g. unpacked tables, column types, instructions on how the data should be processed and loaded (i.e. it tells `dlt` about the content of the data and how to load it into the destination). -## [Config](configuration.md) +## [Config](configuration/configuration.md) A set of values that are passed to the pipeline at run time (e.g. to change its behavior locally vs. in production). -## [Credentials](credentials.md) +## [Credentials](configuration/credentials.md) A subset of configuration whose elements are kept secret and never shared in plain text. diff --git a/docs/website/docs/general-usage/resource.md b/docs/website/docs/general-usage/resource.md index 77df24d592..eedd93a12c 100644 --- a/docs/website/docs/general-usage/resource.md +++ b/docs/website/docs/general-usage/resource.md @@ -70,7 +70,7 @@ accepts following arguments: > hint value. This let's you create table and column schemas depending on the data. See example in > next section. -> 💡 You can mark some resource arguments as configuration and [credentials](credentials.md) +> 💡 You can mark some resource arguments as configuration and [credentials](configuration/credentials.md) > values so `dlt` can pass them automatically to your functions. ### Define a schema with Pydantic @@ -174,7 +174,7 @@ for row in generate_rows(20): print(row) ``` -You can mark some resource arguments as configuration and [credentials](credentials.md) values +You can mark some resource arguments as configuration and [credentials](configuration/credentials.md) values so `dlt` can pass them automatically to your functions. ### Process resources with `dlt.transformer` diff --git a/docs/website/docs/general-usage/schema.md b/docs/website/docs/general-usage/schema.md index e27a87e803..30f0a5e285 100644 --- a/docs/website/docs/general-usage/schema.md +++ b/docs/website/docs/general-usage/schema.md @@ -64,7 +64,7 @@ The default naming convention: > 💡 Use simple, short small caps identifiers for everything! -The naming convention is [configurable](configuration.md) and users can easily create their own +The naming convention is [configurable](configuration/configuration.md) and users can easily create their own conventions that i.e. pass all the identifiers unchanged if the destination accepts that (i.e. DuckDB). diff --git a/docs/website/docs/getting-started.md b/docs/website/docs/getting-started.md index cd3f2cc69d..97464b690f 100644 --- a/docs/website/docs/getting-started.md +++ b/docs/website/docs/getting-started.md @@ -637,7 +637,7 @@ If you want to take full advantage of the `dlt` library, then we strongly sugges - [Transform your data before loading](general-usage/resource#customize-resources) and see some [examples of customizations like column renames and anonymization](general-usage/customising-pipelines/renaming_columns). - [Set up "last value" incremental loading](general-usage/incremental-loading#incremental_loading-with-last-value). - [Set primary and merge keys, define the columns nullability and data types](general-usage/resource#define-schema). -- [Pass config and credentials into your sources and resources](general-usage/credentials). +- [Pass config and credentials into your sources and resources](general-usage/configuration/credentials.md). - [Use built-in requests client](reference/performance#using-the-built-in-requests-client). - [Run in production: inspecting, tracing, retry policies and cleaning up](running-in-production/running). - [Run resources in parallel, optimize buffers and local storage](reference/performance.md) diff --git a/docs/website/docs/running-in-production/running.md b/docs/website/docs/running-in-production/running.md index 96f9f7e071..ed441a44f3 100644 --- a/docs/website/docs/running-in-production/running.md +++ b/docs/website/docs/running-in-production/running.md @@ -111,7 +111,7 @@ load.delete_completed_jobs=true ## Using slack to send messages `dlt` provides basic support for sending slack messages. You can configure Slack incoming hook via -[secrets.toml or environment variables](../general-usage/credentials.md). Please note that **Slack +[secrets.toml or environment variables](../general-usage/configuration/credentials.md). Please note that **Slack incoming hook is considered a secret and will be immediately blocked when pushed to github repository**. In `secrets.toml`: diff --git a/docs/website/docs/walkthroughs/add-a-verified-source.md b/docs/website/docs/walkthroughs/add-a-verified-source.md index ed3701d8b5..d565aa3a60 100644 --- a/docs/website/docs/walkthroughs/add-a-verified-source.md +++ b/docs/website/docs/walkthroughs/add-a-verified-source.md @@ -76,7 +76,7 @@ the supported locations. ## 2. Adding credentials For adding them locally or on your orchestrator, please see the following guide -[credentials](../general-usage/credentials.md). +[credentials](../general-usage/configuration/credentials.md). ## 3. Customize or write a pipeline script diff --git a/docs/website/sidebars.js b/docs/website/sidebars.js index befbfc0b2d..373ae1c83c 100644 --- a/docs/website/sidebars.js +++ b/docs/website/sidebars.js @@ -105,7 +105,22 @@ const sidebars = { 'general-usage/full-loading', 'general-usage/credentials', 'general-usage/schema', - 'general-usage/configuration', + { + type: 'category', + label: 'Configuration', + link: { + type: 'generated-index', + title: 'Configuration', + description: '', + slug: 'dlt-ecosystem/configuration', + }, + items: [ + 'dlt-ecosystem/configuration/configuration', + 'dlt-ecosystem/configuration/credentials', + 'dlt-ecosystem/configuration/config_providers', + 'dlt-ecosystem/configuration/config_specs', + ] + }, 'reference/performance', { type: 'category',