diff --git a/docs/website/docs/walkthroughs/adjust-a-schema.md b/docs/website/docs/walkthroughs/adjust-a-schema.md index d62dc215d9..c1c6189ac4 100644 --- a/docs/website/docs/walkthroughs/adjust-a-schema.md +++ b/docs/website/docs/walkthroughs/adjust-a-schema.md @@ -28,7 +28,7 @@ dlt.pipeline( ) ``` -Following folder structure in project root folder will be created: +The following folder structure in the project root folder will be created: ``` schemas @@ -46,11 +46,10 @@ import_schema_path="schemas/import" ## 2. Run the pipeline to see the schemas To see the schemas, you must run your pipeline again. The `schemas` and `import`/`export` -directories will be created. In each directory, you'll see a `yaml` file with a file -`chess.schema.toml`. +directories will be created. In each directory, you'll see a `yaml` file (e.g. `chess.schema.yaml`). Look at the export schema (in the export folder): this is the schema that got inferred from the data -and was used to load it into the destination (i.e `duckdb`). +and was used to load it into the destination (e.g. `duckdb`). ## 3. Make changes in import schema @@ -59,20 +58,36 @@ hints that were explicitly declared in the `chess` source. You'll use this schem modifications, typically by pasting relevant snippets from your export schema and modifying them. You should keep the import schema as simple as possible and let `dlt` do the rest. -> 💡 How importing a schema works: -> -> 1. When a new pipeline is created and the source function is extracted for the first time, a new -> schema is added to pipeline. This schema is created out of global hints and resource hints -> present in the source extractor function. -> 1. Every such new schema will be saved to the `import` folder (if it does not exist there already) -> and used as the initial version for all future pipeline runs. -> 1. Once a schema is present in `import` folder, **it is writable by the user only**. -> 1. Any change to the schemas in that folder are detected and propagated to the pipeline -> automatically on the next run. It means that after an user update, the schema in `import` -> folder reverts all the automatic updates from the data. - -In next steps we'll experiment a lot, you will be warned to set `full_refresh=True` in the -`dlt.pipeline` until we are done experimenting. +💡 How importing a schema works: + +1. When a new pipeline is created and the source function is extracted for the first time, a new + schema is added to the pipeline. This schema is created out of global hints and resource hints + present in the source extractor function. +1. Every such new schema will be saved to the `import` folder (if it does not exist there already) + and used as the initial version for all future pipeline runs. +1. Once a schema is present in `import` folder, **it is writable by the user only**. +1. Any changes to the schemas in that folder are detected and propagated to the pipeline + automatically on the next run. It means that after a user update, the schema in `import` + folder reverts all the automatic updates from the data. + +In next steps we'll experiment a lot, you will be warned to set `full_refresh=True` until we are done experimenting. + +:::caution +`dlt` will **not modify** tables after they are created. +So if you have a `yaml` file, and you change it (e.g. change a data type or add a hint), +then you need to **delete the dataset** +or set `full_refresh=True`: +```python +dlt.pipeline( + import_schema_path="schemas/import", + export_schema_path="schemas/export", + pipeline_name="chess_pipeline", + destination='duckdb', + dataset_name="games_data", + full_refresh=True, +) +``` +::: ### Change the data type @@ -97,7 +112,7 @@ players_games: data_type: timestamp ``` -Run the pipeline script again and make sure that the change is visible in export schema. Then, +Run the pipeline script again and make sure that the change is visible in the export schema. Then, [launch the Streamlit app](../dlt-ecosystem/visualizations/exploring-the-data.md) to see the changed data. :::note @@ -122,7 +137,7 @@ white__aid: data_type: text ``` -For some reason you'd rather deal with a single JSON (or struct) column. Just declare the `white` +For some reason, you'd rather deal with a single JSON (or struct) column. Just declare the `white` column as `complex`, which will instruct `dlt` not to flatten it (or not convert into child table in case of a list). Do the same with `black` column: @@ -166,5 +181,5 @@ players_games: ## 4. Keep your import schema -Just add and push the import folder to git. It will be used automatically when cloned. Alternatively +Just add and push the import folder to git. It will be used automatically when cloned. Alternatively, [bundle such schema with your source](../general-usage/schema.md#attaching-schemas-to-sources).