Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaced broken links #1166

Merged
merged 2 commits into from
Apr 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/website/blog/2023-06-10-schema-evolution.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,10 +136,10 @@ business-logic tests, you would still need to implement them in a custom way.
## The implementation recipe

1. Use `dlt`. It will automatically infer and version schemas, so you can simply check if there are
changes. You can just use the [normaliser + loader](https://dlthub.com/docs/general-usage/pipeline.md) or
[build extraction with dlt](https://dlthub.com/docs/general-usage/resource.md). If you want to define additional
constraints, you can do so in the [schema](https://dlthub.com/docs/general-usage/schema.md).
1. [Define your slack hook](https://dlthub.com/docs/running-in-production/running.md#using-slack-to-send-messages) or
changes. You can just use the [normaliser + loader](/docs/general-usage/pipeline) or
[build extraction with dlt](/docs/general-usage/resource). If you want to define additional
constraints, you can do so in the [schema](/docs/general-usage/schema).
1. [Define your slack hook](/docs/running-in-production/running#using-slack-to-send-messages) or
create your own notification function. Make sure the slack channel contains the data producer and
any stakeholders.
1. [Capture the load job info and send it to the hook](https://dlthub.com/docs/running-in-production/running#inspect-save-and-alert-on-schema-changes).
1. [Capture the load job info and send it to the hook](/docs/running-in-production/running#inspect-save-and-alert-on-schema-changes).
12 changes: 6 additions & 6 deletions docs/website/blog/2023-08-21-dlt-lineage-support.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,21 +20,21 @@ TL;DR: By linking each load's metadata to the schema evolution event or schema v

Load IDs are crucial in `dlt` and are present in all the top tables (`_dlt_loads`, `load_id`, etc.). Each pipeline run creates one or more load packages, which can be identified by their `load_id`. A load package typically contains data from all resources of a particular source. The `load_id` of a particular package is added to the top data tables and to the `_dlt_loads` table with a status 0 (when the load process is fully completed).

For more details, refer to the [Load IDs](https://dlthub.com/docs/dlt-ecosystem/visualizations/understanding-the-tables#load-ids) section of the documentation.
For more details, refer to the [Load IDs](/docs/general-usage/destination-tables#load-ids) section of the documentation.

### Schema Versioning
### Schema Versioning https://dlthub.com/

Each schema file in `dlt` contains a content-based hash `version_hash` that is used to detect manual changes to the schema (i.e., user edits content) and to detect if the destination database schema is synchronized with the file schema. Each time the schema is saved, the version hash is updated.

For more details, refer to the [Schema content hash and version](https://dlthub.com/docs/general-usage/schema#schema-content-hash-and-version) section of the documentation.
For more details, refer to the [Schema content hash and version](/docs/general-usage/schema#schema-content-hash-and-version) section of the documentation.

### Data Lineage

Data lineage can be super relevant for architectures like the data vault architecture or when troubleshooting. Using the pipeline name and `load_id` provided out of the box by `dlt`, you are able to identify the source and time of data.

You can save complete lineage info for a particular `load_id` including a list of loaded files, error messages (if any), elapsed times, schema changes. This can be helpful, for example, when troubleshooting problems.

For more details, refer to the [Data lineage](https://dlthub.com/docs/dlt-ecosystem/visualizations/understanding-the-tables#data-lineage) section of the documentation.
For more details, refer to the [Data lineage](/docs/general-usage/destination-tables#data-lineage) section of the documentation.

By combining the use of `load_id` and schema versioning, you can achieve a robust system for row and column level lineage in your data pipelines with `dlt`.

Expand All @@ -47,15 +47,15 @@ Row level lineage refers to the ability to track data from its source to its des

In `dlt`, each row in all (top level and child) data tables created by `dlt` contains a unique column named `_dlt_id`. Each child table contains a foreign key column `_dlt_parent_id` linking to a particular row (`_dlt_id`) of a parent table. This allows you to trace the lineage of each row back to its source.

For more details, refer to the [Child and parent tables](https://dlthub.com/docs/dlt-ecosystem/visualizations/understanding-the-tables#child-and-parent-tables) section of the documentation.
For more details, refer to the [Child and parent tables](/docs/general-usage/destination-tables#child-and-parent-tables) section of the documentation.

### Column Level Lineage

Column level lineage refers to the ability to track how each column in your data has been transformed or manipulated from source to destination. This can be important for understanding how your data has been processed, ensuring data integrity, and validating data transformations.

In `dlt`, a column schema contains properties such as `name`, `description`, `data_type`, and `is_variant`, which provide information about the column and its transformations. The `is_variant` property, for example, tells you if a column was generated as a variant of another column.

For more details, refer to the [Tables and columns](https://dlthub.com/docs/dlt-ecosystem/visualizations/understanding-the-tables#table-and-column-names) section of the documentation.
For more details, refer to the [Tables and columns](/docs/general-usage/destination-tables#table-and-column-names) section of the documentation.

By combining row and column level lineage, you can have an easy overview of where your data is coming from and when changes in its structure occur.

Expand Down
2 changes: 1 addition & 1 deletion docs/website/blog/2023-10-06-dlt-holistics.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,7 @@ If you compare the ddl against the sample document in MongoDB you will notice th

`dlt` normalises nested data by populating them in separate tables and creates relationships between the tables, so they can be combined together using normal SQL joins. All this is taken care of by `dlt` and we need not worry about how transformations are handled. In short, the transformation steps we discussed in [Why is dlt useful when you want to ingest data from a production database such as MongoDB?](#why-is-dlt-useful-when-you-want-to-ingest-data-from-a-production-database-such-as-mongodb) are taken care of by dlt, making the data analyst's life easier.

To better understand how `dlt` does this transformation, refer to the [docs](https://dlthub.com/docs/dlt-ecosystem/visualizations/understanding-the-tables#child-and-parent-tables).
To better understand how `dlt` does this transformation, refer to the [docs](/docs/general-usage/destination-tables#child-and-parent-tables).

### 3. Self-service analytics for MongoDB with Holistics.

Expand Down
Loading