Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise and reconcile pipeline graphics #858

Merged
merged 1 commit into from
Sep 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions _data/nav.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,13 @@
subcategories:
- subtitle: Use case guides
page: soda/use-case-guides.md
- subtitle: Test data in an Airflow pipeline
- subtitle: Test data in Airflow
page: soda/quick-start-prod.md
- subtitle: Test data in an ADF pipeline
- subtitle: Test data in ADF
page: soda/quick-start-adf.md
- subtitle: Test data in a Dagster pipeline
- subtitle: Test data in Dagster
page: soda/quick-start-dagster.md
- subtitle: Test data in a Databricks pipeline
- subtitle: Test data in Databricks
page: soda/quick-start-databricks-pipeline.md
- subtitle: Test before data migration
page: soda/quick-start-migration.md
Expand Down
Binary file added assets/images/airflow-pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/images/dagster-flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed assets/images/data-pipeline.png
Binary file not shown.
Binary file modified assets/images/databricks-workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/soda-adf-ingest.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/images/soda-adf-pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/soda-adf-reconcile.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion soda/product-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ Soda makes the results available in the command-line and in your online account,

You can programmatically embed Soda scan executions in your data pipeline after ingestion and transformation to get early and precise warnings in Soda about data quality issues before they have a downstream impact. Upon receiving a data quality alert in Slack, for example, your team can take quick action in Soda Cloud to identify the issue and open an incident to investigate the root cause. See [Test data quality in a data pipeline]({% link soda/quick-start-prod.md %}).

![data-pipeline](/assets/images/data-pipeline.png){:width="700px"}
![airflow-pipeline](/assets/images/airflow-pipeline.png){:width="700px"}

<br />

Expand Down
4 changes: 4 additions & 0 deletions soda/quick-start-adf.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ To validate your account license or free trial, Soda Library must communicate wi

This example executes checks which, after a data migration, validate that the source and target data are matching. The first ADF Notebook Activity links to a notebook which contains the Soda connection details, the check definitions, and the script to run a Soda scan for data quality which executes the [reconciliation checks]({% link soda-cl/recon.md %}).

![soda-adf-reconcile](/assets/images/soda-adf-reconcile.png){:width="700px"}

Download the notebook: <a href="/assets/soda-synapse-recon-notebook.ipynb" download>Soda Synapse Recon notebook</a>

1. In the ADF pipeline, the Data Engineer <a href="https://learn.microsoft.com/en-us/azure/data-factory/transform-data-synapse-notebook#add-a-notebook-activity-for-synapse-to-a-pipeline-with-ui" target="_blank">adds a Notebook activity</a> for Synapse to a pipeline. In the Settings tab, they name the notebook `Reconciliation Checks`.
Expand Down Expand Up @@ -161,6 +163,8 @@ soda-sqlserver

Beyond reconciling the copied data, the Data Engineer uses SodaCL checks to gauge the completeness of data. In a new ADF Notebook Activity, they follow the same pattern as the reconciliation check notebook in which they configured connections to Soda Cloud and the data source, defined SodaCL checks, then prepared a script to run the scan and execute the checks.

![soda-adf-ingest](/assets/images/soda-adf-ingest.png){:width="700px"}

Download the notebook: <a href="/assets/soda-synapse-ingest-notebook.ipynb" download>Soda Synapse Ingest notebook</a>

```python
Expand Down
4 changes: 2 additions & 2 deletions soda/quick-start-prod.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Use this guide as an example for how to set up and use Soda to test the quality

(Not quite ready for this big gulp of Soda? 🥤Try [taking a sip]({% link soda/quick-start-sip.md %}), first.)

![data-pipeline](/assets/images/data-pipeline.png){:width="700px"}
![airflow-pipeline](/assets/images/airflow-pipeline.png){:width="700px"}

[About this guide](#about-this-guide) <br />
[Install Soda from the command-line](#install-soda-from-the-command-line)<br />
Expand Down Expand Up @@ -87,7 +87,7 @@ A check is a test that Soda executes when it scans a dataset in your data source

In this example, the Data Engineer creates multiple checks after ingestion, after initial transformation, and before pushing the information to a visualization or reporting tool.

![data-pipeline](/assets/images/data-pipeline.png){:width="700px"}
![airflow-pipeline](/assets/images/airflow-pipeline.png){:width="700px"}

### Transform checks

Expand Down
2 changes: 1 addition & 1 deletion soda/use-case-guides.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Use the following guides as example implementations based on how you intend to u
| ----- | ----------- | ------------ |
| [Test data in an Airflow pipeline]({% link soda/quick-start-prod.md %}) | Use this guide as an example for how to set up Soda to test the quality of your data in an Airflow pipeline that uses dbt transformations.| Soda Library<br /> Soda Cloud |
| [Test data quality in an ADF pipeline]({% link soda/quick-start-adf.md %}) | Learn how to invoke Soda data quality tests in an ETL pipeline in Azure Data Factory. | Soda Library<br /> Soda Cloud |
| [Test data quality in a Dagster pipeline]({% link soda/quick-start-adf.md %}) | Learn how to invoke Soda data quality tests in a Dagster pipeline. | Soda Library<br /> Soda Cloud |
| [Test data quality in a Dagster pipeline]({% link soda/quick-start-dagster.md %}) | Learn how to invoke Soda data quality tests in a Dagster pipeline. | Soda Library<br /> Soda Cloud |
| [Test data quality in Databricks pipeline]({% link soda/quick-start-databricks-pipeline.md %}) | Learn how to use Databricks notesbooks with Soda to test data quality before feeding a machine learning model. | Soda Library<br /> Soda Cloud |
| [Test data before migration]({% link soda/quick-start-migration.md %}) | Use this guide to set up Soda to test before and after data migration between data sources. | Soda Library<br /> Soda Cloud |
| [Self-serve Soda]({% link soda/quick-start-end-user.md %}) | Use this guide to set up Soda Cloud to enable users across your organization to serve themselves when it comes to testing data quality. | Soda Cloud<br /> Soda Agent |
Expand Down