Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data contracts experimental clarification #821

Merged
merged 3 commits into from
Jul 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions _includes/banner-experimental.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<div class="info">
<span class="closebtn" onclick="this.parentElement.style.display='none';">&times;</span>
Data contracts is an experimental project in Soda Core.<br /><br />As the development team explores data contracts, expect minor imperfections, inconsistencies, and limited support, compatibility, and functionality if you download and use the <code>soda-core-contracts</code> package.
</div>
2 changes: 1 addition & 1 deletion _sass/color_schemes/soda.scss
Original file line number Diff line number Diff line change
Expand Up @@ -912,7 +912,7 @@ body {
/* The info message box */
.info {
padding: 20px 20px 20px 20px;
background-color: #F8F9F9;
background-color: #e7e8e8;
color: black;
margin-bottom: 15px;
position: relative;
Expand Down
Binary file modified assets/images/experimental.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 4 additions & 2 deletions soda/data-contracts-checks.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,12 @@ parent: Create a data contract
![experimental](/assets/images/experimental.png){:height="400px" width="400px"} <br />
*Last modified on {% last_modified_at %}*

Soda data contracts is a Python library that verifies data quality standards as early and often as possible in a data pipeline so as to prevent negative downstream impact. Learn more [About Soda data contracts]({% link soda/data-contracts.md %}#about-data-contracts).
{% include banner-experimental.md %}

Soda data contracts is a Python library that verifies data quality standards as early and often as possible in a data pipeline so as to prevent negative downstream impact. Be aware, Soda data contracts checks do not use SodaCL. Learn more [About Soda data contracts]({% link soda/data-contracts.md %}#about-data-contracts).

<small>✖️ &nbsp;&nbsp; Requires Soda Core Scientific</small><br />
<small>✔️ &nbsp;&nbsp; Experimentally supported in Soda Core 3.3.3 or greater for PostgreSQL, Spark, and Snowflake</small><br />
<small>✔️ &nbsp;&nbsp; Experimentally supported in Soda Core 3.3.3 or greater for PostgreSQL, Snowflake, and Spark</small><br />
<small>✖️ &nbsp;&nbsp; Supported in Soda Core CLI</small><br />
<small>✖️ &nbsp;&nbsp; Supported in Soda Library + Soda Cloud</small><br />
<small>✖️ &nbsp;&nbsp; Supported in Soda Cloud Agreements + Soda Agent</small><br />
Expand Down
4 changes: 3 additions & 1 deletion soda/data-contracts-verify.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,14 @@ parent: Create a data contract
![experimental](/assets/images/experimental.png){:height="400px" width="400px"} <br />
*Last modified on {% last_modified_at %}*

{% include banner-experimental.md %}

To verify a **Soda data contract** is to scan the data in a data source to execute the data contract checks you defined in a contracts YAML file. Available as a Python library, you run the scan programmatically, invoking Soda data contracts in a CI/CD workflow when you create a new pull request, or in a data pipeline after importing or transforming new data.

When deciding when to verify a data contract, consider that contract verification works best on new data as soon as it is produced so as to limit its exposure to other systems or users who might access it. The earlier in a pipeline or workflow, the better! Further, best practice suggests that you store batches of new data in a temporary table, verify a contract on the batches, then append the data to a larger table.

<small>✖️ &nbsp;&nbsp; Requires Soda Core Scientific</small><br />
<small>✔️ &nbsp;&nbsp; Experimentally supported in Soda Core 3.3.3 or greater for PostgreSQL, Spark, and Snowflake</small><br />
<small>✔️ &nbsp;&nbsp; Experimentally supported in Soda Core 3.3.3 or greater for PostgreSQL, Snowflake, and Spark</small><br />
<small>✖️ &nbsp;&nbsp; Supported in Soda Core CLI</small><br />
<small>✖️ &nbsp;&nbsp; Supported in Soda Library + Soda Cloud</small><br />
<small>✖️ &nbsp;&nbsp; Supported in Soda Cloud Agreements + Soda Agent</small><br />
Expand Down
6 changes: 5 additions & 1 deletion soda/data-contracts-write.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,12 @@ parent: Create a data contract
![experimental](/assets/images/experimental.png){:height="400px" width="400px"} <br />
*Last modified on {% last_modified_at %}*

{% include banner-experimental.md %}

**Soda data contracts** is a Python library that uses checks to verify data. Contracts enforce data quality standards in a data pipeline so as to prevent negative downstream impact. To verify the data quality standards for a dataset, you prepare a data **contract YAML file**, which is a formal description of the data. In the data contract, you use checks to define your expectations for good-quality data. Using the Python API, you can add data contract verification ideally right after new data has been produced.

Be aware, Soda data contracts checks do not use SodaCL.

In your data pipeline, add a data contract after data has been produced or transformed so that when you programmatically run a scan via the Python API, Soda data contracts verifies the contract, executing the checks contained within the contract and producing results which indicate whether the checks passed or failed.

```yaml
Expand Down Expand Up @@ -53,7 +57,7 @@ checks:
```

<small>✖️ &nbsp;&nbsp; Requires Soda Core Scientific</small><br />
<small>✔️ &nbsp;&nbsp; Experimentally supported in Soda Core 3.3.3 or greater for PostgreSQL, Spark, and Snowflake</small><br />
<small>✔️ &nbsp;&nbsp; Experimentally supported in Soda Core 3.3.3 or greater for PostgreSQL, Snowflake, and Spark</small><br />
<small>✖️ &nbsp;&nbsp; Supported in Soda Core CLI</small><br />
<small>✖️ &nbsp;&nbsp; Supported in Soda Library + Soda Cloud</small><br />
<small>✖️ &nbsp;&nbsp; Supported in Soda Cloud Agreements + Soda Agent</small><br />
Expand Down
4 changes: 3 additions & 1 deletion soda/data-contracts.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ redirect_from:
<!--Linked to UI, access Shlink-->
*Last modified on {% last_modified_at %}*

{% include banner-experimental.md %}

Use **Soda data contracts** to set data quality standards for data products. In a programmatic Soda scan, Soda executes the standards as data quality checks.
{% include code-header.html %}
```yaml
Expand Down Expand Up @@ -42,7 +44,7 @@ checks:
- type: row_count
```
<small>✖️ &nbsp;&nbsp; Requires Soda Core Scientific</small><br />
<small>✔️ &nbsp;&nbsp; Experimentally supported in Soda Core 3.3.3 or greater for PostgreSQL, Spark, and Snowflake</small><br />
<small>✔️ &nbsp;&nbsp; Experimentally supported in Soda Core 3.3.3 or greater for PostgreSQL, Snowflake, and Spark</small><br />
<small>✖️ &nbsp;&nbsp; Supported in Soda Core CLI</small><br />
<small>✖️ &nbsp;&nbsp; Supported in Soda Library + Soda Cloud</small><br />
<small>✖️ &nbsp;&nbsp; Supported in Soda Cloud Agreements + Soda Agent</small><br />
Expand Down