Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync from soda-core #8

Open
wants to merge 187 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
187 commits
Select commit Hold shift + click to select a range
5164d34
Catch exceptions while building results file (#1936)
m1n0 Sep 13, 2023
71dfe19
[pre-commit.ci] pre-commit autoupdate (#1935)
pre-commit-ci[bot] Sep 14, 2023
8fa452e
Reference check: support must NOT exist (#1937)
m1n0 Sep 18, 2023
995b4ac
Bump to 3.0.49
m1n0 Sep 19, 2023
67597f2
Add thresholds and diagnostics to scan result (#1939)
m1n0 Sep 21, 2023
8e74d93
Fix databricks numeric types profiling (#1941)
m1n0 Sep 27, 2023
67111aa
Bump to 3.0.50
m1n0 Sep 27, 2023
f743fc7
Allow to specify virtual file name for add sodacl string (#1943)
m1n0 Oct 2, 2023
3fdac3c
Feature/add more file formats for duckdb (#1942)
PaoloLeonard Oct 6, 2023
b34e271
added BigQuery Job Labels (#1947)
m1n0 Oct 10, 2023
d25316f
Bump to 3.0.51
m1n0 Oct 11, 2023
2f67adb
Distribution: compute value counts in DB rather than in python
baturayo Oct 13, 2023
fe27fc3
Fix 3.8 compatibility
m1n0 Oct 17, 2023
431a0ee
feat: Add Dask/Pandas configurable data source naming support (#1951)
dirkgroenen Oct 25, 2023
5312c43
Bump to 3.0.52
dirkgroenen Oct 25, 2023
f6505f0
Freshness: support mixed thresholds (#1957)
m1n0 Oct 31, 2023
7affe19
Add License to every package (#1958)
m1n0 Nov 1, 2023
b3c112e
Bump to 3.0.53
m1n0 Nov 1, 2023
2c9cde9
Failed rows check: support thresholds (#1960)
m1n0 Nov 3, 2023
59191bf
Updated install doc to include MotherDuck support via DuckDB (#1963)
janet-can Nov 7, 2023
c7182b1
remove % from pattern (#1956)
chuwangBA Nov 9, 2023
7505aa3
Sqlserver: support quoting tables with brackets, "quote_tables" mode …
m1n0 Nov 14, 2023
644546d
Bump to 3.0.54
m1n0 Nov 14, 2023
5f268b8
Contracts
tombaeyens Nov 15, 2023
6ffddd9
Fix check source payload (#1966)
m1n0 Nov 15, 2023
2a142e7
Bump to 3.1.0
m1n0 Nov 16, 2023
3f8fcc7
Update python api docs (#1967)
m1n0 Nov 16, 2023
88640a9
Make custom identity fixed as v4 (#1968)
m1n0 Nov 20, 2023
09c00a2
Freshness: support in-check filters (#1970)
m1n0 Dec 1, 2023
ae8d325
Bump to 3.1.1
m1n0 Dec 2, 2023
8249949
Adding support for authentication via a chained list of delegate acco…
nathadfield Dec 15, 2023
17c67cf
fix anomaly detection frequency aggregation bug (#1975)
baturayo Dec 15, 2023
46206eb
upgrade pydantic from v1 to v2 (#1974)
baturayo Dec 15, 2023
cb950c9
[pre-commit.ci] pre-commit autoupdate (#1938)
pre-commit-ci[bot] Dec 15, 2023
b7103e1
Bump to 3.1.2
m1n0 Dec 15, 2023
e80f118
feat: implement warn_only for anomaly score (#156) (#1980)
baturayo Dec 27, 2023
3c05346
Bump to 3.1.3
m1n0 Jan 3, 2024
1a44ce0
Dbt: improve parsing logs (#1981)
m1n0 Jan 4, 2024
2bde90c
Sampler: fix link href (#1983)
m1n0 Jan 5, 2024
c3c9521
Document group by example for Soda Core with failed rows check (#1984)
janet-can Jan 5, 2024
45a5a74
Schema check: support custom identity (#1988)
m1n0 Jan 16, 2024
34d65af
Add semver release with major, minor, latest (#1993)
dirkgroenen Jan 23, 2024
036204b
bug: handle null values for continuous dist (#165) (#1994)
baturayo Jan 23, 2024
55b85f5
[pre-commit.ci] pre-commit autoupdate (#1977)
pre-commit-ci[bot] Jan 23, 2024
ceab226
feat: implement new anomaly detection in soda core (#1995)
baturayo Jan 24, 2024
9445d1e
feat: support built-in prophet public holidays (#1997)
baturayo Jan 24, 2024
64bc338
Bump to 3.1.4
m1n0 Jan 24, 2024
b6f4329
Hive data source improvements (#1982)
robertomorandeira Jan 24, 2024
79b513a
feat: implement migrate from anomaly score check config (#168) (#1998)
baturayo Jan 25, 2024
311f1f2
Bump Prophet (#2000)
m1n0 Jan 25, 2024
89da879
Tests: use approx comparison for floats (#1999)
m1n0 Jan 25, 2024
8e0ae62
hive: add configuration parameters (#36)
vijaykiran Jul 3, 2023
2d00558
Bump to 3.1.5
m1n0 Jan 26, 2024
594d026
feat: implement severity level paramaters (#2001)
baturayo Jan 29, 2024
339309f
Always use datasource specifis COUNT expression (#2003)
m1n0 Jan 29, 2024
51a30fb
fix: anomaly detection feedbacks (#2005)
baturayo Jan 31, 2024
70b8753
[pre-commit.ci] pre-commit autoupdate (#2002)
pre-commit-ci[bot] Feb 2, 2024
1d2e8ac
feat: anomaly detection simulator (#163) (#2010)
baturayo Feb 6, 2024
e172b7d
feat: added dremio token support (#2009)
JorisTruong Feb 7, 2024
fc8e191
Bump to 3.2.0
m1n0 Feb 8, 2024
68d44b3
feat: correctly identified anomalies are excluded from training data …
baturayo Feb 9, 2024
1a211f5
fix: show more clearly the detected frequency using warning message f…
baturayo Feb 9, 2024
16ea0b9
Fix simulator import and streamlit path (#2017)
m1n0 Feb 12, 2024
a02f463
[pre-commit.ci] pre-commit autoupdate (#2016)
pre-commit-ci[bot] Feb 13, 2024
2c3ce9d
Update oracle_data_source.py (#2012)
vinod901 Feb 13, 2024
eb2abf9
Oracle: cast config to str/int to prevent oracledb errors (#2018)
m1n0 Feb 13, 2024
dd63d9e
Bump to 3.2.1
m1n0 Feb 13, 2024
ea5831e
Fix assets folder (#2020)
m1n0 Feb 14, 2024
f47801c
fix timezone issue and log messages (#188) (#2023)
baturayo Feb 21, 2024
fe70d82
feat: in anomaly detection simulator use soda core historic check res…
baturayo Feb 28, 2024
7d2ed7b
Update dask-sql (#2026)
m1n0 Feb 29, 2024
f07eba9
Add dask-sql version comment
m1n0 Feb 29, 2024
97c3545
Bump to 3.2.2
m1n0 Feb 29, 2024
6245a4c
feat: implement daily and monthly seasonality to external regressor ……
baturayo Feb 29, 2024
b62550e
Dremio: fix token support (#2028)
m1n0 Mar 6, 2024
8179c50
Bump to 3.2.3
m1n0 Mar 6, 2024
8e41a2c
[pre-commit.ci] pre-commit autoupdate (#2022)
pre-commit-ci[bot] Mar 11, 2024
91dd60f
bugfix: support attributes on multiple checks (#2032)
milanaleksic Mar 12, 2024
e3787d1
Use dbt's new access_url pattern to access cloud API (#2035)
bastienboutonnet Mar 14, 2024
c25a872
Bump to 3.2.4
m1n0 Mar 16, 2024
98c52ce
Contracts 2nd iteration (#2006)
tombaeyens Mar 16, 2024
bd04e84
Bump to 3.3.0
m1n0 Mar 16, 2024
a1a2008
feat: improved wording and tooltip formatting in simulator (#2038)
bastienboutonnet Mar 19, 2024
c20eb59
Failed rows: fix warn/fail thresholds (#2042)
m1n0 Mar 22, 2024
de1d4b4
Bump opentelemetry to 1.22 (#2043)
m1n0 Mar 22, 2024
d4b8183
Bump dev requirements (#2045)
m1n0 Mar 23, 2024
ae33e9f
Bump to 3.3.1
m1n0 Mar 24, 2024
aee8045
Rename argument in set_scan_results_file method (#2047)
ozgenbaris1 Apr 9, 2024
2e40e45
Dremio: support disableCertificateVerification option (#2049)
m1n0 Apr 9, 2024
9e95906
[pre-commit.ci] pre-commit autoupdate (#2037)
pre-commit-ci[bot] Apr 16, 2024
1d21a34
Denodo: fix connection timeout attribute (#2065)
m1n0 Apr 23, 2024
34ace6a
Update db2_data_source.py (#2063)
4rahulae Apr 23, 2024
c046af0
Bump to 3.3.2
m1n0 Apr 24, 2024
76159ca
Update autoflake precommit (#2070)
m1n0 Apr 30, 2024
062b1e2
Contracts v3 (#2067)
tombaeyens Apr 30, 2024
5e51e69
Bump to 3.3.3
tombaeyens Apr 30, 2024
31b1ab3
Fix automated monitoring, prevent duplicate queries (#2075)
m1n0 May 3, 2024
cc02c01
Hive: support scheme (#2077)
m1n0 May 7, 2024
63c73f8
Bump dev requirements (#2078)
m1n0 May 7, 2024
7866d27
Bump deps (#2079)
m1n0 May 7, 2024
8a1ce04
Bump to 3.3.4
m1n0 May 7, 2024
1819347
Failed rows: fix warn/fail thresholds for fail condition (#2084)
m1n0 May 16, 2024
09262b0
upgrade to latest version of ibm-db python client (#2076)
Antoninj May 17, 2024
5d1163c
User defined metric fail query (#2089)
m1n0 May 23, 2024
b014718
Bump to 3.3.5
m1n0 May 23, 2024
4e09b27
CLOUD-7708 - Add Snowflake CI account to pipeline for soda-core (#2088)
dakue-soda May 27, 2024
5776b5e
[CLOUD-7400] Improve memory usage (#2081)
dirkgroenen May 29, 2024
c3dc141
lower pre-commit version to support py38
dirkgroenen May 30, 2024
7e631d5
Duplicate check: fail gracefully in case of error in query (#2093)
m1n0 Jun 5, 2024
552a716
Bump requests and tox/docker (#2094)
m1n0 Jun 5, 2024
af649b9
Duplicate check: support sample exclude columns fully (#2096)
m1n0 Jun 7, 2024
4a87865
[pre-commit.ci] pre-commit autoupdate (#2069)
pre-commit-ci[bot] Jun 17, 2024
41e2d74
Spark: profiling support more text types (#2099)
m1n0 Jun 18, 2024
16fa2c7
Spark: profiling support more numeric types (#2100)
m1n0 Jun 18, 2024
bcfb8c2
Oracle: fix profiling/discovery queries, add numeric profiling (#2101)
m1n0 Jun 20, 2024
46cd46d
Bump to 3.3.6
m1n0 Jun 20, 2024
d70f765
Updated readme for a little more incentive to migrate to soda-library…
janet-can Jun 25, 2024
fca862a
Oracle: fix formats, freshness, other minor fixes (#2106)
m1n0 Jun 26, 2024
8dff97b
Fix CI (#2112)
m1n0 Jun 27, 2024
375adea
Http no failed rows (#2115)
jzalucki Jun 27, 2024
799a94b
Wrong sample query missing count (#2114)
jzalucki Jun 27, 2024
d404bb5
Between threshold error with variables (#2113)
jzalucki Jun 27, 2024
87cc985
Bump to 3.3.7
m1n0 Jun 28, 2024
7262ded
Duplicate check: remove unused aggregated query (#2118)
m1n0 Jun 28, 2024
ea74f4d
Bump to 3.3.8
m1n0 Jun 28, 2024
d377e91
Contracts4 (#2116)
tombaeyens Jun 28, 2024
abd3dfd
Added atlan package to tbump.toml
tombaeyens Jun 28, 2024
9236f35
Bump to 3.3.9
tombaeyens Jun 28, 2024
43b6d08
Add scan_time to HttpSampler payload (#2121)
m1n0 Jun 29, 2024
dd8fdb1
Fixing the default_data_source_properties for Atlan integration (#2127)
tombaeyens Jul 5, 2024
d529c2d
Bump to 3.3.10
tombaeyens Jul 6, 2024
d877fbd
Date format regex improvements (#2128)
pholser Jul 8, 2024
c510e0b
Duckdb: fix schema check for db in file (#2130)
m1n0 Jul 9, 2024
3ace4c9
Add ability to specify custom hostname for Snowflake connection (#2109)
whummer Jul 11, 2024
69007f6
Better user provided queries sanitize. (#2131)
jzalucki Jul 16, 2024
26763eb
Add Scan Context to read/write data from/to a scan (#2134)
m1n0 Jul 16, 2024
2293890
Add sslmode support to pg and denodo (#2066)
m1n0 Jul 16, 2024
f13d268
Add support for custom sqlserver multi_subnet_failover parameter. (#2…
jzalucki Jul 17, 2024
34a05ff
Ensure check identity is part of query name for failed rows queries. …
jzalucki Jul 17, 2024
5d3577b
Bump to 3.3.11
m1n0 Jul 17, 2024
4fc27bb
Scan context: support list keys in getter (#2135)
m1n0 Jul 18, 2024
2921a0d
Bump to 3.3.12
m1n0 Jul 18, 2024
90488ac
Always reset logger when new Scan instance is created. (#2136)
jzalucki Jul 23, 2024
0d08f8c
chore: fix main pipeline, update healthcheck command after sqlserver …
jzalucki Jul 26, 2024
90e7732
Bump to 3.3.13
jzalucki Jul 26, 2024
d97b704
Cross row count check should support custom identity. (#2139)
jzalucki Jul 29, 2024
9ac3303
Handle sql exception nicely for failed rows and user-defined check. (…
jzalucki Aug 1, 2024
8741aac
Spark: send discovery data despite errors. (#2142)
jzalucki Aug 1, 2024
0563049
Spark: failed rows metric result should not be limited to max 100 tot…
jzalucki Aug 1, 2024
ad9db79
Freshness: support variables in thresholds (#2146)
m1n0 Aug 14, 2024
124e475
Spark: replicate implicit 'include all' in profiling consistently wit…
m1n0 Aug 15, 2024
7aa2b3e
Bump to 3.3.14
m1n0 Aug 15, 2024
2a8f08d
Comparison check - fix other table filter (#2149)
jzalucki Aug 19, 2024
f6f7bf2
[pre-commit.ci] pre-commit autoupdate (#2141)
pre-commit-ci[bot] Aug 19, 2024
5dff022
Contracts7 : Fixing the integration correlation issue (#2148)
tombaeyens Aug 20, 2024
06cfa11
Bump to 3.3.15
tombaeyens Aug 20, 2024
62ee1e3
creating a tbump hack
tombaeyens Aug 20, 2024
ec42e43
Resetting contracts to 3.3.14 so tbump can work
tombaeyens Aug 20, 2024
bfc4c68
Revert "Resetting contracts to 3.3.14 so tbump can work"
tombaeyens Aug 20, 2024
73f18da
Bump to 3.3.16
tombaeyens Aug 20, 2024
88234fa
Fixed the Atlan integration glue db-schema switch (#2152)
tombaeyens Aug 21, 2024
892c326
Bump to 3.3.17
tombaeyens Aug 21, 2024
f99eeb6
Fixing atlan source contract yaml and lacking schema error message (#…
tombaeyens Aug 21, 2024
da66239
Bump to 3.3.18
tombaeyens Aug 21, 2024
9147255
Fixing contract test library dependencies for test table creation (#2…
tombaeyens Sep 5, 2024
5cf12cb
Bump to 3.3.19
tombaeyens Sep 5, 2024
8adf017
Fixing the lacking data source error message on contract build (#2158)
tombaeyens Sep 9, 2024
8c42941
Bump to 3.3.20
tombaeyens Sep 9, 2024
0937aa0
Fixing Spark session API (#2159)
tombaeyens Sep 11, 2024
a6f85fe
Bump to 3.3.21
tombaeyens Sep 11, 2024
39ff346
Removing the data source name lower case requirement (#2161)
tombaeyens Sep 11, 2024
2c8c5bd
Bump to 3.3.22
tombaeyens Sep 11, 2024
8586ed4
Add page to docs folder for data contracts language reference (#2166)
janet-can Sep 23, 2024
52dc476
Add hiring banner to README (#2179)
dirkgroenen Oct 15, 2024
a08bbcc
Add support for Azure SQL, Synapse, and Microsoft Fabric and extend s…
sdebruyn Oct 21, 2024
c70fb78
Bump to 3.4.0
m1n0 Oct 21, 2024
a59292a
Add documentation for MS fabric package install + config (#2180)
janet-can Oct 22, 2024
fb60b83
[pre-commit.ci] pre-commit autoupdate (#2177)
pre-commit-ci[bot] Oct 22, 2024
b91b1a9
Comparison row count check secondary datasource filter fix (#2165)
asantoz Oct 22, 2024
bbe338b
Bump to 3.4.1
m1n0 Oct 22, 2024
0ecbec4
Foreach: resolve vars in queries (#2183)
m1n0 Nov 4, 2024
a8c7d34
Chore: use jinja sandbox for templates (#2185)
m1n0 Nov 14, 2024
b605610
Bump to 3.4.2
m1n0 Nov 28, 2024
067c535
[pre-commit.ci] pre-commit autoupdate (#2182)
pre-commit-ci[bot] Nov 28, 2024
d030911
Add include null to valid_count and invalid_count and percentage vers…
jzalucki Nov 28, 2024
9d4b349
Yaml: read and parse files thread-safe (#2188)
m1n0 Nov 28, 2024
5021b74
Bump to 3.4.3
m1n0 Dec 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
10 changes: 10 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,13 @@ ATHENA_SCHEMA=***

# Create and test checks with views instead of tables
TEST_WITH_VIEWS=false

CONTRACTS_POSTGRES_HOST=***
CONTRACTS_POSTGRES_USERNAME=***
CONTRACTS_POSTGRES_PASSWORD=***
CONTRACTS_POSTGRES_DATABASE=***

ATLAN_API_KEY=***

FABRIC_ENDPOINT=***
FABRIC_DWH=***
31 changes: 12 additions & 19 deletions .github/workflows/build-docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,35 +8,30 @@ jobs:
docker:
runs-on: ubuntu-latest
steps:
-
name: check if a version tag
- name: check if a version tag
id: check-version-tag
run: |
if [[ ${{ github.event.client_payload.tag }} =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
echo ::set-output name=match::true
fi
-
name: Sleep for 900s
- name: Sleep for 900s
if: steps.check-version-tag.outputs.match == 'true'
uses: juliangruber/sleep-action@v1
with:
time: 900s
-
name: check if a version tag in ref
- name: check if a version tag in ref
if: steps.check-version-tag.outputs.match == 'true'
id: get-version-tag-in-ref
run: |
if [[ ${{ github.event.client_payload.ref }} =~ ^refs/tags/v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
echo ::set-output name=versiontag::$(echo "${{github.event.client_payload.ref}}" | cut -d / -f 3)
fi
-
name: Checkout
- name: Checkout
if: github.event.client_payload.tag == steps.get-version-tag-in-ref.outputs.versiontag
uses: actions/checkout@v3
with:
ref: ${{ github.event.client_payload.ref }}
-
name: Docker meta
- name: Docker meta
if: github.event.client_payload.tag == steps.get-version-tag-in-ref.outputs.versiontag
id: meta
uses: docker/metadata-action@v4
Expand All @@ -45,27 +40,25 @@ jobs:
sodadata/soda-core
tags: |
type=raw,value=${{ github.event.client_payload.tag }}
-
name: Set up QEMU
type=semver,pattern=v{{major}}.{{minor}},value=${{ github.event.client_payload.tag }}
type=semver,pattern=v{{major}},value=${{ github.event.client_payload.tag }}
- name: Set up QEMU
if: github.event.client_payload.tag == steps.get-version-tag-in-ref.outputs.versiontag
uses: docker/setup-qemu-action@v2
-
name: Set up Docker Buildx
- name: Set up Docker Buildx
if: github.event.client_payload.tag == steps.get-version-tag-in-ref.outputs.versiontag
uses: docker/setup-buildx-action@v2
-
name: Login to DockerHub
- name: Login to DockerHub
if: github.event.client_payload.tag == steps.get-version-tag-in-ref.outputs.versiontag
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
-
name: Build and push
- name: Build and push
if: github.event.client_payload.tag == steps.get-version-tag-in-ref.outputs.versiontag
uses: docker/build-push-action@v3
with:
context: .
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
labels: ${{ steps.meta.outputs.labels }}
40 changes: 35 additions & 5 deletions .github/workflows/main.workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ jobs:
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v3
with:
python-version: '3.11.x'
- uses: pre-commit/[email protected]
with:
extra_args: --all-files
Expand Down Expand Up @@ -49,11 +51,10 @@ jobs:
env:
DATA_SOURCE: ${{ matrix.data-source }}
PYTHON_VERSION: ${{ matrix.python-version }}
SNOWFLAKE_HOST: ${{ secrets.SNOWFLAKE_HOST }}
SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
SNOWFLAKE_USERNAME: ${{ secrets.SNOWFLAKE_USERNAME }}
SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}
SNOWFLAKE_DATABASE: ${{ secrets.SNOWFLAKE_DATABASE }}
SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_CI_ACCOUNT }}
SNOWFLAKE_USERNAME: ${{ secrets.SNOWFLAKE_CI_USERNAME }}
SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_CI_PASSWORD }}
SNOWFLAKE_DATABASE: ${{ secrets.SNOWFLAKE_CI_DATABASE }}
SNOWFLAKE_SCHEMA: "public"
BIGQUERY_ACCOUNT_INFO_JSON: ${{ secrets.BIGQUERY_ACCOUNT_INFO_JSON }}
BIGQUERY_DATASET: "test"
Expand Down Expand Up @@ -169,6 +170,35 @@ jobs:
- name: Test with tox
run: |
tox -- soda -k soda/scientific
test-contracts:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version:
- "3.9"

env:
PYTHON_VERSION: ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v3

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install -y libsasl2-dev
python -m pip install --upgrade pip
cat dev-requirements.in | grep tox | xargs pip install

- name: Test with tox
run: |
tox -- soda -k soda/contracts

publish-pypi:
name: Build & Publish Package
if: contains(github.ref, 'refs/tags/')
Expand Down
51 changes: 40 additions & 11 deletions .github/workflows/pr.workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ jobs:
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v3
with:
python-version: '3.11.x'
- uses: pre-commit/[email protected]
with:
extra_args: --all-files
Expand All @@ -35,15 +37,13 @@ jobs:
- "duckdb"
- "dask"


env:
DATA_SOURCE: ${{ matrix.data-source }}
PYTHON_VERSION: ${{ matrix.python-version }}
SNOWFLAKE_HOST: ${{ secrets.SNOWFLAKE_HOST }}
SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
SNOWFLAKE_USERNAME: ${{ secrets.SNOWFLAKE_USERNAME }}
SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}
SNOWFLAKE_DATABASE: ${{ secrets.SNOWFLAKE_DATABASE }}
SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_CI_ACCOUNT }}
SNOWFLAKE_USERNAME: ${{ secrets.SNOWFLAKE_CI_USERNAME }}
SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_CI_PASSWORD }}
SNOWFLAKE_DATABASE: ${{ secrets.SNOWFLAKE_CI_DATABASE }}
SNOWFLAKE_SCHEMA: "public"
BIGQUERY_ACCOUNT_INFO_JSON: ${{ secrets.BIGQUERY_ACCOUNT_INFO_JSON }}
BIGQUERY_DATASET: "test"
Expand All @@ -61,7 +61,7 @@ jobs:
MYSQL_PASSWORD: sodacore
MYSQL_ROOT_PASSWORD: sodacore
SPARK_DF_HOST: ${{ secrets.SPARK_DF_HOST }}

steps:
- uses: actions/checkout@v3

Expand All @@ -81,8 +81,8 @@ jobs:

- name: Test with tox
run: |
tox --exit-and-dump-after 3600 -- soda -k soda/core
tox --exit-and-dump-after 3600 -- soda -k soda/${{ matrix.data-source }}
tox -- soda -k soda/core
tox -- soda -k soda/${{ matrix.data-source }}
env:
test_data_source: ${{ matrix.data-source }}

Expand Down Expand Up @@ -113,7 +113,7 @@ jobs:

- name: Test with tox
run: |
tox --exit-and-dump-after 3600 -- soda -k soda/core
tox -- soda -k soda/core
env:
test_data_source: postgres
WESTMALLE: BETTER_THAN_LA_TRAPPE
Expand Down Expand Up @@ -145,4 +145,33 @@ jobs:

- name: Test with tox
run: |
tox --exit-and-dump-after 3600 -- soda -k soda/scientific
tox -- soda -k soda/scientific

test-contracts:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version:
- "3.9"

env:
PYTHON_VERSION: ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v3

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install -y libsasl2-dev
python -m pip install --upgrade pip
cat dev-requirements.in | grep tox | xargs pip install

- name: Test with tox
run: |
tox -- soda -k soda/contracts
15 changes: 8 additions & 7 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ files: ^soda/
exclude: antlr/
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
rev: v5.0.0
hooks:
- id: trailing-whitespace
- id: check-added-large-files
Expand All @@ -18,24 +18,25 @@ repos:
- id: debug-statements
- id: detect-private-key
- id: end-of-file-fixer
- repo: https://github.com/humitos/mirrors-autoflake.git
rev: v1.1
- repo: https://github.com/PyCQA/autoflake
rev: v2.3.1
hooks:
- id: autoflake
args: ["--in-place", "--remove-all-unused-imports"]
- repo: https://github.com/asottile/pyupgrade
rev: v3.10.1
rev: v3.19.0
hooks:
- id: pyupgrade
args: [--py37-plus]
exclude: _models?\.py$
args: [--py38-plus, --keep-runtime-typing]
- repo: https://github.com/PyCQA/isort
rev: 5.12.0
rev: 5.13.2
hooks:
- id: isort
additional_dependencies: [toml]
name: Sort imports using isort
- repo: https://github.com/psf/black
rev: 23.7.0
rev: 24.10.0
hooks:
- id: black
name: Run black formatter
Expand Down
6 changes: 6 additions & 0 deletions .streamlit/config.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[theme]
primaryColor = "#00D891" # Primary color
backgroundColor = "#F5F7F7" # Background color
# secondaryBackgroundColor = "#00D891" # Color for the sidebar and other secondary backgrounds
textColor = "#262730" # Primary text color
font = "sans serif" # Font style (e.g., "sans serif", "serif", "monospace")
14 changes: 12 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,13 @@
<a href="https://join.slack.com/t/soda-community/shared_invite/zt-m77gajo1-nXJF7JtbbRht2zwaiLb9pg"><img alt="Slack" src="https://img.shields.io/badge/chat-slack-green.svg"></a>
<a href="#"><img src="https://static.pepy.tech/personalized-badge/soda-core?period=total&units=international_system&left_color=black&right_color=green&left_text=Downloads"></a>
</p>
<br />

<hr />

> [!IMPORTANT]
> **🚀 We're hiring! Are you passionate about open-source and love working on projects like Soda Core? Join our team as a Software Engineer and help shape the future of data quality tools. [Apply now!](https://careers.soda.io/o/software-engineer-data-testing-python-data-engineering-mediorsenior?source=gh-core)**

<hr />

&#10004; An open-source, CLI tool and Python library for data quality testing<br />
&#10004; Compatible with the <a href="https://docs.soda.io/soda-cl/soda-cl-overview.html" target="_blank">Soda Checks Language (SodaCL)</a> <br />
Expand All @@ -22,7 +27,12 @@ When it runs a scan on a dataset, Soda Core executes the checks to find invalid,

#### Soda Library

Consider using **[Soda Library](https://docs.soda.io/soda/quick-start-sip.html)**, an extension of Soda Core that offers more features and functionality, and enables you to connect to a [Soda Cloud](https://docs.soda.io/soda-cloud/overview.html) account to collaborate with your team on data quality.
Consider migrating to **[Soda Library](https://docs.soda.io/soda/quick-start-sip.html)**, an extension of Soda Core that offers more features and functionality, and enables you to connect to a [Soda Cloud](https://docs.soda.io/soda-cloud/overview.html) account to collaborate with your team on data quality.
* Use [Group by](https://docs.soda.io/soda-cl/group-by.html) and [Group Evolution](https://docs.soda.io/soda-cl/group-evolution.html) configurations to intelligently group check results
* Leverage [Reconciliation checks](https://docs.soda.io/soda-cl/recon.html) to compare data between data sources for data migration projects.
* Use [Schema Evolution](https://docs.soda.io/soda-cl/schema.html#define-schema-evolution-checks) checks to automatically validate schemas.
* Set up [Anomaly Detection](https://docs.soda.io/soda-cl/anomaly-detection.html) checks to automatically learn patterns and discover anomalies in your data.

[Install Soda Library](https://docs.soda.io/soda-library/install.html) and get started with a 45-day free trial.

<br />
Expand Down
12 changes: 8 additions & 4 deletions dev-requirements.in
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
pip-tools~=6.5
pip-tools~=7.3
pytest~=7.0
python-dotenv~=1.0
tox~=4.6
tox-docker~=4.1
tox~=4.12
tox-docker~=5.0
pytest-html~=3.1
pytest-cov~=3.0
faker~=13.3
tbump~=6.7
tbump~=6.11
black==22.6.0
typing_extensions>=4.3.0,<5
urllib3~=1.26
pygments~=2.11
readme-renderer~=32.0
certifi>=2022.12.07
wheel>=0.38.1
docutils<0.21 # 0.21 dropped py38 support, remove this after py38 support is gone
pre-commit<3.6 # 3.6 dropped py38, remove this after py38 support is gone
requests>=2.32.3

Loading