Releases · databrickslabs/dqx

19 Mar 17:15

mwojtyczka

v0.3.0

1dd0390

v0.3.0 Latest

Latest

Added sampling to the profiler (#303). The profiler's performance has been significantly improved in this release through the addition of sampling and limiting the input data. The profiler now samples input data with a 30% sampling factor and limits the number of records to 1000 by default, reducing the amount of data processed and enhancing performance. These changes are configurable and can be customized. This resolves issue #215.
Added support for complex column types like struct, map and array. The support is added by extending the col_name to accept expressions (#214). Comprehensive examples have been included in the demo and documentation on how to apply checks on complex types.
Fixed profiler bug when trying to cast a decimal string to int (#211). This modification resolves issue #172 and ensures proper handling of decimal strings during the casting process. This enhancement improves the profiler's robustness and flexibility when processing different data types, specifically integers and decimals.
Renamed DQRule to DQColRule, and DQRuleColSet to DQColSetRule (#300). In this release, the class names DQRule and DQRuleColSet have been renamed to DQRuleCol and DQColSetRule, respectively, to support the addition of more rule types in the future, such as DQDatasetRule. The renaming includes corresponding changes in imports and method calls throughout the codebase. A deprecation warning has been added to the old classes. In addition, the col_functions module has been renamed to col_check_functions. This introduces a breaking change!. It is recommended to to update any references to the old class names in your code to ensure a smooth transition.
Trim autogenerated check name to 255 chars (#301). This change ensures that potential issues arising from long check names are avoided by truncating the auto-generated check name to a reasonable lenght.
Updated sql expression logic (#212). In this release, the SQL expression logic in our data quality library has been updated to cause the sql_expression check to fail if the condition is not met, introducing a potential breaking change.
Added context info to output (#206). Additional context information is now added to the results of quality checks, including name, message, column name, filter, function, runtime, and user-provided metadata for every failed check. This allows users to provide custom metadata that is stored in the reporting columns for failed checks. This change is a breaking change for checks defined using classes! It is advised to consult the latest documentation for the updated syntax of defining checks using DQX classes.

Contributors: @mwojtyczka, @ghanse, @pierre-monnet

Contributors

mwojtyczka, pierre-monnet, and ghanse

Assets 4

10 Mar 11:04

github-actions

v0.2.0

bce1d41

v0.2.0

Added uniqueness check(#200). A uniqueness check has been added, which reports an issue for each row containing a duplicate value in a specified column. This resolves issue 154.
Added sql expression support for limits in not less and not greater than checks, and updated docs (#200). This commit introduces several changes to simplify and enhance data quality checking in PySpark workloads for both streaming and batch data. The naming conventions of rule functions have been unified, and the is_not_less_than and is_not_greater_than functions now accept column names or expressions as limits. The input parameters for range checks have been unified, and the logic of is_not_in_range has been updated to be inclusive of the boundaries. The project's documentation has been improved, with the addition of comprehensive examples, and the contribution guidelines have been clarified. This change includes a breaking change for some of the checks. Users are advised to review and test the changes before implementation to ensure compatibility and avoid any disruptions. Resolves issues: 131, 197, 175, 205
Include predefined check functions by default when applying custom checks by metadata (#203). The data quality engine has been updated to include predefined check functions by default when applying custom checks using metadata in the form of YAML or JSON. This change simplifies the process of defining custom checks, as users no longer need to specify globals(). The default behavior now is to import all predefined checks. The validate_checks method has been updated to accept a dictionary of custom check functions instead of global variables. However, globals() can still be specified for backward compatibility. This improvement resolves the issue #48.

Contributors: @mwojtyczka

Contributors

mwojtyczka

Assets 4

27 Feb 13:52

mwojtyczka

v0.1.13

8170268

v0.1.13

Fixed cli installation and demo (#177). In this release, changes have been made to adjust the dashboard name, ensuring compliance with new API naming rules. The dashboard name now only contains alphanumeric characters, hyphens, or underscores, and the reference section has been split for clarity. In addition, demo for the tool has been updated to work regardless if a path or UC table is provided in the config. Furthermore, documentation has been refactored and udpated to improve clarity. The following issue have been closed: #171 and #198. It may be required to uninstall and install DQX again to redeploy the dashboard.
[Feature] Update is_(not)_in_range (#87) to support max/min limits from col (#153). In this release, the is_in_range and is_not_in_range quality rule functions have been updated to support a column as the minimum or maximum limit, in addition to a literal value. This change is accomplished through the introduction of optional min_limit_col_expr and max_limit_col_expr arguments, allowing users to specify a column expression as the minimum or maximum limit. Extensive testing, including unit tests and integration tests, has been conducted to ensure the correct behavior of the new functionality. These enhancements offer increased flexibility when defining quality rules, catering to a broader range of use cases and scenarios.

Contributors: @karthik-ballullaya-db, @mwojtyczka

Contributors

mwojtyczka and karthik-ballullaya-db

Assets 4

13 Feb 18:38

mwojtyczka

v0.1.12

c6d9c5f

v0.1.12

Fixed installation process for Serverless (#150). This commit removes the pyspark dependency from the librar to avoid spark version conflicts in Serverless and future DBR versions. CLI has been updated to install pyspark for local command execution.
Updated demos and documentation (#169). In this release, the quality checks in the demos have been updated to better showcase the capabilities of DQX. Documentation has been updated in various places for increase clarity. Additional contributing guides have been added.

Contributors: @mwojtyczka

Contributors

mwojtyczka

Assets 4

12 Feb 12:14

github-actions

v0.1.11

8710329

v0.1.11

What's Changed

Provided option to customize reporting column names (#127). In this release, the DQEngine library has been enhanced to allow for customizable reporting column names. A new constructor has been added to DQEngine, which accepts an optional ExtraParams object for extra configurations. A new Enum class, DefaultColumnNames, has been added to represent the columns used for error and warning reporting. New tests have been added to verify the application of checks with custom column naming. These changes aim to improve the customizability, flexibility, and user experience of DQEngine by providing more control over the reporting columns and resolving issue #46. Contributors @hrfmartins @mwojtyczka
Fixed parsing error when loading checks from a file (#165). In this release, we have addressed a parsing error that occurred when loading checks (data quality rules) from a file, fixing issue #162. The specific issue being resolved is a SQL expression parsing error. The changes include refactoring tests to eliminate code duplication and improve maintainability, as well as updating method and variable names to use filepath instead of "path". Additionally, new unit and integration tests have been added and manually tested to ensure the correct functionality of the updated code. Contributors @mwojtyczka
Removed usage of try_cast spark function from the checks to make sure DQX can be run on more runtimes (#163). In this release, we have refactored the code to remove the usage of the try_cast Spark function and replace it with cast and isNull checks to improve code compatibility, particularly for runtimes where try_cast is not available. The affected functionality includes null and empty column checks, checking if a column value is in a list, and checking if a column value is a valid date or timestamp. We have added unit and integration tests to ensure functionality is working as intended. Contributors @mwojtyczka
Added filter to rules so that you can make conditional checks (#141). The filter serves as a condition that data must meet to be evaluated by the check function. The filters restrict the evaluation of checks to only apply to rows that meet the specified conditions. This feature enhances the flexibility and customizability of data quality checks in the DQEngine. Contributors @pierre-monnet @mwojtyczka

Full Changelog: v0.1.8...v0.1.11

Contributors

mwojtyczka, hrfmartins, and pierre-monnet

Assets 4

04 Feb 11:40

github-actions

v0.1.10

fb40b2d

v0.1.10

What's Changed

Fixed docs-build by @mwojtyczka in #129
Patch user agent by @sundarshankar89 in #121
New dashboard query, Update to demos and docs by @mwojtyczka in #133
Support datetime arguments for column range functions by @ghanse in #142
DQX engine refactor and docs update by @mwojtyczka in #138
Add column functions to check for valid date strings by @ghanse in #144
Generate rules for DLT as Python dictionary by @alexott in #148
Make DQX compatible with Serverless by @mwojtyczka in #147

Full Changelog: v0.1.8...v0.1.10

Contributors

alexott, mwojtyczka, and 2 other contributors

Assets 4

24 Jan 18:36

github-actions

v0.1.9

b5b7cd8

v0.1.9

What's Changed

Fixed docs-build by @mwojtyczka in #129
Patch user agent by @sundarshankar89 in #121
New dashboard query, Update to demos and docs by @mwojtyczka in #133

Full Changelog: v0.1.8...v0.1.9

Contributors

mwojtyczka and sundarshankar89

Assets 4

23 Jan 16:48

github-actions

v0.1.8

236ca71

v0.1.8

What's Changed

Updated docs by @mwojtyczka in #117
added search for docs by @sundarshankar89 in #119
✨ improve docs styling by @renardeinside in #118
Add Dashboard as Code, DQX Data Quality Summmary Dashboard by @nehamilak-db in #86
updated profiling documentation with cost consideration by @canan-girgin in #126
Release v0.1.8 by @mwojtyczka in #128

Full Changelog: v0.1.7...v0.1.8

Contributors

renardeinside, mwojtyczka, and 3 other contributors

Assets 4

21 Jan 17:47

github-actions

v0.1.7

5119df9

v0.1.7

What's Changed

Set cache invalidation for pypi badge by @mwojtyczka in #102
Correct handling of Decimal, Short and Byte types by @alexott in #103
✨ introduce docs by @renardeinside in #104
Rollback for readme and contributing by @mwojtyczka in #112
🛠️ fix docs path by @renardeinside in #111
Updated runner for docs release by @mwojtyczka in #113
🔧 fix runner for docs deployment by @renardeinside in #114
Updated docs by @mwojtyczka in #115
Release v0.1.7 by @mwojtyczka in #116

Full Changelog: v0.1.6...v0.1.7

Contributors

alexott, renardeinside, and mwojtyczka

Assets 4

17 Jan 11:48

github-actions

v0.1.6

4c4f934

v0.1.6

What's Changed

Fix for image links in README on PyPi by @alexott in #95
added test methods for InstallationMixin.py, log.py and dlt_rules by @canan-girgin in #93
issue 47 - new check is_not_null_and_not_empty_array and fixed timestamp mismatch issue in profiler by @dinbab1984 in #98
Updated logo by @mwojtyczka in #96
Release v0.1.6 by @mwojtyczka in #101

Full Changelog: v0.1.5...v0.1.6

Contributors

alexott, mwojtyczka, and 2 other contributors

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributors

Contributors

Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: databrickslabs/dqx

v0.3.0

Contributors

v0.2.0

Contributors

v0.1.13

Contributors

v0.1.12

Contributors

v0.1.11

What's Changed

Contributors

v0.1.10

What's Changed

Contributors

v0.1.9

What's Changed

Contributors

v0.1.8

What's Changed

Contributors

v0.1.7

What's Changed

Contributors

v0.1.6

What's Changed

Contributors