[FEATURE] `ColumnValuesNonNull` and `ColumnValuesNonNullCount` metrics #10959

NathanFarmer · 2025-02-21T19:48:54Z

Implement ColumnValuesNonNull and ColumnValuesNonNullCount metrics
Add special case for return type of Metrics with names that end in condition so they don't also return domain and value kwargs
Use new backend-specific testing pattern established by Expectation integration testing framework
Move integration test files into tests/integration/metrics like Expectation integration testing framework

Description of PR changes above includes a link to an existing GitHub issue
PR title is prefixed with one of: [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], [CONTRIB]
Code is linted - run invoke lint (uses ruff format + ruff check)
Appropriate tests and docs have been updated

…e than 1 of the same metric

…to return

…ctations/great_expectations into f/gx-40/batch-compute-metrics

great_expectations/datasource/fluent/interfaces.py

great_expectations/metrics/metric_results.py

billdirks

Thanks for doing this! Overall it looks good. Do we need to delete between? That is, can we have the null metrics in addition to between?

billdirks · 2025-02-25T00:07:07Z

great_expectations/datasource/fluent/interfaces.py

+                value = raw_metrics[metric_name][1]
+                # "condition" metrics return the domain and value kwargs
+                # we just want the value, which is the first item in the tuple
+                if metric_name.endswith(MetricNameSuffix.CONDITION) and isinstance(value, tuple):
+                    value = value[0]


nit: I'd move this logic into a helper method, eg parse_metric_value.

I make make an equivalent one for parse_metric_config_id.

These seem like lower level details than this method reads.

I think I understand the ask here. I refactored some of this into 2 helper methods and added NamedTuples to make it clearer what's going on.

billdirks · 2025-02-25T00:09:36Z

great_expectations/metrics/__init__.py

@@ -1,3 +1,3 @@
 from .batch.row_count import BatchRowCount
-from .column_values.between import ColumnValuesBetween
+from .column_values.non_null import ColumnValuesNonNull, ColumnValuesNonNullCount


Why are we deleting between? Isn't this a new metric in addition to between?

Between was there to demonstrate how it would all look, but then I learned we needed to add this (different) metric to support another epic. Between was never tested, which is really the bulk of the remaining work at this point.

great_expectations/metrics/metric_results.py

joshua-stauffer · 2025-02-25T12:15:17Z

great_expectations/metrics/metric_results.py

+class ConditionValues(MetricResult[Union[pd.Series, "pyspark.sql.Column", "BinaryExpression"]]):
+    @classmethod
+    def validate_value_type(cls, value):
+        if isinstance(value, pd.Series):
+            return value
+
+        try:
+            from great_expectations.compatibility.pyspark import pyspark
+
+            if isinstance(value, pyspark.sql.Column):
+                return value
+        except (ImportError, AttributeError):
+            pass
+
+        try:
+            from great_expectations.compatibility.sqlalchemy import BinaryExpression
+
+            if isinstance(value, BinaryExpression):
+                return value
+        except (ImportError, AttributeError):
+            pass
+
+        raise ConditionValuesValueError(type(value))
+
+    @classmethod
+    def __get_validators__(cls):
+        yield cls.validate_value_type


👏 nice workaround!

joshua-stauffer

looks good, thanks for fixing that gnarly type

joshua-stauffer · 2025-02-25T14:34:42Z

tests/integration/metrics/column_values/test_non_null.py

+PANDAS_DATA_SOURCES: Sequence[DataSourceTestConfig] = [
+    PandasFilesystemCsvDatasourceTestConfig(),
+    PandasDataFrameDatasourceTestConfig(),
+]
+
+SPARK_DATA_SOURCES: Sequence[DataSourceTestConfig] = [
+    SparkFilesystemCsvDatasourceTestConfig(),
+]
+
+SQL_DATA_SOURCES: Sequence[DataSourceTestConfig] = [
+    BigQueryDatasourceTestConfig(),
+    DatabricksDatasourceTestConfig(),
+    MSSQLDatasourceTestConfig(),
+    PostgreSQLDatasourceTestConfig(),
+    SnowflakeDatasourceTestConfig(),
+    SqliteDatasourceTestConfig(),
+]


i suggest we move these to a common conftest module so we can test every metric against a standard set of backends.

We should probably restrict this list to only test against data sources which are officially supported by core. As far as I know, MSSQL is not on that list.

joshua-stauffer · 2025-02-25T14:43:55Z

tests/integration/metrics/column_values/test_non_null.py

+        data_source_configs=PANDAS_DATA_SOURCES,
+        data=DATA_FRAME,
+    )
+    def test_success_pandas(self, batch_for_datasource) -> None:


can we type these test params?

NathanFarmer and others added 30 commits February 18, 2025 21:33

Working with test against broad return type

8eeac41

Narrow types for result

adc349d

Merge branch 'develop' into f/gx-40/batch-compute-metrics

0254387

Merge branch 'develop' into f/gx-40/batch-compute-metrics

66d8321

Remove table parameter from all domains

49f5660

Cleanup

324c44c

Add comment to Metric.name

e3ffdd6

Only return requested metrics

890e978

Documentation

c1cdedc

Add assertions that uncovered bug in metrics calculator returning mor…

e687dc3

…e than 1 of the same metric

Update existing tests

1fd135c

Time complexity optimization

7d351c6

Fix type error

a7cee88

Merge branch 'develop' into f/gx-40/batch-compute-metrics

0f3c584

Update metrics package to include BatchRowCount

ae1491f

Check all Metrics in the metrics model to find out which result type …

2cec1db

…to return

Add metrics package string contstant

263b0fe

Merge branch 'f/gx-40/batch-compute-metrics' of github.com:great-expe…

c573b56

…ctations/great_expectations into f/gx-40/batch-compute-metrics

Add list[Metric] type annotation to override inferred type

71c3dce

Merge branch 'develop' into f/gx-40/batch-compute-metrics

3b8d938

Add test for metric error and MetricErrorResultValue type

3937cf7

Remove unused type ignore

5b8ca7f

Update test

fff3d3b

Narrow types

cdb269d

Remove type ignore

7b21548

Add a docstring to Batch.compute_metrics()

ee74cd5

Add type overload to Batch.compute_metrics()

7eeeb6b

Update typing in test

9e8d720

Add a note about mypy bug

50acf77

Spacing

5b7e6d5

NathanFarmer added 4 commits February 21, 2025 22:00

Pass engine instead of conn to create and drop

424f5f9

Revert all sql test change commits

135bee3

Don't test mysql

e98e838

Don't test mysql

1c037aa

joshua-stauffer reviewed Feb 24, 2025

View reviewed changes

great_expectations/datasource/fluent/interfaces.py Outdated Show resolved Hide resolved

Add comment about discarding domain and value kwargs

6fa7bfe

NathanFarmer requested review from joshua-stauffer and billdirks February 24, 2025 16:34

Improve comment

755b5a1

joshua-stauffer reviewed Feb 24, 2025

View reviewed changes

great_expectations/metrics/metric_results.py Outdated Show resolved Hide resolved

NathanFarmer requested a review from joshua-stauffer February 24, 2025 19:26

NathanFarmer added 6 commits February 24, 2025 14:27

Add custom validator to ConditionValues

af1106b

Use correct syntax

7e3756b

Use correct syntax

649fbb1

Remove uneccessary comments

c587528

Add assertion message

825b76b

Add assertion message

4f38809

billdirks reviewed Feb 25, 2025

View reviewed changes

Add logic to helper methods and add some NamedTuples

7cfaaa5

NathanFarmer requested a review from billdirks February 25, 2025 05:20

NathanFarmer added 2 commits February 24, 2025 23:16

Add assertion in tests

176fabb

Reorganize

6ca5ca6

joshua-stauffer reviewed Feb 25, 2025

View reviewed changes

joshua-stauffer approved these changes Feb 25, 2025

View reviewed changes

NathanFarmer added this pull request to the merge queue Feb 25, 2025

joshua-stauffer reviewed Feb 25, 2025

View reviewed changes

Merged via the queue into develop with commit 7790b6c Feb 25, 2025
100 checks passed

NathanFarmer deleted the f/gx-43/add-column-values-metric branch February 25, 2025 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] `ColumnValuesNonNull` and `ColumnValuesNonNullCount` metrics #10959

[FEATURE] `ColumnValuesNonNull` and `ColumnValuesNonNullCount` metrics #10959

NathanFarmer commented Feb 21, 2025 •

edited

Loading

billdirks left a comment

billdirks Feb 25, 2025

NathanFarmer Feb 25, 2025

billdirks Feb 25, 2025

NathanFarmer Feb 25, 2025

joshua-stauffer Feb 25, 2025

joshua-stauffer left a comment

joshua-stauffer Feb 25, 2025

joshua-stauffer Feb 25, 2025

[FEATURE] ColumnValuesNonNull and ColumnValuesNonNullCount metrics #10959

[FEATURE] ColumnValuesNonNull and ColumnValuesNonNullCount metrics #10959

Conversation

NathanFarmer commented Feb 21, 2025 • edited Loading

billdirks left a comment

Choose a reason for hiding this comment

billdirks Feb 25, 2025

Choose a reason for hiding this comment

NathanFarmer Feb 25, 2025

Choose a reason for hiding this comment

billdirks Feb 25, 2025

Choose a reason for hiding this comment

NathanFarmer Feb 25, 2025

Choose a reason for hiding this comment

joshua-stauffer Feb 25, 2025

Choose a reason for hiding this comment

joshua-stauffer left a comment

Choose a reason for hiding this comment

joshua-stauffer Feb 25, 2025

Choose a reason for hiding this comment

joshua-stauffer Feb 25, 2025

Choose a reason for hiding this comment

[FEATURE] `ColumnValuesNonNull` and `ColumnValuesNonNullCount` metrics #10959

[FEATURE] `ColumnValuesNonNull` and `ColumnValuesNonNullCount` metrics #10959

NathanFarmer commented Feb 21, 2025 •

edited

Loading