Incorrect validation_result["results"]["exception_info"] structure when raised_exception == True #10849

vasilijyaromenka · 2025-01-13T08:49:30Z

Describe the bug
When raised_exception == True, exception_info has incorrect structure.
Instead of {'raised_exception': True, 'exception_traceback': 'The traceback', 'exception_message': 'some message'}, it has the following structure:
{"additional_key" : {'raised_exception': True, 'exception_traceback': 'The traceback', 'exception_message': 'some message'}}

To Reproduce

# df to validate 
df = spark.sql("""
            SELECT  id , CASE WHEN id%4 = 0 THEN "NOT NULL" END AS colname
            FROM  range(1, 100)""")

# update expectation suite
suite_name = "e_simple_unit_test"

suite = context.suites.add_or_update (gx.ExpectationSuite(name=suite_name))

correct_column_name = gx.expectations.ExpectColumnValuesToNotBeNull (
    column="colname", mostly=1, row_condition = "id%2 = 0", condition_parser = "spark")
    
incorrect_column_name = gx.expectations.ExpectColumnValuesToNotBeNull (
    column="___colname___", mostly=1, row_condition = "id%2 = 0", condition_parser = "spark")

suite.add_expectation(correct_column_name)
suite.add_expectation(incorrect_column_name)

suite.save()

# update validation
data_source_name = data_source_configs["data_source_name"]
data_asset_name = data_source_configs["data_asset_name"]
batch_definition_name = data_source_configs["batch_definition_name"]

batch_definition = context.data_sources.get(data_source_name).get_asset(data_asset_name).get_batch_definition(batch_definition_name)
validation_definition_name = "unit_test_validation_definition"

validation_definition = gx.ValidationDefinition(
data=batch_definition, suite=suite, name=validation_definition_name
)

unit_test_validation_definition = context.validation_definitions.add_or_update(validation_definition)

# run the ValidationDefinition
validation_results = unit_test_validation_definition.run(
                                    batch_parameters={"dataframe": df}, 
                                    result_format = "COMPLETE")
results_dict = validation_results.to_json_dict()

for dct in results_dict["results"]:
    if "exception_message" in dct["exception_info"].keys():
        print("\nCorrect exception_info structure:")
    elif  "exception_message" not in dct["exception_info"].keys():
        print("\nInorrect exception_info structure:")

    print(dct["exception_info"])

returns -- >


Inorrect exception_info structure:
{"('column_values.nonnull.condition', '242ce27d28b7ac28fe08ad7be0377b1a', ())": {'exception_traceback': 'Traceback.......', 'exception_message': 'Error: The column "___colname___" in BatchData does not exist.', 'raised_exception': True}}

Correct exception_info structure:
{'raised_exception': False, 'exception_traceback': None, 'exception_message': None}

Expected behavior

Correct exception_info structure:
{'raised_exception': True, 'exception_traceback': 'Traceback.......', 'exception_message': 'Error: The column "___colname___" in BatchData does not exist.'}

Correct exception_info structure:
{'raised_exception': False, 'exception_traceback': None, 'exception_message': None}

Environment (please complete the following information):

Great Expectations Version: [e.g. 1.3.1]
Data Source: Spark
Cloud environment: Databricks

The text was updated successfully, but these errors were encountered:

adeola-ak · 2025-01-22T13:10:28Z

thanks for reaching out, I have shared this with my team -- please follow for updates

adeola-ak · 2025-01-29T13:49:50Z

can you briefly share how this has affected you, has it made it harder to pull out keys from the result?

vasilijyaromenka · 2025-01-30T08:38:28Z

Hi @adeola-ak ,

I don't use data docs. I parse and save GE results into my log table instead.

# run the ValidationDefinition
validation_results = my_validation_definition.run(
                                        batch_parameters={"dataframe": data_frame_to_check}, 
                                        result_format = "COMPLETE")

# convert the results to a dictionary
validation_results_dict= validation_results.to_json_dict()


# create a DF to save later in a log table
df = spark.createDataFrame(validation_results_dict, schema=ge_results_schema)

I used a similar approach with v0.18 and was able to see which checks had failed because of errors and what exactly those errors were (column was not found or smth else)

Now, because of this additional key that is every time different, I cannot pull out error values nor raised_exception boolean result

adeola-ak added this to GX Core Issues Board Jan 22, 2025

github-project-automation bot moved this to To Do in GX Core Issues Board Jan 22, 2025

adeola-ak moved this from To Do to In progress in GX Core Issues Board Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect validation_result["results"]["exception_info"] structure when raised_exception == True #10849

Incorrect validation_result["results"]["exception_info"] structure when raised_exception == True #10849

vasilijyaromenka commented Jan 13, 2025

adeola-ak commented Jan 22, 2025

adeola-ak commented Jan 29, 2025

vasilijyaromenka commented Jan 30, 2025

Incorrect validation_result["results"]["exception_info"] structure when raised_exception == True #10849

Incorrect validation_result["results"]["exception_info"] structure when raised_exception == True #10849

Comments

vasilijyaromenka commented Jan 13, 2025

adeola-ak commented Jan 22, 2025

adeola-ak commented Jan 29, 2025

vasilijyaromenka commented Jan 30, 2025