Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect validation_result["results"]["exception_info"] structure when raised_exception == True #10849

Open
vasilijyaromenka opened this issue Jan 13, 2025 · 3 comments

Comments

@vasilijyaromenka
Copy link

Describe the bug
When raised_exception == True, exception_info has incorrect structure.
Instead of {'raised_exception': True, 'exception_traceback': 'The traceback', 'exception_message': 'some message'}, it has the following structure:
{"additional_key" : {'raised_exception': True, 'exception_traceback': 'The traceback', 'exception_message': 'some message'}}

To Reproduce

# df to validate 
df = spark.sql("""
            SELECT  id , CASE WHEN id%4 = 0 THEN "NOT NULL" END AS colname
            FROM  range(1, 100)""")

# update expectation suite
suite_name = "e_simple_unit_test"

suite = context.suites.add_or_update (gx.ExpectationSuite(name=suite_name))

correct_column_name = gx.expectations.ExpectColumnValuesToNotBeNull (
    column="colname", mostly=1, row_condition = "id%2 = 0", condition_parser = "spark")
    
incorrect_column_name = gx.expectations.ExpectColumnValuesToNotBeNull (
    column="___colname___", mostly=1, row_condition = "id%2 = 0", condition_parser = "spark")

suite.add_expectation(correct_column_name)
suite.add_expectation(incorrect_column_name)

suite.save()

# update validation
data_source_name = data_source_configs["data_source_name"]
data_asset_name = data_source_configs["data_asset_name"]
batch_definition_name = data_source_configs["batch_definition_name"]

batch_definition = context.data_sources.get(data_source_name).get_asset(data_asset_name).get_batch_definition(batch_definition_name)
validation_definition_name = "unit_test_validation_definition"

validation_definition = gx.ValidationDefinition(
data=batch_definition, suite=suite, name=validation_definition_name
)

unit_test_validation_definition = context.validation_definitions.add_or_update(validation_definition)

# run the ValidationDefinition
validation_results = unit_test_validation_definition.run(
                                    batch_parameters={"dataframe": df}, 
                                    result_format = "COMPLETE")
results_dict = validation_results.to_json_dict()

for dct in results_dict["results"]:
    if "exception_message" in dct["exception_info"].keys():
        print("\nCorrect exception_info structure:")
    elif  "exception_message" not in dct["exception_info"].keys():
        print("\nInorrect exception_info structure:")

    print(dct["exception_info"])

returns -- >


Inorrect exception_info structure:
{"('column_values.nonnull.condition', '242ce27d28b7ac28fe08ad7be0377b1a', ())": {'exception_traceback': 'Traceback.......', 'exception_message': 'Error: The column "___colname___" in BatchData does not exist.', 'raised_exception': True}}

Correct exception_info structure:
{'raised_exception': False, 'exception_traceback': None, 'exception_message': None}

Expected behavior

Correct exception_info structure:
{'raised_exception': True, 'exception_traceback': 'Traceback.......', 'exception_message': 'Error: The column "___colname___" in BatchData does not exist.'}

Correct exception_info structure:
{'raised_exception': False, 'exception_traceback': None, 'exception_message': None}

Environment (please complete the following information):

  • Great Expectations Version: [e.g. 1.3.1]
  • Data Source: Spark
  • Cloud environment: Databricks
@adeola-ak
Copy link
Contributor

thanks for reaching out, I have shared this with my team -- please follow for updates

@adeola-ak
Copy link
Contributor

can you briefly share how this has affected you, has it made it harder to pull out keys from the result?

@vasilijyaromenka
Copy link
Author

Hi @adeola-ak ,

I don't use data docs. I parse and save GE results into my log table instead.

# run the ValidationDefinition
validation_results = my_validation_definition.run(
                                        batch_parameters={"dataframe": data_frame_to_check}, 
                                        result_format = "COMPLETE")

# convert the results to a dictionary
validation_results_dict= validation_results.to_json_dict()


# create a DF to save later in a log table
df = spark.createDataFrame(validation_results_dict, schema=ge_results_schema)

I used a similar approach with v0.18 and was able to see which checks had failed because of errors and what exactly those errors were (column was not found or smth else)

Now, because of this additional key that is every time different, I cannot pull out error values nor raised_exception boolean result

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

No branches or pull requests

2 participants