Adds ignore_metadata option to assert_approx_df_equality #100
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds ignore_metadata option to assert_approx_df_equality function, to avoid some wrong dataframe comparisons.
For example, when I have two dataframes with this schemas:
root
|-- FORMA_ID: decimal(38,0) (nullable = true)
|-- START_DATE: timestamp (nullable = true)
|-- TYPE: string (nullable = true)
|-- AMOUNT: decimal(38,10) (nullable = true)
|-- datalake_ingestion_timestamp: timestamp (nullable = false)
root
|-- FORMA_ID: decimal(38,0) (nullable = true)
|-- START_DATE: timestamp (nullable = true)
|-- TYPE: string (nullable = true)
|-- AMOUNT: decimal(38,10) (nullable = true)
|-- datalake_ingestion_timestamp: timestamp (nullable = true)
And I perform an approx comparison as:
assert_approx_df_equality(df, expected_df, ignore_nullable=True, precision=1e-8)
Currently I get an error regarding nullable, despite
ignore_nullable
option. By passing ignore_metadata option the test is success