You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to write some test cases to validate the data between a .parquet file in s3 and target (hive table). I have loaded the .parquet data into one dataframe and the hive table data into another dataframe. When I now try to compare the schema of the two dataframes, using 'assertSmallDataFrameEquality' it returns false, eventhough schema is same. Not sure why it is failing. Any suggestions would be helpful?
The text was updated successfully, but these errors were encountered:
The spark-fast-tests library defines the assertSmallDataFrameEquality method that checks if the schema and data in two DataFrames is equal. In your case, the schemas might be the same, but the data might be different.
This project, spark-daria, contains a validateSchema method that's defined here to make sure two schemas are the same.
If you only want to confirm schema equality, then validateSchema will probably be more useful. You can always print out the schemas of both DataFrames with the printSchema method and manually compare the differences. Hopefully this helps!
I am trying to write some test cases to validate the data between a .parquet file in s3 and target (hive table). I have loaded the .parquet data into one dataframe and the hive table data into another dataframe. When I now try to compare the schema of the two dataframes, using 'assertSmallDataFrameEquality' it returns false, eventhough schema is same. Not sure why it is failing. Any suggestions would be helpful?
The text was updated successfully, but these errors were encountered: