-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iceberg fails ACID verification test #10454
Comments
@matthijseikelenboom can you please share details of what exactly is failing in which scenario for the people that aren't running the test? This makes it easier to reason about what the issue is and what your expectation within a particular scenario is. |
@nastra The README of the repo describes in detail what the test does. But in short: The test fails because what it expects to be present on disc, isn't there. On the term "scenario", there is only one. A transaction is generated, is picked up by a writer thread and is then attempted to be inserted in to the data lakehouse. The reader thread(s) then verify whether the transaction has happened and if it happened correctly. |
Is the Catalog Cache in use? |
SparkSessionProvider.java - line 68: If this is the property that you mean, then no, is has been disabled. Though, I've tested it with this property set to true as well, it doesn't make a difference. |
Looking briefly at the code it looks like the error may be in the retry logic for the transaction statements as written. From what I can tell the code does not handle Iceberg Validation exceptions , I had to put away my computer for a flight but will look again tomorrow. |
Yea , I think @RussellSpitzer has a good point. I think this code https://github.com/matthijseikelenboom/iceberg-acid-verification/blob/master/src/main/java/org/example/writer/TransactionWriter.java#L167 swallows the exception. Iceberg isolation was designed to work optimistically by throwing exceptions like ValidationException when it detects concurrent changes to the table that would break the given isolation level. The user then needs to retry the query. Running this test, I saw some instances of IllegalStateException
This is actually different than ValidationException, and is actually thrown because a certain optimization for MERGE INTO (runtime file-filtering) seems to requires the table to be completely up to date. This can be turned off if you wish: spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled=false if you want more lenient behavior that throws exception only if isolation constraints are violated. But for either case, the test should handle the exception and retry. |
In between flights I found another suspect thing which was when a row was deleted the transaction log still attempts to find it rather than check for its absence. Gonna sleep now look at this again later.
|
@szehon-ho I saw that I forgot to copy over the retry mechanism that we build. It has been included in the latest commit. Another interesting thing, is that when you set @RussellSpitzer interesting find, curious to what else you'll find |
I am pretty sure it’s the delete statement failing but I’m too tired, just got back to my house, to figure out why.
|
I didn't go to sleep, here is the error. This is what your delete statements look like:
|
The Delete verification is also wrong
This should be ExpectedRecordAbsence |
I fixed all the bugs and the test suite now passes. I would strongly encourage you to write unit tests for frameworks like this to make sure they behave correctly before submitting issues to the project. https://github.com/matthijseikelenboom/iceberg-acid-verification/pull/1/files |
Oh wow, I'll take this back to the team at work! Thanks for taking the time, but my issue wasn't important enough to not go to sleep I think 😅. About the unit tests, those where written, but I didn't copy all of them over to my repo. I should have, because the |
Closing as this has been resolved |
Apache Iceberg version
1.5.2 (latest release)
Query engine
Spark
Please describe the bug 🐞
Problem statement
For work we had needed to have a concurrent read/write support for our data lake, which uses Spark. We where noticing some inconsistencies, so we wrote a test that can verify whether something like Iceberg adheres to ACID. We did however find that Iceberg fails this test.
Now, it can be that we've wrongly configured Iceberg or that there is some mistake in the test code.
My question is if someone of you can take a look at it, and perhaps can explain what is going wrong here.
To Reproduce
How to run the test and it's findings are described in the README of the repository, but here is a short run down
Steps to reproduce the behavior:
Expected behavior
Environment Description
Additional context
It's worth noting that other solutions, Hudi and Delta Lake, have also been tested this way. Hudi also didn't pass this test, but it was resolved with a bug ticket on GitHub issue. Delta Lake did pass the test.
The text was updated successfully, but these errors were encountered: