-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core, Spark3.5: Fix tests failure due to timeout #11654
base: main
Are you sure you want to change the base?
Conversation
core/src/test/java/org/apache/iceberg/hadoop/TestHadoopCommits.java
Outdated
Show resolved
Hide resolved
a43d09b
to
6f8fb28
Compare
core/src/test/java/org/apache/iceberg/hadoop/TestHadoopCommits.java
Outdated
Show resolved
Hide resolved
6f8fb28
to
0d83f3e
Compare
daf014a
to
45e1cd6
Compare
@nastra Please take another look, thanks! |
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMerge.java
Outdated
Show resolved
Hide resolved
45e1cd6
to
9c3d071
Compare
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestDelete.java
Outdated
Show resolved
Hide resolved
7961140
to
55f635a
Compare
@RussellSpitzer or @amogh-jahagirdar can any of you also take a look at this please? |
55f635a
to
b1a0395
Compare
@@ -1609,8 +1611,14 @@ public synchronized void testMergeWithSnapshotIsolation() | |||
createOrReplaceView("source", Collections.singletonList(1), Encoders.INT()); | |||
|
|||
sql( | |||
"ALTER TABLE %s SET TBLPROPERTIES('%s' '%s')", | |||
tableName, MERGE_ISOLATION_LEVEL, "snapshot"); | |||
"ALTER TABLE %s SET TBLPROPERTIES('%s', '%s', '%s', '%s', '%s', '%s')", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you've got too many commas here. I believe this should be ('%s' '%s', '%s' '%s', '%s' '%s')
and same for the other Spark test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand right, to address flakiness of these tests that are timing out, we're reducing a lot of commit properties (like the max retry and the min retry wait), that way we just fail fast? Typically these should not exceed these values and any tests which do, would just end up failing fast.
Do we have any indication on what could be the root cause for the commits in these tests taking so long?
To be clear, I'm open to giving the fix in this PR a try but I think we should be a bit cautious that we're not introducing flakiness the other way where another set of tests end up failing more frequently due to not waiting enough for example.
@amogh-jahagirdar from what I saw in some local debugging is that these specific tests typically took quite a long time due to exponential retries and eventually timed out, hence this is lowering the min/max wait times for these retries in these 3 particular tests. Other tests shouldn't be affected by this |
b1a0395
to
04c4f71
Compare
Sounds good,
Yeah this was the part I was trying to unpack was why are these particular tests retrying a lot? Like I said though, that may be difficult to reason about and we may just want to tune these properties in the interim, so all good! |
This PR attempts to fix following tests failure due to timeout.
#11047
#11066
#11651