-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CASSANDRA-20248: Fix failing repair test #3828
base: cep-15-accord
Are you sure you want to change the base?
Conversation
Patch by Alex Petrov; reviewed by David Capwell for CASSANDRA-20248.
@@ -150,6 +150,8 @@ public static void setupCluster() throws IOException | |||
if (throwable.getClass().toString().contains("InstanceShutdown") || // can't check instanceof as it is thrown by a different classloader | |||
throwable.getMessage() != null && throwable.getMessage().contains("Parent repair session with id")) | |||
return true; | |||
if (throwable.getMessage().contains("Corrupted: Corrupted")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what i don't get, why is this passing in trunk and been stable there, but cep-15-accord is unstable? What is unique about cep-15-accord that makes this unstable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the fact this error is making it to this code path means we are not handling the error, and we have been since 4.0... why is cep-15-accord not handling this error anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the root cause. CASSANDRA-20069 removed the error handling that this test depends on, so we need to revert that to get the test passing, not telling the test to ignore this exception.
jvm-dtest should do
# org.apache.cassandra.distributed.impl.AbstractCluster#uncaughtExceptions
InstanceClassLoader cl = (InstanceClassLoader) thread.getContextClassLoader();
get(cl.getInstanceId()).uncaughtException(thread, error);
this causes us to hit our failure detection logic, which leads to C* to halt the JVM due to stability issues (what this test is testing). Without that logic org.apache.cassandra.distributed.impl.InstanceKiller$InstanceShutdown
doesn't get thrown and this test fails (because we stopped handling the exception that this test is trying to make sure we do)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test looks to show that we stopped handling an exception we are supposed to handle, so telling the test to ignore it seems like the wrong behavior.
@dcapwell could you take another look? We do of course trigger
|
d7764b3
to
fee0e64
Compare
fee0e64
to
a3a37f3
Compare
No description provided.