-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Severe thread contention while running large test with parallel methods #772
Comments
I took a look at the source and I'm not sure this is the correct interpretation. Do you have the full thread dump? Any way you can reproduce this on a small project? |
I think you're right, it is just taking a very long time for it to complete the test, and the threads are blocked because another thread is busy inserting into the graph. I looked at the log again and I see that it's only partially done. I'm not sure why it's so slow though, when I ran this in JUnit, the tests execute very quickly. (Full test runs in 1-2 hours, in testng, it's been running for 11+ hours now). So this is not a bug, feel free to close the issue. I'm sure it's something with my test. Thanks for the quick response. |
Thanks for the clarification. Not sure I want to close this just yet, I'm not convinced yet that this is not a bug in TestNG, especially if you say that the same test runs fast in JUnit. Can you double check this? I'm a bit confused, since if the same tests are running in JUnit, then they are all independent (since JUnit doesn't support dependent tests) which means that the DynamicGraph use in TestNG should be minimal (this graph is only used when you have dependent test methods). Can you report back on what you find? I want to make sure there's not some hidden slow algorithm in TestNG. |
Let me start the jvm with a code profiler and run again. Here are the particulars of the test:
I'll do a little more research and get back to you. |
Here is the execution profiler.
21.66% is spent on executing the test, and 61% is spent inserting into the graph set. (the rest is spent on our caching implementation, which suffered a disconnect to our dev environment and is busy rebuilding it's caches). Do you know why we would be inserting into that graph? |
It looks like the hashCode is based on the method, which would be the same for 100,000 entries. So each time it adds to the HashSet, it has to iterate over 100,000 instances in order to find that the object is not in the set? I will experiment with making the hashCode more unique... |
Can you turning off parallelism completely? I'm curious what this will change. |
I turned off parallelism and now it's running about 1000 tests every 15 seconds. I don't see any of the delays I was seeing before. |
Doesn't sound like deadlock so much as severe lock contention. On Tue, Aug 4, 2015 at 5:04 PM, Ben La Monica [email protected]
|
That was my intuition, glad to see it confirmed. Like @rschmitt is saying, it's most likely a contention problem. |
Here is a simplified example that illustrates the problem. import static org.junit.Assert.assertNotNull;
import java.util.concurrent.atomic.AtomicInteger;
import org.testng.annotations.AfterMethod;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Factory;
import org.testng.annotations.Test;
public class ContentionTest {
private Integer testData;
@Factory(dataProvider="data")
public ContentionTest(Integer data) {
testData = data;
}
@DataProvider(name="data", parallel=true)
public static Object[][] getData() {
Integer[][] data = new Integer[600000][1];
for(int i = 0; i < 600000; i++) {
data[i] = new Integer[] { i };
}
return data;
}
private static AtomicInteger testCount = new AtomicInteger(0);
@AfterMethod
public void countTests() {
int count = testCount.incrementAndGet();
if (count % 1000 == 0) {
System.out.println(count + " tests completed.");
}
}
@Test
public void one() {
assertNotNull("obj", testData);
}
@Test
public void two() {
assertNotNull("obj", testData);
}
@Test
public void three() {
assertNotNull("obj", testData);
}
@Test
public void four() {
assertNotNull("obj", testData);
}
@Test
public void five() {
assertNotNull("obj", testData);
}
@Test
public void six() {
assertNotNull("obj", testData);
}
} And the suite: <suite name="contention suite">
<test name="contention tests" parallel="methods" thread-count="6">
<classes>
<class name="ContentionTest"/>
</classes>
</test>
</suite> |
Unfortunately I don't have any more time to devote to this issue. I'm going to have to go and find a different solution. Let me know if I can contribute any more info to you. Thanks for the great testing framework! |
Thanks for the PR and your efforts, @benlamonica, I'll take it from there! |
@cbeust, @juherr the similar fix of #775 has been broken down into #848, #849, #850, #852. Each one of them improves different path of the code path that makes #772 to take such a long time to execute. If you run #772 tests on the code before and after the fix, you will see that the time it takes to execute the tests are significantly reduced. |
@testn The problem is we have no proof it or any idea how much we can. Then, it will take time to reproduce and test it on our side. The hardest and most important thing with perf issues are not the fix, but the way to reproduce and to measure it easily. But maybe you can help us? |
@juherr, just try it with a lower number. In my case, 6000 is already big enough to see the problem. If you want to start validating issues, you probably need to create a lot of test cases for performance testing in which I don't see it much in TestNG. |
@benlamonica , can you help provide the result of the test? |
v 6.9.13.3
|
@shaburov Do you have a way to reproduce it every time? |
@juherr But I can guarantee that it is reproduced with the following settings: |
Is the fix resolving your issue? You made the most difficult part of the job, why not proposing a pull request then? :) |
While running a large test (600,000+ test methods) I encountered a deadlock situation.
All of the threads are blocked with this stack trace:
And the thread that is holding the mutex:
The text was updated successfully, but these errors were encountered: