-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix for DocumentsWriter concurrency (fixes #935, closes #886) #940
BUG: Fix for DocumentsWriter concurrency (fixes #935, closes #886) #940
Conversation
…63e10c that made the documents writer non-concurrent (but threadsafe)
…): The DocumentsWriterThreadPool.Reset() method must be called within the context of a lock because there is a possible race condition with other callers of DocumentsWriterThreadPool.Reset(). See apache#935.
…ntLock.tryLock() method barges to the front of the queue instead of returning false like Monitor.TryEnter(). Use Monitor.Enter(object, ref bool) instead, which always returns true. We get locks in a different order, but I am not sure whether that matters. Fixes apache#935. Closes apache#886.
…at(1000)] to try to reproduce on Azure DevOps (to be reverted).
…readState() method because it has no callers. This simplifies the design of ReentrantLock, since we don't need to artificially keep track of "queued threads".
…ninterruptableMonitor on cascaded methods.
…er test (to be reverted)
… Enter(). Enter() causes deadlocks in other tests, so need to localize this change to DocumentsWriterFlushControl.
…lush(): Use ReentrantLock.Lock() instead of ReentrantLock.TryLock() because Java relies on "barging" behavior instead of returning false when the current thread isn't next in the queue. We cannot do that, but we can wait for the lock to become available instead.
…ryLock() that accepts a timeout (TimeSpan)
…ome time for threads to reach the beginning of the wait queue. In Java, they are automatically put at the beginning of the queue, but since we cannot do that in .NET, we wait a little bit.
…iseconds to wait on whether the current process is 64 or 32 bit.
…liseconds that is used by callers of ReentrantLock.TryLock() to set the default value.
…ng TryDequeue() and TryPeek() methods on netstandard2.0 and .NET Framework
…n to use unfair locking similar to how it was done in Java. We track the queue and use ManualResetEventSlim to control entry into the lock for queued tasks. Ported some of the ReentrantLock tests from Apache Harmony.
…he missing TryDequeue() and TryPeek() methods on netstandard2.0 and .NET Framework" This reverts commit e5a65e9cd8bbf996fb599ff76e8d6b9f90babe4b.
…leMonitor.TryEnter() instead of UninterruptableMonitor.Enter() so we can control what happens while the thread waits. We simply call Thread.Yield() to allow TryLock() to proceed before any waiting threads. Commented tests that depend on IsLocked because the property was removed.
…n a longer test (to be reverted)" This reverts commit b30e4abb576c8bfde3337a92de927a10747f88ae.
…ed [Repeat(1000)] to try to reproduce on Azure DevOps (to be reverted)." This reverts commit d8fca410dafd1bf5529e8200034e1e2e5be83f07.
…itorStateException() Removed unused exception variable and added a comment to indicate success so we don't have to suppress warnings
I'll try to free up some time so I can look through this towards the end of the week. I'm sure I'd learn a lot in the process, so looking forward to it. |
This week, I made good headway toward setting up an environment where I can benchmark the current master vs this PR to see if the proposed improvements do, in fact, speed up indexing through support for DocumentsWriter concurrency. For benchmarking I'm using an approach similar to the code provided in #935 to see the level of impact this PR has on the performance issue reported there. I will work on this more this next week and provide the results of the benchmarks once I have them. Then I will review the code in light of those benchmarks. That's my update for this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many method names in this file are camelCase. They should be changed to PascalCase as is the norm in the .NET world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further review this file appears that it may be a port of a file from OpenJdk. That codebase is GNU GPL 2 licensed which is not compatible with the Apache License that appears in this file's header. The file needs to be removed from the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, this is from Apache Harmony, which is Apache 2.0 licensed: https://github.com/apache/harmony/blob/trunk/classlib/modules/concurrent/src/test/java/JSR166TestCase.java
Typically, we haven't been strictly converting method names in test classes except for test method names. But if we really want that, we should probably open a new issue to do it consistently across all test files. This feels like it is pretty low in priority, but if you insist I will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically, we haven't been strictly converting method names in test classes except for test method names
If that's the case then no worries. I was just calling out the case difference since I was unaware of that history. The casing is fine as submitted then.
I have reviewed each of the individual commits for this PR, but since many commits overlap and change the same areas of code I still want to review the aggregate effect on each changed file by comparing it to the Java Lucene 4.8.1 file. I plan to do that on Monday. I have also run performance tests on this PR vs the Master prior to the PR and the performance appears to be approximately the same. |
We also ran a test and did not see any performance improvements - CPU remained at the same level and was not elevated in anyway. @NightOwl888 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further review this file appears that it may be a port of a file from OpenJdk. That codebase is GNU GPL 2 licensed which is not compatible with the Apache License that appears in this file's header. The file needs to be removed from the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file appears it may be a port of a file from OpenJdk. That codebase is GNU GPL 2 licensed which is not compatible with the Apache License that appears in this file's header. The file needs to be removed from the PR. We should create our own ReentrantLockTest class that does not incorporate external code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, this is from Apache Harmony, which is Apache 2.0 licensed: https://github.com/apache/harmony/blob/trunk/classlib/modules/concurrent/src/test/java/ReentrantLockTest.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification an citing the source. Great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes to ReentrantLock overall seem like an elegant solution to get close to parity to how the reentrantLock class works in Java (as far as I understand it based on a bunch of googling). So, in general, this feels like a great step forward. It is, however, imperative that the code be original and not incorporate any OpenJDK code or code from any non-Apache source. Please ensure that is the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is all original. The JDK implementation is much more complex than this, the only thing that are copied are a few of the member signatures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These method should ideally all contain doc comments to explain what they do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some docs and eliminated the recursive methods that could overflow the stack (in theory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
Today I made a private branch off of of this PR and I hacked in some concurrent logging to memory deep inside the indexing code at the dwpt level. Specifically I logged to memory each time a thread entered the In my observations during performance testing, I could never get my processor above 45% utilization, and then only for brief spikes, but that kinda makes sense to me given the fact that indexing is pretty clearly an IO bound operation not a CPU bound operation. Just for fun I created a parallel threading test where I had up to 20 threads reading text files in one format and writing them in another format via an input and output stream per thread. My local processor has 20 cores. This 2nd setup doesn't use Lucene.NET in any way but I wanted to see if outside of Lucene.NET I could get high utilization of my CPU when using lots of threads doing heavy IO work. And the answer, as I suspected, is no. My processor never got even above 35% utilization and that just for brief spikes. |
…nter() recursive methods to avoid overflowing the stack and moved the logic into the catch blocks. Also added documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate your work on this @NightOwl888. This improved locking implementation tracks MUCH more closely with the original Java implementation. Super cool to see.
Summary of the changes (Less than 80 chars)
Lucene.Net.Support.Threading.ReentrantLock: Fixed the implementation so it prioritizes new threads obtaining the lock over waiting threads.
Fixes #935. Closes #886.
Description
This has been a known issue for some time, however as we have had the
DocumentsWriter
working reliably on a single thread most users have not worried about it until it was reported in #935.There were 2 issues that were causing test failures explained in #935 (comment). The second issue turned out to be much more involved to work out how to address even though the most recent solution is actually very simple and lightweight. Instead of calling
UninterruptableMonitor.Enter()
in theLock()
method, we callUninterruptableMonitor.TryEnter()
in a loop that executesThread.Yield()
. This allows other threads to acquire the lock even though there are waiting threads.Granted, while this approach seems to reliably pass the tests, it may be a bit naïve of an implementation. While it doesn't peg my CPU and seems to run well on Azure DevOps, I am not sure whether this is the appropriate solution for production scenarios. Suggestions are welcome as to how to improve this. Do note there were 2 prior attempts:
ManualResetEventSlim
and aQueue<T>
to manage the waiting threads. This is a more complete implementation and even passed many of the Apache Harmony tests, but it comes at a pretty steep performance cost. Maybe there is a way to improve this, though.