-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate Failing Test: Lucene.Net.Index.TestIndexWriterOnJRECrash::TestNRTThreads_Mem() #894
Comments
…ed [AwaitsFix] attribute because this test fails intermittently (see apache#894).
…ed [AwaitsFix] attribute because this test fails intermittently (see #894).
I am raising priority on this and adding it as a production blocker because we are missing durability (ACID). A commit done simultaneously with a power outage will result in a corrupt index about 1 time out of 200. This only occurs on .NET Core, not on .NET Framework. We looked at this a bit already. Now that we have completed fsync support, we expected this to automatically work. Since it doesn't fail on .NET Framework, it seems probable there is something going on at a very deep level inside of .NET Core where the behavior has changed. |
If this is only happening on .NET Core and not .NET Framework, and you talked about #555, then the key may be that With So, if this is the case:
the lock does not cause database corruption. |
In this case, could we use an OS-level disk monitoring and analysis tool to document the differences between . NET Framework and .NET Core, then try to modify the code to make behavior consistently? |
Thanks for putting some thought into this. Indeed these sound like 2 good places to start. I looked at I checked with ChatGPT to get some high-level analysis about how Lucene 4.8.0 manages transactional behavior:
I then went a little bit further to try to get some clues as to specifically what is failing. It gave us a few things we can experiment with to determine if they help to solve the problem.
Using OS-level disk monitoring might also help to narrow down the problem, but it does seem like it might be a shortcut to try these other suggestions first. Would you like to give it a shot to see whether one of these suggestions fixes the test?
|
Yes, but not until I'm done with the tricky stuff at hand, I haven't touched code in almost 1 year, but happy to contribute any help to lucene.net as soon as I can on my end. |
Is there an existing issue for this?
Task description
This test was ported and added in #786 (to close #768).
Unfortunately, it fails intermittently. A user shared in Lucene.net corrupted index (segments.gen) on StackOverflow that this appears to be due to a real problem that happens in production.
The failure may or may not be related to the deprecation of thread interrupts (#555) in Lucene.NET which were supported in the Java version. We make a best effort to support them using
UninterrruptableMonitor
instead oflock
statements, but since a lock may be taken in any library we depend on that may throwThreadInterruptedException
from the action of taking a lock, we cannot 100% guarantee that we can catch every one of these exceptions in order to rollback an in-process commit. In Java, taking a lock does not throw an exception in any case.The text was updated successfully, but these errors were encountered: