-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Valgrind: invalid reads & writes by compaction thread #110
Comments
Oops, how about a couple more details? eleveldb: commit 639f69c and then the patches on the slf-faulterl-introduction2 branch Not easily repeatable, sorry! |
This appears to be an error when shutting down the database that has an active compaction. Does that sound correct? If so, would you attempt your test again with lines 1426 through 1435 of db/db_impl.cc commented out. Those lines were added due to valgrind testing a long time ago (like a year ago). Something may have changed. |
Quite likely. The valgrind errors happened while running the following test case 20 times in a row: https://gist.github.com/slfritchie/1d2900a7a53e53ef2e82 ... all 20 passed, and only 1 of the 20 runs managed to trigger any Valgrind complaints. There are 8 open() & 7 close() calls in that test case. The faulterl C code is calling |
While trying to reproduce this error according to Matthew's advice from 10 hours ago (about lines 1426-1435), I re-ran with the faulterl library LD_PRELOAD'ed but with the main fault injection configuration option disabling all fault injection. I managed to hit the same invalid reads & writes on the first try. So, fault injection appears not to be 100% necessary to hit this problem, yay. Cut-and-paste the contents of the gist above into an Erlang shell to bind the variable
|
How did valgrind change with the lines removed? |
Today's status:
It looks like the
And I saw this Valgrind complaint:
|
The first valgrind report was a nuisance issue. This second one is bad. The second one suggests the database actually closed before an iterator completed. I can think of two scenarios:
Do you know if your test intentionally creates scenario one? Independently, need to watch to see if the original valgrind report ever occurs again. |
There is now an eleveldb/mv-tuning7 branch (parallel to leveldb mv-tuning7) that addresses the likely scenario producing the second Valgrind report. This is by no means a comprehensive fix to the imagined "API calls after db close" scenario. It is more of a proof of concept / understanding. |
Not without an error in the Erlang compiler or VM that fails to execute code in the
That's a bit interesting. Lines 329-339 show that the fold is always running to completion/exception throwing before returning control to the QuickCheck test code. It seems like the signal from leveldb that the iterator has found the right edge of the keyspace by |
Theory of the moment is that I interpreted the second Valgrind stack incorrectly. My new interpretation is that "Invalid write of size 8" at refobjects.cc:51 is really an attempt to update the performance counters. The performance counter code is expected (by me) to create a shared memory map and never unmap it. Maybe something unmaps it implicitly in the test cycle? Continuing to research this interpretation for now. |
New symptom, discussed this afternoon with @matthewvon:
|
Moving to 2.0.1. |
Moved back to 2.0 |
All of these messages appear to come from a single test case. I'll try to figure out if this is deterministic or not -- I haven't created a clever way to have QuickCheck recognize when valgrind spots an error. I have a manual way to do it, but it's manual and requires a huge amount of babysitting and is only worthwhile on 100% deterministic cases.
The text was updated successfully, but these errors were encountered: