Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Deflake unit test
DBErrorHandlingFSTest.AtomicFlushNoSpaceError
(#1…
…3234) Summary: `DBErrorHandlingFSTest.AtomicFlushNoSpaceError` is flaky due to seg fault during error recovery: ``` ... frame #5: 0x00007f0b3ea0a9d6 librocksdb.so.9.10`rocksdb::VersionSet::GetObsoleteFiles(std::vector<rocksdb::ObsoleteFileInfo, std::allocator<rocksdb::ObsoleteFileInfo>>*, std::vector<rocksdb::ObsoleteBlobFileInfo, std::allocator<rocksdb::ObsoleteBlobFileInfo>>*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>*, unsigned long) [inlined] std::vector<rocksdb::ObsoleteFileInfo, std::allocator<rocksdb::ObsoleteFileInfo>>::begin(this=<unavailable>) at stl_vector.h:812:16 frame #6: 0x00007f0b3ea0a9d6 librocksdb.so.9.10`rocksdb::VersionSet::GetObsoleteFiles(this=0x0000000000000000, files=size=0, blob_files=size=0, manifest_filenames=size=0, min_pending_output=18446744073709551615) at version_set.cc:7258:18 frame #7: 0x00007f0b3e8ccbc0 librocksdb.so.9.10`rocksdb::DBImpl::FindObsoleteFiles(this=<unavailable>, job_context=<unavailable>, force=<unavailable>, no_full_scan=<unavailable>) at db_impl_files.cc:162:30 frame #8: 0x00007f0b3e85e698 librocksdb.so.9.10`rocksdb::DBImpl::ResumeImpl(this=<unavailable>, context=<unavailable>) at db_impl.cc:434:20 frame #9: 0x00007f0b3e921516 librocksdb.so.9.10`rocksdb::ErrorHandler::RecoverFromBGError(this=<unavailable>, is_manual=<unavailable>) at error_handler.cc:632:46 ``` I suspect this is due to DB being destructed and reopened during recovery. Specifically, the [ClearBGError() call](https://github.com/facebook/rocksdb/blob/c72e79a262bf696faf5f8becabf92374fc14b464/db/db_impl/db_impl.cc#L425) can release and reacquire mutex, and DB can be closed during this time. So it's not safe to access DB state after ClearBGError(). There was a similar story in #9496. [Moving the obsolete files logic after ClearBGError()](#11955) probably makes the seg fault more easily triggered. This PR updates `ClearBGError()` to guarantee that db close cannot finish until the method is returned and the mutex is released. So that we can safely access DB state after calling it. Pull Request resolved: #13234 Test Plan: I could not trigger the seg fault locally, will just monitor future test failures. Reviewed By: jowlyzhang Differential Revision: D67476836 Pulled By: cbi42 fbshipit-source-id: dfb3e9ccd4eb3d43fc596ec10e4052861eeec002
- Loading branch information