Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky Test: solana-core banking_stage::consumer::tests::test_write_persist_transaction_status #4853

Closed
anza-team opened this issue Feb 7, 2025 · 8 comments
Assignees
Labels

Comments

@anza-team
Copy link
Collaborator

AUTO-GENERATED. DO NOT EDIT.

📝 Buildkite Analytics

@anza-team anza-team added the test label Feb 7, 2025
@steviez steviez marked this as a duplicate of #4855 Feb 7, 2025
@steviez
Copy link

steviez commented Feb 7, 2025

This failure comes from an issue with the reliability of TransactionStatusService writing the blockstore. Two PR's that touched this service recently:
#4654
#4032

@ksolana
Copy link

ksolana commented Feb 8, 2025

Running this test with tsan shows there is a data race.

SUMMARY: ThreadSanitizer: data race /home/sol/.cargo/registry/src/index.crates.io-6f17d22bba15001f/librocksdb-sys-0.16.0+8.10.0/rocksdb/logging/env_logger.h:143:14 in rocksdb::EnvLogger::Logv(char const*, __va_list_tag*)

WARNING: ThreadSanitizer: data race (pid=2998552)
  Read of size 8 at 0x720800ea7980 by thread T2:
    #0 vsnprintf /rustc/llvm/src/llvm-project/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_common_interceptors.inc:1660:1 (solana_core-651f1956f27a8fa6+0x11b8de8) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #1 rocksdb::EnvLogger::Logv(char const*, __va_list_tag*) /home/sol/.cargo/registry/src/index.crates.io-6f17d22bba15001f/librocksdb-sys-0.16.0+8.10.0/rocksdb/logging/env_logger.h:143:14 (solana_core-651f1956f27a8fa6+0x2ef1325) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #2 rocksdb::db::DBCommon$LT$T$C$rocksdb..db..DBWithThreadModeInner$GT$::open_cf_descriptors::h004561e575d92139 /home/sol/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rocksdb-0.22.0/src/db.rs:584:9 (solana_core-651f1956f27a8fa6+0x28a30ad) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #3 solana_ledger::blockstore_db::Rocks::open::h243a02a76d8c139d /home/sol/src/agave.1/ledger/src/blockstore_db.rs:124:17 (solana_core-651f1956f27a8fa6+0x28a30ad)
    #4 solana_ledger::blockstore::Blockstore::do_open::h355a1d10b828993b /home/sol/src/agave.1/ledger/src/blockstore.rs:400:27 (solana_core-651f1956f27a8fa6+0x283b7d9) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #5 solana_ledger::blockstore::Blockstore::open::hc5bf03bafb83ef99 /home/sol/src/agave.1/ledger/src/blockstore.rs:384:9 (solana_core-651f1956f27a8fa6+0x283b059) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #6 solana_core::banking_stage::consumer::tests::test_write_persist_transaction_status::h81ec02b0f80a09bb /home/sol/src/agave.1/core/src/banking_stage/consumer.rs:1969:30 (solana_core-651f1956f27a8fa6+0x15664b9) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)


  Previous write of size 8 at 0x720800ea7980 by thread T5:
    #0 memcpy /rustc/llvm/src/llvm-project/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_common_interceptors_memintrinsics.inc:115:5 (solana_core-651f1956f27a8fa6+0x11a0d1e) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #1 std::char_traits<char>::copy(char*, char const*, unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/char_traits.h:437:33 (solana_core-651f1956f27a8fa6+0x309db26) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #2 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_copy(char*, char const*, unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:359:4 (solana_core-651f1956f27a8fa6+0x309db26)
    #3 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_copy_chars(char*, char const*, char const*) /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:406:9 (solana_core-651f1956f27a8fa6+0x309db26)
    #4 void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>(char const*, char const*, std::forward_iterator_tag) /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.tcc:225:6 (solana_core-651f1956f27a8fa6+0x309db26)
    #5 void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct_aux<char const*>(char const*, char const*, std::__false_type) /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:255:11 (solana_core-651f1956f27a8fa6+0x309db26)
    #6 void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>(char const*, char const*) /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:274:4 (solana_core-651f1956f27a8fa6+0x309db26)
    #7 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:490:2 (solana_core-651f1956f27a8fa6+0x309db26)
    #8 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::substr(unsigned long, unsigned long) const /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:2855:16 (solana_core-651f1956f27a8fa6+0x309db26)

    #9 rocksdb::AddProperty(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /home/sol/.cargo/registry/src/index.crates.io-6f17d22bba15001f/librocksdb-sys-0.16.0+8.10.0/build_version.cc:38:46 (solana_core-651f1956f27a8fa6+0x309db26)
    #10 rocksdb::db::DBCommon$LT$T$C$rocksdb..db..DBWithThreadModeInner$GT$::open_cf_descriptors::h004561e575d92139 /home/sol/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rocksdb-0.22.0/src/db.rs:584:9 (solana_core-651f1956f27a8fa6+0x28a30ad) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #11 solana_ledger::blockstore_db::Rocks::open::h243a02a76d8c139d /home/sol/src/agave.1/ledger/src/blockstore_db.rs:124:17 (solana_core-651f1956f27a8fa6+0x28a30ad)
    #12 solana_ledger::blockstore::Blockstore::do_open::h355a1d10b828993b /home/sol/src/agave.1/ledger/src/blockstore.rs:400:27 (solana_core-651f1956f27a8fa6+0x283b7d9) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)


  Location is heap block of size 20 at 0x720800ea7980 allocated by thread T5:
    #0 malloc /rustc/llvm/src/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:666:5 (solana_core-651f1956f27a8fa6+0x11a38d0) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #1 operator new(unsigned long) <null> (libstdc++.so.6+0xae98b) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2)
    #2 rocksdb::db::DBCommon$LT$T$C$rocksdb..db..DBWithThreadModeInner$GT$::open_cf_descriptors::h004561e575d92139 /home/sol/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rocksdb-0.22.0/src/db.rs:584:9 (solana_core-651f1956f27a8fa6+0x28a30ad) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #3 solana_ledger::blockstore_db::Rocks::open::h243a02a76d8c139d /home/sol/src/agave.1/ledger/src/blockstore_db.rs:124:17 (solana_core-651f1956f27a8fa6+0x28a30ad)
    #4 solana_ledger::blockstore::Blockstore::do_open::h355a1d10b828993b /home/sol/src/agave.1/ledger/src/blockstore.rs:400:27 (solana_core-651f1956f27a8fa6+0x283b7d9) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #5 solana_ledger::blockstore::Blockstore::open::hc5bf03bafb83ef99 /home/sol/src/agave.1/ledger/src/blockstore.rs:384:9 (solana_core-651f1956f27a8fa6+0x283b059) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #6 solana_core::banking_stage::consumer::tests::test_write_persist_loaded_addresses::hd3bed0de503db776 /home/sol/src/agave.1/core/src/banking_stage/consumer.rs:2116:30 (solana_core-651f1956f27a8fa6+0x1569444) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #7 solana_core::banking_stage::consumer::tests::repeat_these_tests::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::he3fcb6540945bbd6 /home/sol/src/agave.1/core/src/banking_stage/consumer.rs:1908:25 (solana_core-651f1956f27a8fa6+0x174585f) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)


  Thread T2 (tid=2998555, running) created by thread T1 at:
    #0 pthread_create /rustc/llvm/src/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1023:3 (solana_core-651f1956f27a8fa6+0x11a53f1) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #1 std::sys::pal::unix::thread::Thread::new::h3d0b518cde20be63 /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/pal/unix/thread.rs:84:19 (solana_core-651f1956f27a8fa6+0x43d86e1) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #2 std::thread::Builder::spawn_unchecked_::h05d3dede5f472eb6 /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:580:30 (solana_core-651f1956f27a8fa6+0x198c159) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #3 std::thread::Builder::spawn_unchecked::h1b5a88258016c5a2 /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:450:32 (solana_core-651f1956f27a8fa6+0x198c159)
    #4 std::thread::Builder::spawn::h63cb056caf5657ca /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:383:18 (solana_core-651f1956f27a8fa6+0x150a37a) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #5 std::thread::spawn::hecaf0c7830713405 /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:710:20 (solana_core-651f1956f27a8fa6+0x150a37a)
    #6 solana_core::banking_stage::consumer::tests::repeat_these_tests::_$u7b$$u7b$closure$u7d$$u7d$::h030818b39a4475bd /home/sol/src/agave.1/core/src/banking_stage/consumer.rs:1904:21 (solana_core-651f1956f27a8fa6+0x150a37a)
    #7 core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$mut$u20$F$GT$::call_once::h5c594ca89080bf81 /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:305:13 (solana_core-651f1956f27a8fa6+0x150a37a)

  Thread T5 (tid=2998558, running) created by thread T1 at:
    #0 pthread_create /rustc/llvm/src/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1023:3 (solana_core-651f1956f27a8fa6+0x11a53f1) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #1 std::sys::pal::unix::thread::Thread::new::h3d0b518cde20be63 /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/pal/unix/thread.rs:84:19 (solana_core-651f1956f27a8fa6+0x43d86e1) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #2 std::thread::Builder::spawn_unchecked_::ha56ae6deee266ff4 /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:580:30 (solana_core-651f1956f27a8fa6+0x19a8269) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #3 std::thread::Builder::spawn_unchecked::heda8238dd3c56ae2 /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:450:32 (solana_core-651f1956f27a8fa6+0x19a8269)
    #4 std::thread::Builder::spawn::h9580dd1001d03e02 /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:383:18 (solana_core-651f1956f27a8fa6+0x150a3a8) (BuildId: 0bb0b66e10897f69e582dd0f77a8f0df0879931c)
    #5 std::thread::spawn::h20e7b7e7fd5ebbfd /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:710:20 (solana_core-651f1956f27a8fa6+0x150a3a8)
    #6 solana_core::banking_stage::consumer::tests::repeat_these_tests::_$u7b$$u7b$closure$u7d$$u7d$::h030818b39a4475bd /home/sol/src/agave.1/core/src/banking_stage/consumer.rs:1907:21 (solana_core-651f1956f27a8fa6+0x150a3a8)
    #7 core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$mut$u20$F$GT$::call_once::h5c594ca89080bf81 /home/sol/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:305:13 (solana_core-651f1956f27a8fa6+0x150a3a8)

@jstarry
Copy link

jstarry commented Feb 8, 2025

We should first revert the change that caused this regression rather than keeping it in master while we work on a fix

@steviez
Copy link

steviez commented Feb 8, 2025

We should first revert the change that caused this regression rather than keeping it in master while we work on a fix

In hindsight, yeah, we should probably have. In this specific case, I believe I have root-caused it and have a fix so it would be more work to revert and then push back in

For the general case, yes, I'd agree that we should be more aggressive about backing changes out that make CI flaky

@jstarry
Copy link

jstarry commented Feb 8, 2025

Revert is here: #4875
We can race our PR's through CI :)

@steviez
Copy link

steviez commented Feb 8, 2025

See you at the finish line #4873 😉

Image

@alexpyattaev
Copy link

Please do not use rayon::spawn without manually joining, those threads are impossible to keep accounted for (e.g.to ensure they finish in a reasonable time), especially on a global rayon pool.

@steviez
Copy link

steviez commented Feb 10, 2025

#4853 would have addressed the issue, but we decided we're going to rework TSS a bit more. #4875 reflects the back out, and this test should no longer be flaky

@steviez steviez closed this as completed Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants