Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[version3] Deadlock on node closing #417

Open
fpelliccioni opened this issue May 28, 2018 · 51 comments
Open

[version3] Deadlock on node closing #417

fpelliccioni opened this issue May 28, 2018 · 51 comments
Assignees
Milestone

Comments

@fpelliccioni
Copy link
Contributor

(Posted here by Eric indication).

Hello.
I am experimenting a deadlock when I try to close the node (bn) Ctrl-C (SIGINT, SIGTERM).

To get the bn executable I follow:

wget https://raw.githubusercontent.com/libbitcoin/libbitcoin-node/version3/install.sh
chmod +x install.sh
sudo ./install.sh --prefix=/opt/libbitcoin --build-boost --disable-shared
#Boost modified to use 1.64 because 1.57 has compilation errors (multiprecision)

Using the following configuration file:

# cat /home/ubuntu/fer/deadlock/data/deadlock-btc.cfg 

[log]
archive_directory = /home/ubuntu/fer/deadlock/data/log/archive-node-btc-testnet
debug_file = /home/ubuntu/fer/deadlock/data/log/node-btc-testnet-debug.log
error_file = /home/ubuntu/fer/deadlock/data/log/node-btc-testnet-error.log
rotation_size = 100000000
minimum_free_space = 0
verbose = true

[network]
protocol_maximum = 70013
protocol_minimum = 31402
identifier = 118034699
outbound_connections = 50
inbound_connections = 50
inbound_port = 18333
channel_expiration_minutes = 30
hosts_file = /home/ubuntu/fer/deadlock/data/hosts-btc-testnet.cache
seed = testnet-seed.bitcoin.jonasschnelli.ch:18333
seed = seed.tbtc.petertodd.org:18333
seed = testnet-seed.bluematt.me:18333
seed = testnet-seed.bitcoin.schildbach.de:18333
seed = testnet-seed.voskuil.org:18333
self = 0.0.0.0:0

[database]
directory = /home/ubuntu/fer/deadlock/data/database/btc-testnet
flush_writes = false

[blockchain]

# cores = 4

# Testnet-btc
checkpoint = 000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943:0
checkpoint = 00000000009e2958c15ff9290d571bf9459e93b19765c6801ddeccadbb160a1e:100000
checkpoint = 0000000000287bffd321963ef05feab753ebe274e1d78b2fd4e2bfe9ad3aa6f2:200000
checkpoint = 000000000000226f7618566e70a2b5e020e29579b46743f05348427239bf41a1:300000
checkpoint = 000000000598cbbb1e79057b79eef828c495d4fc31050e6b179c57d07d00367c:400000
checkpoint = 000000000001a7c0aaa2630fbb2c0e476aafffc60f82177375b2aaa22209f606:500000
checkpoint = 000000000000624f06c69d3a9fe8d25e0a9030569128d63ad1b704bbb3059a16:600000
checkpoint = 000000000000406178b12a4dea3b27e13b3c4fe4510994fd667d7c1e6a3f4dc1:700000
checkpoint = 0000000000209b091d6519187be7c2ee205293f25f9f503f90027e25abf8b503:800000
checkpoint = 0000000000356f8d8924556e765b7a94aaebc6b5c8685dcfa2b1ee8b41acd89b:900000
checkpoint = 0000000000478e259a3eda2fafbeeb0106626f946347955e99278fe6cc848414:1000000
checkpoint = 00000000001aa0b431dc7f8fa75179b8440bdb671db5ca79e1087faff00c19d8:1050000
checkpoint = 00000000001c2fb9880485b1f3d7b0ffa9fabdfd0cf16e29b122bb6275c73db0:1100000

[fork]
easy_blocks = true
# retarget = true
bip16 = true
bip30 = true
bip34 = true
bip66 = true
bip65 = true
bip90 = true

[node]
relay_transactions=true

... and I have the following Python script to run the node multiple times:

# cat multiple_launchs.py 

from subprocess import call
from subprocess import check_output
from random import randint
from subprocess import Popen
from subprocess import CalledProcessError

import subprocess
import signal
import os
import time

def get_pid(name):
    try:
        return map(int, check_output(["pidof",name]).split())
    except CalledProcessError:
        return []

def process_is_running(name):
    return len(get_pid(name)) > 0

def launch():
    sout = open('sout-fer.log', 'w')
    serr = open('serr-fer.log', 'w')
    return (sout, serr, Popen(["./bn", "-c", "/home/ubuntu/fer/deadlock/data/deadlock-btc.cfg"], stdout=sout, stderr=serr))

def finalize(p):
    pid = p.pid
    os.kill(pid, signal.SIGINT)

def finalize_and_wait(p):
    finalize(p)
    return p.wait()


def finalize_wait_measure(p):
    finalize(p)
    start_time = time.time()
    p.wait()
    elapsed_time = time.time() - start_time
    return elapsed_time


def wait_for_termination(name):
    while process_is_running(name):
        time.sleep(10)

def main():

    stats = []

    for i in range(0, 100):
        sout, serr, p = launch()
        seconds = randint(10, 120)
        print("******************************* Sleeping for %s seconds..." % (seconds))
        time.sleep(seconds)
        print("******************************* SLEEP FINALIZED *******************************************")


        elapsed_time = finalize_wait_measure(p)
        sout.close()
        serr.close()
        print("******************************* Closing procedure elapsed time: %s seconds..." % (elapsed_time))

        stats.append([i, seconds, elapsed_time])

    print("******************************* STATS *******************************************")
    print(stats)

main()

Node chain initialization and script running:

./bn -i -c /home/ubuntu/fer/deadlock/data/deadlock-btc.cfg
python multiple_launchs.py

(the python script have to be in the same directory as bn executable)

Thanks and regards,
Fernando.

@evoskuil evoskuil added the bug label Jun 3, 2018
@evoskuil evoskuil self-assigned this Jun 3, 2018
@evoskuil evoskuil changed the title Deadlock on node closing [version3] Deadlock on node closing Jun 3, 2018
@evoskuil
Copy link
Member

evoskuil commented Jun 3, 2018

#Boost modified to use 1.64 because 1.57 has compilation errors (multiprecision)

I don't get this, our CI builds use 1.57.

@thecodefactory
Copy link
Member

thecodefactory commented Jun 3, 2018

Using a gcc version later than 5.x has boost warnings that shake out as errors (known boost bug corrected in later versions). A solution is to specify gcc-5 such as:

CC=gcc-5 CXX=g++-5 ./install.sh [args]

UPDATE: https://svn.boost.org/trac10/ticket/12181

@evoskuil
Copy link
Member

evoskuil commented Jun 3, 2018

Seems like a good reason to bump our boost minimum.

@evoskuil
Copy link
Member

evoskuil commented Jun 5, 2018

The root cause appears to be that priority_dispatch_.concurrent(...) must succeed whether or not stop has occurred, but the priority thread pool is suspended upon stop, precluding the dispatch of any new work. However this is guarded by blocking stop for the validation mutex.

@evoskuil
Copy link
Member

evoskuil commented Jun 5, 2018

The problem is that the guard implementation is flawed. The organize(...) methods (tx and block) must check stopped() before invocation of the validator but do not.

@evoskuil
Copy link
Member

evoskuil commented Jun 5, 2018

@fpelliccioni Thanks for troubleshooting and posting this! Please re-verify once this is merged.

@thecodefactory
Copy link
Member

On a related note, is there a max version of boost that is acceptable for upgrade? Specifically, what restrictions are there keeping us from the latest version (1.67.0 as of now) assuming proper build and smoke testing for now?

@evoskuil
Copy link
Member

evoskuil commented Jun 7, 2018

There’s no strong reason for any particular version. The rationale has been that a lower version reduces the need to install. But the reality of boost install variation means this isn’t so important. As long as there is a NuGet package and all builds are good I don’t see a problem with a higher version.

@thecodefactory
Copy link
Member

Boost has been upgraded in master #420

@evoskuil
Copy link
Member

Status?

@fpelliccioni
Copy link
Contributor Author

fpelliccioni commented Aug 16, 2018 via email

@evoskuil
Copy link
Member

ok, hvae fun!

@evoskuil
Copy link
Member

Status?

@fpelliccioni
Copy link
Contributor Author

fpelliccioni commented Sep 18, 2018 via email

@fpelliccioni
Copy link
Contributor Author

The deadlock is still happening.
I have the process (bn) blocked for more than 20 hours.
To test it, I executed the same steps described above, in an Ubuntu 16.04 with GCC 7.3.0.
In successful shutdowns, the closing time is approximately 130 seconds.

@garceri (Gerardo) told me that he also experiences these deadlocks every day when he works on libbitcoin community servers.

@evoskuil
Copy link
Member

Odd that it is apparently common for you but that @thecodefactory has not seen it.

@thecodefactory
Copy link
Member

@garceri (Gerardo) told me that he also experiences these deadlocks every day when he works on libbitcoin community servers.

What does this mean?

@fpelliccioni
Copy link
Contributor Author

@garceri (Gerardo) told me that he also experiences these deadlocks every day when he works on libbitcoin community servers.

What does this mean?

Gerardo is our devops, he regularly configures environments and he has reported that he has experienced the deadlock many times.

@fpelliccioni
Copy link
Contributor Author

Odd that it is apparently common for you but that @thecodefactory has not seen it.

It is easy to reproduce with the instructions I gave.
@thecodefactory could you provide how did you test it? Thanks!

@thecodefactory
Copy link
Member

Gerardo is our devops, he regularly configures environments and he has reported that he has experienced the deadlock many times.

Ok, thanks. To eliminate the possibility that it's script related (to ensure python I/O deadlocks are not introduced), you have verified that this is easily reproducible by manually starting/stopping a version3 server?

@fpelliccioni
Copy link
Contributor Author

Gerardo is our devops, he regularly configures environments and he has reported that he has experienced the deadlock many times.

Ok, thanks. To eliminate the possibility that it's script related (to ensure python I/O deadlocks are not introduced), you have verified that this is easily reproducible by manually starting/stopping a version3 server?

Gerardo has verified it manually. (I have verified it using the script).
The problem is that doing it manually is tedious, because it is probably that the first, second or third execution are correct, sometimes it occurs in execution number 50 (for example).
Check the script to see if it finds any flaw, for me it is correct.

@evoskuil
Copy link
Member

How is it that he is working so frequently with libbitcoin server vs. bitprim server?

@garceri
Copy link

garceri commented Sep 20, 2018

I have been doing some tests with libbitcoin recently and found that if you increase the number of connections (both incoming and outgoing) that seems to start causing deadlocks.

@evoskuil
Copy link
Member

I’ve run thousands of connections and have never seen it. If I can’t trap it in a debugger it’s unlikely to be found.

@thecodefactory
Copy link
Member

I have been doing some tests with libbitcoin recently and found that if you increase the number of connections (both incoming and outgoing) that seems to start causing deadlocks.

Which values are you using? I'll try the same.

@fpelliccioni
Copy link
Contributor Author

I’ve run thousands of connections and have never seen it. If I can’t trap it in a debugger it’s unlikely to be found.

You can use my script and then attach to the process.

@fpelliccioni
Copy link
Contributor Author

I’ve run thousands of connections and have never seen it. If I can’t trap it in a debugger it’s unlikely to be found.

You can use my script and then attach to the process.

I could do it, but not today...

@thecodefactory
Copy link
Member

I could do it, but not today...

The backtrace can help, but if run from the script, it's not guaranteed to be a deadlock in the server. Once attached, it's stopped and inspecting the state might not tell much.

I'm not saying it can't happen, but I'm trying to find a sane way to verify this. I've been mind-numbingly starting/stopping a latest v3 bn server manually over and over while looking at other things. No shutdown errors so far.

@fpelliccioni
Copy link
Contributor Author

I could do it, but not today...

The backtrace can help, but if run from the script, it's not guaranteed to be a deadlock in the server. Once attached, it's stopped and inspecting the state might not tell much.

I'm not saying it can't happen, but I'm trying to find a sane way to verify this. I've been mind-numbingly starting/stopping a latest v3 bn server manually over and over while looking at other things. No shutdown errors so far.

It is possible that running in Debug mode the error does not manifest.

@fpelliccioni
Copy link
Contributor Author

he backtrace can help, but if run from the script, it's not guaranteed to be a deadlock in the server. Once attached, it's stopped and inspecting the state might not t

Do you work only in Windows?

@thecodefactory
Copy link
Member

Do you work only in Windows?

I use GNU/Linux.

@evoskuil
Copy link
Member

It’s unlikely to differ in debug vs. release builds if a debugger is not attached. But I run both and have never seen it.

@thecodefactory
Copy link
Member

@fpelliccioni @garceri If you have access to GNU/Linux, see if you can reproduce it with this script (which is hacky, but more or less the same idea as yours, but eliminates the possibility of python pipe I/O deadlocks):

#!/bin/bash

bn_program=bin/bn
bn_config=bn.cfg.v3

while [ true ]; do
    $bn_program -c $bn_config &
    pid=$(echo $!)
    sleep 20
    kill $pid
    kill_time=$(date +"%s")
    while [ $(ps ux | grep $pid | wc -l) -eq 2 ]; do
        now_time=$(date +"%s")
        elapsed=$(($now_time - $kill_time))                           
        echo "Waiting for $pid to exit ($elapsed seconds elapsed) ..."
        sleep 5
    done
    sleep 2
done

I've had this running the past ~30 minutes or so.

@fpelliccioni
Copy link
Contributor Author

@fpelliccioni @garceri If you have access to GNU/Linux, see if you can reproduce it with this script (which is hacky, but more or less the same idea as yours, but eliminates the possibility of python pipe I/O deadlocks):

#!/bin/bash

bn_program=bin/bn
bn_config=bn.cfg.v3

while [ true ]; do
    $bn_program -c $bn_config &
    pid=$(echo $!)
    sleep 20
    kill $pid
    kill_time=$(date +"%s")
    while [ $(ps ux | grep $pid | wc -l) -eq 2 ]; do
        now_time=$(date +"%s")
        elapsed=$(($now_time - $kill_time))                           
        echo "Waiting for $pid to exit ($elapsed seconds elapsed) ..."
        sleep 5
    done
    sleep 2
done

I've had this running the past ~30 minutes or so.

I am running it.
I will let you know, it could take hours.

@fpelliccioni
Copy link
Contributor Author

Deadlock using the bash script.

...
Waiting for 28069 to exit (2987 seconds elapsed) ...                                                                                                                                                                                          
Waiting for 28069 to exit (2992 seconds elapsed) ... 
# ps -ax | grep bn
28069 pts/5    Sl+    0:00 /home/ubuntu/test_libbitcoin/build-libbitcoin-node/libbitcoin-node/console/bn -c /home/ubuntu/test_libbitcoin/data/deadlock-btc.cfg

@thecodefactory
Copy link
Member

thecodefactory commented Sep 21, 2018

Deadlock using the bash script.

Interesting, leave it running to see if it resolves, but that's definitely a longer time for exit than I've seen by far. I've been running mine since I said earlier (~6+ hours ago), still without any issue.

If it runs longer and you're able to attach to it, the backtraces could help.

@fpelliccioni
Copy link
Contributor Author

Waiting for 28069 to exit (46416 seconds elapsed) ...
Waiting for 28069 to exit (46421 seconds elapsed) ...
Waiting for 28069 to exit (46426 seconds elapsed) ...

Can we say that we are a deadlock?

@evoskuil
Copy link
Member

It appears we are reproducing what we have seen anecdotally. It happens on bitprim builds/machines but not others.

@fpelliccioni
Copy link
Contributor Author

It appears we are reproducing what we have seen anecdotally. It happens on bitprim builds/machines but not others.

Ok, but is a normal machine with Ubuntu Linux, and the build is done using https://raw.githubusercontent.com/libbitcoin/libbitcoin-node/version3/install.sh

I will talk with @jujumax and @garceri to bring you access to one of those machines.

@thecodefactory
Copy link
Member

Can we say that we are a deadlock?

Send a kill signal to it and wait a bit more if it's still running, to be sure the script isn't causing some error.

@thecodefactory
Copy link
Member

thecodefactory commented Sep 21, 2018

The fact that it's ubuntu does concern me. There was a reported bug (on slack) that causes hangs if you don't build boost statically, specifically on ubuntu. I was able to reproduce it on a VM, but haven't followed up on that yet. I also haven't re-tested since we've upgraded boost versions.

Can you confirm that you've built your version statically?

EDIT: And yes, the hangs are on shutdown (reported as when ctrl-c was pressed).

@fpelliccioni
Copy link
Contributor Author

The fact that it's ubuntu does concern me. There was a reported bug (on slack) that causes hangs if you don't build boost statically, specifically on ubuntu. I was able to reproduce it on a VM, but haven't followed up on that yet. I also haven't re-tested since we've upgraded boost versions.

Can you confirm that you've built your version statically?

EDIT: And yes, the hangs are on shutdown (reported as when ctrl-c was pressed).

I ran the install.sh script in the way described above, so according to it, I am building Boost statically.

--------------------------------------------------------------------
OS                    : Linux
PARALLEL              : 48
CC                    : 
CXX                   : 
CPPFLAGS              : 
CFLAGS                : 
CXXFLAGS              : 
LDFLAGS               : 
LDLIBS                : 
BUILD_BOOST           : yes
BUILD_DIR             : build-libbitcoin-node
PREFIX                : /opt/libbitcoin
DISABLE_SHARED        : yes
DISABLE_STATIC        : 
with_boost            : --with-boost=/opt/libbitcoin
with_pkgconfigdir     : --with-pkgconfigdir=/opt/libbitcoin/lib/pkgconfig
--------------------------------------------------------------------

I will try using DISABLE_STATIC and I will let you know.

@evoskuil
Copy link
Member

@thecodefactory do you have a repro?

@thecodefactory
Copy link
Member

@thecodefactory do you have a repro?

No, I don't (at least not yet, using the script). I did stop running it after a couple days, but could try again a little bit more. It might be something like environment related, so it might be better to try it on my ubuntu VM to see if it's more likely there.

@thecodefactory
Copy link
Member

No luck reproducing so far, running all day on my Ubuntu 16.04.4 LTS VM.

$ uname -a
Linux ubuntu 4.4.0-128-generic #154-Ubuntu SMP Fri May 25 14:15:18 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

It's possible something is manifesting from gcc version differences?

@thecodefactory
Copy link
Member

Several days into running the script on the Ubuntu VM, I got a hang on shutdown. @fpelliccioni Were you ever able to attach so that we can compare notes here?

(gdb) where
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000000000046de13 in boost::condition_variable::wait (this=0x1256890, m=...)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/condition_variable.hpp:77
#2  0x00000000004e5d69 in boost::shared_mutex::lock (this=0x1256808)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/shared_mutex.hpp:294
#3  0x000000000063f21d in libbitcoin::prioritized_mutex::lock_high_priority (this=0x12568bc, 
    this@entry=0x1256800) at src/utility/prioritized_mutex.cpp:55
#4  0x00000000004b3180 in libbitcoin::blockchain::block_chain::stop (this=this@entry=0x1255fa0)
    at src/interface/block_chain.cpp:353
#5  0x0000000000450d49 in libbitcoin::node::full_node::stop (this=this@entry=0x1255280)
    at src/full_node.cpp:254
#6  0x0000000000451266 in libbitcoin::node::full_node::close (this=0x1255280) at src/full_node.cpp:271
#7  0x000000000044123b in libbitcoin::node::executor::run (this=this@entry=0x7fffe3eb5c50)
    at console/executor.cpp:213
#8  0x0000000000441588 in libbitcoin::node::executor::menu (this=this@entry=0x7fffe3eb5c50)
    at console/executor.cpp:179
#9  0x000000000040f602 in libbitcoin::main (argc=3, argv=0x7fffe3eb5fd8) at console/main.cpp:45
#10 0x00007fb82feec830 in __libc_start_main (main=0x40f690 <main(int, char**)>, argc=3, 
    argv=0x7fffe3eb5fd8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffe3eb5fc8) at ../csu/libc-start.c:291
#11 0x000000000043d969 in _start ()

THREAD 2

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000000000046de13 in boost::condition_variable::wait (this=0x1259b48, m=...)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/condition_variable.hpp:77
#2  0x000000000048179d in boost::shared_mutex::unlock_upgrade_and_lock (this=this@entry=0x1259a68)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/shared_mutex.hpp:503
#3  0x00000000004ff275 in libbitcoin::database::memory_map::reserve (this=0x1259a08, size=4796908, 
    expansion=<optimized out>) at src/memory/memory_map.cpp:360
#4  0x00000000004ff521 in libbitcoin::database::memory_map::reserve (this=<optimized out>, 
    size=<optimized out>) at src/memory/memory_map.cpp:332
#5  0x0000000000500a8b in libbitcoin::database::record_manager::new_records (
    this=this@entry=0x1259ba0, count=count@entry=1) at src/primitives/record_manager.cpp:133
#6  0x00000000004eaf1e in libbitcoin::database::block_database::write_position (
    this=this@entry=0x1259350, position=position@entry=290581446, height=599612)
    at src/databases/block_database.cpp:270
#7  0x00000000004ebbe2 in libbitcoin::database::block_database::store (this=0x1259350, block=..., 
    height=height@entry=599612) at src/databases/block_database.cpp:215
#8  0x00000000004e04e4 in libbitcoin::database::data_base::handle_push_transactions(std::error_code const&, std::shared_ptr<libbitcoin::message::block const>, unsigned long, std::function<void (std::error_code const&)>) (this=0x12563b8, ec=..., block=std::shared_ptr (count 18, weak 0) 0x7fb7101c77e0, 
    height=599612, handler=...) at src/data_base.cpp:738
... more omitted ...

THREAD 3

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000006480a9 in boost::asio::detail::posix_event::wait<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> > (lock=..., this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/posix_event.hpp:106
#2  boost::asio::detail::task_io_service::do_run_one (ec=..., this_thread=..., lock=..., 
    this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/task_io_service.ipp:380
#3  boost::asio::detail::task_io_service::run (ec=..., this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/task_io_service.ipp:149
#4  boost::asio::io_service::run (this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/impl/io_service.ipp:59
#5  libbitcoin::threadpool::<lambda()>::operator() (__closure=<optimized out>, 
    __closure=<optimized out>) at src/utility/threadpool.cpp:89
#6  boost::detail::thread_data<libbitcoin::threadpool::spawn_once(libbitcoin::thread_priority)::<lambda()> >::run(void) (this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/detail/thread.hpp:116
#7  0x0000000000700095 in thread_proxy ()
#8  0x00007fb830d466ba in start_thread (arg=0x7fb82f6ca700) at pthread_create.c:333
#9  0x00007fb82ffd341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109


THREAD 4

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000006480a9 in boost::asio::detail::posix_event::wait<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> > (lock=..., this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/posix_event.hpp:106
#2  boost::asio::detail::task_io_service::do_run_one (ec=..., this_thread=..., lock=..., 
    this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/task_io_service.ipp:380
#3  boost::asio::detail::task_io_service::run (ec=..., this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/task_io_service.ipp:149
#4  boost::asio::io_service::run (this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/impl/io_service.ipp:59
#5  libbitcoin::threadpool::<lambda()>::operator() (__closure=<optimized out>, 
    __closure=<optimized out>) at src/utility/threadpool.cpp:89
#6  boost::detail::thread_data<libbitcoin::threadpool::spawn_once(libbitcoin::thread_priority)::<lambda()> >::run(void) (this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/detail/thread.hpp:116
#7  0x0000000000700095 in thread_proxy ()
#8  0x00007fb830d466ba in start_thread (arg=0x7fb82f6ca700) at pthread_create.c:333
#9  0x00007fb82ffd341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

THREAD 5

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000006480a9 in boost::asio::detail::posix_event::wait<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> > (lock=..., this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/posix_event.hpp:106
#2  boost::asio::detail::task_io_service::do_run_one (ec=..., this_thread=..., lock=..., 
    this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/task_io_service.ipp:380
#3  boost::asio::detail::task_io_service::run (ec=..., this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/task_io_service.ipp:149
#4  boost::asio::io_service::run (this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/impl/io_service.ipp:59
#5  libbitcoin::threadpool::<lambda()>::operator() (__closure=<optimized out>, 
    __closure=<optimized out>) at src/utility/threadpool.cpp:89
#6  boost::detail::thread_data<libbitcoin::threadpool::spawn_once(libbitcoin::thread_priority)::<lambda()> >::run(void) (this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/detail/thread.hpp:116
#7  0x0000000000700095 in thread_proxy ()
#8  0x00007fb830d466ba in start_thread (arg=0x7fb82e6c8700) at pthread_create.c:333
#9  0x00007fb82ffd341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

THREAD 6

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000000000047a533 in boost::condition_variable::wait (m=..., this=0x1259d28)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/condition_variable.hpp:77
#2  boost::shared_mutex::lock_shared (this=0x1259cf8)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/shared_mutex.hpp:191
#3  boost::shared_lock<boost::shared_mutex>::lock (this=0x7fb72e0695e0)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/lock_types.hpp:645
#4  0x00000000004ea4a5 in boost::shared_lock<boost::shared_mutex>::shared_lock (m_=..., 
    this=0x7fb72e0695e0)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/lock_types.hpp:520
#5  libbitcoin::database::block_database::read_position (this=this@entry=0x1259350, 
    height=height@entry=372306) at src/databases/block_database.cpp:289
#6  0x00000000004ea632 in libbitcoin::database::block_database::get (this=0x1259350, 
    height=height@entry=372306) at src/databases/block_database.cpp:150
#7  0x00000000004b80a3 in libbitcoin::blockchain::block_chain::fetch_locator_block_headers(std::shared_ptr<libbitcoin::message::get_headers const>, std::array<unsigned char, 32ul> const&, unsigned long, std::function<void (std::error_code const&, std::shared_ptr<libbitcoin::message::headers>)>) const (
    this=0x1255fa0, locator=..., threshold=..., limit=<optimized out>, handler=...)
    at src/interface/block_chain.cpp:835
#8  0x000000000049599d in libbitcoin::node::protocol_block_out::handle_receive_get_headers (
    this=0x7fb7280085b0, ec=..., message=std::shared_ptr (count 4, weak 0) 0x7fb7101c9ae0)
    at src/protocols/protocol_block_out.cpp:159
... more omitted ...


THREAD 7

#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007fb83086ba16 in std::__atomic_futex_unsigned_base::_M_futex_wait_until(unsigned int*, unsigned int, bool, std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00000000004bba00 in std::__atomic_futex_unsigned<2147483648u>::_M_load_and_test_until (__ns=..., 
    __s=..., __has_timeout=<optimized out>, __mo=<optimized out>, __equal=<optimized out>, 
    __operand=<optimized out>, __assumed=<optimized out>, this=<optimized out>)
    at /usr/include/c++/5/bits/atomic_futex.h:104
#3  std::__atomic_futex_unsigned<2147483648u>::_M_load_and_test (__mo=<optimized out>, 
    __equal=<optimized out>, __operand=<optimized out>, __assumed=<optimized out>, 
    this=<optimized out>) at /usr/include/c++/5/bits/atomic_futex.h:122
#4  std::__atomic_futex_unsigned<2147483648u>::_M_load_when_equal (__mo=std::memory_order_acquire, 
    __val=1, this=0x7fb7100009a0) at /usr/include/c++/5/bits/atomic_futex.h:162
#5  std::__future_base::_State_baseV2::wait (this=0x7fb710000990) at /usr/include/c++/5/future:322
#6  std::__basic_future<std::error_code>::_M_get_result (this=<optimized out>)
    at /usr/include/c++/5/future:681
#7  std::future<std::error_code>::get (this=<optimized out>) at /usr/include/c++/5/future:760
#8  libbitcoin::blockchain::block_organizer::organize(std::shared_ptr<libbitcoin::message::block const>, std::function<void (std::error_code const&)>) (this=this@entry=0x1256f00, 
    block=std::shared_ptr (count 18, weak 0) 0x7fb7101c77e0, handler=...)
    at src/pools/block_organizer.cpp:123
#9  0x00000000004b45f2 in libbitcoin::blockchain::block_chain::organize(std::shared_ptr<libbitcoin::message::block const>, std::function<void (std::error_code const&)>) (this=0x1255fa0, 
    block=std::shared_ptr (count 18, weak 0) 0x7fb7101c77e0, handler=...)
    at src/interface/block_chain.cpp:1031
#10 0x0000000000488b4f in libbitcoin::node::protocol_block_in::handle_receive_block (
    this=0x7fb728007600, ec=..., message=std::shared_ptr (count 18, weak 0) 0x7fb7101c77e0)
    at src/protocols/protocol_block_in.cpp:343
... more omitted ...

THREAD 8

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000000000047aba3 in boost::condition_variable::wait (m=..., this=0x7fb7101b9628)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/condition_variable.hpp:77
#2  boost::shared_mutex::lock (this=0x7fb7101b95a0)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/shared_mutex.hpp:294
#3  boost::unique_lock<boost::shared_mutex>::lock (this=0x7fb72d067c40)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/lock_types.hpp:346
#4  0x000000000059fa05 in boost::unique_lock<boost::shared_mutex>::unique_lock (m_=..., 
    this=0x7fb72d067c40)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/lock_types.hpp:124
#5  libbitcoin::resubscriber<std::error_code, std::shared_ptr<libbitcoin::message::block const> >::do_invoke (this=0x7fb7101b9550, args#0=..., args#1=std::shared_ptr (empty) 0x0)
    at /scrubbed/path/libbitcoin/version3/install/include/bitcoin/bitcoin/impl/utility/resubscriber.ipp:133
#6  0x00000000005956ed in std::_Mem_fn_base<void (libbitcoin::resubscriber<std::error_code, std::shared_ptr<libbitcoin::message::block const> >::*)(std::error_code, std::shared_ptr<libbitcoin::message::block const>), true>::_M_call<std::shared_ptr<libbitcoin::resubscriber<std::error_code, std::shared_ptr<libbitcoin::message::block const> > >&, std::error_code&, std::shared_ptr<libbitcoin::message::block const>&>
    (__ptr=std::shared_ptr (count 2, weak 1) 0x7fb7101b9550, this=0x7fb72d067ce0)
    at /usr/include/c++/5/functional:634
... more omitted ...

THREAD 9

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000000000047aba3 in boost::condition_variable::wait (m=..., this=0x7fb7101b9628)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/condition_variable.hpp:77
#2  boost::shared_mutex::lock (this=0x7fb7101b95a0)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/shared_mutex.hpp:294
#3  boost::unique_lock<boost::shared_mutex>::lock (this=0x7fb72d067c40)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/lock_types.hpp:346
#4  0x000000000059fa05 in boost::unique_lock<boost::shared_mutex>::unique_lock (m_=..., 
    this=0x7fb72d067c40)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/lock_types.hpp:124
#5  libbitcoin::resubscriber<std::error_code, std::shared_ptr<libbitcoin::message::block const> >::do_invoke (this=0x7fb7101b9550, args#0=..., args#1=std::shared_ptr (empty) 0x0)
    at /scrubbed/path/libbitcoin/version3/install/include/bitcoin/bitcoin/impl/utility/resubscriber.ipp:133
#6  0x00000000005956ed in std::_Mem_fn_base<void (libbitcoin::resubscriber<std::error_code, std::shared_ptr<libbitcoin::message::block const> >::*)(std::error_code, std::shared_ptr<libbitcoin::message::block const>), true>::_M_call<std::shared_ptr<libbitcoin::resubscriber<std::error_code, std::shared_ptr<libbitcoin::message::block const> > >&, std::error_code&, std::shared_ptr<libbitcoin::message::block const>&>
    (__ptr=std::shared_ptr (count 2, weak 1) 0x7fb7101b9550, this=0x7fb72d067ce0)
    at /usr/include/c++/5/functional:634


THREAD 10

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000586338 in boost::asio::detail::posix_event::wait<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> > (lock=..., this=0x7fb7280012c8)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/posix_event.hpp:106
#2  boost::asio::detail::task_io_service::do_run_one (ec=..., this_thread=..., lock=..., 
    this=0x7fb728001270)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/task_io_service.ipp:380
#3  boost::asio::detail::task_io_service::run (this=0x7fb728001270, ec=...)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/task_io_service.ipp:149
#4  0x0000000000587519 in boost::asio::io_service::run (this=0x7fb728002070)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/impl/io_service.ipp:59
#5  boost::asio::detail::resolver_service_base::work_io_service_runner::operator() (
    this=0x7fb7280027c8)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/resolver_service_base.ipp:32
#6  boost::asio::detail::posix_thread::func<boost::asio::detail::resolver_service_base::work_io_service_runner>::run (this=0x7fb7280027c0)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/posix_thread.hpp:82
#7  boost::asio::detail::boost_asio_detail_posix_thread_function (arg=0x7fb7280027c0)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/posix_thread.ipp:64
#8  0x00007fb830d466ba in start_thread (arg=0x7fb727fff700) at pthread_create.c:333
#9  0x00007fb82ffd341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

@evoskuil
Copy link
Member

evoskuil commented Jan 1, 2019

I still have not ever seen this. Note also: libbitcoin/libbitcoin-server#360

@evoskuil evoskuil added info required and removed bug labels Jan 1, 2019
@evoskuil
Copy link
Member

evoskuil commented Jan 2, 2019

@thecodefactory How many peers were configured for the above trace?

@thecodefactory
Copy link
Member

@thecodefactory How many peers were configured for the above trace?

Good question, it's been a while so I don't recall. I had archived the VM it was running on, but will try to restore it to find out.

@thecodefactory
Copy link
Member

@thecodefactory How many peers were configured for the above trace?

Ok, it was a v3 run using the above shell script with a testnet configuration. 0 inbound, 2 outbound, 1 manual peer. I'll update and rebuild the v3 code and see if I can get this test re-started with latest.

@evoskuil
Copy link
Member

evoskuil commented Jan 2, 2019

Great, 3 was the right answer.

@evoskuil evoskuil added this to the 4.0 milestone Jan 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants