Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdsh spins after all children are done #85

Open
brianjmurrell opened this issue Dec 13, 2016 · 13 comments
Open

pdsh spins after all children are done #85

brianjmurrell opened this issue Dec 13, 2016 · 13 comments

Comments

@brianjmurrell
Copy link

I'm frequently seeing this with 2.31:

29248 pts/9    Sl+    0:00  |   |   \_ pdsh -R ssh -l root -S -w <list of hosts including globs> script -c command /tmp/command.out121301
29276 ?        Zs     0:00  |   |   |   \_ [ssh] <defunct>
29280 ?        Zs     0:00  |   |   |   \_ [ssh] <defunct>
29285 ?        Zs     0:00  |   |   |   \_ [ssh] <defunct>
29286 ?        Zs     0:00  |   |   |   \_ [ssh] <defunct>
29302 ?        Zs     0:00  |   |   |   \_ [ssh] <defunct>
29312 ?        Zs     0:00  |   |   |   \_ [ssh] <defunct>
...
29249 pts/9    S+     0:00  |   |   \_ /usr/bin/perl -w /usr/bin/dshbak -c
29250 pts/9    S+     0:00  |   |   \_ less -S

with pdsh doing this at the above time:

Thread 35 (Thread 0x7fa86bc52700 (LWP 29870)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86bc51f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa810000b08
#3  xpoll (xfds=xfds@entry=0x7fa86bc51f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6438118) at dsh.c:689
        a = 0x556ec6438118
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 13, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86bc52700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86bc52700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361339315968, 845442474694923737, 140736415242767, 4096, 140361339315968, 140361339316672, -798462072651786791, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 34 (Thread 0x7fa86a16d700 (LWP 29738)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a16cf00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa80c000b08
#3  xpoll (xfds=xfds@entry=0x7fa86a16cf00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6438020) at dsh.c:689
        a = 0x556ec6438020
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 6, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a16d700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a16d700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361311115008, 845442474694923737, 140736415242767, 4096, 140361311115008, 140361311115712, -798463840567700007, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 33 (Thread 0x7fa86bc31700 (LWP 29603)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86bc30f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa860000b08
#3  xpoll (xfds=xfds@entry=0x7fa86bc30f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6437f28) at dsh.c:689
        a = 0x556ec6437f28
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 12, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86bc31700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86bc31700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361339180800, 845442474694923737, 140736415242767, 4096, 140361339180800, 140361339181504, -798462090368526887, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 32 (Thread 0x7fa86bbad700 (LWP 29593)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86bbacf00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa840000b08
#3  xpoll (xfds=xfds@entry=0x7fa86bbacf00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6437e30) at dsh.c:689
        a = 0x556ec6437e30
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 11, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86bbad700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86bbad700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361338640128, 845442474694923737, 140736415242767, 4096, 140361338640128, 140361338640832, -798463119013194279, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 31 (Thread 0x7fa86bd18700 (LWP 29546)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86bd17f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa8640012c8
#3  xpoll (xfds=xfds@entry=0x7fa86bd17f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6437d38) at dsh.c:689
        a = 0x556ec6437d38
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 15, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86bd18700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86bd18700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361340126976, 845442474694923737, 140736415242767, 4096, 140361340126976, 140361340127680, -798462245524220455, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 30 (Thread 0x7fa86a14c700 (LWP 29521)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a14bf00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa83001ea38
#3  xpoll (xfds=xfds@entry=0x7fa86a14bf00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6437c40) at dsh.c:689
        a = 0x556ec6437c40
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 31, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a14c700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a14c700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361310979840, 845442474694923737, 140736415242767, 4096, 140361310979840, 140361310980544, -798463853989472807, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 29 (Thread 0x7fa86bbce700 (LWP 29516)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86bbcdf00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa83c000938
#3  xpoll (xfds=xfds@entry=0x7fa86bbcdf00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6437b48) at dsh.c:689
        a = 0x556ec6437b48
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 30, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86bbce700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86bbce700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361338775296, 845442474694923737, 140736415242767, 4096, 140361338775296, 140361338776000, -798463101296454183, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 28 (Thread 0x7fa86a0a7700 (LWP 29500)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a0a6f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa7e4002618
#3  xpoll (xfds=xfds@entry=0x7fa86a0a6f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6437a50) at dsh.c:689
        a = 0x556ec6437a50
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 29, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a0a7700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a0a7700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361310304000, 845442474694923737, 140736415242767, 4096, 140361310304000, 140361310304704, -798463800839252519, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 27 (Thread 0x7fa86a1d0700 (LWP 29490)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a1cff00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa81c0014e8
#3  xpoll (xfds=xfds@entry=0x7fa86a1cff00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6437958) at dsh.c:689
        a = 0x556ec6437958
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 26, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a1d0700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a1d0700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361311520512, 845442474694923737, 140736415242767, 4096, 140361311520512, 140361311521216, -798463924856433191, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 26 (Thread 0x7fa86bc10700 (LWP 29479)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86bc0ff00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa848000d78
#3  xpoll (xfds=xfds@entry=0x7fa86bc0ff00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6437860) at dsh.c:689
        a = 0x556ec6437860
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 7, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86bc10700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86bc10700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361339045632, 845442474694923737, 140736415242767, 4096, 140361339045632, 140361339046336, -798462103790299687, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 25 (Thread 0x7fa86bc94700 (LWP 29457)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86bc93f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa84c002638
#3  xpoll (xfds=xfds@entry=0x7fa86bc93f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6437768) at dsh.c:689
        a = 0x556ec6437768
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 5, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86bc94700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86bc94700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361339586304, 845442474694923737, 140736415242767, 4096, 140361339586304, 140361339587008, -798462174657260071, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 24 (Thread 0x7fa86a1af700 (LWP 29438)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a1aef00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa814002878
#3  xpoll (xfds=xfds@entry=0x7fa86a1aef00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6437670) at dsh.c:689
        a = 0x556ec6437670
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 64, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a1af700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a1af700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361311385344, 845442474694923737, 140736415242767, 4096, 140361311385344, 140361311386048, -798463942573173287, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 23 (Thread 0x7fa86a086700 (LWP 29436)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a085f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa7f0001a58
#3  xpoll (xfds=xfds@entry=0x7fa86a085f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6437578) at dsh.c:689
        a = 0x556ec6437578
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 65, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a086700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a086700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361310168832, 845442474694923737, 140736415242767, 4096, 140361310168832, 140361310169536, -798463818555992615, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 22 (Thread 0x7fa86bbef700 (LWP 29433)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86bbeef00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa838000d78
#3  xpoll (xfds=xfds@entry=0x7fa86bbeef00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6437480) at dsh.c:689
        a = 0x556ec6437480
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 58, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86bbef700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86bbef700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361338910464, 845442474694923737, 140736415242767, 4096, 140361338910464, 140361338911168, -798463083579714087, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 21 (Thread 0x7fa86a0e9700 (LWP 29431)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a0e8f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa7fc001418
#3  xpoll (xfds=xfds@entry=0x7fa86a0e8f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6437388) at dsh.c:689
        a = 0x556ec6437388
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 59, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a0e9700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a0e9700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361310574336, 845442474694923737, 140736415242767, 4096, 140361310574336, 140361310575040, -798463769700739623, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 20 (Thread 0x7fa86a1f1700 (LWP 29412)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a1f0f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa818002668
#3  xpoll (xfds=xfds@entry=0x7fa86a1f0f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6437290) at dsh.c:689
        a = 0x556ec6437290
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 62, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a1f1700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a1f1700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361311655680, 845442474694923737, 140736415242767, 4096, 140361311655680, 140361311656384, -798463911434660391, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 19 (Thread 0x7fa86a044700 (LWP 29410)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a043f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa7f8002878
#3  xpoll (xfds=xfds@entry=0x7fa86a043f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6437198) at dsh.c:689
        a = 0x556ec6437198
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 60, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a044700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a044700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361309898496, 845442474694923737, 140736415242767, 4096, 140361309898496, 140361309899200, -798463712255552039, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 18 (Thread 0x7fa86a10a700 (LWP 29397)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a109f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa800002878
#3  xpoll (xfds=xfds@entry=0x7fa86a109f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec64370a0) at dsh.c:689
        a = 0x556ec64370a0
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 55, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a10a700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a10a700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361310709504, 845442474694923737, 140736415242767, 4096, 140361310709504, 140361310710208, -798463889422952999, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 17 (Thread 0x7fa86a065700 (LWP 29395)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a064f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa808002878
#3  xpoll (xfds=xfds@entry=0x7fa86a064f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6436fa8) at dsh.c:689
        a = 0x556ec6436fa8
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 53, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a065700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a065700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361310033664, 845442474694923737, 140736415242767, 4096, 140361310033664, 140361310034368, -798463698833779239, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 16 (Thread 0x7fa86a023700 (LWP 29392)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a022f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa7ec000ab8
#3  xpoll (xfds=xfds@entry=0x7fa86a022f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6436eb0) at dsh.c:689
        a = 0x556ec6436eb0
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 54, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a023700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a023700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361309763328, 845442474694923737, 140736415242767, 4096, 140361309763328, 140361309764032, -798463729972292135, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 15 (Thread 0x7fa86a12b700 (LWP 29387)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a12af00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa804001878
#3  xpoll (xfds=xfds@entry=0x7fa86a12af00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6436db8) at dsh.c:689
        a = 0x556ec6436db8
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 50, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a12b700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a12b700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361310844672, 845442474694923737, 140736415242767, 4096, 140361310844672, 140361310845376, -798463871706212903, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 14 (Thread 0x7fa86a0c8700 (LWP 29383)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a0c7f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa844002878
#3  xpoll (xfds=xfds@entry=0x7fa86a0c7f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6436cc0) at dsh.c:689
        a = 0x556ec6436cc0
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 42, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a0c8700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a0c8700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361310439168, 845442474694923737, 140736415242767, 4096, 140361310439168, 140361310439872, -798463783122512423, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 13 (Thread 0x7fa86bcb5700 (LWP 29347)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86bcb4f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa858000b08
#3  xpoll (xfds=xfds@entry=0x7fa86bcb4f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec64369d8) at dsh.c:689
        a = 0x556ec64369d8
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 16, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86bcb5700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86bcb5700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361339721472, 845442474694923737, 140736415242767, 4096, 140361339721472, 140361339722176, -798462161235487271, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 12 (Thread 0x7fa86bcf7700 (LWP 29345)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86bcf6f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa854000b08
#3  xpoll (xfds=xfds@entry=0x7fa86bcf6f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec64368e0) at dsh.c:689
        a = 0x556ec64368e0
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 8, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86bcf7700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86bcf7700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361339991808, 845442474694923737, 140736415242767, 4096, 140361339991808, 140361339992512, -798462125802007079, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 11 (Thread 0x7fa86bcd6700 (LWP 29343)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86bcd5f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa85c000b08
#3  xpoll (xfds=xfds@entry=0x7fa86bcd5f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec64367e8) at dsh.c:689
        a = 0x556ec64367e8
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 19, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86bcd6700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86bcd6700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361339856640, 845442474694923737, 140736415242767, 4096, 140361339856640, 140361339857344, -798462143518747175, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 10 (Thread 0x7fa86bc73700 (LWP 29341)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86bc72f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa850000b08
#3  xpoll (xfds=xfds@entry=0x7fa86bc72f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec64366f0) at dsh.c:689
        a = 0x556ec64366f0
        rv = <optimized out>
        result = 3
        xpfds = {{fd = -1, events = 1, revents = 0}, {fd = 23, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86bc73700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86bc73700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361339451136, 845442474694923737, 140736415242767, 4096, 140361339451136, 140361339451840, -798462054935046695, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 9 (Thread 0x7fa86a212700 (LWP 29287)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a211f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa824000aa8
#3  xpoll (xfds=xfds@entry=0x7fa86a211f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6435580) at dsh.c:689
        a = 0x556ec6435580
        rv = <optimized out>
        result = 3
        xpfds = {{fd = 18, events = 1, revents = 0}, {fd = 38, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a212700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a212700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361311790848, 845442474694923737, 140736415242767, 4096, 140361311790848, 140361311791552, -798464031156873767, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 8 (Thread 0x7fa86a233700 (LWP 29281)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a232f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa7f4000aa8
#3  xpoll (xfds=xfds@entry=0x7fa86a232f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6435488) at dsh.c:689
        a = 0x556ec6435488
        rv = <optimized out>
        result = 3
        xpfds = {{fd = 61, events = 1, revents = 0}, {fd = 63, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a233700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a233700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361311926016, 845442474694923737, 140736415242767, 4096, 140361311926016, 140361311926720, -798464013440133671, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 7 (Thread 0x7fa86a254700 (LWP 29275)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a253f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa820000aa8
#3  xpoll (xfds=xfds@entry=0x7fa86a253f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6435390) at dsh.c:689
        a = 0x556ec6435390
        rv = <optimized out>
        result = 3
        xpfds = {{fd = 10, events = 1, revents = 0}, {fd = 20, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a254700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a254700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361312061184, 845442474694923737, 140736415242767, 4096, 140361312061184, 140361312061888, -798463995723393575, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 6 (Thread 0x7fa86a275700 (LWP 29273)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a274f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa82c000aa8
#3  xpoll (xfds=xfds@entry=0x7fa86a274f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec6435298) at dsh.c:689
        a = 0x556ec6435298
        rv = <optimized out>
        result = 3
        xpfds = {{fd = 9, events = 1, revents = 0}, {fd = 39, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a275700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a275700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361312196352, 845442474694923737, 140736415242767, 4096, 140361312196352, 140361312197056, -798463982301620775, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 5 (Thread 0x7fa86a296700 (LWP 29272)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a295f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa828000aa8
#3  xpoll (xfds=xfds@entry=0x7fa86a295f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec64351a0) at dsh.c:689
        a = 0x556ec64351a0
        rv = <optimized out>
        result = 3
        xpfds = {{fd = 14, events = 1, revents = 0}, {fd = 35, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a296700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a296700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361312331520, 845442474694923737, 140736415242767, 4096, 140361312331520, 140361312332224, -798464102023834151, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 4 (Thread 0x7fa86a2b7700 (LWP 29271)):
#0  0x00007fa86abca56d in poll () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x0000556ec573da45 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
No locals.
#2  _poll (timeout=-1, nfds=<optimized out>, xfds=0x7fa86a2b6f00) at xpoll.c:69
        i = <optimized out>
        rv = <optimized out>
        pfds = 0x7fa834000aa8
#3  xpoll (xfds=xfds@entry=0x7fa86a2b6f00, nfds=nfds@entry=2, timeout=timeout@entry=-1) at xpoll.c:199
        i = <optimized out>
#4  0x0000556ec5725bff in _rsh_thread (args=0x556ec64350a8) at dsh.c:689
        a = 0x556ec64350a8
        rv = <optimized out>
        result = 3
        xpfds = {{fd = 36, events = 1, revents = 0}, {fd = 17, events = 1, revents = 0}}
        nfds = 2
#5  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86a2b7700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86a2b7700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361312466688, 845442474694923737, 140736415242767, 4096, 140361312466688, 140361312467392, -798464084307094055, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#6  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 3 (Thread 0x7fa86bd39700 (LWP 29252)):
#0  do_sigwait (sig=0x7fa86bd38e8c, set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:64
        ret = <optimized out>
        tmpset = {__val = {0 <repeats 16 times>}}
#1  __sigwait (set=set@entry=0x7fa86bd38e90, sig=sig@entry=0x7fa86bd38e8c) at ../sysdeps/unix/sysv/linux/sigwait.c:96
        oldtype = 0
        sig = 0x7fa86bd38e8c
        set = 0x7fa86bd38e90
#2  0x0000556ec57255a8 in _signals_thread (arg=<optimized out>) at dsh.c:963
        set = {__val = {524290, 0 <repeats 15 times>}}
        last_intr = 0
        signo = 0
        e = <optimized out>
#3  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86bd39700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86bd39700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361340262144, 845442474694923737, 140736415242767, 4096, 140361340262144, 140361340262848, -798462232102447655, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#4  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 2 (Thread 0x7fa86bd5a700 (LWP 29251)):
#0  0x00007fa86ab9b9fd in nanosleep () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x00007fa86ab9b94a in __sleep (seconds=0, seconds@entry=2) at ../sysdeps/posix/sleep.c:55
        save_errno = 0
        ts = {tv_sec = 0, tv_nsec = 982699711}
#2  0x0000556ec5724f22 in _wdog (args=<optimized out>) at dsh.c:323
        i = <optimized out>
#3  0x00007fa86ae9c5ca in start_thread (arg=0x7fa86bd5a700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fa86bd5a700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140361340397312, 845442474694923737, 140736415242767, 4096, 140361340397312, 140361340398016, -798462214385707559, -798464656809273895}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#4  0x00007fa86abd60ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 1 (Thread 0x7fa86bd5c700 (LWP 29248)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
No locals.
#1  0x0000556ec5726809 in dsh (opt=0x7fffc0099b90) at dsh.c:1143
        i = 63
        rc = 0
        rv = <optimized out>
        rshcount = 69
        thread_wdog = 140361340397312
        thread_sig = 140361340262144
        attr_wdog = {__size = "\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\000\020", '\000' <repeats 16 times>, "\002", '\000' <repeats 20 times>, __align = 0}
        attr_sig = {__size = "\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\000\020", '\000' <repeats 16 times>, "\002", '\000' <repeats 20 times>, __align = 0}
        pcp_infiles = 0x0
        itr = 0x556ec6433fe0
        domain = 0x0
        domain_in_label = <optimized out>
        __PRETTY_FUNCTION__ = "dsh"
#2  0x0000556ec57243d2 in main (argc=9, argv=<optimized out>) at main.c:131
        opt = {progname = 0x7fffc009b2a0 "pdsh", debug = false, info_only = false, test_range_expansion = false, sdr_verify = false, sdr_global = (unknown: 1809324616), altnames = false, sigint_terminates = false, wcoll = 0x556ec6434410, luser = 0x556ec6431058 "brian", luid = 1001, ruser = 0x556ec6431178 "root", fanout = 32, connect_timeout = 10, command_timeout = 0, rcmd_name = 0x556ec6431eb8 "ssh", misc_modules = 0x0, resolve_hosts = true, kill_on_fail = false, separate_stderr = true, stdin_unavailable = false, cmd = 0x556ec6433e28 "script -c command /tmp/command.out121301;echo XXRETCODE:$?", dshpath = 0x0, getstat = 0x556ec573e8f8 ";echo XXRETCODE:$?", ret_remote_rc = true, labels = true, preserve = false, recursive = false, infile_names = 0x0, outfile_name = 0x0, pcp_server = false, target_is_directory = false, pcp_client = false, pcp_client_host = 0x0, local_program_path = 0x556ec6431f38 "/usr/bin/pdsh", remote_program_path = 0x556ec6431908 "/usr/bin/pdsh", reverse_copy = false}
        retval = 0
        m = 0x556ec5741955 "/usr/lib64/pdsh"

Clearly once the remote sssh's are done, pdsh is not noticing this and reaping the children and exiting correctly.

Known issue, or something more I can do to debug?

@grondo
Copy link
Member

grondo commented Dec 13, 2016

I don't think this is a known issue.

Is the pdsh process spinning on one or more CPUs at this time, or is it sleeping? From the backtrace it appears that all "dsh" threads (managing ssh processes) are sleeping in poll(2), the signal thread is waiting in sigwait(2), etc, so unfortunately I don't see anything obvious.

I haven't looked at this code in awhile, so I'll have to peek and see how and when the ssh module calls waitpid on its children. What surprises me is that the dsh threads are still active, as I would think the fds open to the ssh processes would be closed when the processes exit

@grondo
Copy link
Member

grondo commented Dec 13, 2016

Something to look at would be ls -l /proc/$(pidof pdsh)/fd to ensure the fds we're polling on in the backtrace do exist in the process (if pdsh is spinning its possible one or more are invalid fds).

Also, does this reproduce when you don't run script -c on the remote?

@brianjmurrell
Copy link
Author

lrwx------. 1 brian brian 64 Dec 14 08:08 0 -> /dev/pts/9
l-wx------. 1 brian brian 64 Dec 14 08:08 1 -> 'pipe:[65964207]'
lrwx------. 1 brian brian 64 Dec 14 08:08 101 -> 'socket:[65959830]'
lrwx------. 1 brian brian 64 Dec 14 08:08 12 -> 'socket:[65963768]'
lrwx------. 1 brian brian 64 Dec 14 08:08 15 -> 'socket:[65965067]'
lrwx------. 1 brian brian 64 Dec 14 08:08 17 -> 'socket:[65962865]'
lrwx------. 1 brian brian 64 Dec 14 08:08 19 -> 'socket:[65958501]'
lrwx------. 1 brian brian 64 Dec 14 07:22 2 -> /dev/pts/9
lrwx------. 1 brian brian 64 Dec 14 08:08 20 -> 'socket:[65955703]'
lrwx------. 1 brian brian 64 Dec 14 08:08 21 -> 'socket:[65958505]'
lrwx------. 1 brian brian 64 Dec 14 08:08 23 -> 'socket:[65955715]'
lrwx------. 1 brian brian 64 Dec 14 08:08 29 -> 'socket:[65955737]'
lrwx------. 1 brian brian 64 Dec 14 08:08 31 -> 'socket:[65955719]'
lrwx------. 1 brian brian 64 Dec 14 08:08 32 -> 'socket:[65967229]'
lrwx------. 1 brian brian 64 Dec 14 08:08 33 -> 'socket:[65965079]'
lrwx------. 1 brian brian 64 Dec 14 08:08 34 -> 'socket:[65963780]'
lrwx------. 1 brian brian 64 Dec 14 08:08 37 -> 'socket:[65965098]'
lrwx------. 1 brian brian 64 Dec 14 08:08 41 -> 'socket:[65963784]'
lrwx------. 1 brian brian 64 Dec 14 08:08 47 -> 'socket:[65959826]'
lrwx------. 1 brian brian 64 Dec 14 08:08 49 -> 'socket:[65964238]'
lrwx------. 1 brian brian 64 Dec 14 08:08 53 -> 'socket:[65964242]'
lrwx------. 1 brian brian 64 Dec 14 08:08 59 -> 'socket:[65964246]'
lrwx------. 1 brian brian 64 Dec 14 08:08 61 -> 'socket:[65958517]'
lrwx------. 1 brian brian 64 Dec 14 08:08 67 -> 'socket:[65958521]'
lrwx------. 1 brian brian 64 Dec 14 08:08 68 -> 'socket:[65964250]'
lrwx------. 1 brian brian 64 Dec 14 08:08 7 -> 'socket:[65964212]'
lrwx------. 1 brian brian 64 Dec 14 08:08 73 -> 'socket:[65958525]'
lrwx------. 1 brian brian 64 Dec 14 08:08 77 -> 'socket:[65958529]'
lrwx------. 1 brian brian 64 Dec 14 08:08 8 -> 'socket:[65958497]'
lrwx------. 1 brian brian 64 Dec 14 08:08 83 -> 'socket:[65958533]'
lrwx------. 1 brian brian 64 Dec 14 08:08 85 -> 'socket:[65964254]'
lrwx------. 1 brian brian 64 Dec 14 08:08 89 -> 'socket:[65964258]'
lrwx------. 1 brian brian 64 Dec 14 08:08 9 -> 'socket:[65963764]'
lrwx------. 1 brian brian 64 Dec 14 08:08 97 -> 'socket:[65964262]'
lrwx------. 1 brian brian 64 Dec 14 08:08 99 -> 'socket:[65958537]'

Also, does this reproduce when you don't run script -c on the remote?

Yes.

@brianjmurrell
Copy link
Author

Was the previous information useful in any way? Or anything else I can do to gather more information?

@grondo
Copy link
Member

grondo commented Dec 15, 2016

Unfortunately I can't figure out why the socket pair for stdout/err would still be open when the ssh processes have exited, unless there is a bug where the ssh side of socketpairs is not closed in pdsh after fork... It might help to try to strace a pdsh process that is in this state, to see if it is blocked or continuously waking up from poll(2) with some error.

@brianjmurrell
Copy link
Author

$ strace -f -p 15231
strace: Process 15231 attached with 35 threads
[pid 15433] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15425] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15273] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15271] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15269] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15268] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15266] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15267] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15264] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15263] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15262] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15259] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15255] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15254] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15253] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15249] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15247] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15240] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15239] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15238] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15237] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15236] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15235] rt_sigtimedwait([INT TSTP], NULL, NULL, 8 <unfinished ...>
[pid 15234] restart_syscall(<... resuming interrupted nanosleep ...> <unfinished ...>
[pid 15272] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15436] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15265] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15270] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15251] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15252] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15244] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15246] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15243] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15242] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 15231] futex(0x563b0a6d3244, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished ...>
[pid 15234] <... restart_syscall resumed> ) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0
[pid 15234] nanosleep({2, 0}, 0x7ff2dc01cf10) = 0

According to the backtrace pasted earlier, the dsh() is doing:

            pthread_cond_wait(&threadcount_cond, &threadcount_mutex);

For good measure:

$ lsof -p 15231
COMMAND   PID  USER   FD   TYPE             DEVICE SIZE/OFF     NODE NAME
pdsh    15231 brian  cwd    DIR              253,7     4096  1052429 /home/brian/lab
pdsh    15231 brian  rtd    DIR              253,4     4096        2 /
pdsh    15231 brian  txt    REG              253,4   181224  1213908 /usr/bin/pdsh
pdsh    15231 brian  mem    REG              253,4    11272  1299297 /usr/lib64/pdsh/sshcmd.so
pdsh    15231 brian  mem    REG              253,4    11144  1299073 /usr/lib64/pdsh/xrcmd.so
pdsh    15231 brian  mem    REG              253,4    11216  1297680 /usr/lib64/pdsh/execcmd.so
pdsh    15231 brian  mem    REG              253,4    57184  1215168 /usr/lib64/libnss_files-2.23.so
pdsh    15231 brian  mem    REG              253,4  2089496  1215148 /usr/lib64/libc-2.23.so
pdsh    15231 brian  mem    REG              253,4   142312  1215176 /usr/lib64/libpthread-2.23.so
pdsh    15231 brian  mem    REG              253,4    19736  1215154 /usr/lib64/libdl-2.23.so
pdsh    15231 brian  mem    REG              253,4   180200  1220435 /usr/lib64/libtinfo.so.6.0
pdsh    15231 brian  mem    REG              253,4   167424  1220221 /usr/lib64/libncurses.so.6.0
pdsh    15231 brian  mem    REG              253,4    35752  1217322 /usr/lib64/libhistory.so.6.3
pdsh    15231 brian  mem    REG              253,4   296072  1217685 /usr/lib64/libreadline.so.6.3
pdsh    15231 brian  mem    REG              253,4   172080  1215141 /usr/lib64/ld-2.23.so
pdsh    15231 brian    0u   CHR              136,9      0t0       12 /dev/pts/9
pdsh    15231 brian    1w  FIFO               0,10      0t0 68387580 pipe
pdsh    15231 brian    2u   CHR              136,9      0t0       12 /dev/pts/9
pdsh    15231 brian    7u  unix 0xffff880186defc00      0t0 68382445 type=STREAM
pdsh    15231 brian    9u  unix 0xffff8801d4321400      0t0 68389067 type=STREAM
pdsh    15231 brian   13u  unix 0xffff88048c762400      0t0 68388460 type=STREAM
pdsh    15231 brian   15u  unix 0xffff8804700b9400      0t0 68385654 type=STREAM
pdsh    15231 brian   16u  unix 0xffff880186def000      0t0 68382449 type=STREAM
pdsh    15231 brian   19u  unix 0xffff880470594000      0t0 68390102 type=STREAM
pdsh    15231 brian   21u  unix 0xffff8804700bd400      0t0 68385658 type=STREAM
pdsh    15231 brian   26u  unix 0xffff8801d4320000      0t0 68389071 type=STREAM
pdsh    15231 brian   27u  unix 0xffff880043d3ac00      0t0 68382694 type=STREAM
pdsh    15231 brian   28u  unix 0xffff880186dec800      0t0 68382461 type=STREAM
pdsh    15231 brian   29u  unix 0xffff8801d4326c00      0t0 68389075 type=STREAM
pdsh    15231 brian   30u  unix 0xffff880043d3dc00      0t0 68382700 type=STREAM
pdsh    15231 brian   31u  unix 0xffff88047c28e000      0t0 68384562 type=STREAM
pdsh    15231 brian   32u  unix 0xffff88048cbf7800      0t0 68386533 type=STREAM
pdsh    15231 brian   35u  unix 0xffff88047c289000      0t0 68384575 type=STREAM
pdsh    15231 brian   41u  unix 0xffff8801d4320800      0t0 68389085 type=STREAM
pdsh    15231 brian   49u  unix 0xffff88046508cc00      0t0 68389089 type=STREAM
pdsh    15231 brian   51u  unix 0xffff88047c288000      0t0 68384579 type=STREAM
pdsh    15231 brian   53u  unix 0xffff880186deb800      0t0 68382465 type=STREAM
pdsh    15231 brian   57u  unix 0xffff880465088000      0t0 68389093 type=STREAM
pdsh    15231 brian   65u  unix 0xffff880186def800      0t0 68382469 type=STREAM
pdsh    15231 brian   69u  unix 0xffff88046d84a400      0t0 68389101 type=STREAM
pdsh    15231 brian   73u  unix 0xffff880186dee400      0t0 68382473 type=STREAM
pdsh    15231 brian   77u  unix 0xffff88048cb00400      0t0 68382477 type=STREAM
pdsh    15231 brian   83u  unix 0xffff88048cb01c00      0t0 68382481 type=STREAM
pdsh    15231 brian   85u  unix 0xffff88046d84c400      0t0 68389105 type=STREAM
pdsh    15231 brian   91u  unix 0xffff880470450400      0t0 68389109 type=STREAM
pdsh    15231 brian   93u  unix 0xffff88047c288c00      0t0 68384583 type=STREAM
pdsh    15231 brian   97u  unix 0xffff88048c767400      0t0 68388478 type=STREAM
pdsh    15231 brian  101u  unix 0xffff88046bc87400      0t0 68382486 type=STREAM
pdsh    15231 brian  105u  unix 0xffff88048c760c00      0t0 68388483 type=STREAM
pdsh    15231 brian  107u  unix 0xffff88048c761000      0t0 68388487 type=STREAM

@grondo
Copy link
Member

grondo commented Dec 15, 2016

Yes dsh() is expected to be waiting on pthread exit condition, so it knows when all threads are complete (or in normal case when it can start new dsh threads). The 2 second wakeups are the watchdog thread which is used to signal threads that are in connecting state over the connect timeout, or run state for longer than the command timeout.

I'm not sure I have any clues as to what is wrong here... Does the problem reproduce with the simplest test case using the "exec" rcmd type to run something instead of "ssh"? (you might have to make some fake workload)

@brianjmurrell
Copy link
Author

So it's interesting. Reducing the remote command down to a simple echo hello to try to simplify the problem. It happens sometimes and not others. This one particular time out of a range of hosts including [1-23,29-48,52-59] only one thread failed to complete:

 5220 pts/3    Sl+    0:00  |   |   \_ pdsh -R ssh -l root -S -w foo-[1-23,29-48,52-59] echo hello
 5327 ?        Zs     0:00  |   |   |   \_ [ssh] <defunct>

and pdsh confirms it:

$ pdsh -R ssh -l root -S -w foo-[1-23,29-48,52-59] echo hello | dshbak -c
^Cpdsh@mobl: interrupt (one more within 1 sec to abort)
pdsh@mobl:  (^Z within 1 sec to cancel pending threads)
pdsh@mobl: foo-55: command in progress

I've still got this running in case there is something we can dig out of it.

@grondo
Copy link
Member

grondo commented Dec 15, 2016

Ah, good reproducer thanks. Since it happens only occasionally it must be a race of some sort, which means it might be a little difficult to get a solution quickly. In the output, did pdsh receive and print the "hello" from foo-55?

@grondo
Copy link
Member

grondo commented Dec 15, 2016

BTW, does something similar happen if you replace -R ssh with -R exec? Both implementations use similar code so this might help narrow it down

@brianjmurrell
Copy link
Author

I think the sshd on the remote is not exiting. It's just sitting on a select:

# strace -f -p 73000
Process 73000 attached
select(15, [3 6], [], NULL, NULL

Yeah. Killing that sshd made the pdsh complete although it didn't produce any input. Perhaps the ^C fouled that up. Yeah. Repeating the experiment, if i go kill the non-exiting sshd the pdsh completes and the output displays.

@brianjmurrell
Copy link
Author

I think ultimately this problem has to do with ssh multiplexing (i.e. using Control{Master,Persist,Path}, etc.). When I disable that the problem seems to go away.

Which is quite a pity since multiplexing makes the per-connection handshake quite a bit quicker.

@grondo
Copy link
Member

grondo commented Dec 15, 2016

Interesting! and thanks for running that down! I can't think of how the ssh process stdout/err fds stay open even after the process exits.. even with connection sharing this scheme should work, otherwise something like ssh remote command | grep foo would always hang I'd assume.

Does the remote sshd go away if you kill pdsh and let init reap the defunct ssh processes?

I wonder if a kludge could be written to work around this case which would allow sshcmd dsh threads to immediately reap commands when they exit (if calling wait even fixes the remote sshd hang)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants