-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stat-core-merger stuck communicating with gdb #35
Comments
I added additional logging to see what GDB is telling us. It is stuck here at a line that may or may not be needed on pppc64 le (which is the platform i'm on as it happens) https://github.com/LLNL/STAT/blob/develop/scripts/core_file_merger.py#L409 I deleted that extra read but still had hangs with python3. In the end I fell back to python-2.7 and now it's working (with that extra ppc64 readline deleted) |
for the gdb hang, you may need to comment out these 3 lines:
I don't exactly recall the history, but at some point we found this was necessary, but this appears to no longer be the case |
I just commited changes to the develop branch to comment out those lines |
A note for me to look one day at doing the gdb communication the other way around: instead of python reading gdb, have gdb execute a python script (https://sourceware.org/gdb/onlinedocs/gdb/Python-API.html) |
@roblatham00 Good news, I think I figured out the source of the hang. I was able to reproduce stat-core-merger hangs on one of our CORAL systems and managed to fix it by flushing the input buffer to the gdb process during communicate(). The change is in develop in this commit 19858dc. Also, I was able to install the develop branch with this commit on our CORAL system using the gcc 8.3.1 compiler. Can you try this out and let me know if this resolves your issue? Note if you still have your previous STAT installation, you could just try to modify your installed core_file_merger.py file and add the flush after the stdin.write(). |
Platform: OLCF Summit
Versions: STAT from spack:
spack install stat%[email protected] cxxflags=--std=c++14
I was trying to collect/compare backtraces for ten core files with a command like this:
stat-core-merger -x =bedrock -F stdout -c /gpfs/alpine/csc332/scratch/${USER}/quintain-cores/
after fixing up python's string/bye challenges (maybe I goofed that!) , the command hangs. Running with -L debug shows me
When I check with
ps
I see STAT is trying to do this:and when I run that command myself, gdb suggests it did not process the command line arguments as expected:
in particular
pagination: No such file or directory
andExcess command line arguments ignored
If I re-run that command with all the -ex arguments quoted, gdb will give me the
(gdb)
prompt that the python script expectsHacking up
scripts/core_file_merger.py
to add those quotes gave me the command line I expected, however it still hangs atFind a value for the current rank
.When I ctrl-c the process, the python backtrace tells me it's stuck in
info threads
:Any suggestions for next steps?
Thanks
The text was updated successfully, but these errors were encountered: