-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Darshan parser not generating any output #1000
Comments
Hi @wadudmiah , can you confirm that there are no error messages being produced at runtime? Can you also tell us what sort of file system the log is being written to? This is an unusual error. It indicates that the gzip-compressed data contained in a portion of the log file cannot be parsed by libz. Usually even if something goes wrong in Darshan itself, the data generated can at least be uncompressed before there is a problem interpreting it. I'm not sure what's wrong, but there may be some workarounds we can try that will help narrow down the problem. I'd like to confirm the questions above first, though, to get a better idea of the situation. |
Even I am facing the same issue, with darshan version 3.4.4, OpenMPI version 4.1.2, GCC/MPICC version 11.4.0 Was not able to get a reliable log, but I got 2 types of errors when I performed
First one
The second kind of error was
The results would vary based on my mpirun command.
It gives me proper output 7/10 without any issues. But when I run
It gives me errors 19/20 times. Even though from compiling to running I do the exact same steps one of these 2 errors pops up. There is no pattern for which error occurs. CPU information
|
Oh actually @wadudmiah 's original bug report is probably addressed by #1002, which is available in origin/main and will be included in the next release. I didn't make the connection. @Arun8765 I'm not sure if your error is related or not. Are you able to share one of the log files that you are unable to parse? |
Hi @carns , here are the 2 darshan files that throw the above errors when I use darshan-parser arunk_mpiParaio_id118913-118913_9-6-4541-16124893853151831010_1.darshan.txt |
I just had a look at both example logs shared by @wadudmiah and @Arun8765 and I can confirm the reported parsing errors. It doesn't appear using Darshan's main branch is any help, so these both look like new issues (likely related given failures like this with decompression have been rare to see in Darshan). We are looking into the problem to see if we have any ideas/suggestions. In the meantime, if you can provide the following info it might be of some help:
|
This is something we've not been able to reproduce directly ourselves, but did confirm from looking at each of your log files that there appears to be corrupted data in them. The log data is first compressed (without error) then immediately appended to the Darshan log, so there is very little time for it to be corrupted. One possibility, beyond a potential corner case bug in Darshan itself that we don't understand, is a problem with the MPI-IO implementation and how Darshan uses it. We have a few suggestions on things to try to see if it perhaps works around the issue that maybe you could try:
|
Hi,
I have generated the attached Darshan trace file, but when I try to generate the text output, I get the following error and no text output:
This is a run of the benchio benchmark (https://github.com/EPCCed/benchio/issues) running on 4 nodes and with 16 MPI processes per node (ppn=16). However, Darshan profiles it correctly when I run the same benchmark on 2 nodes (ppn=16), so with 32 MPI processes. Strangely, it also works on 16 nodes (ppn = 16), so with 256 MPI processes.
Any help will be greatly appreciated.
ku1324@k_benchio.x_id1172710-18348_7-23-42668-17459906300090053027_1.darshan.txt
The text was updated successfully, but these errors were encountered: