Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix_ze_soname #296

Merged
merged 1 commit into from
Oct 11, 2024
Merged

fix_ze_soname #296

merged 1 commit into from
Oct 11, 2024

Conversation

TApplencourt
Copy link
Collaborator

Fix @abagusetty bug, where the new mpich was hanging iprof.

It was due to yakza checking for the soname of our tracing library and expecting it to match ze_loader.

Working now

applenco@x4516c1s7b0n0:~/mpi_hang> mpirun -n 1 -- ~/THAPI/build/ici/bin/iprof -- ./a.out
THAPI_SYNC_DAEMON_MPI Warning: Did not get MPI_THREAD_SINGLE, got MPI_THREAD_MULTIPLE
Hello world from processor x4516c1s7b0n0, rank 0 out of 1 processors
THAPI: Trace location: /home/applenco/thapi-traces/thapi_aggreg--2024-10-11T16:43:44+00:00
BACKEND_MPI | 1 Hostnames | 1 Processes | 1 Threads |

                  Name |     Time | Time(%) | Calls |  Average |      Min |      Max |
              MPI_Init | 505.26ms |  98.31% |     1 | 505.26ms | 505.26ms | 505.26ms |
          MPI_Finalize |   8.66ms |   1.68% |     1 |   8.66ms |   8.66ms |   8.66ms |
MPI_Get_processor_name |   4.10us |   0.00% |     1 |   4.10us |   4.10us |   4.10us |
         MPI_Comm_size |   3.62us |   0.00% |     1 |   3.62us |   3.62us |   3.62us |
         MPI_Comm_rank |    645ns |   0.00% |     1 | 645.00ns |    645ns |    645ns |
                 Total | 513.93ms | 100.00% |     5 |

BACKEND_ZE | 1 Hostnames | 1 Processes | 1 Threads |

                               Name |     Time | Time(%) | Calls |  Average |     Min |      Max |
                     zeModuleCreate |  22.75ms |  67.75% |    60 | 379.18us | 91.28us |   1.06ms |
                      zeEventCreate |   2.33ms |   6.93% |  4096 | 567.78ns |   245ns |  17.64us |
                    zeModuleDestroy |   2.10ms |   6.27% |    60 |  35.08us |  2.71us | 383.52us |
                        zeDeviceGet |   1.88ms |   5.58% |     6 | 312.52us |   848ns |   1.87ms |
              zeDeviceCanAccessPeer |   1.58ms |   4.72% |    66 |  24.01us |   150ns |  61.77us |
                     zeKernelCreate |   1.30ms |   3.87% |   864 |   1.51us |   646ns | 341.80us |
                     zeEventDestroy | 682.80us |   2.03% |  4096 | 166.70ns |   137ns |   3.58us |
                    zeKernelDestroy | 338.01us |   1.01% |   864 | 391.21ns |   193ns |   2.69us |
                  zeEventPoolCreate | 234.87us |   0.70% |     7 |  33.55us | 10.20us | 136.24us |
zeDriverGetExtensionFunctionAddress | 224.28us |   0.67% |     7 |  32.04us |   571ns | 215.35us |
                 zeEventPoolDestroy | 118.06us |   0.35% |     7 |  16.87us |  7.25us |  59.40us |
                    zeContextCreate |  14.91us |   0.04% |     3 |   4.97us |  4.72us |   5.40us |
              zeDeviceGetSubDevices |  12.53us |   0.04% |    24 | 522.17ns |   116ns |   2.76us |
                             zeInit |   4.13us |   0.01% |     3 |   1.38us |   912ns |   1.77us |
                        zeDriverGet |   3.93us |   0.01% |     5 | 785.60ns |   174ns |   1.85us |
                   zeContextDestroy |   3.71us |   0.01% |     1 |   3.71us |  3.71us |   3.71us |
                              Total |  33.58ms | 100.00% | 10169 |

applenco@x4516c1s7b0n0:~/mpi_hang> module list

Currently Loaded Modules:
  1) gcc-runtime/12.2.0-267awrk            16) elfutils/0.186-yuor73r         31) ruby-ffi/1.15.4-5mo5s2q
  2) gmp/6.2.1-yctcuid                     17) pcre2/10.43-vzzidje            32) ruby-babeltrace2/0.1.4-3k74k53
  3) mpfr/4.2.1-fhgnwe7                    18) berkeley-db/18.1.40-2frw2z6    33) ruby-narray-old/0.6.1.2-iriybfo
  4) mpc/1.3.1-ygprpb4                     19) gdbm/1.23                      34) ruby-narray-ffi/1.4.4-x4lt3r2
  5) gcc/12.2.0                            20) perl/5.38.0                    35) ruby-opencl/1.3.12-pbmvgrc
  6) intel_compute_runtime/release/950.13  21) libmd/1.0.4-nvn3prd            36) thapi/git.ceaabfc-serial
  7) oneapi/eng-compiler/2024.07.30.002    22) libbsd/0.12.1-dsshygz          37) ruby-cast/0.3.1-3kwxnzj
  8) libfabric/1.20.1                      23) expat/2.6.2-s3fkrly            38) ruby-cast-to-yaml/0.1.1-5dhftgq
  9) cray-pals/1.4.0                       24) python/3.10.13                 39) ruby-mini-portile2/2.6.1-zbqteay
 10) cray-libpals/1.4.0                    25) glib/2.78.3-lpcguoz            40) ruby-nokogiri/1.12.5-3x7wfrs
 11) lz4/1.9.4                             26) babeltrace2/2.0.6-w37vov2      41) ruby-metababel/1.1.2-6o367to
 12) libarchive/3.7.1-fvef5p2              27) lttng-tools/2.12.11            42) gmake/4.4.1
 13) libiconv/1.17-kg7cda7                 28) abseil-cpp/20240116.2-cihlltz  43) hwloc/2.9.2-level-zero
 14) libmicrohttpd/0.9.50-jjjslhm          29) protobuf/3.27.1                44) yaksa/0.3-fxpciid
 15) sqlite/3.43.2-2onu5lp                 30) ruby/2.7.2-w7it2ky             45) mpich/opt/git.063ef64

@TApplencourt TApplencourt requested a review from Kerilk October 11, 2024 16:46
Copy link
Collaborator

@Kerilk Kerilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Kerilk Kerilk merged commit a8dfacd into master Oct 11, 2024
16 checks passed
@Kerilk Kerilk deleted the fix_ze_soname branch October 11, 2024 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants