Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: CHPL_TASKS=fifo + CHPL_COMM=gasnet segfaults in runtime startup #26405

Closed
jabraham17 opened this issue Dec 13, 2024 · 1 comment · Fixed by #26415
Closed

[Bug]: CHPL_TASKS=fifo + CHPL_COMM=gasnet segfaults in runtime startup #26405

jabraham17 opened this issue Dec 13, 2024 · 1 comment · Fixed by #26415

Comments

@jabraham17
Copy link
Member

jabraham17 commented Dec 13, 2024

Running hello world with CHPL_TASKS=fifo + CHPL_COMM=gasnet segfaults during runtime startup. Using GASNET_BACKTRACE=1, I traced this to a failure in partitionResources, because topology is NULL.

I believe this is because partitionResources is called when it should not be. Making the following to chpl_topo_post_comm_init (to match other chpl_topo_ functions) partially fixes the issue

diff --git a/runtime/src/topo/hwloc/topo-hwloc.c b/runtime/src/topo/hwloc/topo-hwloc.c
index f6188ea24ed..0ae99532b0a 100644
--- a/runtime/src/topo/hwloc/topo-hwloc.c
+++ b/runtime/src/topo/hwloc/topo-hwloc.c
@@ -242,6 +242,9 @@ void chpl_topo_pre_comm_init(char *accessiblePUsMask) {
 // initialized.
 //
 void chpl_topo_post_comm_init(void) {
+  if (!haveTopology) {
+    return;
+  }
   partitionResources();
 }
 

After making this change, hello world exists with "error: number of cpus is uninitialized". So something needs to be done to properly initialize the number of cpus

One way this can be done is by changing chpl_topo_pre_comm_init to make sure it initializes the topology when using tasks=fifo, but this feels heavy handed (based on code comments).

I think this is only an issue when CHPL_HWLOC is not explicitly set, as it defaults to CHPL_HWLOC=none when CHPL_TASKS!=qthreads. And with CHPL_HWLOC=none, this code is not used.

chplenv

CHPL_HOST_PLATFORM: darwin
CHPL_HOST_COMPILER: clang
  CHPL_HOST_CC: clang
  CHPL_HOST_CXX: clang++
CHPL_HOST_ARCH: arm64
CHPL_TARGET_PLATFORM: darwin
CHPL_TARGET_COMPILER: clang *
  CHPL_TARGET_CC: clang
  CHPL_TARGET_CXX: clang++
  CHPL_TARGET_LD: /usr/bin/clang++
CHPL_TARGET_ARCH: arm64
CHPL_TARGET_CPU: native *
CHPL_LOCALE_MODEL: flat
CHPL_COMM: gasnet *
  CHPL_COMM_SUBSTRATE: udp
  CHPL_GASNET_SEGMENT: everything
CHPL_TASKS: fifo *
CHPL_LAUNCHER: amudprun
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_HOST_MEM: cstdlib
CHPL_MEM: cstdlib *
CHPL_ATOMICS: cstdlib
  CHPL_NETWORK_ATOMICS: none
CHPL_GMP: system *
CHPL_HWLOC: system *
CHPL_RE2: bundled *
CHPL_LLVM: system
  CHPL_LLVM_SUPPORT: system
  CHPL_LLVM_CONFIG: /opt/homebrew/opt/llvm@19/bin/llvm-config
  CHPL_LLVM_VERSION: 19
CHPL_AUX_FILESYS: none
CHPL_LIB_PIC: none
CHPL_SANITIZE: none
CHPL_SANITIZE_EXE: none
@jabraham17
Copy link
Member Author

I discussed this offline with @jhh67, we determined that the correct thing to do is the "heavy handed" approach and just make sure to initialize topology when CHPL_TASKS=fifo. This led to removing haveTopology entirely, as now the check for CHPL_TASKS is whether it is fifo or qthreads, which will always be true.

#26415 will do that and resolve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant