You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should root-cause this. My money is on Travis CI environment or a Open-MPI bug, rather than Casper.
testing mpiexec=mpiexec --oversubscribe -np 4 CSP_NG=0 win_allocate ...
CASPER Configuration:
RMA_ERR_CHECK (enabled)
CSP_VERBOSE = err|conf_g|warn|conf_win|conf_comm|info
CSP_NG = 0
CSP_ASYNC_CONFIG = on
CSP_TOPO = machine
CSP_ASYNC_MODE = rma|pt2pt
PT2PT Offloading Options:
CSP_OFFLOAD_MIN_MSGSZ = 8192 bytes
CSP_OFFLOAD_SHMQ_NCELLS = 64 (total 13 Kbytes)
cell size = 208 bytes, cell size(aligned) = 256 bytes
--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have. It is likely that your MPI job will now either abort or
experience performance degradation.
Local host: travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74
System call: open(2)
Error: No such file or directory (errno 2)
--------------------------------------------------------------------------
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] *** An error occurred in MPI_Win_allocate
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] *** reported by process [893255681,2]
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] *** on communicator MPI COMMUNICATOR 3 SPLIT FROM 0
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] *** MPI_ERR_WIN: invalid window
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] *** and potentially your MPI job)
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19152] PMIX ERROR: UNREACHABLE in file ../../../../../../../opal/mca/pmix/pmix3x/pmix/src/server/pmix_server.c at line 2147
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19152] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19152] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
test failed ! mpiexec --oversubscribe -np 4 /home/travis/build/pmodels/casper/test/win_allocate
The text was updated successfully, but these errors were encountered:
@jeffhammond Do you have any idea if we are still facing this issue with a newer version of OpenMPI? I plan to make a production release for Casper since it was stable in the past a few years.
We should root-cause this. My money is on Travis CI environment or a Open-MPI bug, rather than Casper.
The text was updated successfully, but these errors were encountered: