Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open-MPI win_allocate issue #35

Open
jeffhammond opened this issue Nov 17, 2019 · 3 comments
Open

Open-MPI win_allocate issue #35

jeffhammond opened this issue Nov 17, 2019 · 3 comments

Comments

@jeffhammond
Copy link
Member

We should root-cause this. My money is on Travis CI environment or a Open-MPI bug, rather than Casper.

testing mpiexec=mpiexec --oversubscribe -np 4 CSP_NG=0 win_allocate ...
CASPER Configuration:
    RMA_ERR_CHECK    (enabled) 
    CSP_VERBOSE      = err|conf_g|warn|conf_win|conf_comm|info
    CSP_NG           = 0
    CSP_ASYNC_CONFIG = on
    CSP_TOPO         = machine
    CSP_ASYNC_MODE   = rma|pt2pt
PT2PT Offloading Options:
    CSP_OFFLOAD_MIN_MSGSZ   = 8192 bytes
    CSP_OFFLOAD_SHMQ_NCELLS = 64 (total 13 Kbytes)
                              cell size = 208 bytes, cell size(aligned) = 256 bytes
--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.
  Local host:  travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74
  System call: open(2) 
  Error:       No such file or directory (errno 2)
--------------------------------------------------------------------------
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] *** An error occurred in MPI_Win_allocate
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] *** reported by process [893255681,2]
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] *** on communicator MPI COMMUNICATOR 3 SPLIT FROM 0
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] *** MPI_ERR_WIN: invalid window
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] ***    and potentially your MPI job)
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19152] PMIX ERROR: UNREACHABLE in file ../../../../../../../opal/mca/pmix/pmix3x/pmix/src/server/pmix_server.c at line 2147
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19152] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19152] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
test failed ! mpiexec --oversubscribe -np 4 /home/travis/build/pmodels/casper/test/win_allocate
@minsii
Copy link
Collaborator

minsii commented May 13, 2021

@jeffhammond Do you have any idea if we are still facing this issue with a newer version of OpenMPI? I plan to make a production release for Casper since it was stable in the past a few years.

@jeffhammond
Copy link
Member Author

I don't know. I stopped paying attention to Travis CI failures in many of my projects.

@minsii
Copy link
Collaborator

minsii commented May 13, 2021

Me too. We moved to github actions in the other projects. Let me migrate it for Casper too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants