-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ncmpi_create Stalls When Using High MPI Rank Counts #142
Comments
|
I'd also like to know which MPI implementation/version and which file system. Definitely strange that things work ok with 256 processes but not 300+. |
It's mpich-4.1.2 @roblatham00 |
I tried and it still hang, maybe it is because of MPI-IO @wkliao |
What file system are you writing to? Can you try adding "ufs:" as a prefix to your output file name, i.e. ufs:./output.nc? |
@wkliao it works,thank you so much! |
@wkliao Unfortunately, the issue with the stuck creation of output.nc has reappeared. Strangely enough, when I use The hostfile is like |
The problem may be the file system you are using. |
I'm encountering an issue where the ncmpi_create function appears to stall when running my application with a high number of MPI processes. Specifically, the program hangs at the ncmpi_create call when attempting to create a new NetCDF file.
My Netcdf version is 1.12.1, FLAGS as below
I executed the command below and it will stall at ncmpi_create, there are 4 nodes and each node has 96 cores
mpirun -n 384 -hosts controller1,compute1,compute2storage,compute3storage ./test ./output.nc
if I reduce the number of rank, like mpirun -n 256, it can work.
I want to know what might be causing this, whether it's a network bottleneck or a disk bottleneck, or OS options
My code
The text was updated successfully, but these errors were encountered: