Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash of OMIP2 simulation using TL319_tn14 grid and noresm2_5_alpha04_v3 #387

Closed
matsbn opened this issue Aug 29, 2024 · 2 comments
Closed
Assignees

Comments

@matsbn
Copy link
Contributor

matsbn commented Aug 29, 2024

Using branch feature/noresm2_5_alpha04_v3 of https://github.com/mvertens/NorESM.git, compset 2000_DATM%JRA_SLND_CICE_BLOM_DROF%JRA_SGLC_SWAV_SESP and grid combination TL319_tn14, the simulation crashed at what seemed the first attempt to write CICE diagnostics.

The error message in cesm.log.* was:
[b4167:12257] *** An error occurred in MPI_Gather
[b4167:12257] *** reported by process [47501952548864,123]
[b4167:12257] *** on communicator MPI COMMUNICATOR 49 SPLIT FROM 44
[b4167:12257] *** MPI_ERR_TRUNCATE: message truncated
[b4167:12257] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[b4167:12257] *** and potentially your MPI job)

I had a feeling it could have something to do with LFS stripes and I saw that in env_run.xml, PIO_STRIDE was set to $MAX_MPITASKS_PER_NODE. This is 128 on Betzy, but I was running with fewer processors that this for CICE (96 processors). When manually setting PIO_STRIDE to 8 for all components (a bit random), the simulation ran fine. Not sure this is the reason for the crash, but if it is, maybe PIO_STRIDE should be set to the minimum of $MAX_MPITASKS_PER_NODE and processors per component?

@TomasTorsvik
Copy link
Contributor

@matsbn - Is this the same issue as is being discussed in the NorESM issue 539? If so, I suggest to close the issue here.

@matsbn
Copy link
Contributor Author

matsbn commented Nov 27, 2024

Yes @TomasTorsvik , it is the same issue. I'll close this one.

@matsbn matsbn closed this as completed Nov 27, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in NorESM Development Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants