Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Parallel mesh creation fails when called with too many processes/too little data for process count #3588

Open
schnellerhase opened this issue Jan 6, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@schnellerhase
Copy link
Contributor

schnellerhase commented Jan 6, 2025

Summarize the issue

In all dimensions the mesh generation with create_interval, create_rectangle and create_box fails with an MPI error when too many processes are used for too little mesh entities.

This should at least result in a user readable error, better yet in a parallelized mesh that does not use all processes and possibly warns the caller about the questionable number of entities per process.

How to reproduce the bug

The following test case runs in all dimensions with success sequentially and stops working at some process count.

TEMPLATE_TEST_CASE("BUG parallel mesh creation", "", double, float)
{
  auto interval
      = dolfinx::mesh::create_interval<TestType>(MPI_COMM_WORLD, 2, {0.0, 1.0});
  // auto square = dolfinx::mesh::create_rectangle<TestType>(
  //     MPI_COMM_WORLD, {{{0.0, 0.0}, {1.0, 1.0}}}, {1, 1},
  //     mesh::CellType::triangle);

  // auto cube = dolfinx::mesh::create_box<TestType>(
  //     MPI_COMM_WORLD, {{{0.0, 0.0, 0.0}, {1.0, 1.0, 1.0}}}, {1, 1, 1},
  //     mesh::CellType::tetrahedron);
}

Minimal Example (Python)

No response

Output (Python)

An error occurred in MPI_Neighbor_alltoallv

Version

main branch

DOLFINx git commit

No response

Installation

No response

Additional information

Observed in #3584

@jorgensd
Copy link
Member

jorgensd commented Jan 6, 2025

I think it should be possible to make say an interval with N cells over M processes with N<=M.
We should probably add a warning, saying that it is not recommended to do so, but nothing in the internals of DOLFINx would have issue with it (except currently the constructor?)

@schnellerhase
Copy link
Contributor Author

I think it should be possible to make say an interval with N cells over M processes with N<=M.

Agree.

Is there a good guide/rule on how one should pick a useful level of parallelization for a given problem?

@jorgensd
Copy link
Member

jorgensd commented Jan 7, 2025

I think it should be possible to make say an interval with N cells over M processes with N<=M.

Agree.

Is there a good guide/rule on how one should pick a useful level of parallelization for a given problem?

It depends on the geometry (how all cells are connected) as well as what kind of problem you are solving (type of spaces, number of degrees of freedom etc). What I've seen is that people recommend having between 10 000 and 100 000 dofs per process (sometimes up to 500 000 per proc).

@chrisrichardson
Copy link
Contributor

In my experience, dolfinx seems to scale pretty well in memory - so you can just push up the number of dofs per process until you run out of memory. The numbers @jorgensd suggests are good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants