Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diocotron error with Gingko #86

Open
PaulineVidal opened this issue Feb 7, 2025 · 16 comments
Open

Diocotron error with Gingko #86

PaulineVidal opened this issue Feb 7, 2025 · 16 comments

Comments

@PaulineVidal
Copy link
Collaborator

PaulineVidal commented Feb 7, 2025

An error with Gingko appears while running the diocotron simulation:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Ginkgo did not converge in MatrixBatchCsr
Aborted (core dumped)

It does not appears for

  • Circular mapping with a grid of 256x512.
  • Czarny mapping with a grid of 128x256.

It does appears for

  • Czarny mapping with a grid of 256x256.
  • Czarny mapping with a grid of 256x512.
  • Czarny mapping with (1e-5, 1) parameters make it look like a circular mapping with a grid of 256x512.

The simulation can be a bit non-smooth. Here is the density at t = 50 for ( Czarny mapping with a grid of 128x256)
Image

@tpadioleau
Copy link
Member

How many iterations does the linear solver do ?

@PaulineVidal
Copy link
Collaborator Author

I don't know very well Gingko. I guess it is a parameter in Gingko, I just see m_gko_matrix->solve(m_x_init, b); in the code.
@AbdelhadiKara , maybe you know better the answer.

@AbdelhadiKara
Copy link
Collaborator

AbdelhadiKara commented Feb 10, 2025

You have the possibility to check the iterations number by activating the logger. there is a boolean you can set to true when instantiating the m_gko_matrix. you can have a look here.

PS: You can try to solve this issue by increasing the maximum number of iterations.

@PaulineVidal
Copy link
Collaborator Author

PaulineVidal commented Feb 10, 2025

Yes, it seems to not fail when we set m_max_iter(max_iter.value_or(10000)) instead of m_max_iter(max_iter.value_or(1000)).
Ok, thank you 👍

@tpadioleau
Copy link
Member

Doesn't it mean the condition number is high ? Could we find a better preconditioner ?

@AbdelhadiKara
Copy link
Collaborator

@tpadioleau the preconditinner used is Jacobi. It was the only one availiable. IMO it would be better to rely on kokkos kernels solvers. They released batched version of CG.

@tpadioleau
Copy link
Member

@tpadioleau the preconditinner used is Jacobi. It was the only one availiable. IMO it would be better to rely on kokkos kernels solvers. They released batched version of CG.

How do you think it would help to use Kokkos Kernels ?

Regarding Ginkgo, do you know if the version 1.9 has other preconditioners ?

@AbdelhadiKara
Copy link
Collaborator

AbdelhadiKara commented Feb 11, 2025

For several reasons.

  • to reduce dépendancies.
  • a totally personal point of view, but kokkos-kernels implementation seem more stable.(Robust)
  • if in the future We need something, it will be easier to Ask cexa team.

@tpadioleau
Copy link
Member

For several reasons.

  • to reduce dépendancies.
  • a totally personal point of view, but kokkos-kernels implementation seem more stable.(Robust)
  • if in the future We need something, it will be easier to Ask cexa team.

On the overall I think I can agree but I want to point out that it will not avoid the work of finding adapted preconditioners and stopping criteria.

@AbdelhadiKara
Copy link
Collaborator

Thank you for these precisions
I agree with you, I've just checked and none of the release 1.9 and develop branches contain any preconditioners other than Jacobi for batch solvers.
For stopping criterion, i had investigated several months ago and ginkgo doc makes distinction beetween the "true" stopping criteria and the "implicit" one which is internal to solvers.

@tpadioleau
Copy link
Member

Ok thanks!

@PaulineVidal
Copy link
Collaborator Author

PaulineVidal commented Feb 17, 2025

For indications, here are the numbers of iterations needed for the simulation I mentionned:
Diocotron simulation on a Czarny mapping on a 256x512 grid.

csr_log.txt

(for the 700th time step, around 1900 iterations in the Poisson solver were needed.)

The modifications I made are on the branch pvidal_diocotron_on_czarny_map
(https://github.com/gyselax/gyselalibxx/blob/pvidal_diocotron_on_czarny_map/simulations/geometryRTheta/diocotron/diocotron.cpp)

@tpadioleau
Copy link
Member

 System no. 0:
 Number of iterations = 1909
 Implicit residual norm = 2.07155e-18
 True (Ax-b) residual norm = 3.4037e-14
 Right-hand side (b) norm = 0.00214497
 --- System 0 did not converge! ---

Looking at the residual, to me it looks converged. It is possible that the last 1000 iterations are not increasing much the residual. It could be that the tolerance requested is too strict. A colleague once told me that a rule of thumb to select the value of the tolerance is roughly the machine precision divided by the condition number of the matrix.

@AbdelhadiKara
Copy link
Collaborator

I have found an interesting feature in ginkgo CMakeLists,
option(GINKGO_JACOBI_FULL_OPTIMIZATIONS "Use all the optimizations for the CUDA Jacobi algorithm" OFF)
Perhaps switching this to ON will bring some improvements?

@tpadioleau
Copy link
Member

@AbdelhadiKara What kind of optimization can we expect from this option ?

@AbdelhadiKara
Copy link
Collaborator

I have found at least two things if this option is activated:

  • leads to increased jacobi blocksize
  • unrolling of some loops

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants