Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement KrylovPoissonSolver #3803

Open
glwagner opened this issue Sep 30, 2024 · 11 comments · May be fixed by #3812
Open

Implement KrylovPoissonSolver #3803

glwagner opened this issue Sep 30, 2024 · 11 comments · May be fixed by #3812
Labels
performance 🏍️ So we can get the wrong answer even faster

Comments

@glwagner
Copy link
Member

glwagner commented Sep 30, 2024

With @amontoison

We'd like to implement a Poisson solver that uses Krylov under the hood instead of our custom (preconditioned) conjugate gradient solver. This will open the door to more solvers (that may be more appropriate for our pressure Poisson equation than conjugate gradient) like conjugate residual, etc.

To make this work we need to overload some of Krylov's operators for Oceananigans Field:

  • kaxpby!(n, s, x, dx, y, dy) which performs y = y + s * x where n is the total length (eg Nx * Ny * Nz), s is the output, dx and dy are strides (irrelevant for us)
  • kaxpby!(n, s, x, dx, t, y, dy) which performs y = t * y + s * x where n is the total length (eg Nx * Ny * Nz), s is the output, dx and dy are strides (irrelevant for us)
  • kdot
  • knrm2
  • Either kcopy! or copyto!
@glwagner
Copy link
Member Author

We'll also write a simple script that solves a problem:

using Oceananigans
using Krylov

# make a grid
# choose a preconditioner
solver = Oceananigans.Solvers.KrylovPoissonSolver(grid; preconditioner=cool_preconditioner)

x = CenterField(grid)
b = CenterField(grid)
set!(b, rand(size(grid)...))
solve!(x, solver, b)

@glwagner
Copy link
Member Author

cc @xkykai

@glwagner glwagner added the performance 🏍️ So we can get the wrong answer even faster label Oct 1, 2024
@xkykai
Copy link
Collaborator

xkykai commented Oct 1, 2024

This is super exciting! Out of curiosity is there any/which Krylov solver is compatible on multiple GPUs? Seems to be an important bottleneck for our current PreconditionedConjugateGradientSolver approach.

@amontoison
Copy link

amontoison commented Oct 1, 2024

All Krylov solvers in Krylov.jl can take an operator as input, so if your operator-vector products can use multiple GPUs, it will already be faster.
We can develop a specific array type to limit communication and also use multiple GPUs on less expensive operations (norm, dot, axpy, ...), but simply using multiple GPUs for the most expensive part (operator-vector/operator-matrix products) will already lead to a significant speed-up.

ref: JuliaSmoothOptimizers/Krylov.jl#441

@glwagner
Copy link
Member Author

glwagner commented Oct 1, 2024

The only parts of the solver algorithm that require communication between nodes are norm and dot, right?

The "matrix product" --- a Poisson operator for us --- has unavoidable communication as well. But, we should be able to keep this limited in scope and we only need 1 halo.

The trickier part where I think there is room for significant optimization is in the development of an effective multi-GPU preconditioner that is also efficient in parallel.

@amontoison
Copy link

Yes, you are right.
For the multi-GPU preconditioner, we could start with a block-Jacobi preconditioner and then implement a multi-GPU version of AMG.

@glwagner
Copy link
Member Author

glwagner commented Oct 1, 2024

This is super exciting! Out of curiosity is there any/which Krylov solver is compatible on multiple GPUs? Seems to be an important bottleneck for our current PreconditionedConjugateGradientSolver approach.

Also to be clear about what this can do --- with Krylov, we can still use the FFT preconditioner. When we do that the parallelism issues are identical to the issues with our current CG solver, it's just that tweaking the solver method might allow us to do fewer iterations.

So there are two things going on in this discussion which are independent. First is whether conjugate gradient is optimal or whether we should use a different method. The second issue is the preconditioner, which is the more uncertain part but where we might have more gains.

@glwagner glwagner linked a pull request Oct 2, 2024 that will close this issue
@glwagner
Copy link
Member Author

glwagner commented Oct 2, 2024

Here's a cg implementation:

https://jso.dev/Krylov.jl/stable/examples/cg/

@amontoison is there an in-place version that accepts a preallocated solution x? We'd prefer that since we carefully manage memory and want to create the possibility of maximizing available memory on GPUs.

@amontoison
Copy link

amontoison commented Oct 2, 2024

@glwagner
All Krylov solvers have an in-place version where the first argument is a workspace that contains all storage needed by the solver.
You can recover the solution with solver.x or just solution(solver).

I wrote a section about in-place methods in the documentation:
https://jso.dev/Krylov.jl/dev/inplace/

The in-place version of CG is detailed here.

The cost in terms of storage of each solver is also documented:
https://jso.dev/Krylov.jl/dev/storage/

@amontoison
Copy link

amontoison commented Oct 2, 2024

We maybe need to adapt the constructor of the KrylovSolvers (workspaces) for your specific type.
Example: https://github.com/JuliaSmoothOptimizers/Krylov.jl/blob/main/src/krylov_solvers.jl#L162

It works fine for CPU / GPU arrays as well as various partitioned arrays but Field seems quite different.

@glwagner
Copy link
Member Author

glwagner commented Oct 2, 2024

Hmm yes. We can implement a custom constructor. Or we can capture the additional info needed for the "constructor" S in a closure.

We can also add constructors to Krylov with a more generic interface based on similar or deepcopy with a "template array". We do this for our solvers to achieve some generality even within the concept of Field (Field requires not only the size, but also "locations" on the staggered grid)

linear_operator_product = similar(template_field) # A*xᵢ = qᵢ
search_direction = similar(template_field) # pᵢ
residual = similar(template_field) # rᵢ

I think using a closure is simpler though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance 🏍️ So we can get the wrong answer even faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants