Implement KrylovPoissonSolver #3803

glwagner · 2024-09-30T15:58:49Z

We'd like to implement a Poisson solver that uses Krylov under the hood instead of our custom (preconditioned) conjugate gradient solver. This will open the door to more solvers (that may be more appropriate for our pressure Poisson equation than conjugate gradient) like conjugate residual, etc.

To make this work we need to overload some of Krylov's operators for Oceananigans Field:

kaxpby!(n, s, x, dx, y, dy) which performs y = y + s * x where n is the total length (eg Nx * Ny * Nz), s is the output, dx and dy are strides (irrelevant for us)
kaxpby!(n, s, x, dx, t, y, dy) which performs y = t * y + s * x where n is the total length (eg Nx * Ny * Nz), s is the output, dx and dy are strides (irrelevant for us)
kdot
knrm2
Either kcopy! or copyto!

The text was updated successfully, but these errors were encountered:

glwagner · 2024-09-30T16:00:28Z

We'll also write a simple script that solves a problem:

using Oceananigans
using Krylov

# make a grid
# choose a preconditioner
solver = Oceananigans.Solvers.KrylovPoissonSolver(grid; preconditioner=cool_preconditioner)

x = CenterField(grid)
b = CenterField(grid)
set!(b, rand(size(grid)...))
solve!(x, solver, b)

glwagner · 2024-09-30T16:02:27Z

cc @xkykai

xkykai · 2024-10-01T18:07:36Z

This is super exciting! Out of curiosity is there any/which Krylov solver is compatible on multiple GPUs? Seems to be an important bottleneck for our current PreconditionedConjugateGradientSolver approach.

amontoison · 2024-10-01T18:41:22Z

All Krylov solvers in Krylov.jl can take an operator as input, so if your operator-vector products can use multiple GPUs, it will already be faster.
We can develop a specific array type to limit communication and also use multiple GPUs on less expensive operations (norm, dot, axpy, ...), but simply using multiple GPUs for the most expensive part (operator-vector/operator-matrix products) will already lead to a significant speed-up.

ref: JuliaSmoothOptimizers/Krylov.jl#441

glwagner · 2024-10-01T19:28:21Z

The only parts of the solver algorithm that require communication between nodes are norm and dot, right?

The "matrix product" --- a Poisson operator for us --- has unavoidable communication as well. But, we should be able to keep this limited in scope and we only need 1 halo.

The trickier part where I think there is room for significant optimization is in the development of an effective multi-GPU preconditioner that is also efficient in parallel.

amontoison · 2024-10-01T20:02:06Z

Yes, you are right.
For the multi-GPU preconditioner, we could start with a block-Jacobi preconditioner and then implement a multi-GPU version of AMG.

glwagner · 2024-10-01T23:34:24Z

This is super exciting! Out of curiosity is there any/which Krylov solver is compatible on multiple GPUs? Seems to be an important bottleneck for our current PreconditionedConjugateGradientSolver approach.

Also to be clear about what this can do --- with Krylov, we can still use the FFT preconditioner. When we do that the parallelism issues are identical to the issues with our current CG solver, it's just that tweaking the solver method might allow us to do fewer iterations.

So there are two things going on in this discussion which are independent. First is whether conjugate gradient is optimal or whether we should use a different method. The second issue is the preconditioner, which is the more uncertain part but where we might have more gains.

glwagner · 2024-10-02T01:32:58Z

Here's a cg implementation:

https://jso.dev/Krylov.jl/stable/examples/cg/

@amontoison is there an in-place version that accepts a preallocated solution x? We'd prefer that since we carefully manage memory and want to create the possibility of maximizing available memory on GPUs.

amontoison · 2024-10-02T03:40:49Z

@glwagner
All Krylov solvers have an in-place version where the first argument is a workspace that contains all storage needed by the solver.
You can recover the solution with solver.x or just solution(solver).

I wrote a section about in-place methods in the documentation:
https://jso.dev/Krylov.jl/dev/inplace/

The in-place version of CG is detailed here.

The cost in terms of storage of each solver is also documented:
https://jso.dev/Krylov.jl/dev/storage/

amontoison · 2024-10-02T03:50:50Z

We maybe need to adapt the constructor of the KrylovSolvers (workspaces) for your specific type.
Example: https://github.com/JuliaSmoothOptimizers/Krylov.jl/blob/main/src/krylov_solvers.jl#L162

It works fine for CPU / GPU arrays as well as various partitioned arrays but Field seems quite different.

glwagner · 2024-10-02T14:36:47Z

Hmm yes. We can implement a custom constructor. Or we can capture the additional info needed for the "constructor" S in a closure.

We can also add constructors to Krylov with a more generic interface based on similar or deepcopy with a "template array". We do this for our solvers to achieve some generality even within the concept of Field (Field requires not only the size, but also "locations" on the staggered grid)

Oceananigans.jl/src/Solvers/conjugate_gradient_solver.jl

Lines 87 to 89 in 45838a5

    
           linear_operator_product = similar(template_field) # A*xᵢ = qᵢ 
        
           search_direction = similar(template_field) # pᵢ 
        
                   residual = similar(template_field) # rᵢ

I think using a closure is simpler though.

glwagner added the performance 🏍️ So we can get the wrong answer even faster label Oct 1, 2024

glwagner linked a pull request Oct 2, 2024 that will close this issue

Try using Krylov to solve the Poisson equation #3812

Draft

glwagner mentioned this issue Oct 18, 2024

Immediate NaNs running a non-hydrostatic model with an immersed boundary and open boundary conditions and immersed pressure solver #3831

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement KrylovPoissonSolver #3803

Implement KrylovPoissonSolver #3803

glwagner commented Sep 30, 2024 •

edited

Loading

glwagner commented Sep 30, 2024

glwagner commented Sep 30, 2024

xkykai commented Oct 1, 2024

amontoison commented Oct 1, 2024 •

edited

Loading

glwagner commented Oct 1, 2024 •

edited

Loading

amontoison commented Oct 1, 2024

glwagner commented Oct 1, 2024

glwagner commented Oct 2, 2024 •

edited

Loading

amontoison commented Oct 2, 2024 •

edited

Loading

amontoison commented Oct 2, 2024 •

edited

Loading

glwagner commented Oct 2, 2024

Implement KrylovPoissonSolver #3803

Implement KrylovPoissonSolver #3803

Comments

glwagner commented Sep 30, 2024 • edited Loading

glwagner commented Sep 30, 2024

glwagner commented Sep 30, 2024

xkykai commented Oct 1, 2024

amontoison commented Oct 1, 2024 • edited Loading

glwagner commented Oct 1, 2024 • edited Loading

amontoison commented Oct 1, 2024

glwagner commented Oct 1, 2024

glwagner commented Oct 2, 2024 • edited Loading

amontoison commented Oct 2, 2024 • edited Loading

amontoison commented Oct 2, 2024 • edited Loading

glwagner commented Oct 2, 2024

glwagner commented Sep 30, 2024 •

edited

Loading

amontoison commented Oct 1, 2024 •

edited

Loading

glwagner commented Oct 1, 2024 •

edited

Loading

glwagner commented Oct 2, 2024 •

edited

Loading

amontoison commented Oct 2, 2024 •

edited

Loading

amontoison commented Oct 2, 2024 •

edited

Loading