merge Biot-Savart and tested Fast Multipole Method #81

marinlauber · 2023-10-23T14:58:05Z

Merge my own Biot-Savart branch with the original Biot-Savart boundary condition branch.

The files that are interesting now (23/11) are:

test_BS.jl: Test the velocity reconstruction on the Rankine vortex, using both the full kernel integral of the Fast (Multigrid) one. The fast kernel produces identical results to the full kernel with L2 error around 1e-6, but it's roughly 20x faster on this example.
Biot-Savart.ipynb: Test using Biot-Savart to impose external flow BC during the simulations. Here we have a 2D cylinder flow, we do a single time step and try to recover the potential flow solution.
BiotSavart.jl: contains the full kernel integration of Biot-Savart and other functions.

The other files are not particularly useful; I just kept them in case.

Current issues:

Iterative solution for the velocity boundary condition for the cylinder p-flow contains an error with the fast kernel, but works for the full one (see Biot-Savart.ipynb). The error seems confined to the lower left corner(?!)
Biot-Savart Kernel is specialized to 2D flows sign(i-j)*ω[I]*r[j]/(2π*r'*r+ϵ^2). Other difficulties in migrating to 3D flows.
MultiLevelPoisson allocates L and x that are not used. (Specialise Multigrid type?)
Include correction during the Poisson projection v-Cycle

The main problem with the previous version were the `a.u .= 0` and `a.u ./= 2` lines in `mom_step!` which kept modifying the exit plane values. I changed these lines to only update the values inside the domain. I added an `flow.exit` to the struct and initialization functions and use this to trigger the convection exit behavior. The `apply!` function had the opposite problem - it WASN'T filling in the boundary value. Now it does. It also seemed weird to me that the convection exit value in Lotus was based on the old value at the boundary but the u* value (before projection) upstream. I'm not sure it matters much, but it seems more consistent to just use the old values in both places, so that's what it does now. Finally, I added a Lamp vortex dipole test case, which is nice since it should leave the exit without changing size or speed.

…o pr/74

Convective exit

@JuliaRegistrator

@JuliaRegistrator register

Update Project.toml

codecov · 2023-10-23T15:04:38Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (adb4784) 93.89% compared to head (bb27850) 93.76%.
Report is 1 commits behind head on biot-savart.

❗ Current head bb27850 differs from pull request most recent head 7f8b5c3. Consider uploading reports for the commit 7f8b5c3 to get more accurate results

Files	Patch %	Lines
src/util.jl	84.61%	2 Missing ⚠️

Additional details and impacted files

@@               Coverage Diff               @@
##           biot-savart      #81      +/-   ##
===============================================
- Coverage        93.89%   93.76%   -0.14%     
===============================================
  Files                8        8              
  Lines              377      385       +8     
===============================================
+ Hits               354      361       +7     
- Misses              23       24       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This should reset the src files to their original state and move the multilevel stuff to a new top-level file for debugging.

Switched to the cell-centered version of the vorticity so reduction works like with pressure which looks correct in the plots but not quantified yet. The boundary reconstruction error (compared to the analytic solution) for the lamb vortex using the full vorticity field is 3e-6 when N=64 and reduces rapidly with more resolution.

The distance wasn't correct for the coarse levels, and 've fixed this. It's sort of working now, but hopefully there is still a bug, since the error is pretty high using the ml method.

Implemented a much more flexible multilevel indexing approach so that an arbitrary number of cells at any level can be labelled as "close enough" to x to define the next level of the search. Close enough can be adjusted using a new `dis` parameter, which balances speed with accuracy in the expansion. The "safe" distance is currently defined as proportional to dx at each level, although I think the increased uncertainty with depth may imply a larger (dx^2?) scaling may be more effective. Testing with the current (dis=10) default gives errors around 1e-4 regardless of N, and requires around 40*log(N) kernel operations for every boundary point.

Got rid of the allocations, and we're now blazing fast.

Application to first convective step of circle flow is working. Currently calling biotBC! outside the pressure solve and iterating. This works but seems a little unstable and currently take 9 loops to get L₂(σ)<1e-5. A function to update σ within the pressure solve might accelerate convergence.

cosmetic

Added keyword argument option to `project` so that I can set `itmx=1` to do only a single V-cycle before updating with `biotsavartBC!`. This did improve convergence a bit. Next we need a `biotsavartResid!` function integrated inside `solver` so the pressure isn't messed up by iteration. Then I'll generalize it so it can be applied at every level.

I redefined `loc` to give the location relative the domain boundary corner (because it's a pain to deal with the 1.5 grid point shift consistently). Also added the ability to have more than one buffer cell for `inside`.

The projection loop is working but we are using p to sequentially update u, ω, u_BC, and σ just to update r each iteration. This has some issues: 1. This causes pressure to be driven to zero as we iterate, so we loose the pressure for the time step 2. Its a waste of compute to update all those intermediate fields. 3. It can't be integrated within `solve` directly. It should be possible to go straight from p to ω to r, which would seem to solve everything. I've tried to code this up but it doesn't match the (working) indirect method yet.

I've got the residual update matching the brute force approach, but only if I use ω from u. The `ω_from_p` isn't working yet. To get this working, I needed to zero vorticity on the boundary and only compute the normal component of the velocity in `biotBC!`. It's also much more stable on increased resolution, so I increase to D=256. The piece that I'm still fundamentally missing is why the boundary vorticity is so important. It goes to zero over the iterations, but convergence is much worse if it's excluded and faster if `boitBC!` is run repeatedly, assuring the boundary vorticity can influence the residual.

I just was just forgetting a factor of dt with the pressure driven vorticity. The current version keeps the pressure, doesn't go through intermediate variables, and converges to the correct solution (u_max=2) in 8 steps. I tried adding another residual update between the Vcycle and smooth steps, but it didn't speed anything up. Also, the test case is very extreme. If I _don't_ use biotsavart, the solver takes 14 iterations to converge!

This is working, but the solver doesn't converge when the vorticity is too close to the boundary. Still, nice progress.

sum(resid) should be 0 for a conservative BC. Biot-Savart is conservative, but the multi-level approximation isn't. Increasing the window dis improves the sum very very slowly, so I've added an explicit correction instead. This turns out to be very important for the long-time behaviour of the pressure.

I've updated the code to run on the GPU. Unfortunately, running on the GPU with a 2^12 square domain results in a huge slow-down. Both `biotBC!` and `update_resid!` slow down by almost 10x!

I've refactored the velocity function and the BC function to work in both 2D and 3D. Unlike most of WaterLily, this does require a little code duplication because in 2D you only need one `ω` component and the weight is 1/2πr^2, whereas in 3D you need all three components and the weight is 1/4πr^3. I used dispatch on the dimension of the vector and got the repeated code down to only 2 lines each (41&42, 72&73. I also added a 3D vortex test case, a Hill vortex. It's actually nicer than the lamb dipole since it doesn't need special functions or autodiff, meaning it's probably the better case to roll into the final tests. The new code works, but the accuracy sensitivity in 3D is quite different than 2D. Perhaps because the weights drop of cubically, but using `dis=1` reconstructs the Hill vortex without issues. Instead, the accuracy is very sensitive to the size of the lowest MLArray level, which needs to be around 8x8x8 for the Hill vortex. This drives the computational cost, with the rest of the refinement levels adding constant time. Perhaps this is due to placing the center of vorticity at the cell centers, which is the least accurate on the coarsest levels. Is the center computed using `|ω|` in 3D? Between the larger number of boundary cells and the larger size of the lowest level, the 3D BC is pretty slow, ~0.5s on the Hill vortex with N=128. Hopefully, GPU to the rescue...

Running and passing the 2D and 3D test. But unfortunately, I'm seeing a huge slowdown!

I made a minimum example to help diagnose why I'm seeing a slow-down on the GPU. This example is stand-alone, and doesn't require WaterLily. It does a local sum over a multi-level tuple, very similar to `u_ω`, but I'm clearly missing something, because I'm seeing a speed-up of 14x instead of a slow-down of 70x...

Very very close to the true example now, but still haven't found the issue. Using `loc(0,I)` does slow things down a lot on the GPU, but not 20x!

Closers still. VERY important to get the biotsavart inlined. Maybe that's it...?

The lambda function `biotsavart` was the issue. Marking the function `@inline` avoids huge slow down the serial code. The triple indexing of ω causes the GPU to fall over because permute wasn't propigating inbounds! I've made that change and now it's very fast.

That worked.

@loop

Turns out that setting working group size 64 is the same as (64,1) in 2D and (64,1,1) in 3D. So slicing on i=1 and then using that working group means trying to split 1 thing 64 ways and not splitting the other dimensions at all!!! This was the REAL problem, with our GPU BC. Fixed this by creating a tuple in @loop that assigns 64 to size(R)'s largest dimension. I'm sure this can be further optimized, but it keeps performance the same when using @inside, and makes all the i=1-slices much faster. Tested and working in 2D and 3D vortices and the 2D circle case.

Refactored the projection step to minimize the number of times we have to call the (still somewhat expensive) u_ω function to only 1+nᵖ times per projection (down from 3+nᵖ). If nᵖ=1 after prediction and 0 after correction, this means 3 calls vs 7. On the downside, if nᵖ>1, then p is overwritten and the velocity is (wastefully) updated more than once. 1. Compute ω from both u* AND old grad(p). Then we only call u_ω once on each domain face to find the initial residual. 2. If r₂<tol, update u with old grad(p). We already accounted for this, so no u_ω calls needed 3. If not, do a Vcycle, update u with new grad(p), and fix the domain faces with u_ω. 4. If still r₂>tol, reset p=0 and loop 3 to convergence. 5. Once outside the loop, update the ghost values, but not the domain faces.

Avoids ever overwriting the pressure but still only updates the domain faces 1+np times!

Results are really good!

Attempt to implement the 3D version of `biot_project!` that is not working yet. I tried increasing the 3D value of dis to 5, and while this makes it divergence a tiny bit slower, it's still completely broken.

the residual "fix" was the piece hard coded for 2D operations. Found it, fixed it. Tested in 2D and 3D, for Array and CuArray. Working nicely!

weymouth · 2023-12-05T09:29:43Z

This is too experimental to merge into the main repository we're going to start a new repo.

marinlauber and others added 11 commits September 7, 2023 10:16

initial Biot-Savart BC

633557e

initial convective exit commit

6465360

removed old files

2d4787c

Move to test

cdb354c

Merge branch 'master' of https://github.com/weymouth/WaterLily.jl int…

9f1b1fe

…o pr/74

Change exit to exitBC to avoid keyword conflict

62793e0

Merge pull request WaterLily-jl#74 from marinlauber/Convective-Exit

a2d7661

Convective exit

Update Project.toml

792fe18

@JuliaRegistrator register

Merge pull request WaterLily-jl#78 from weymouth/weymouth-tag-1-03

45cc63b

Update Project.toml

merge Biot-Savart and tested Fast Multipole Method

d68b0d2

weymouth and others added 17 commits October 24, 2023 17:18

Reorganize multilevel stuff

a0a126a

This should reset the src files to their original state and move the multilevel stuff to a new top-level file for debugging.

init lamb dipole test

89a3615

Merge branch 'master' into pr/81

0f65975

Changed distance

3154578

The distance wasn't correct for the coarse levels, and 've fixed this. It's sort of working now, but hopefully there is still a bug, since the error is pretty high using the ml method.

Update BioSavart_multilevel.jl

def9a90

Got rid of the allocations, and we're now blazing fast.

Update BioSavart_multilevel.jl

782e8e6

cosmetic

clean up

a763154

Changed origin

6d09084

I redefined `loc` to give the location relative the domain boundary corner (because it's a pain to deal with the 1.5 grid point shift consistently). Also added the ability to have more than one buffer cell for `inside`.

scrap

6efabd3

Refactor for clarity

6e9e1f2

weymouth force-pushed the biot-savart branch from 98e2a46 to ee2ccdc Compare October 29, 2023 18:53

weymouth and others added 24 commits October 29, 2023 22:41

Update BioSavart_multilevel.jl

03b6c5c

Getting ready for correction step

4e0fc79

Momentum_step working

fafd54f

This is working, but the solver doesn't converge when the vorticity is too close to the boundary. Still, nice progress.

GPU working, but very slow

723cec6

I've updated the code to run on the GPU. Unfortunately, running on the GPU with a 2^12 square domain results in a huge slow-down. Both `biotBC!` and `update_resid!` slow down by almost 10x!

GPU running

a405eaf

Running and passing the 2D and 3D test. But unfortunately, I'm seeing a huge slowdown!

Update MWE.jl

9afee5d

Update MWE.jl

53b13e3

Very very close to the true example now, but still haven't found the issue. Using `loc(0,I)` does slow things down a lot on the GPU, but not 20x!

Update MWE.jl

2b6f38e

Closers still. VERY important to get the biotsavart inlined. Maybe that's it...?

Update MWE.jl

19e318a

The lambda function `biotsavart` was the issue. Marking the function `@inline` avoids huge slow down the serial code. The triple indexing of ω causes the GPU to fall over because permute wasn't propigating inbounds! I've made that change and now it's very fast.

Got it

e9ef025

That worked.

Update BioSavart_multilevel.jl

9140c8d

Update BioSavart_multilevel.jl

e2dd4da

Avoids ever overwriting the pressure but still only updates the domain faces 1+np times!

Added domain test routines

f29db14

Results are really good!

3D biot_project!, DIVERGING

3da68f1

Attempt to implement the 3D version of `biot_project!` that is not working yet. I tried increasing the 3D value of dis to 5, and while this makes it divergence a tiny bit slower, it's still completely broken.

3D example with old method

636a3bc

use multilevel funcions

58270a2

Fixed fix_resid!

b925718

the residual "fix" was the piece hard coded for 2D operations. Found it, fixed it. Tested in 2D and 3D, for Array and CuArray. Working nicely!

Moving Biot-Savart to it's own repo

5de451e

keeping rolling

7f8b5c3

weymouth closed this Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge Biot-Savart and tested Fast Multipole Method #81

merge Biot-Savart and tested Fast Multipole Method #81

marinlauber commented Oct 23, 2023 •

edited by weymouth

Loading

codecov bot commented Oct 23, 2023 •

edited

Loading

weymouth commented Dec 5, 2023

merge Biot-Savart and tested Fast Multipole Method #81

merge Biot-Savart and tested Fast Multipole Method #81

Conversation

marinlauber commented Oct 23, 2023 • edited by weymouth Loading

codecov bot commented Oct 23, 2023 • edited Loading

Codecov Report

weymouth commented Dec 5, 2023

marinlauber commented Oct 23, 2023 •

edited by weymouth

Loading

codecov bot commented Oct 23, 2023 •

edited

Loading