Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge Biot-Savart and tested Fast Multipole Method #81

Closed
wants to merge 52 commits into from

Conversation

marinlauber
Copy link
Member

@marinlauber marinlauber commented Oct 23, 2023

Merge my own Biot-Savart branch with the original Biot-Savart boundary condition branch.

The files that are interesting now (23/11) are:

  1. test_BS.jl: Test the velocity reconstruction on the Rankine vortex, using both the full kernel integral of the Fast (Multigrid) one. The fast kernel produces identical results to the full kernel with L2 error around 1e-6, but it's roughly 20x faster on this example.
  2. Biot-Savart.ipynb: Test using Biot-Savart to impose external flow BC during the simulations. Here we have a 2D cylinder flow, we do a single time step and try to recover the potential flow solution.
  3. BiotSavart.jl: contains the full kernel integration of Biot-Savart and other functions.

The other files are not particularly useful; I just kept them in case.

Current issues:

  • Iterative solution for the velocity boundary condition for the cylinder p-flow contains an error with the fast kernel, but works for the full one (see Biot-Savart.ipynb). The error seems confined to the lower left corner(?!)
  • Biot-Savart Kernel is specialized to 2D flows sign(i-j)*ω[I]*r[j]/(2π*r'*r+ϵ^2). Other difficulties in migrating to 3D flows.
  • MultiLevelPoisson allocates L and x that are not used. (Specialise Multigrid type?)
  • Include correction during the Poisson projection v-Cycle

marinlauber and others added 11 commits September 7, 2023 10:16
The main problem with the previous version were the `a.u .= 0` and `a.u ./= 2` lines in `mom_step!` which kept modifying the exit plane values. I changed these lines to only update the values inside the domain.

I added an `flow.exit` to the struct and initialization functions and use this to trigger the convection exit behavior.

The `apply!` function had the opposite problem - it WASN'T filling in the boundary value. Now it does.

It also seemed weird to me that the convection exit value in Lotus was based on the old value at the boundary but the u* value (before projection) upstream. I'm not sure it matters much, but it seems more consistent to just use the old values in both places, so that's what it does now.

Finally, I added a Lamp vortex dipole test case, which is nice since it should leave the exit without changing size or speed.
@codecov
Copy link

codecov bot commented Oct 23, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (adb4784) 93.89% compared to head (bb27850) 93.76%.
Report is 1 commits behind head on biot-savart.

❗ Current head bb27850 differs from pull request most recent head 7f8b5c3. Consider uploading reports for the commit 7f8b5c3 to get more accurate results

Files Patch % Lines
src/util.jl 84.61% 2 Missing ⚠️
Additional details and impacted files
@@               Coverage Diff               @@
##           biot-savart      #81      +/-   ##
===============================================
- Coverage        93.89%   93.76%   -0.14%     
===============================================
  Files                8        8              
  Lines              377      385       +8     
===============================================
+ Hits               354      361       +7     
- Misses              23       24       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

weymouth and others added 17 commits October 24, 2023 17:18
This should reset the src files to their original state and move the multilevel stuff to a new top-level file for debugging.
Switched to the cell-centered version of the vorticity so reduction works like with pressure which looks correct in the plots but not quantified yet.

The boundary reconstruction error (compared to the analytic solution) for the lamb vortex using the full vorticity field is 3e-6 when N=64 and reduces rapidly with more resolution.
The distance wasn't correct for the coarse levels, and 've fixed this. It's sort of working now, but hopefully there is still a bug, since the error is pretty high using the ml method.
Implemented a much more flexible multilevel indexing approach so that an arbitrary number of cells at any level can be labelled as "close enough" to x to define the next level of the search. Close enough can be adjusted using a new `dis` parameter, which balances speed with accuracy in the expansion. The "safe" distance is currently defined as proportional to dx at each level, although I think the increased uncertainty with depth may imply a larger (dx^2?) scaling may be more effective.
Testing with the current (dis=10) default gives errors around 1e-4 regardless of N, and requires around 40*log(N) kernel operations for every boundary point.
Got rid of the allocations, and we're now blazing fast.
Application to first convective step of circle flow is working.

Currently calling biotBC! outside the pressure solve and iterating. This works but seems a little unstable and currently take 9 loops to get L₂(σ)<1e-5.

A function to update σ within the pressure solve might accelerate convergence.
Added keyword argument option  to `project` so that I can set `itmx=1` to do only a single V-cycle before updating with `biotsavartBC!`. This did improve convergence a bit.

Next we need a `biotsavartResid!` function integrated inside `solver` so the pressure isn't messed up by iteration. Then I'll generalize it so it can be applied at every level.
I redefined `loc` to give the location relative the domain boundary corner (because it's a pain to deal with the 1.5 grid point shift consistently).

Also added the ability to have more than one buffer cell for `inside`.
The projection loop is working but we are using p to sequentially update u, ω, u_BC, and σ just to update r each iteration. This has some issues:
1. This causes pressure to be driven to zero as we iterate, so we loose the pressure for the time step
2. Its a waste of compute to update all those intermediate fields.
3. It can't be integrated within `solve` directly.

It should be possible to go straight from p to ω to r, which would seem to solve everything. I've tried to code this up but it doesn't match the (working) indirect method yet.
I've got the residual update matching the brute force approach, but only if I use ω from u. The `ω_from_p` isn't working yet.

To get this working, I needed to zero vorticity on the boundary and only compute the normal component of the velocity in `biotBC!`. It's also much more stable on increased resolution, so I increase to D=256.

The piece that I'm still fundamentally missing is why the boundary vorticity is so important. It goes to zero over the iterations, but convergence is much worse if it's excluded and faster if `boitBC!` is run repeatedly, assuring the boundary vorticity can influence the residual.
I just was just forgetting a factor of dt with the pressure driven vorticity.

The current version keeps the pressure, doesn't go through intermediate variables, and converges to the correct solution (u_max=2) in 8 steps. I tried adding another residual update between the Vcycle and smooth steps, but it didn't speed anything up.

Also, the test case is very extreme. If I _don't_ use biotsavart, the solver takes 14 iterations to converge!
weymouth and others added 24 commits October 29, 2023 22:41
This is working, but the solver doesn't converge when the vorticity is too close to the boundary. Still, nice progress.
sum(resid) should be 0 for a conservative BC. Biot-Savart is conservative, but the multi-level approximation isn't. Increasing the window dis improves the sum very very slowly, so I've added an explicit correction instead. This turns out to be very important for the long-time behaviour of the pressure.
I've updated the code to run on the GPU. Unfortunately, running on the GPU with a 2^12 square domain results in a huge slow-down. Both `biotBC!` and `update_resid!` slow down by almost 10x!
3D!
I've refactored the velocity function and the BC function to work in both 2D and 3D. Unlike most of WaterLily, this does require a little code duplication because in 2D you only need one `ω` component and the weight is 1/2πr^2, whereas in 3D you need all three components and the weight is  1/4πr^3. I used dispatch on the dimension of the vector and got the repeated code down to only 2 lines each (41&42, 72&73.

I also added a 3D vortex test case, a Hill vortex. It's actually nicer than the lamb dipole since it doesn't need special functions or autodiff, meaning it's probably the better case to roll into the final tests.

The new code works, but the accuracy sensitivity in 3D is quite different than 2D. Perhaps because the weights drop of cubically, but using `dis=1` reconstructs the Hill vortex without issues. Instead, the accuracy is very sensitive to the size of the lowest MLArray level, which needs to be around 8x8x8 for the Hill vortex. This drives the computational cost, with the rest of the refinement levels adding constant time. Perhaps this is due to placing the center of vorticity at the cell centers, which is the least accurate on the coarsest levels. Is the center computed using `|ω|` in 3D?

Between the larger number of boundary cells and the larger size of the lowest level, the 3D BC is pretty slow, ~0.5s on the Hill vortex with N=128. Hopefully, GPU to the rescue...
Running and passing the 2D and 3D test. But unfortunately, I'm seeing a huge slowdown!
I made a minimum example to help diagnose why I'm seeing a slow-down on the GPU. This example is stand-alone, and doesn't require WaterLily. It does a local sum over a multi-level tuple, very similar to `u_ω`, but I'm clearly missing something, because I'm seeing a speed-up of 14x instead of a slow-down of 70x...
Very very close to the true example now, but still haven't found the issue. Using `loc(0,I)` does slow things down a lot on the GPU, but not 20x!
Closers still. VERY important to get the biotsavart inlined. Maybe that's it...?
The lambda function `biotsavart` was the issue. Marking the function `@inline` avoids huge slow down the serial code. The triple indexing of ω causes the GPU to fall over because permute wasn't propigating inbounds! I've made that change and now it's very fast.
That worked.
Turns out that setting working group size 64 is the same as (64,1) in 2D and (64,1,1) in 3D. So slicing on i=1 and then using that working group means trying to split 1 thing 64 ways and not splitting the other dimensions at all!!! This was the REAL problem, with our GPU BC.

Fixed this by creating a tuple in @loop that assigns 64 to size(R)'s largest dimension. I'm sure this can be further optimized, but it keeps performance the same when using @inside, and makes all the i=1-slices much faster.

Tested and working in 2D and 3D vortices and the 2D circle case.
Refactored the projection step to minimize the number of times we have to call the (still somewhat expensive) u_ω function to only 1+nᵖ times per projection (down from 3+nᵖ). If nᵖ=1 after prediction and 0 after correction, this means 3 calls vs 7.
On the downside, if nᵖ>1, then p is overwritten and the velocity is (wastefully) updated more than once.
1. Compute ω from both u* AND old grad(p). Then we only call u_ω once on each domain face to find the initial residual.
2. If r₂<tol, update u with old grad(p). We already accounted for this, so no u_ω calls needed
3. If not, do a Vcycle, update u with new grad(p), and fix the domain faces with u_ω.
4. If still r₂>tol, reset p=0 and loop 3 to convergence.
5. Once outside the loop, update the ghost values, but not the domain faces.
Avoids ever overwriting the pressure but still only updates the domain faces 1+np times!
Results are really good!
Attempt to implement the 3D version of `biot_project!` that is not working yet. I tried increasing the 3D value of dis to 5, and while this makes it divergence a tiny bit slower, it's still completely broken.
the residual "fix" was the piece hard coded for 2D operations. Found it, fixed it. Tested in 2D and 3D, for Array and CuArray. Working nicely!
@weymouth
Copy link
Collaborator

weymouth commented Dec 5, 2023

This is too experimental to merge into the main repository we're going to start a new repo.

@weymouth weymouth closed this Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants