Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance numbers #15

Closed
loliverhennigh opened this issue Aug 23, 2023 · 4 comments
Closed

Performance numbers #15

loliverhennigh opened this issue Aug 23, 2023 · 4 comments

Comments

@loliverhennigh
Copy link
Contributor

Any numbers we can see for performance. Been interested to see if a Jax LBM implementation can get close to optimal performance. Lettuce LBM is around 20x slower for example, https://github.com/lettucecfd/lettuce

@mehdiataei
Copy link
Contributor

mehdiataei commented Aug 23, 2023

Hello Oliver,

We will soon release the performance metrics in an accompanying paper. Roughly speaking, when compared to a fully fused LBM kernel in a state-of-the-art C++ benchmark code, our version is approximately 6-7 times slower for lid-driven cavity flow. However, assuming that the BC kernel in the C++ isn't fused (which is often the case if you want to leverage the complex BCs that are available in XLB), the performance gap narrows to roughly 3-5 times (this is also the case if you compare the performance for periodic BCs, such as the performance test case in Lettuce). While I haven't run tests on V100, preliminary tests suggest it is significantly faster than Lettuce.

A major advantage is that our code has ~96% scaling efficiency on a single DGX node and maintains respectable scaling even on up to 512 GPUs. As far as I remember, Lettuce wasn't multi-GPU (or multi-node) capable.

It's worth noting that there are ongoing work to close this performance gap further by integrating Triton kernels into portions of the code.

@loliverhennigh
Copy link
Contributor Author

Fantastic! This is very exciting to hear. Have you considered using either Taichi Lang or Warp for writing the kernels (https://github.com/NVIDIA/warp)? I have experience with both and found them to be particularly good for things like this. I have an LBM solver implemented in Warp and am getting same performance as FluidX3D (https://github.com/ProjectPhysX/FluidX3D). Warp also has pretty good Jax integration I think. I haven't tried implementing LBM in Taichi but have a explicit finite volume solver and it appears to be getting SOA performance although I am less confident of that. I have also tried Triton a bit but found it a little difficult to get working for this kind of work. If you do implement in Triton I will be very interested to see how it goes though :).

@loliverhennigh
Copy link
Contributor Author

Sorry one more comment, if you are interested in getting the rendering stuff like in fluidX3D running I would also suggest looking at either Warp or Taichi. Implementing ray marching/tracing is kinda complicated in a tensor based framework like Jax. I can't imagine implementing it in Triton. Here is a very simple ray marching on the density contours of a FV solver in Taichi. https://www.youtube.com/watch?v=xcZcHbvMe-g.

@mehdiataei
Copy link
Contributor

mehdiataei commented Aug 23, 2023

Hey Oliver.

Thanks for your comments. In fact, we have discussed using Warp specifically for visualization tasks with NVIDIA extensively! (we are collaborating with NVIDIA JAX team on this project FYI). Warp and USD would be quite useful for this purpose, especially when dealing with simulation with multi-billion voxels.

The issue with Warp is the license agreement, which is incompatible with Apache 2.0. I have raised this issue with directors at NVIDIA, but haven't heard back of any progress.

I am happy to chat more about this if your're interested. I have added you on Linkedin or pls shoot me an email at [email protected].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants