Tuning `compute_tau_minor_absorption_kernel` #8

Chiil · 2021-05-20T13:42:57Z

First, tuning done by @julietbravo, optimal block size: (1,3,1). Another look by @isazi and @benvanwerkhoven would be highly appreciated. Why are the block sizes so small?

The text was updated successfully, but these errors were encountered:

bartvstratum · 2021-05-23T16:34:30Z

isazi · 2021-06-17T12:04:39Z

I have also been working on this kernel. What I did so far was:

swap x and y dimension for threads
simplify kernel to remove the big tropo split
add pragma unroll

The code seems to be twice as fast than the original code (best tuned against best tuned) on the A100 I am running everything on.

benvanwerkhoven · 2021-06-25T14:29:37Z

To record the progress on the kernel here as well. Bart changed Alessio's kernel to instead call the kernel twice, once for each value of idx_tropo (0 or 1). That has helped to dramatically simplify the code and it seems without much performance loss since most thread blocks had all idx_tropo 0 or all 1 anyway.

Today I have inlined the interpolate byflav 2D function so that I could fuse the loop inside this function and the loop that updates tau with tau_minor. This allowed me to not use tau_minor in global memory and use a register called ltau_minor instead, saving many loads and stores to global memory.

I've made a second version of the kernel that caches tau in shared memory for each iteration of the imnr loop over nscales. Threads in the x-dimension cooperatively load and store values of tau in order to coalesce the loads and stores of tau in global memory. The tau values are actually private to each thread, there is just cooperation between threads in the x-dimension for loads and stores. This change was really invasive to the code so I've made a separate version for this. It does require to have block_size_x, block_size_y, and max_gpt to be known at compile time.

Chiil assigned isazi and benvanwerkhoven May 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tuning `compute_tau_minor_absorption_kernel` #8

Tuning `compute_tau_minor_absorption_kernel` #8

Chiil commented May 20, 2021

bartvstratum commented May 23, 2021

isazi commented Jun 17, 2021

benvanwerkhoven commented Jun 25, 2021

Tuning compute_tau_minor_absorption_kernel #8

Tuning compute_tau_minor_absorption_kernel #8

Comments

Chiil commented May 20, 2021

bartvstratum commented May 23, 2021

isazi commented Jun 17, 2021

benvanwerkhoven commented Jun 25, 2021

Tuning `compute_tau_minor_absorption_kernel` #8

Tuning `compute_tau_minor_absorption_kernel` #8