Tuning `interpolation_kernel()` #13

bartvstratum · 2021-06-15T11:20:12Z

I'm also starting the tuning of interpolation_kernel().

The text was updated successfully, but these errors were encountered:

bartvstratum · 2021-06-16T09:00:39Z

I added the kernel tuner script (interpolation_kernel.py). This one again requires binary files as input, generated from the cuda_dump_bins branch.

Results from tuning:

So the optimal block size is again very small...

Optimized, the kernel is about 3x faster. Old and new profile:

Time(%)  Total Time (ns)  Instances   Average   Minimum  Maximum                                                  Name                                                
 -------  ---------------  ---------  ---------  -------  -------  ----------------------------------------------------------
 9.3         30895235         61   506479.3   137631   620120  void (anonymous namespace)::interpolation_kernel<double>(int, int, int, int, int, int, int, double,¿
 3.6         11185705         61   183372.2    79967   212253  void (anonymous namespace)::interpolation_kernel<double>(int, int, int, int, int, int, int, double,¿

bartvstratum · 2021-06-16T09:48:34Z

Changing the order of the dimensions (x=col, y=lay) is a bit faster: adffcd1

Time(%)  Total Time (ns)  Instances   Average   Minimum  Maximum                                                  Name
-------  ---------------  ---------  ---------  -------  -------  -------------------------------------------------------------------------------------------
2.8          8696371         61   142563.5    74783   171934  void (anonymous namespace)::interpolation_kernel<double>(

isazi · 2021-06-16T09:51:52Z

I have been testing changing the order of dimensions in both Tau kernels, no major changes. I assume it is because most data accesses are not directly dependent on thread ID, but on the content of some other array.

bartvstratum · 2021-06-16T10:14:35Z

It's again a bit faster if I remove the ncol loop from the kernel: 1e57819

Time(%)  Total Time (ns)  Instances   Average   Minimum  Maximum                                                  Name                                        -------  ---------------  ---------  ---------  -------  -------  -------------------------------------------------------------------------------------------
    1.6          5043399         61    82678.7    43136   100063  void (anonymous namespace)::interpolation_kernel<double>(int, int, int, int, int, in

This also results in larger optimal block sizes:

bartvstratum self-assigned this Jun 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tuning `interpolation_kernel()` #13

Tuning `interpolation_kernel()` #13

bartvstratum commented Jun 15, 2021

bartvstratum commented Jun 16, 2021

bartvstratum commented Jun 16, 2021 •

edited

Loading

isazi commented Jun 16, 2021

bartvstratum commented Jun 16, 2021 •

edited

Loading

Tuning interpolation_kernel() #13

Tuning interpolation_kernel() #13

Comments

bartvstratum commented Jun 15, 2021

bartvstratum commented Jun 16, 2021

bartvstratum commented Jun 16, 2021 • edited Loading

isazi commented Jun 16, 2021

bartvstratum commented Jun 16, 2021 • edited Loading

Tuning `interpolation_kernel()` #13

Tuning `interpolation_kernel()` #13

bartvstratum commented Jun 16, 2021 •

edited

Loading

bartvstratum commented Jun 16, 2021 •

edited

Loading