Performance on JUWELS-Booster #6

goord · 2021-02-17T21:40:11Z

I have completed my tests on juwels-booster, which has 2 x 24C AMD-EPYC Rome cpu's and 4 x NVIDIA A100 per node. I see virtually no speedup yet between CPU/GPU versions

case	config	longwave time [ms]	shortwave time [ms]
allsky	gcc	251	257
allsky	gcc+cuda	255	258
rfmip	gcc	316	255
rfmip	gcc+cuda	323	260

For the CUDA runs I used the cuda branch.

Chiil · 2021-02-17T21:58:58Z

Can you also test the rcemip testcase and increase the work load to by setting n_col to 64**2? The allsky and rfmip cases are relatively cheap and are mainly used for correctness checking.

goord · 2021-02-18T09:13:22Z

Yes for some reason I couldn't get the input file for the rcemip case... will try once more.

Chiil · 2021-02-18T09:23:56Z

Send me a message on Slack if I need to clarify something, maybe the make_links.sh misses a step.

isazi · 2021-02-18T09:55:19Z

I managed to run all three tests on the DAS.

goord · 2021-02-18T12:11:28Z

Ah I first needed to run test_rcemip_input.py. Ok, here is the updated table:

case	config	longwave time [ms]	shortwave time [ms]
allsky	gcc	251	255
allsky	gcc+cuda	105	94
rfmip	gcc	316	255
rfmip	gcc+cuda	98	78
rcemip	gcc	38418	37622
rcemip	gcc+cuda	3708	3337

This is running without the 'cloud optics' flag btw.

isazi · 2021-02-18T12:24:38Z

Would it make sense if I also provide the same table for DAS-5? We have a node with an A100 (but older and slower Xeon CPUs).

goord · 2021-02-18T12:27:45Z

You can tell me if those numbers are in line with mine Alessio.

goord · 2021-02-18T13:54:21Z

Fixed my latest table which was transposed

Chiil · 2021-02-18T14:02:59Z

I did a quick benchmark on an AWS P3 V100 instance:

case	config	longwave (ms)	shortwave (ms)
rcemip	gcc+cuda	3117	2331
rcemip	gcc	14365	11362

goord · 2021-02-18T16:23:50Z

After using fortran compiler flags (!) I get following figures:

case	config	longwave (ms)	shortwave (ms)
rcemip	gcc+cuda	3688	3319
rcemip	gcc	8084	6982

So A100's (at least the ones in juwels-booster) appear somewhat slower than the V100 card. This may indicate the code needs some re-tuning...

Chiil · 2021-02-18T16:48:20Z

We have not done much tuning yet. We have rushed a little in getting a reference implementation ready that gives identical results with the CPU version, but there is probably still a lot to gain in kernel tuning.

goord · 2021-02-18T20:36:04Z

Some ad-hoc tuning of the kernel block sizes reduces the timings to 3051 ms (lw) and 2669 ms (sw) so yes, there is definitely headroom for speedup by tuning on A100s.

goord · 2021-02-18T22:18:05Z

Further manual tuning established

case	config	longwave (ms)	shortwave (ms)
rcemip	gcc+cuda	1439	1174

goord added the documentation Improvements or additions to documentation label Feb 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance on JUWELS-Booster #6

Performance on JUWELS-Booster #6

goord commented Feb 17, 2021

Chiil commented Feb 17, 2021 •

edited

Loading

goord commented Feb 18, 2021

Chiil commented Feb 18, 2021

isazi commented Feb 18, 2021

goord commented Feb 18, 2021 •

edited

Loading

isazi commented Feb 18, 2021

goord commented Feb 18, 2021

goord commented Feb 18, 2021

Chiil commented Feb 18, 2021

goord commented Feb 18, 2021

Chiil commented Feb 18, 2021 •

edited

Loading

goord commented Feb 18, 2021

goord commented Feb 18, 2021

Performance on JUWELS-Booster #6

Performance on JUWELS-Booster #6

Comments

goord commented Feb 17, 2021

Chiil commented Feb 17, 2021 • edited Loading

goord commented Feb 18, 2021

Chiil commented Feb 18, 2021

isazi commented Feb 18, 2021

goord commented Feb 18, 2021 • edited Loading

isazi commented Feb 18, 2021

goord commented Feb 18, 2021

goord commented Feb 18, 2021

Chiil commented Feb 18, 2021

goord commented Feb 18, 2021

Chiil commented Feb 18, 2021 • edited Loading

goord commented Feb 18, 2021

goord commented Feb 18, 2021

Chiil commented Feb 17, 2021 •

edited

Loading

goord commented Feb 18, 2021 •

edited

Loading

Chiil commented Feb 18, 2021 •

edited

Loading