-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MYNN SFC OpenACC Acceleration #1891
Conversation
I'm rerunning RTs on Hera since the first set ran into memory issues apparently. |
Sounds good @grantfirl . Keep us posted. Once those Hera logs are ready I think we can start working this PR. |
@zach1221 OK, RTs on Hera finished successfully, and I attached the log in the description. |
ORTs passed. Jenkins-ci logs attached. |
Automated RT Failure Notification |
Jet system issue. System is very crowded. Skip jet! |
Sure. I should have the hera logs ready soon. |
Jet queue is not moving even today. So confirm to skip jet. @grantfirl we can start merging process. @zach1221 FYI |
I've followed up with Grant on ccpp-physics pr #97 . |
@zach1221 Thanks! |
PR Author Checklist:
I have linked PR's from all sub-components involved in section below.
I am confirming reviews are completed in ALL sub-component PR's.
I have run the full RT suite on either Hera/Cheyenne AND have attached the log to this PR below this line:
RegressionTests_hera.log
I have added the list of all failed regression tests to "Anticipated changes" section.
I have filled out all sections of the template.
Description
See ufs-community/ccpp-physics#97 for a complete description.
Overview from the developer:
With very minimal changes to the original code of the scheme, the MYNN surface scheme has been enhanced with OpenACC statements which introduce the capability for offloading computational execution to OpenACC-supported accelerator devices (e.g. Nvidia GPUs). Since the scheme operates by looping multiple times over independent vertical columns, the overall computational strategy maps well to GPU hardware where multiple iterations of each loop can be run in parallel with SIMD methods. Data movement has been optimized to ensure data transfers from host memory to device memory are limited as data movement is a significant source of latency when performing offloading to accelerator devices. Performance increases on a GPU ranged from a 3.3x slowdown to a 41.9x speedup versus CPU execution (See the Performance section for more information).
Linked Issues and Pull Requests
Associated UFSWM Issue to close
Subcomponent Pull Requests
NOAA-EMC/fv3atm#693
ufs-community/ccpp-physics#97
Blocking Dependencies
None
Subcomponents involved:
Anticipated Changes
Input data
Regression Tests:
Tests effected by changes in this PR:
Libraries
Code Managers Log
Testing Log: