Skip to content

Commit

Permalink
Use MPI_Bcast instead of multiple p2p messages to update nest from pa…
Browse files Browse the repository at this point in the history
…rent (#2059)

use single MPI_Bcast (via mpp_broadcast()) instead of multiple point-to-point messages (via mpp_send/mpp_recv)
address issue maxing out the SlingShot-10 link on the first node resulting in a .15s hit every fifth time step
  • Loading branch information
dkokron authored Jan 22, 2024
1 parent b471de6 commit adfcede
Show file tree
Hide file tree
Showing 10 changed files with 3,895 additions and 3,841 deletions.
2 changes: 1 addition & 1 deletion FV3
96 changes: 75 additions & 21 deletions tests/logs/OpnReqTests_control_p8_hera.log
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Thu Jan 18 20:56:52 UTC 2024
Sun Jan 21 16:22:33 UTC 2024
Start Operation Requirement Test


baseline dir = /scratch1/NCEPDEV/stmp4/Zachary.Shrader/FV3_OPNREQ_TEST/OPNREQ_TEST/control_p8_bit_base_gnu
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_289728/control_p8_gnu_bit_base
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_294052/control_p8_gnu_bit_base
Checking test bit_base control_p8_gnu results ....
Moving baseline bit_base control_p8_gnu files ....
Moving sfcf000.nc .........OK
Expand Down Expand Up @@ -51,14 +51,14 @@ Moving baseline bit_base control_p8_gnu files ....
Moving RESTART/20210323.060000.sfc_data.tile5.nc .........OK
Moving RESTART/20210323.060000.sfc_data.tile6.nc .........OK

0: The total amount of wall time = 281.497456
0: The maximum resident set size (KB) = 1302428
0: The total amount of wall time = 278.810775
0: The maximum resident set size (KB) = 1306868

Test bit_base control_p8_gnu PASS


baseline dir = /scratch1/NCEPDEV/stmp4/Zachary.Shrader/FV3_OPNREQ_TEST/OPNREQ_TEST/control_p8_dbg_base_gnu
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_289728/control_p8_gnu_dbg_base
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_294052/control_p8_gnu_dbg_base
Checking test dbg_base control_p8_gnu results ....
Moving baseline dbg_base control_p8_gnu files ....
Moving sfcf000.nc .........OK
Expand Down Expand Up @@ -106,14 +106,14 @@ Moving baseline dbg_base control_p8_gnu files ....
Moving RESTART/20210323.060000.sfc_data.tile5.nc .........OK
Moving RESTART/20210323.060000.sfc_data.tile6.nc .........OK

0: The total amount of wall time = 916.105846
0: The maximum resident set size (KB) = 1288088
0: The total amount of wall time = 923.177982
0: The maximum resident set size (KB) = 1288004

Test dbg_base control_p8_gnu PASS


baseline dir = /scratch1/NCEPDEV/stmp4/Zachary.Shrader/FV3_OPNREQ_TEST/OPNREQ_TEST/control_p8_std_base_gnu
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_289728/control_p8_gnu_dcp
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_294052/control_p8_gnu_dcp
Checking test dcp control_p8_gnu results ....
Comparing sfcf000.nc .........OK
Comparing sfcf021.nc .........OK
Expand Down Expand Up @@ -160,14 +160,14 @@ Checking test dcp control_p8_gnu results ....
Comparing RESTART/20210323.060000.sfc_data.tile5.nc .........OK
Comparing RESTART/20210323.060000.sfc_data.tile6.nc .........OK

0: The total amount of wall time = 249.392791
0: The maximum resident set size (KB) = 1285764
0: The total amount of wall time = 247.584509
0: The maximum resident set size (KB) = 1276892

Test dcp control_p8_gnu PASS


baseline dir = /scratch1/NCEPDEV/stmp4/Zachary.Shrader/FV3_OPNREQ_TEST/OPNREQ_TEST/control_p8_std_base_gnu
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_289728/control_p8_gnu_mpi
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_294052/control_p8_gnu_mpi
Checking test mpi control_p8_gnu results ....
Comparing sfcf000.nc .........OK
Comparing sfcf021.nc .........OK
Expand Down Expand Up @@ -214,14 +214,14 @@ Checking test mpi control_p8_gnu results ....
Comparing RESTART/20210323.060000.sfc_data.tile5.nc .........OK
Comparing RESTART/20210323.060000.sfc_data.tile6.nc .........OK

0: The total amount of wall time = 251.492784
0: The maximum resident set size (KB) = 1280936
0: The total amount of wall time = 251.539685
0: The maximum resident set size (KB) = 1278948

Test mpi control_p8_gnu PASS


baseline dir = /scratch1/NCEPDEV/stmp4/Zachary.Shrader/FV3_OPNREQ_TEST/OPNREQ_TEST/control_p8_std_base_gnu
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_289728/control_p8_gnu_rst
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_294052/control_p8_gnu_rst
Checking test rst control_p8_gnu results ....
Comparing sfcf000.nc .........OK
Comparing sfcf021.nc .........OK
Expand Down Expand Up @@ -268,14 +268,14 @@ Checking test rst control_p8_gnu results ....
Comparing RESTART/20210323.060000.sfc_data.tile5.nc .........OK
Comparing RESTART/20210323.060000.sfc_data.tile6.nc .........OK

0: The total amount of wall time = 250.611367
0: The maximum resident set size (KB) = 1281936
0: The total amount of wall time = 248.860720
0: The maximum resident set size (KB) = 1278980

Test rst control_p8_gnu PASS


baseline dir = /scratch1/NCEPDEV/stmp4/Zachary.Shrader/FV3_OPNREQ_TEST/OPNREQ_TEST/control_p8_std_base_gnu
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_289728/control_p8_gnu_std_base
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_294052/control_p8_gnu_std_base
Checking test std_base control_p8_gnu results ....
Moving baseline std_base control_p8_gnu files ....
Moving sfcf000.nc .........OK
Expand Down Expand Up @@ -323,11 +323,65 @@ Moving baseline std_base control_p8_gnu files ....
Moving RESTART/20210323.060000.sfc_data.tile5.nc .........OK
Moving RESTART/20210323.060000.sfc_data.tile6.nc .........OK

0: The total amount of wall time = 254.679817
0: The maximum resident set size (KB) = 1277036
0: The total amount of wall time = 254.890624
0: The maximum resident set size (KB) = 1282588

Test std_base control_p8_gnu PASS


baseline dir = /scratch1/NCEPDEV/stmp4/Zachary.Shrader/FV3_OPNREQ_TEST/OPNREQ_TEST/control_p8_std_base_gnu
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_294052/control_p8_gnu_thr
Checking test thr control_p8_gnu results ....
Comparing sfcf000.nc .........OK
Comparing sfcf021.nc .........OK
Comparing sfcf024.nc .........OK
Comparing atmf000.nc .........OK
Comparing atmf021.nc .........OK
Comparing atmf024.nc .........OK
Comparing GFSFLX.GrbF00 .........OK
Comparing GFSFLX.GrbF21 .........OK
Comparing GFSFLX.GrbF24 .........OK
Comparing GFSPRS.GrbF00 .........OK
Comparing GFSPRS.GrbF21 .........OK
Comparing GFSPRS.GrbF24 .........OK
Comparing RESTART/20210323.060000.coupler.res .........OK
Comparing RESTART/20210323.060000.fv_core.res.nc .........OK
Comparing RESTART/20210323.060000.fv_core.res.tile1.nc .........OK
Comparing RESTART/20210323.060000.fv_core.res.tile2.nc .........OK
Comparing RESTART/20210323.060000.fv_core.res.tile3.nc .........OK
Comparing RESTART/20210323.060000.fv_core.res.tile4.nc .........OK
Comparing RESTART/20210323.060000.fv_core.res.tile5.nc .........OK
Comparing RESTART/20210323.060000.fv_core.res.tile6.nc .........OK
Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile1.nc .........OK
Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile2.nc .........OK
Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile3.nc .........OK
Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile4.nc .........OK
Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile5.nc .........OK
Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile6.nc .........OK
Comparing RESTART/20210323.060000.fv_tracer.res.tile1.nc .........OK
Comparing RESTART/20210323.060000.fv_tracer.res.tile2.nc .........OK
Comparing RESTART/20210323.060000.fv_tracer.res.tile3.nc .........OK
Comparing RESTART/20210323.060000.fv_tracer.res.tile4.nc .........OK
Comparing RESTART/20210323.060000.fv_tracer.res.tile5.nc .........OK
Comparing RESTART/20210323.060000.fv_tracer.res.tile6.nc .........OK
Comparing RESTART/20210323.060000.phy_data.tile1.nc .........OK
Comparing RESTART/20210323.060000.phy_data.tile2.nc .........OK
Comparing RESTART/20210323.060000.phy_data.tile3.nc .........OK
Comparing RESTART/20210323.060000.phy_data.tile4.nc .........OK
Comparing RESTART/20210323.060000.phy_data.tile5.nc .........OK
Comparing RESTART/20210323.060000.phy_data.tile6.nc .........OK
Comparing RESTART/20210323.060000.sfc_data.tile1.nc .........OK
Comparing RESTART/20210323.060000.sfc_data.tile2.nc .........OK
Comparing RESTART/20210323.060000.sfc_data.tile3.nc .........OK
Comparing RESTART/20210323.060000.sfc_data.tile4.nc .........OK
Comparing RESTART/20210323.060000.sfc_data.tile5.nc .........OK
Comparing RESTART/20210323.060000.sfc_data.tile6.nc .........OK

0: The total amount of wall time = 251.852498
0: The maximum resident set size (KB) = 1279220

Test thr control_p8_gnu PASS

OPERATION REQUIREMENT TEST WAS SUCCESSFUL
Thu Jan 18 22:06:08 UTC 2024
Elapsed time: 01h:09m:16s. Have a nice day!
Sun Jan 21 17:30:30 UTC 2024
Elapsed time: 01h:07m:57s. Have a nice day!
24 changes: 12 additions & 12 deletions tests/logs/OpnReqTests_cpld_control_nowave_noaero_p8_hera.log
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Thu Jan 18 22:33:35 UTC 2024
Mon Jan 22 14:14:01 UTC 2024
Start Operation Requirement Test


baseline dir = /scratch1/NCEPDEV/stmp4/Zachary.Shrader/FV3_OPNREQ_TEST/OPNREQ_TEST/cpld_control_c96_noaero_p8_dbg_base_gnu
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_201234/cpld_control_nowave_noaero_p8_gnu_dbg_base
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_239905/cpld_control_nowave_noaero_p8_gnu_dbg_base
Checking test dbg_base cpld_control_nowave_noaero_p8_gnu results ....
Moving baseline dbg_base cpld_control_nowave_noaero_p8_gnu files ....
Moving sfcf021.tile1.nc .........OK
Expand Down Expand Up @@ -66,14 +66,14 @@ Moving baseline dbg_base cpld_control_nowave_noaero_p8_gnu files ....
Moving RESTART/iced.2021-03-23-21600.nc .........OK
Moving RESTART/ufs.cpld.cpl.r.2021-03-23-21600.nc .........OK

0: The total amount of wall time = 1248.222948
0: The maximum resident set size (KB) = 1379648
0: The total amount of wall time = 1268.844986
0: The maximum resident set size (KB) = 1409776

Test dbg_base cpld_control_nowave_noaero_p8_gnu PASS


baseline dir = /scratch1/NCEPDEV/stmp4/Zachary.Shrader/FV3_OPNREQ_TEST/OPNREQ_TEST/cpld_control_c96_noaero_p8_std_base_gnu
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_201234/cpld_control_nowave_noaero_p8_gnu_rst
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_239905/cpld_control_nowave_noaero_p8_gnu_rst
Checking test rst cpld_control_nowave_noaero_p8_gnu results ....
Comparing sfcf021.tile1.nc .........OK
Comparing sfcf021.tile2.nc .........OK
Expand Down Expand Up @@ -135,14 +135,14 @@ Checking test rst cpld_control_nowave_noaero_p8_gnu results ....
Comparing RESTART/iced.2021-03-23-21600.nc .........OK
Comparing RESTART/ufs.cpld.cpl.r.2021-03-23-21600.nc .........OK

0: The total amount of wall time = 388.716059
0: The maximum resident set size (KB) = 1381872
0: The total amount of wall time = 379.623302
0: The maximum resident set size (KB) = 1398708

Test rst cpld_control_nowave_noaero_p8_gnu PASS


baseline dir = /scratch1/NCEPDEV/stmp4/Zachary.Shrader/FV3_OPNREQ_TEST/OPNREQ_TEST/cpld_control_c96_noaero_p8_std_base_gnu
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_201234/cpld_control_nowave_noaero_p8_gnu_std_base
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_239905/cpld_control_nowave_noaero_p8_gnu_std_base
Checking test std_base cpld_control_nowave_noaero_p8_gnu results ....
Moving baseline std_base cpld_control_nowave_noaero_p8_gnu files ....
Moving sfcf021.tile1.nc .........OK
Expand Down Expand Up @@ -205,11 +205,11 @@ Moving baseline std_base cpld_control_nowave_noaero_p8_gnu files ....
Moving RESTART/iced.2021-03-23-21600.nc .........OK
Moving RESTART/ufs.cpld.cpl.r.2021-03-23-21600.nc .........OK

0: The total amount of wall time = 389.788849
0: The maximum resident set size (KB) = 1386320
0: The total amount of wall time = 387.075066
0: The maximum resident set size (KB) = 1403652

Test std_base cpld_control_nowave_noaero_p8_gnu PASS

OPERATION REQUIREMENT TEST WAS SUCCESSFUL
Thu Jan 18 23:35:28 UTC 2024
Elapsed time: 01h:01m:54s. Have a nice day!
Mon Jan 22 15:12:57 UTC 2024
Elapsed time: 00h:58m:57s. Have a nice day!
24 changes: 12 additions & 12 deletions tests/logs/OpnReqTests_regional_control_hera.log
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Thu Jan 18 23:40:06 UTC 2024
Sun Jan 21 23:30:59 UTC 2024
Start Operation Requirement Test


baseline dir = /scratch1/NCEPDEV/stmp4/Zachary.Shrader/FV3_OPNREQ_TEST/OPNREQ_TEST/regional_control_std_base_gnu
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_153057/regional_control_gnu_dcp
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_219650/regional_control_gnu_dcp
Checking test dcp regional_control_gnu results ....
Comparing dynf000.nc .........OK
Comparing dynf006.nc .........OK
Expand All @@ -14,14 +14,14 @@ Checking test dcp regional_control_gnu results ....
Comparing NATLEV.GrbF00 .........OK
Comparing NATLEV.GrbF06 .........OK

0: The total amount of wall time = 522.233732
0: The maximum resident set size (KB) = 577632
0: The total amount of wall time = 518.322638
0: The maximum resident set size (KB) = 589984

Test dcp regional_control_gnu PASS


baseline dir = /scratch1/NCEPDEV/stmp4/Zachary.Shrader/FV3_OPNREQ_TEST/OPNREQ_TEST/regional_control_std_base_gnu
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_153057/regional_control_gnu_std_base
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_219650/regional_control_gnu_std_base
Checking test std_base regional_control_gnu results ....
Moving baseline std_base regional_control_gnu files ....
Moving dynf000.nc .........OK
Expand All @@ -33,14 +33,14 @@ Moving baseline std_base regional_control_gnu files ....
Moving NATLEV.GrbF00 .........OK
Moving NATLEV.GrbF06 .........OK

0: The total amount of wall time = 535.159778
0: The maximum resident set size (KB) = 574824
0: The total amount of wall time = 517.001768
0: The maximum resident set size (KB) = 587296

Test std_base regional_control_gnu PASS


baseline dir = /scratch1/NCEPDEV/stmp4/Zachary.Shrader/FV3_OPNREQ_TEST/OPNREQ_TEST/regional_control_std_base_gnu
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_153057/regional_control_gnu_thr
working dir = /scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_OPNREQ_TEST/opnReqTest_219650/regional_control_gnu_thr
Checking test thr regional_control_gnu results ....
Comparing dynf000.nc .........OK
Comparing dynf006.nc .........OK
Expand All @@ -51,11 +51,11 @@ Checking test thr regional_control_gnu results ....
Comparing NATLEV.GrbF00 .........OK
Comparing NATLEV.GrbF06 .........OK

0: The total amount of wall time = 517.618309
0: The maximum resident set size (KB) = 577192
0: The total amount of wall time = 524.960283
0: The maximum resident set size (KB) = 587216

Test thr regional_control_gnu PASS

OPERATION REQUIREMENT TEST WAS SUCCESSFUL
Fri Jan 19 00:22:28 UTC 2024
Elapsed time: 00h:42m:22s. Have a nice day!
Mon Jan 22 00:08:06 UTC 2024
Elapsed time: 00h:37m:08s. Have a nice day!
Loading

0 comments on commit adfcede

Please sign in to comment.