Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ww3 wise merge develop #888

Merged
merged 178 commits into from
Mar 22, 2023

Conversation

aronroland
Copy link
Collaborator

@aronroland aronroland commented Feb 12, 2023

Pull Request Summary

Efficiency and accuracy improvement in unstructured WW3. Within the down-scaling efforts with SHOM & USACE, we have developed a series of enhancements and bug fixes for the unstructured domain decomposition part of WW3 (PDLIB). The improvements are following the quest to improve the down-scaling abilities of WW3. We have improved and further debugged the unstructured part for coupled applications and nearshore modelling. Beside that we enhanced the coupling abilities withing the domain decomposition framework we worked on efficiency, robustness and accuracy for nearshore applications. Moreover, we extend the implicit framework to coastal reflection as well as seamless wave setup computation using an elliptical solver. More enhancements are described within the resolved issues listed below.

Description

  1. Consolidation and performance improvements of wave setup computation on structured and unstructured grids.
  2. Performance and memory improvement of the unstructured framework.
  3. Introduction of a new limiter (mixture of Hersbach & Janssen + Komen etal.) as an option to the existing one in order to reduce the time step dependency of the implicit solver (Limiter implementation to reduce time step dependency for the implicit scheme #704). Introduction of a new namelist variable JGS_LIMITER_FUNC. Beeing 1 for the old limiter and 2 for the new one. For details wait for the manual update.
  4. Alternative computations of the group velocity for the unstructured part for preventing running out of the table, reduce memory access, and put more workload on the computations (parametric group velocity calculation for implicit scheme/domain decomposition #576).
  5. Introduce under-relaxation for triads and wave breaking and integrate those terms without the limiter.
  6. Improve shallow-water source term integration for explicit source terms within the unstructured downscaling approach.
  7. Bug fixes for wetting & drying for both implicit and explicit schemes.
  8. Bug fix for the wave triad interaction (explicit part off by factor 2) and introduction of the diagonal source term part for more robust integration within the implicit and explicit unstructured solvers. Moreover, unnecessary allocation and computation were removed, code was rewritten. Unification of the diagonal part for explicit and implicit solvers (under-relaxation for Triads and breaking  #703). Parameters still hard-coded.
  9. Introduce coastline reflection for the implicit scheme, and homogenization within the new wetting & drying scheme.
  10. Introduction of Block Explicit Solver as an option for unstructured meshes for global applications. In this way, we reduce latency since the exchange is only called for each freq, which improves the performance on runs with a large core count. Moreover, this prepares explicit schemes for hybrid parallelization (Block Explicit implementation for unstructured meshes  #887).
  11. Consolidation of the implicit solver in terms of memory usage, coherence and debug output and further work on the Jacobi solver towards CPU coherency of the Jacobi solver.
  12. Fix bugs and clean & test 2nd order time-space LAX-FCT-CRD scheme.
  13. Add namelist documentation for the new schemes.

The works was done in cooperation with Heloise Michaud, SHOM, Jane Mckee Smith, USACE, Tyler Hesser, USACE, Mary Brayant, USACE, @aliabdolali, NOAA/NCEP , Mathieu Dutour-Sikiric, IRB, and Aron Roland, Roland & Partner, IT&E GmbH)

Issue(s) addressed

#704, #576, #703, #887, #905, #903, #902, #901, #900, #899, #912, #849

Commit Message

Work related to the down-scaling capabilities of the unstructured grid approach.

  • Consolidation and performance improvements of wave setup computation on structured and unstructured grids.
    
  • Performance and memory improvement of the unstructured framework.
    
  • Introduction of a new limiter (mixture of Hersbach & Janssen + Komen etal.)
    
  • Alternative computations of the group velocity for unstructured grids
    
  • Introduce under-relaxation for triads and wave breaking
    
  • Improve shallow-water source term integration for explicit source terms within the unstructured down-scaling approach.
    
  • Bug fixes for wetting & drying for both implicit and explicit schemes.
    
  • Bug fix for the wave triad interaction.
    
  • Introduce coastline reflection for the implicit scheme, and homogenization within the new wetting & drying scheme.
    
  • Introduction of Block Explicit Solver as an option for unstructured meshes.
    
  • Consolidation of the implicit solver in terms of memory usage, coherence.
    
  • debug output and further work on the Jacobi solver towards CPU coherency of the Jacobi solver.
    
  • Fix bugs and clean & test 2nd order time-space LAX-FCT-CRD scheme.
    

Check list

Testing

  • How were these changes tested? AR: We run regression tests at NCEP
  • Are the changes covered by regression tests? (If not, why? Do new tests need to be added?). AR: Yes they are covered
  • Have the matrix regression tests been run (if yes, please note HPC and compiler)? AR: Yes at NCEP.
  • Please indicate the expected changes in the regression test output, (Note the list of known non-identical tests.). AR: Changes for mostly all unstructured grid cases and cases where triad interactions are utilized.
  • Please provide the summary output of matrix.comp (matrix.Diff.txt, matrixCompFull.txt and matrixCompSummary.txt): We did a comprehensive comparison after running the entire retest: The non Unstructured cases have an expected change in standard ww3_grid.out

!< ,    EXPFSN =    T,EXPFSPSI =    F,    EXPFSFCT =    F,IMPFSN =    F,EXPTOTAL=    F,    IMPTOTAL=    F,IMPREFRACTION=    F,    IMPFREQSHIFT=    F, IMPSOURCE=    F,    SETUP_APPLY_WLV=    T,    JGS_TERMINATE_MAXITER=    T,    JGS_TERMINATE_DIFFERENCE=    T,    JGS_TERMINATE_NORM=    F,    JGS_LIMITER=    F,    JGS_LIMITER_FUNC=    1,    JGS_USE_JACOBI=    T,    JGS_BLOCK_GAUSS_SEIDEL=    T,    JGS_MAXITER=    100,    JGS_PMIN=      1.000,    JGS_DIFF_THR=      0.000,    JGS_NORM_THR=      0.000,    JGS_NLEVEL=    0,    JGS_SOURCE_NONLINEAR=    F
!---
!> ,    EXPFSN =    T,EXPFSPSI =    F,    EXPFSFCT =    F,IMPFSN =    F,EXPTOTAL=    F,    IMPTOTAL=    F,IMPREFRACTION=    F,    IMPFREQSHIFT=    F, IMPSOURCE=    F,    SETUP_APPLY_WLV=    F,    JGS_TERMINATE_MAXITER=    T,    JGS_TERMINATE_DIFFERENCE=    T,    JGS_TERMINATE_NORM=    F,    JGS_LIMITER=    F,    JGS_USE_JACOBI=    T,    JGS_BLOCK_GAUSS_SEIDEL=    T,    JGS_MAXITER=    100,    JGS_PMIN=      1.000,    JGS_DIFF_THR=      0.000,    JGS_NORM_THR=      0.000,    JGS_NLEVEL=    0,    JGS_SOURCE_NONLINEAR=    F
!

Pre-known non-b4b cases:

mww3_test_03/./work_PR1_MPI_e                                         
mww3_test_03/./work_PR2_UNO_MPI_e                                         
mww3_test_03/./work_PR2_UNO_MPI_d2                                         
mww3_test_03/./work_PR1_MPI_d2                                         
mww3_test_03/./work_PR3_UNO_MPI_d2_c                                         
mww3_test_03/./work_PR3_UQ_MPI_d2_c                                       
mww3_test_03/./work_PR3_UNO_MPI_d2                                         
mww3_test_03/./work_PR2_UQ_MPI_d2                                         
mww3_test_03/./work_PR3_UQ_MPI_e                                         
mww3_test_03/./work_PR3_UQ_MPI_d2                                         
ww3_tp2.10/./work_MPI_OMPH                                     
ww3_tp2.16/./work_MPI_OMPH                                   
ww3_ufs1.3/./work_a                                       

Expected changes in the test for triad:

ww3_tp1.9/./work_PR3_UQ_MPI                                     
ww3_tp1.9/./work_PR3_UQ                             

The rest of cases are unstructured and we carefully compared results. There are changes in the outputs, due to changes in the code.
                                 
ww3_ts4/./work_ug_MPI   
ww3_tp2.17/./work_b                                     
ww3_tp2.17/./work_mb                                         
ww3_tp2.17/./work_mc                                       
ww3_tp2.17/./work_a                                         
ww3_tp2.17/./work_ma1                                         
ww3_tp2.17/./work_c                                     
ww3_tp2.17/./work_ma                                       
ww3_tp2.17/./work_mc1                                       
ww3_tp2.21/./work_b                                       
ww3_tp2.21/./work_mb                                         
ww3_tp2.21/./work_a                                     
ww3_tp2.21/./work_b_metis                                     
ww3_tp2.21/./work_ma                                     
ww3_tp2.6/./work_ST4                                         
ww3_tp2.6/./work_pdlib                                     
ww3_tp2.6/./work_ST0                                         
ww3_tp2.7/./work_ST0                                                                  

@JessicaMeixner-NOAA
Copy link
Collaborator

Thank you for updates related to the reviews @aronroland !!!!

@aronroland
Copy link
Collaborator Author

i am done with all open tasks ...

@aliabdolali
Copy link
Contributor

@JessicaMeixner-NOAA @MatthewMasarik-NOAA
Aron answered the reviews and we also updated the PR template, indicating the full list of identical and different cases + cleaned OASIS, etc. For regtests, we did a comprehensive analysis, it is ready to go.
FYI: @AvichalMehra-NOAA

@JessicaMeixner-NOAA
Copy link
Collaborator

Thank you @aronroland and @aliabdolali -- @MatthewMasarik-NOAA and I will start reviewing again now.

Copy link
Collaborator

@MatthewMasarik-NOAA MatthewMasarik-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aliabdolali and @aronroland you guys are correct on the print_memcheck's still being in the code. My mistake on those. There's no other flagged area's in the code so nothing else to look into as far as I'm concerned. I'll submit regtests for the branch.

@JessicaMeixner-NOAA
Copy link
Collaborator

All regression tests passed last night. The comparison with develop is now in the queue. The queue times are longer than normal so the turn around time will not be as fast as usual, but hopefully should have that later today.

@aronroland
Copy link
Collaborator Author

All regression tests passed last night. The comparison with develop is now in the queue. The queue times are longer than normal so the turn around time will not be as fast as usual, but hopefully should have that later today.

nice!

@JessicaMeixner-NOAA
Copy link
Collaborator

FYI two PRs are about to be merged. There will be a minor update to merge the code to this branch. As promised to @thesser1 I will subsequently make a PR to this branch with that fix. It's still anticipated that this PR can be merged at the end of today or at the latest tomorrow, despite the slow queues.

@JessicaMeixner-NOAA
Copy link
Collaborator

As promised PR with merge of develop and conflict is here: erdc#11

@JessicaMeixner-NOAA
Copy link
Collaborator

Just a status update as it's near the end of the day. The compare script submitted this morning is still in the queue. I'm working on getting tests through multiple machines to see which one goes through faster to attempt to accelerate this, but we will not be able to merge this today. My hope is that jobs will go through overnight and we'll be in a position to merge this tomorrow. Apologies for the delay, but the queue times will not get better until April.

@aliabdolali
Copy link
Contributor

@JessicaMeixner-NOAA thanks for the update

Copy link
Collaborator

@JessicaMeixner-NOAA JessicaMeixner-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all of this work. The regression test logs for hera are here:

matrixCompFull.txt
matrixCompSummary.txt
The diff is not included due to size.

And a summary from orion here:
orion.matrixCompSummary.txt

The more concise list of diffs are in the PR description skipping the ww3_grid out files.

@aliabdolali
Copy link
Contributor

Yay, it is a big progress, counting down to see it being merged👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants