Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add manufactured solution tendency terms to Omega #170

Conversation

hyungyukang
Copy link

This PR adds custom tendency terms required for the manufactured solution test case to Omega. This feature can be tested in conjunction with the Polaris package, which has included a convergence study leveraging this capability. The details of this test case can be found in Section 2.10 and Figures 13 and 19 in Bishnu et al. 2024 and the manufactured solution test in Polaris.

The error norms for this test case are expected to demonstrate second-order convergence. The test results will be posted in this PR following integration with Polaris for this test case.

This PR does not modify the unit tests but passed them on Frontier using both CPU and GPU.

Checklist

  • Testing
    • Unit tests have passed. Please provide a relevant CDash build entry for verification.
    • Polaris test suite has passed

Manufactured solution tendency terms required for a manufactured
solution test case from Bishnu et al. (2024) are added to Omega.
This feature can be tested in conjunction with the Polaris package,
which includes a convergence study leveraging this capability.
@hyungyukang
Copy link
Author

Although personal testing indicates second-order convergence for 'Default,' 'Del2,' and 'Del4,' this must be validated with Polaris.

image

The above test was conducted on Frontier GPU using crayclanggpu.

@hyungyukang hyungyukang added Omega Polaris Issues or pull requests related to Polaris support labels Nov 25, 2024
@mwarusz
Copy link
Member

mwarusz commented Nov 26, 2024

@hyungyukang
I have one small suggestion. Instead of clearing and re-creating default tendencies you could do this:

// these are empty by default
CustomTendencyType CustomThickTend;
CustomTendencyType CustomVelTend;

if (UseCustomTendency) {
   // get manufactured tendencies etc
   CustomThickTend = ManufacturedSol.ManufacturedThickTend;
   CustomVelTend = ManufacturedSol.ManufacturedVelTend;
}
// one unconditional call to create
Tendencies::DefaultTendencies = create("Default", DefHorzMesh, NVertLevels, NTracers, &TendConfig, CustomThickTend, CustomVelTend);

@hyungyukang
Copy link
Author

@hyungyukang I have one small suggestion. Instead of clearing and re-creating default tendencies you could do this:

@mwarusz , thank you very much for the suggestion. I agree with you. I just tried it and it's working perfectly! I'm going to commit the code soon.
May I include your name in the reviewer list if you are comfortable with it?

Copy link

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments/questions about error checking. These are not just for @hyungyukang but for Omega developers more broadly. Maybe @philipwjones has some thoughts?

Modify the tendency initialization process to avoid clearing and
recreating the default Tendencies with custom tendencies
@mwarusz
Copy link
Member

mwarusz commented Nov 26, 2024

@hyungyukang

May I include your name in the reviewer list if you are comfortable with it?

Sure.

@hyungyukang
Copy link
Author

hyungyukang commented Nov 26, 2024

@xylar and @cbegeman ,

To enable this feature, Polaris needs to make some changes from config/Default.yml.
Here is the difference between omega.yml I used for the default manufactured solution test and config/Default.yml.

Diff. between `omega.yml` and `config/Default.yml`
@@ -1,13 +1,13 @@
 Omega:
   TimeIntegration:
     CalendarType: No Leap
-    TimeStepper: Forward-Backward
+    TimeStepper: RungeKutta4
     TimeStep: 0000_00:10:00
     StartTime: 0001-01-01_00:00:00
-    StopTime: 0001-01-01_02:00:00
+    StopTime: 0001-01-01_10:00:00
     RunDuration: none
   Dimension:
-    NVertLevels: 60
+    NVertLevels: 1
   Decomp:
     HaloWidth: 3
     DecompMethod: MetisKWay
@@ -21,17 +21,17 @@ Omega:
     PVTendencyEnable: true
     KETendencyEnable: true
     SSHTendencyEnable: true
-    VelDiffTendencyEnable: true
+    VelDiffTendencyEnable: false
     ViscDel2: 1.0e3
-    VelHyperDiffTendencyEnable: true
+    VelHyperDiffTendencyEnable: false
     ViscDel4: 1.2e11
     TracerHorzAdvTendencyEnable: true
     TracerDiffTendencyEnable: true
     EddyDiff2: 10.0
     TracerHyperDiffTendencyEnable: true
     EddyDiff4: 0.0
-    UseCustomTendency: false
-    ManufacturedSolutionTendency: false
+    UseCustomTendency: true
+    ManufacturedSolutionTendency: true
   Tracers:
     Base: [Temperature, Salinity]
     Debug: [Debug1, Debug2, Debug3]
@@ -49,7 +49,6 @@ Omega:
       UseStartEnd: false
       Contents:
         - State
-        - Base
     # Restarts are used to initialize for all job submissions after the very
     # first startup job. We use UseStartEnd with a start time just after the
     # simulation start time so that omega does not attempt to use a restart
@@ -84,8 +83,8 @@ Omega:
       Mode: write
       IfExists: replace
       Precision: double
-      Freq: 1
-      FreqUnits: months
+      Freq: 10
+      FreqUnits: hours
       UseStartEnd: false
       Contents:
         - Tracers

@hyungyukang hyungyukang requested a review from mwarusz November 26, 2024 15:40
@philipwjones
Copy link

@xylar and all. As I mentioned in the Error design, most of these errors should indeed be critical and abort, but we don't have a way of doing that easily until I get the new Error capability implemented (LOG_CRITICAL currently does not abort automatically). As part of that implementation, I'll be doing a pass through the code and replacing many of these error calls and error returns with critical errors and aborts when needed. But if we want to at least upgrade these to critical error messages, that will be a good hint about where we want this to happen.

@philipwjones
Copy link

And while I'm commenting on this PR...do we need to start thinking about multiple YAML configs now that we're starting to get conflicts. In particular, the streams unit test with this change will be writing output every 10 hours for a full year so probably not what we want.

@xylar
Copy link

xylar commented Nov 26, 2024

@philipwjones, thanks for the update. I didn't realize that we weren't there yet with LOG_CRITICAL, sorry about that.

@xylar
Copy link

xylar commented Nov 26, 2024

In particular, the streams unit test with this change will be writing output every 10 hours for a full year so probably not what we want.

I don't think @hyungyukang is proposing any changes to the YAML file (other than adding a few config options). These are the changes that Polaris will be making to the YAML file it uses (made manually for now).

@philipwjones
Copy link

Ah yes, I misread the comment on the Default.yml changes.

@xylar
Copy link

xylar commented Dec 4, 2024

We need a Polaris PR to make the changes to the manufactured solution tests such that the correct yaml config options are set. This shouldn't be hard but there are a lot of moving pieces in Polaris and Omega right now so I haven't got to it yet.

Copy link
Member

@mwarusz mwarusz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few C++ coding suggestions and one general question about constants.

Comment on lines +94 to +95
R8 Grav = 9.80665_Real; // Gravity acceleration
R8 Pii = 3.141592653589793_Real; // Pi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the scope of this PR, but should we start thinking about creating a shared constants file ? @mark-petersen @xylar Thoughts ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Common constants need to come from a unified file, and preferably the centralized one that @xylar pointed to above. We struggled with multiple constants files in MPAS within E3SM.

Copy link

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hyungyukang, this is working great for me in Polaris. On Chrysalis, I did a test merge of this branch with develop. Then, ran the CTest utility in Polaris, which is also a convenient way to automatically build the Omega code. Then, I ran the manufactured solution test using this branch:
E3SM-Project/polaris#249

The results look great!

convergence_ssh

comparison

For everyone's reference, a typical omega.yml that Polaris produces for this is:

Omega:
  TimeIntegration:
    CalendarType: No Leap
    TimeStepper: RungeKutta4
    TimeStep: 0000_00:10:00.000
    StartTime: 0001-01-01_00:00:00
    StopTime: none
    RunDuration: 0000_10:00:00.000
  Dimension:
    NVertLevels: 1
  Decomp:
    HaloWidth: 3
    DecompMethod: MetisKWay
  State:
    NTimeLevels: 2
  Advection:
    FluxThicknessType: Center
    FluxTracerType: Center
  Tendencies:
    ThicknessFluxTendencyEnable: true
    PVTendencyEnable: true
    KETendencyEnable: true
    SSHTendencyEnable: true
    VelDiffTendencyEnable: false
    ViscDel2: 1.0e3
    VelHyperDiffTendencyEnable: false
    ViscDel4: 1.2e11
    TracerHorzAdvTendencyEnable: true
    TracerDiffTendencyEnable: true
    EddyDiff2: 10.0
    TracerHyperDiffTendencyEnable: true
    EddyDiff4: 0.0
    UseCustomTendency: true
    ManufacturedSolutionTendency: true
  Tracers:
    Base: [Temperature, Salinity]
    Debug: [Debug1, Debug2, Debug3]
  ManufacturedSolution:
    WavelengthX: 5.0e6
    WavelengthY: 4.33013e6
    Amplitude: 1.0
  IOStreams:
    # InitialState should only be used when starting from scratch
    # After the simulations initial start, the frequency should be
    # changed to never so that the initial state file is not read.
    InitialState:
      UsePointerFile: false
      Filename: init.nc
      Mode: read
      Precision: double
      Freq: 1
      FreqUnits: OnStartup
      UseStartEnd: false
      Contents:
      - NormalVelocity
      - LayerThickness
    RestartRead:
      UsePointerFile: true
      PointerFilename: ocn.pointer
      Mode: read
      Precision: double
      Freq: 1
      FreqUnits: OnStartup
      UseStartEnd: true
      StartTime: 0001-01-01_00:00:01
      EndTime: 99999-12-31_00:00:00
      Contents:
      - Restart
    History:
      UsePointerFile: false
      Filename: output.nc
      Mode: write
      IfExists: replace
      Precision: double
      Freq: 36000
      FreqUnits: Seconds
      UseStartEnd: false
      Contents:
      - NormalVelocity
      - LayerThickness
      - SshCellDefault

@hyungyukang
Copy link
Author

@xylar , this is great! Thank you so much for testing this PR with Polaris! Results look beautiful. Were you able to run Del2/Del4 test as well?

@xylar
Copy link

xylar commented Dec 4, 2024

@hyungyukang, I think some work will be needed by @cbegeman to rebase E3SM-Project/polaris#234 before we can run del2/del4 in Polaris. She's busy with AGU right now and I haven't had time to familiarize myself with that branch yet. So I didn't try that so far. Maybe we should treat that as a separate review?

@hyungyukang
Copy link
Author

@xylar , Aha, got it. Probably we can wait for it? I'm currently busy with AGU as well and will need some time to address the code suggestions from @mwarusz . While I don't want to hold this PR for longer time, it will not affect subsequent Omega developments. So it would be good to merge it once all tests (default, del2, del4) have been confirmed to work properly.

@xylar
Copy link

xylar commented Dec 5, 2024

So it would be good to merge it once all tests (default, del2, del4) have been confirmed to work properly.

We have a bit of a logjam at the moment. Manufactured solution default is the only test case that supports Omega at the moment so it would be really nice to have it working with both Omega develop and Polaris main. This would require merging this PR, updating the Omega submodule in E3SM-Project/polaris#231 and merging it, and then merging E3SM-Project/polaris#249.

If E3SM-Project/polaris#234 can be rebased and updated to work with the convergence in space and time only that we recently added, we should be able to test with it. But I don't feel comfortable taking over that work so that will need to wait for @cbegeman to have some time. My preference would be to merge this branch assuming that the del2 and del4 terms are okay based on your testing, and to follow up with another PR if we later discover that bug fixes are needed.

In other words, while leaving this PR unmerged is not holding up Omega development, it is holding up Polaris development.

@mark-petersen
Copy link

mark-petersen commented Dec 5, 2024

I am testing this PR on Perlmutter. CPU compiles. GPU with nvidiagpu has the error within ManufacturedThicknessTendency, so it is caused by this PR:

/global/homes/m/mpeterse/repos/E3SM/opr/components/omega/src/ocn/CustomTendencyTerms.h(65): error: An extended __host__ __device__ lambda cannot be defined inside a class ("ManufacturedThicknessTendency") with private or protected access within another class
               [=] __attribute__((host)) __attribute__((device))(int ICell, int KLevel) {
                                                                ^

/global/homes/m/mpeterse/repos/E3SM/opr/components/omega/src/ocn/CustomTendencyTerms.h(133): error: An extended __host__ __device__ lambda cannot be defined inside a class ("ManufacturedVelocityTendency") with private or protected access within another class
               [=] __attribute__((host)) __attribute__((device))(int IEdge, int KLevel) {
                                                                ^
This is my compile sequence:

######### perlmutter GPU
salloc --nodes 4 --qos interactive --time 01:00:00 --constraint gpu --tasks-per-node=2 --gpus-per-task 1 --account=m4572_g # or e3sm_g
CODE_DIR=opr
RUNDIR=251205_omega_gpu
mkdir ${PSCRATCH}/runs/$RUNDIR
cd !$

rm -rf build
mkdir build
cd build
module load cmake

export PARMETIS_ROOT=/global/cfs/cdirs/e3sm/software/polaris/pm-gpu/spack/dev_polaris_0_4_0_nvidiagpu_mpich/var/spack/environments/dev_polaris_0_4_0_nvidiagpu_mpich/.spack-env/view
cmake \
   -DOMEGA_CIME_COMPILER=nvidiagpu \ 
   -DOMEGA_BUILD_TYPE=Release \
   -DOMEGA_CIME_MACHINE=pm-gpu \
   -DOMEGA_PARMETIS_ROOT=${PARMETIS_ROOT}\
   -DOMEGA_BUILD_TEST=ON \
   -Wno-dev \
   -S /global/homes/m/mpeterse/repos/E3SM/${CODE_DIR}/components/omega -B .
./omega_build.sh

# linking:
cd test
ln -isf /global/homes/m/mpeterse/meshes/omega/O*nc .
cp /global/homes/m/mpeterse/repos/E3SM/${CODE_DIR}/components/omega/configs/Default.yml omega.yml

cd ..
./omega_ctest.sh

@mark-petersen
Copy link

Also with -DOMEGA_CIME_COMPILER=gnugpu

/global/homes/m/mpeterse/repos/E3SM/opr/components/omega/src/ocn/CustomTendencyTerms.h(65): error: An extended __host__ __device__ lambda cannot be defined inside a class ("ManufacturedThicknessTendency") with private or protected access within another class
                           (int ICell, int KLevel) {
                           ^

/global/homes/m/mpeterse/repos/E3SM/opr/components/omega/src/ocn/CustomTendencyTerms.h(133): error: An extended __host__ __device__ lambda cannot be defined inside a class ("ManufacturedVelocityTendency") with private or protected access within another class
                           (int IEdge, int KLevel) {
                           ^

/global/homes/m/mpeterse/repos/E3SM/opr/components/omega/src/ocn/CustomTendencyTerms.h(65): error: An extended __host__ __device__ lambda cannot be defined inside a class ("ManufacturedThicknessTendency") with private or protected access within another class
                           (int ICell, int KLevel) {
                           ^

/global/homes/m/mpeterse/repos/E3SM/opr/components/omega/src/ocn/CustomTendencyTerms.h(133): error: An extended __host__ __device__ lambda cannot be defined inside a class ("ManufacturedVelocityTendency") with private or protected access within another class
                           (int IEdge, int KLevel) {
                           ^

@mark-petersen
Copy link

Passes perlmutter cpu tests using -DOMEGA_CIME_COMPILER=nvidia, except the REDUCTIONS_TEST which is a known problem on the head of develop.

@hyungyukang
Copy link
Author

hyungyukang commented Dec 5, 2024

In other words, while leaving this PR unmerged is not holding up Omega development, it is holding up Polaris development.

@xylar , all make sense to me. Thanks a lot! Let's merge this branch first, after which we can proceed with working on the del2/del4 source terms. If any issues arise with the code subsequently, I can address them by opening an additional pull request for the bug fix. I will address suggestions and issues soon.

@hyungyukang
Copy link
Author

I am testing this PR on Perlmutter. CPU compiles. GPU with nvidiagpu has the error within ManufacturedThicknessTendency, so it is caused by this PR:

@mark-petersen , thanks for testing. I actually haven't ran it on pm-gpu because sometimes compilation on it is very slow... I will address the issue soon.

- Address code suggestions
- Resolve an issue with pm-gpu
@hyungyukang
Copy link
Author

hyungyukang commented Dec 7, 2024

I am testing this PR on Perlmutter. CPU compiles. GPU with nvidiagpu has the error within ManufacturedThicknessTendency, so it is caused by this PR:

@mark-petersen , I have resolved the issue with pm-gpu. Could you please try again?

Copy link

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this compiles on pm gpu with nvidia now. Approving based on the reported tests above. Thanks!

@mark-petersen
Copy link

Checked with @cbegeman, who is busy at AGU this week.

@mark-petersen mark-petersen removed the request for review from cbegeman December 13, 2024 03:12
@mark-petersen mark-petersen merged commit 9054c43 into E3SM-Project:develop Dec 13, 2024
2 checks passed
@hyungyukang hyungyukang deleted the hkang/omega/add-manufactured-tendencies branch December 30, 2024 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Omega Polaris Issues or pull requests related to Polaris support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants