Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

noahmp with coupled model configurations produces segfault in debug mode #609

Closed
DeniseWorthen opened this issue May 29, 2021 · 103 comments
Closed
Labels
bug Something isn't working

Comments

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented May 29, 2021

Description

Compiling S2S in debug mode and running a new cpld_bmarkfrac_v16_noahmp test produces a segfault.

To Reproduce:

Checkout this branch

This branch includes a new (non-wave) benchmark_v16_7b test and a matching restart test. To produce the segfault:

./rt.sh -cek -l rt.test > output 2>&1 &

The low-resolution cpld_control_p7 test is located in this branch

Additional context

Output

err log:

forrtl: error (73): floating divide by zero
Image              PC                Routine            Line        Source
fv3.exe            000000000D4D9EBE  Unknown               Unknown  Unknown
fv3.exe            000000000D616580  Unknown               Unknown  Unknown
fv3.exe            000000000794392E  module_sf_noahmpl        1910  module_sf_noahmplsm.f90
fv3.exe            000000000793A907  module_sf_noahmpl         756  module_sf_noahmplsm.f90
fv3.exe            00000000074EC894  noahmpdrv_mp_noah         840  sfc_noahmp_drv.F90
fv3.exe            000000000635546B  ccpp_fv3_gfs_v16_         660  ccpp_FV3_GFS_v16_coupled_noahmp_physics_cap.F90
fv3.exe            000000000601A4E3  ccpp_static_api_m         424  ccpp_static_api.F90
fv3.exe            000000000601248C  ccpp_driver_mp_cc         188  CCPP_driver.F90

The offending line is: fsno = tanh( snowh /(parameters%scffac * fmelt))

The following modification eliminates the segfault : if(fmelt .gt. 0.)fsno = tanh( snowh /(parameters%scffac * fmelt)) .

With the above fix, when compiled in non-debug mode the v16 noahmp restart test passes.

@DeniseWorthen DeniseWorthen added the bug Something isn't working label May 29, 2021
@junwang-noaa
Copy link
Collaborator

@HelinWei-NOAA May I ask if you can take a look at issue with NoahMP debug test? The issue is in both standalone gfsv16 NoahMP and coupled NoahMP tests.

@junwang-noaa
Copy link
Collaborator

@DeniseWorthen Please try the following fix Helin provided. If it is working, we may ask Helin to create a new PR.

The modification is shown below in FV3/ccpp/physics/physics/module_sf_noahmplsm.f90

! ground snow cover fraction [niu and yang, 2007, jgr]

 fsno = 0.
 if(snowh <= 1.e-6 .or. sneqv <= 1.e-3) then
  snowh = 0.0
  sneqv = 0.0
 end if

 if(snowh.gt.0.)  then
     bdsno    = sneqv / snowh
     fmelt    = (bdsno/100.)**parameters%mfsno
     fsno     = tanh( snowh /(parameters%scffac * fmelt))
 endif

@DeniseWorthen
Copy link
Collaborator Author

@junwang-noaa I tested this fix. In debug mode, there is no seg fault. In non-debug mode, both the control and restart test pass.

@DeniseWorthen
Copy link
Collaborator Author

This will be fixed by ccpp PR #673

@DeniseWorthen DeniseWorthen changed the title benchmark v16 noahmp test produces segfault in debug mode benchmark v16 7b test produces segfault in debug mode Jul 23, 2021
@DeniseWorthen
Copy link
Collaborator Author

@junwang-noaa @climbfuji I expected that PR #702 would fix this issue since that PR contained the fix for a divide by zero for fsno. I just repeated the test and it still fails, now in a new location.

  0: GFS_phys_time_vary_init: compute sncovr from weasd and soil vegetation parameters
  0: GFS_phys_time_vary_init: initialize albedo for land and ice
  0:  be create fcst grid
  0:  dateS=hours since 2013-04-01 00:00:00date_init=        2013           4
  0:            1           0           0           0
  0:  af create fcst fieldbundle, name=atmrc=           0
  0:  af create fcst fieldbundle, name=sfc_nearest_stodrc=           0
  0:  af create fcst fieldbundle, name=sfc_bilinearrc=           0
  0:  in fcst,init total time:    390.653084993362
 78: forrtl: error (73): floating divide by zero
 78: Image              PC                Routine            Line        Source
 78: fv3.exe            000000000D7A674E  Unknown               Unknown  Unknown
 78: libpthread-2.17.s  00002AAF587EF630  Unknown               Unknown  Unknown
 78: fv3.exe            0000000007C46B8C  module_sf_noahmpl        2028  module_sf_noahmplsm.f90
 78: fv3.exe            0000000007C3A75B  module_sf_noahmpl         756  module_sf_noahmplsm.f90
 78: fv3.exe            000000000760F52C  noahmpdrv_mp_noah         840  sfc_noahmp_drv.F90
 78: fv3.exe            00000000062FD619  oupled_nsstnoahmp         774  ccpp_FV3_GFS_v16_coupled_nsstNoahmpUGWPv1_physics_cap.F90
 78: fv3.exe            0000000005ECF7A4  ccpp_static_api_m         564  ccpp_static_api.F90

@climbfuji
Copy link
Collaborator

@junwang-noaa @climbfuji I expected that PR #702 would fix this issue since that PR contained the fix for a divide by zero for fsno. I just repeated the test and it still fails, now in a new location.

  0: GFS_phys_time_vary_init: compute sncovr from weasd and soil vegetation parameters
  0: GFS_phys_time_vary_init: initialize albedo for land and ice
  0:  be create fcst grid
  0:  dateS=hours since 2013-04-01 00:00:00date_init=        2013           4
  0:            1           0           0           0
  0:  af create fcst fieldbundle, name=atmrc=           0
  0:  af create fcst fieldbundle, name=sfc_nearest_stodrc=           0
  0:  af create fcst fieldbundle, name=sfc_bilinearrc=           0
  0:  in fcst,init total time:    390.653084993362
 78: forrtl: error (73): floating divide by zero
 78: Image              PC                Routine            Line        Source
 78: fv3.exe            000000000D7A674E  Unknown               Unknown  Unknown
 78: libpthread-2.17.s  00002AAF587EF630  Unknown               Unknown  Unknown
 78: fv3.exe            0000000007C46B8C  module_sf_noahmpl        2028  module_sf_noahmplsm.f90
 78: fv3.exe            0000000007C3A75B  module_sf_noahmpl         756  module_sf_noahmplsm.f90
 78: fv3.exe            000000000760F52C  noahmpdrv_mp_noah         840  sfc_noahmp_drv.F90
 78: fv3.exe            00000000062FD619  oupled_nsstnoahmp         774  ccpp_FV3_GFS_v16_coupled_nsstNoahmpUGWPv1_physics_cap.F90
 78: fv3.exe            0000000005ECF7A4  ccpp_static_api_m         564  ccpp_static_api.F90

@HelinWei-NOAA FYI

@HelinWei-NOAA
Copy link
Collaborator

@junwang-noaa @barlage

line 2028 is
d_rsurf = 2.2e-5 * parameters%smcmax(1) * parameters%smcmax(1) * ( 1.0 - parameters%smcwlt(1) / parameters%smcmax(1) ) ** (2.0+3.0/parameters%bexp(1))

Can you print out some parameters to see which one is zero? Or please tell me your running directory and I can do a test. Thanks.

@DeniseWorthen
Copy link
Collaborator Author

There is a test branch you can use to re-create the error. Use ./rt.sh -cek -l rt.test > output 2>&1 &. The compile is set up to compile in debug mode.

@barlage
Copy link
Collaborator

barlage commented Jul 27, 2021

@DeniseWorthen @junwang-noaa I'm a little confused about when/if this was ever fixed. Denise's comment on June 1 seems to indicate that is was working, but then it stopped working and the title was changed. Is there a known point in the history where the code was passing debug mode and then a known point where it wasn't?

@DeniseWorthen
Copy link
Collaborator Author

I tested the fsno fix noted above in debug mode at the time. I am assuming that somewhere in later commits a different fault crept in. I don't know at which commit that occurred.

I had created the test because I wanted to have a restart test ready for the p7 configuration (we can't test restarts in the wave configurations). But we're not currently running this non-wave, p7b test. I can try to do some sleuthing and pinpoint when I first see the test fail.

@barlage
Copy link
Collaborator

barlage commented Jul 27, 2021

OK, thanks Denise. Helin and I are trying to figure out what is happening, but have no solution now. The Noah-MP model, driver and parameter source code hasn't changed since June 7 so it is a bit confusing.

@DeniseWorthen
Copy link
Collaborator Author

The test itself changed for p7b:

# P7b
export IAER=1011
export DO_UGWP_V1=".true."
export KTHERM=2
export TFREEZE_OPTION='mushy

I can try to recreate the original test, which I believe was nsst and noahmp only.

@DeniseWorthen
Copy link
Collaborator Author

@barlage As part of updating all the coupled regression tests to use P7 settings, I have a low resolution (C96/1deg ocean/ice) case which gives me the same error message. That run directory is on hera:

/scratch1/NCEPDEV/stmp2/Denise.Worthen/FV3_RT/rt_7926/cpld_control_p7

@HelinWei-NOAA
Copy link
Collaborator

@DeniseWorthen From my testing, I found it was caused by fractional grid. Because you still used the old fixed fields (on Gaussian grid), sometimes vegetation and soil type mismatches will happen with fractional grid. vegetation type indicates land but soil type points to water. So for my own runs, I use the tiled fixed fields. George has run QC to make sure that situation will not happen with fraction grid when he creates the tiled fixed fields. For the time being, you can turn off fractional grid scheme. P7 runs will also use the tiled fixed fields and there should be no such issue.

@DeniseWorthen
Copy link
Collaborator Author

@HelinWei-NOAA Thanks, but this test case (in the rt_7926 directory) is using tiled fixed files. It is our low resolution match for the P7 prototype configuration. The input.nml is below. There are some grb files---are there tiled versions of these I should be using?

  FNGLAC   = 'global_glacier.2x2.grb'
  FNMXIC   = 'global_maxice.2x2.grb'
  FNTSFC   = 'RTGSST.1982.2012.monthly.clim.grb'
  FNSNOC   = 'global_snoclim.1.875.grb'
  FNZORC   = 'igbp'
  FNALBC   = 'C96.snowfree_albedo.tileX.nc',
  FNALBC2  = 'C96.facsf.tileX.nc',
  FNAISC   = 'CFSR.SEAICE.1982.2012.monthly.clim.grb'
  FNTG3C   = 'C96.substrate_temperature.tileX.nc',
  FNVEGC   = 'C96.vegetation_greenness.tileX.nc',
  FNVETC   = 'C96.vegetation_type.tileX.nc',
  FNSOTC   = 'C96.soil_type.tileX.nc',
  FNSMCC   = 'global_soilmgldas.t126.384.190.grb',
  FNMSKH   = 'global_slmask.t1534.3072.1536.grb'
  FNTSFA   = ''
  FNACNA   = ''
  FNSNOA   = ''
  FNVMNC   = 'C96.vegetation_greenness.tileX.nc',
  FNVMXC   = 'C96.vegetation_greenness.tileX.nc',
  FNSLPC   = 'C96.slope_type.tileX.nc',
  FNABSC   = 'C96.maximum_snow_albedo.tileX.nc',

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Jul 29, 2021 via email

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Jul 29, 2021 via email

@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Jul 29, 2021

My testing indicates that it is parameters%bexp(1) which can be zero. I am testing on gaea right now to avoid hera congestion so I can't point you to a run directory unless you also have access to gaea.

@HelinWei-NOAA
Copy link
Collaborator

Helin, I decided to give a try to NoahMP with debug on. I am using frac_grid=.true. and tiled fix fields. Yet my job crashed with the same error "libpthread-2.17.s 00002AFE3877D630 Unknown Unknown Unknown fv3_cmeps_CICE6_n 0000000009539438 module_sf_noahmpl 2028 module_sf_noahmplsm.f90 fv3_cmeps_CICE6_n 000000000952D007 module_sf_noahmpl 756 module_sf_noahmplsm.f90 fv3_cmeps_CICE6_n 0000000008BF305C noahmpdrv_mp_noah 840 sfc_noahmp_drv.F90 fv3_cmeps_CICE6_n 0000000006C5FE0E smgshocnsstnoahmp 794 ccpp_FV3_GFS_cpld_rasmgshocnsstnoahmp_ugwp_physics_cap.F90 fv3_cmeps_CICE6_n 00000000062B4C22 ccpp_static_api_m 2172 ccpp_static_api.F90" Moorthi

On Thu, Jul 29, 2021 at 1:59 PM HelinWei-NOAA @.> wrote: @DeniseWorthen https://github.com/DeniseWorthen From my testing, I found it was caused by fractional grid. Because you still used the old fixed fields (on Gaussian grid), sometimes vegetation and soil type mismatches will happen with fractional grid. vegetation type indicates land but soil type points to water. So for my own runs, I use the tiled fixed fields. George has run QC to make sure that situation will not happen with fraction grid when he creates the tiled fixed fields. For the time being, you can turn off fractional grid scheme. P7 runs will also use the tiled fixed fields and there should be no such issue. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#609 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALLVRYXANCZPXSW2R62BLY3T2GJGVANCNFSM45YNLQ3Q .
-- Dr. Shrinivas Moorthi Research Meteorologist Modeling and Data Assimilation Branch Environmental Modeling Center / National Centers for Environmental Prediction 5830 University Research Court - (W/NP23), College Park MD 20740 USA Tel: (301)683-3718 e-mail: @.
Phone: (301) 683-3718 Fax: (301) 683-3718

@SMoorthi-emc Can you try to turn off the fractional grid? That means we still have some mismatches with those tiled fixed fields, i.e. the land-sea mask points to the land while both veg/soil type are water with fractional grid.

@HelinWei-NOAA
Copy link
Collaborator

I want to make a correction. It is not veg and soil type mismatch. It is land-sea mask vs veg/soil type with fractional grid. If you print out parameters%bexp(1), the model will crash with bexp=0 and soil type=14, vtype=17.

@DeniseWorthen From my testing, I found it was caused by fractional grid. Because you still used the old fixed fields (on Gaussian grid), sometimes vegetation and soil type mismatches will happen with fractional grid. vegetation type indicates land but soil type points to water. So for my own runs, I use the tiled fixed fields. George has run QC to make sure that situation will not happen with fraction grid when he creates the tiled fixed fields. For the time being, you can turn off fractional grid scheme. P7 runs will also use the tiled fixed fields and there should be no such issue.

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Jul 29, 2021 via email

@HelinWei-NOAA
Copy link
Collaborator

My testing indicates that it is parameters%bexp(1) which can be zero. I am testing on gaea right now to avoid hera congestion so I can't point you to a run directory unless you also have access to gaea.

That's what I found too. Only when soil type=14 (water), bexp=0. When the point is water, the LSM should not be called.

@HelinWei-NOAA
Copy link
Collaborator

My testing indicates that it is parameters%bexp(1) which can be zero. I am testing on gaea right now to avoid hera congestion so I can't point you to a run directory unless you also have access to gaea.

That's what I found too. Only when soil type=14 (water), bexp=0. When the point is water, the LSM should not be called.

Noah LSM doesn't assign bexp to zero for soil type 14 (water). Otherwise the run with Noah LSM and fractional grid should crash too.

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Jul 29, 2021 via email

@HelinWei-NOAA
Copy link
Collaborator

HelinWei-NOAA commented Jul 29, 2021 via email

@DeniseWorthen
Copy link
Collaborator Author

I'm confused about your statement that "P7 runs will also use the tiled fixed fields and there should be no such issue." This test is for the P7 configuration. The resolution is C96/1deg because that is our control case for regression testing. Since this test uses tiled fixed files then it indicates to me that either a) the QC was not adequate on the C96 files or b) even with QC, the code needs modification to ensure that LSM is not called for water points.

@HelinWei-NOAA
Copy link
Collaborator

@GeorgeGayno-NOAA Can you confirm your code did assign both valid soil and veg types for fractional grid (1>landfrc>0)? We came across the situation with landfrc > 0 but both soil/veg indicate water.

I'm confused about your statement that "P7 runs will also use the tiled fixed fields and there should be no such issue." This test is for the P7 configuration. The resolution is C96/1deg because that is our control case for regression testing. Since this test uses tiled fixed files then it indicates to me that either a) the QC was not adequate on the C96 files or b) even with QC, the code needs modification to ensure that LSM is not called for water points.

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Jul 30, 2021 via email

@GeorgeGayno-NOAA
Copy link
Contributor

@bingfu-NOAA could you please pick a few more ICs to check if there are points where 1) veg and soil type are 0 while land frac > 0, 2) veg=17 and soil=14 while land frac >0 ? If these points do exist, can you trace back to find out at which step of the IC generation they occurred ?

Where are the ICs from chgres? I can help check.

@bingfu-NOAA
Copy link

bingfu-NOAA commented Aug 2, 2021 via email

@barlage
Copy link
Collaborator

barlage commented Aug 2, 2021

@bingfu-NOAA @GeorgeGayno-NOAA If you need any help, I can assist too. I have some scripts that I have used to check some of the tiles in the past: /home/Michael.Barlage/data/check They could be easily modified for whatever purpose.

@GeorgeGayno-NOAA
Copy link
Contributor

Hi George, The ICs are here: /scratch2/NCEPDEV/stmp3/Bing.Fu/o/p7ic/com/gens/dev/merge/C384_025 Thanks, Bing

--------------------------------------------------------- Bing Fu IMSG at NOAA/NWS/NCEP/EMC 5830 University Research Ct., College Park, MD 20740 @.*** 301-683-3738

On Mon, Aug 2, 2021 at 11:09 AM GeorgeGayno-NOAA @.***> wrote: @bingfu-NOAA https://github.com/bingfu-NOAA could you please pick a few more ICs to check if there are points where 1) veg and soil type are 0 while land frac > 0, 2) veg=17 and soil=14 while land frac >0 ? If these points do exist, can you trace back to find out at which step of the IC generation they occurred ? Where are the ICs from chgres? I can help check. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#609 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQG5GZ74UFPBVJFOJIGN6TT22YLZANCNFSM45YNLQ3Q .

@bingfu-NOAA Thanks. I checked the surface files for 2018030100. The files do not contain the land fraction record, only the 'slmsk'. Looking at tile 1, 'slmsk' is either 0 or 1. I see there are numerous points that have a valid soil/veg type with an 'slmsk' of 0. I was not able to tell if there were any undefined soil/veg points where 'slmsk' was '1'. If the model uses land fraction from the orog files, please let me know where they are so I can do a proper check of the surface files.

@bingfu-NOAA
Copy link

bingfu-NOAA commented Aug 2, 2021 via email

@yangfanglin
Copy link
Collaborator

The oro data the model actually reads in for integration is /scratch1/NCEPDEV/global/glopara/fix/fix_fv3_fracoro/C384.mx025_frac/oro_C384.mx025.tile1.nc. Is there any concern about the consistency between this file and the ceil/floor files ? @shansun6 Shan, can you remind us again the differences between the three files ?

@GeorgeGayno-NOAA
Copy link
Contributor

I looked at the oro data here: /scratch1/NCEPDEV/global/glopara/fix/fix_fv3_fracoro/C384.mx025_frac. And the surface data here: /scratch2/NCEPDEV/stmp3/Bing.Fu/o/p7ic/com/gens/dev/merge/C384_025/2018030100/gfs/C384/INPUT

Using 'ncview' I did not see any obvious mismatches. To confirm, I will need to write a utility to read each file and check. Unless someone already has a utility to do that.

@barlage
Copy link
Collaborator

barlage commented Aug 3, 2021

@GeorgeGayno-NOAA I expanded the check ncl script here:
/home/Michael.Barlage/data/check/check_land_soil.ncl

(0) number of cumulative fix_mask=1: 260881
(0) number of cumulative fix_land_frac (0,1]: 273435
(0) number of cumulative fix_lake_frac (0,1]: 6907
(0) number of cumulative valid fix land type: 273435
(0) number of cumulative valid fix soil type: 273435
(0) number of cumulative missing fix veg type with land: 0
(0) number of cumulative missing fix soil type with land: 0
(0) number of cumulative fix land type = 17: 0
(0) number of cumulative sfc_mask=1 : 255365
(0) number of cumulative valid sfc land type: 273435
(0) number of cumulative valid sfc soil type: 273435
(0) number of cumulative sfc land type >0 : 273435
(0) number of cumulative sfc soil type /=14: 273435
(0) number of cumulative sfc land type =0 : 611301
(0) number of cumulative sfc soil type =0 : 611301
(0) number of cumulative sfc land type =17 : 0
(0) number of cumulative sfc soil type =14 : 0
(0) number of cumulative mismatch land : 0
(0) number of cumulative mismatch soil : 0
(0) number of cumulative mismatch cross1 : 0
(0) number of cumulative mismatch cross2 : 0

The final four are reporting that everywhere there are valid land (veg type > 0) and soil (soil type >0 and /=14) in the surface tiles, there are coincident values in the fix files. Also, cross checking sfc land with fix soil and fix land with sfc soil shows the same, no inconsistencies.

@bingfu-NOAA
Copy link

@barlage @GeorgeGayno-NOAA Thank you all for your help to check the sfc IC files.

@yangfanglin
Copy link
Collaborator

@barlage @GeorgeGayno-NOAA This is interesting and puzzling. @shansun6 noticed there are mismatched points between the new oro data @mdtoyNOAA Mike created for the uGWD and the original fractional oro data.


Shan's email
I noticed some inconsistency in oro_data_ls.tile5.nc compared to oro_data.tile5.nc
in /scratch1/NCEPDEV/nems/emc.nemspara/RT/NEMSfv3gfs/input-data-20210717/FV3_input_data384/INPUT_L127/
see below where oa1 is plotted. The right is in oro_dats.tile5.nc which has zero over water. The left one is from oro_data_ls.tile.nc and there are non-zero values over water. I don't know if oro_data_ls.tile5.nc is used in the latest FV3.

Although, my understanding is that oro_data_ls.tile5.nc is only used by uGWD.v1. This needs Mike's confirmation.

Have anyone run the model with NOAH-MP but without using uGWD.v1 ? Is the model is crashing ?

@ShanSunNOAA
Copy link
Collaborator

ShanSunNOAA commented Aug 3, 2021 via email

@yangfanglin
Copy link
Collaborator

(1) at each point with land_frac >0, both vtype & stype are > 0.
(2) at each point with land_frac=0, both vtype & type are zero.

Is there a precision issue ? For instance, land_frac=0 in the ICs becomes a very small value after being read into the model. Can we run a test by setting in the model, for instance, if ( land_frac < 1.0E-6) then land_frac=0 ?

@barlage
Copy link
Collaborator

barlage commented Aug 3, 2021

@shansun6 you may want to also add a check for vtype>0 and stype=14

14 is soil type "water" and does not have valid values in the noahmp parameter table and I believe was the source of the original problems in this issue.

@ShanSunNOAA
Copy link
Collaborator

ShanSunNOAA commented Aug 3, 2021 via email

@HelinWei-NOAA
Copy link
Collaborator

HelinWei-NOAA commented Aug 3, 2021

@GeorgeGayno-NOAA I expanded the check ncl script here:
/home/Michael.Barlage/data/check/check_land_soil.ncl

(0) number of cumulative fix_mask=1: 260881
(0) number of cumulative fix_land_frac (0,1]: 273435
(0) number of cumulative fix_lake_frac (0,1]: 6907
(0) number of cumulative valid fix land type: 273435
(0) number of cumulative valid fix soil type: 273435
(0) number of cumulative missing fix veg type with land: 0
(0) number of cumulative missing fix soil type with land: 0
(0) number of cumulative fix land type = 17: 0
(0) number of cumulative sfc_mask=1 : 255365
(0) number of cumulative valid sfc land type: 273435
(0) number of cumulative valid sfc soil type: 273435
(0) number of cumulative sfc land type >0 : 273435
(0) number of cumulative sfc soil type /=14: 273435
(0) number of cumulative sfc land type =0 : 611301
(0) number of cumulative sfc soil type =0 : 611301
(0) number of cumulative sfc land type =17 : 0
(0) number of cumulative sfc soil type =14 : 0
(0) number of cumulative mismatch land : 0
(0) number of cumulative mismatch soil : 0
(0) number of cumulative mismatch cross1 : 0
(0) number of cumulative mismatch cross2 : 0

The final four are reporting that everywhere there are valid land (veg type > 0) and soil (soil type >0 and /=14) in the surface tiles, there are coincident values in the fix files. Also, cross checking sfc land with fix soil and fix land with sfc soil shows the same, no inconsistencies.

This is consistent with what I found. There is no soil type 14 in the soil type data from the fixed fields. But the model crashed at the point with soil type =14. Below is the email I sent to Moorth when we discussed this issue:

Even I added this section to FV3GFS_io.F90, the model still crashed at the same point.

    if (Model%frac_grid) then

! landfrac >0 with veg/soil point to water, set landfrac=0

     if (nint(Sfcprop(nb)%stype(ix)) == 14) then

     

      Sfcprop(nb)%landfrac(ix) =  zero

      Sfcprop(nb)%slmsk(ix) = 0

       if (Sfcprop(nb)%lakefrac(ix) > zero) then

         Sfcprop(nb)%lakefrac(ix) = one

        else

         Sfcprop(nb)%oceanfrac(ix) = one

       endif

      Model%frac_grid = .false.

     endif

    endif

It turns out in FV3GFS_io.F90, we don't have any case for soil type =14. The soil type is from Sfcprop(nb)%stype(ix). However in the other place like sfc_noahmp_drv.F90,

the soil type is given by GFS_Interstitial(cdata%thrd_no)%soiltype.

Do you know where GFS_Interstitial(cdata%thrd_no)%soiltype is defined in the model? So basically there is a conflict, we don't have any soil type 14 from Sfcprop but we have some from GFS_Interstitial(cdata%thrd_no).

@ShanSunNOAA
Copy link
Collaborator

ShanSunNOAA commented Aug 3, 2021 via email

@HelinWei-NOAA
Copy link
Collaborator

@GeorgeGayno-NOAA I expanded the check ncl script here:
/home/Michael.Barlage/data/check/check_land_soil.ncl
(0) number of cumulative fix_mask=1: 260881
(0) number of cumulative fix_land_frac (0,1]: 273435
(0) number of cumulative fix_lake_frac (0,1]: 6907
(0) number of cumulative valid fix land type: 273435
(0) number of cumulative valid fix soil type: 273435
(0) number of cumulative missing fix veg type with land: 0
(0) number of cumulative missing fix soil type with land: 0
(0) number of cumulative fix land type = 17: 0
(0) number of cumulative sfc_mask=1 : 255365
(0) number of cumulative valid sfc land type: 273435
(0) number of cumulative valid sfc soil type: 273435
(0) number of cumulative sfc land type >0 : 273435
(0) number of cumulative sfc soil type /=14: 273435
(0) number of cumulative sfc land type =0 : 611301
(0) number of cumulative sfc soil type =0 : 611301
(0) number of cumulative sfc land type =17 : 0
(0) number of cumulative sfc soil type =14 : 0
(0) number of cumulative mismatch land : 0
(0) number of cumulative mismatch soil : 0
(0) number of cumulative mismatch cross1 : 0
(0) number of cumulative mismatch cross2 : 0
The final four are reporting that everywhere there are valid land (veg type > 0) and soil (soil type >0 and /=14) in the surface tiles, there are coincident values in the fix files. Also, cross checking sfc land with fix soil and fix land with sfc soil shows the same, no inconsistencies.

This is consistent with what I found. There is no soil type 14 in the soil type data from the fixed fields. But the model crashed at the point with soil type =14. Below is the email I sent to Moorth when we discussed this issue:

Even I added this section to FV3GFS_io.F90, the model still crashed at the same point.

    if (Model%frac_grid) then

! landfrac >0 with veg/soil point to water, set landfrac=0

     if (nint(Sfcprop(nb)%stype(ix)) == 14) then

     

      Sfcprop(nb)%landfrac(ix) =  zero

      Sfcprop(nb)%slmsk(ix) = 0

       if (Sfcprop(nb)%lakefrac(ix) > zero) then

         Sfcprop(nb)%lakefrac(ix) = one

        else

         Sfcprop(nb)%oceanfrac(ix) = one

       endif

      Model%frac_grid = .false.

     endif

    endif

It turns out in FV3GFS_io.F90, we don't have any case for soil type =14. The soil type is from Sfcprop(nb)%stype(ix). However in the other place like sfc_noahmp_drv.F90,

the soil type is given by GFS_Interstitial(cdata%thrd_no)%soiltype.

Do you know where GFS_Interstitial(cdata%thrd_no)%soiltype is defined in the model? So basically there is a conflict, we don't have any soil type 14 from Sfcprop but we have some from GFS_Interstitial(cdata%thrd_no).

or soil type 14 is from GFS_surface_generic.F90
if (soiltyp(i) < 1) soiltyp(i) = 14
if (vegtype(i) < 1) vegtype(i) = 17
if (slopetyp(i) < 1) slopetyp(i) = 1
This means we have soiltype < 1 over some land points

@HelinWei-NOAA
Copy link
Collaborator

Hi Mike, Thanks for your reply. When I added a check for vtype>0 and stype=14, there was no point showing up. Actually there was no point with stype=14 regardless of vtype. Is it possible? Shan

On Mon, Aug 2, 2021 at 8:58 PM Michael Barlage @.***> wrote: @shansun6 https://github.com/shansun6 you may want to also add a check for vtype>0 and stype=14 14 is soil type "water" and does not have valid values in the noahmp parameter table and I believe was the source of the original problems in this issue. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#609 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALORMVWECJ4XE4FP4FTGQI3T25LLHANCNFSM45YNLQ3Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

The raw data has only valid type over land only. So actually we don't have stype=14. Over water it is either 0 or the flag value.

@HelinWei-NOAA
Copy link
Collaborator

Now both fixed fields and ICs look good. There should be some issues in the code and Moorthi already has the solution. @SMoorthi-emc can you explain what you found and how you fixed the problem? Thanks.

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Aug 3, 2021 via email

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Aug 3, 2021 via email

@yangfanglin
Copy link
Collaborator

NOAH-MP was turned on in p7a and p7b. Soil temp was fine in Lydia's plot. Can we make a run with NOAH-MP and uGWD.v0 (gwd_opt=1) ? A new SDF needs to be created it does not exist.

@DeniseWorthen
Copy link
Collaborator Author

Code was fixed in PR #723

pjpegion pushed a commit to NOAA-PSL/ufs-weather-model that referenced this issue Apr 4, 2023
epic-cicd-jenkins pushed a commit that referenced this issue Apr 17, 2023
1) Fix the bad WE2E test configuration file for MET_verification_only_vx (Issue #608).
2) Make creation of symlinks to pregenerated files depend on whether downstream tasks need those symlinks (Issue #610).
3) Set default value of FIXdir to HOMEdir/fix only when RUN_ENVIR="nco", not when RUN_TASK_MAKE_GRID=False; otherwise, set FIXdir to EXPTDIR (Issue #616).
4) Add a flag to the script get_expts_status.sh so that if an experiment hasn't been launched yet, it calls the launch script launch_FV3LAM_wflow.sh to launch it instead of only outputting a message that it's not yet launched.
epic-cicd-jenkins pushed a commit that referenced this issue Apr 17, 2023
The way the fix directory is set changed in PR #609, specifically item number 3.
I think this has been causing an issue with forced run_envir=nco mode. I don't have a full explanation for this at the moment, but reverting back to old way of setting FIXdir seems to atleast partially fix the issue. I was able to run fundamental tests successfully on Hera using run_envir=nco.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants