Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: e3sm_to_cmip raises an obscure concurrency error when the grid is invalid #257

Open
forsyth2 opened this issue Apr 19, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@forsyth2
Copy link

forsyth2 commented Apr 19, 2024

What happened?

I ran into this error:

2024-04-19 19:12:28,039_039:INFO:cmorize:pr: creating CMOR variable with CMOR axis objects.
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0rc1_chrysalis/lib/python3.10/site-packages/e3sm_to_cmip/__main__.py", line 931, in _run_parallel
    out = res.result()
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0rc1_chrysalis/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0rc1_chrysalis/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
^M 50%|█████     | 1/2 [00:00<00:00,  5.38it/s]  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0rc1_chrysalis/lib/python3.10/site-packages/e3sm_to_cmip/__main__.py", line 931, in _run_parallel
    out = res.result()
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0rc1_chrysalis/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.10.0rc1_chrysalis/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
^M100%|██████████| 2/2 [00:00<00:00, 10.74it/s]

...

mv: cannot stat '/lcrc/group/e3sm/ac.forsyth2/zppy_p1_output/v3.LR.piControl/post/atm/native/cmip_ts/monthly/tmp_ts_atm_monthly_0001-0005-0005/CMIP6/CMIP/*/*/*/*/*/*/*/*/*.nc': No such file or directory

because I missed specifying a mapping_file in zppy. See E3SM-Project/zppy#549 (comment) and E3SM-Project/zppy#549 (comment) for original comments.

What did you expect to happen? Are there are possible answers you came across?

It would be good to catch the grid error earlier and raise that instead of letting e3sm_to_cmip error out on a concurrency bug that isn't particularly informative. E.g., is it easy to say "oh zppy's mapping_file includes a certain substring and therefore will/won't be compatible"?

@TonyB9000 has a table to keep track of mapping file information, (/p/user_pub/e3sm/staging/resource/derivatives.conf on acme1, /lcrc/group/e3sm2/DSM/Staging/Resource/derivatives.conf on chrysalis), which has "selections based upon realm, resolution, and model_version. It can be USED programmatically, but cannot be MAINTAINED programmatically"

Minimal Complete Verifiable Example (MVCE)

# Relevant section of ts_atm_monthly_0001-0005-0005.bash
  srun -N 1 e3sm_to_cmip \
  --output-path \
  ${dest_cmip}/${tmp_dir} \
  --var-list \
  'pr, tas, rsds, rlds, rsus' \
  --realm \
  atm \
  --input-path \
  ${input_dir} \
  --user-metadata \
  /lcrc/group/e3sm/ac.forsyth2/zppy_p1_output/v3.LR.piControl/post/scripts/${workdir}/default_metadata.json \
  --num-proc \
  12 \
  --tables-path \
  ${cmortables_dir}

Relevant log output

No response

Anything else we need to know?

The zppy cfg:

[default]
input = /lcrc/group/e3sm2/ac.golaz/E3SMv3/v3.LR.piControl
output = /lcrc/group/e3sm/ac.forsyth2/zppy_p1_output/v3.LR.piControl
case = v3.LR.piControl
www = /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_p1_www
partition = compute
environment_commands = "source /lcrc/soft/climate/e3sm-unified/test_e3sm_unified_1.10.0rc1_chrysalis.sh"

[ts]
active = True
walltime = "00:50:00"

  [[ atm_monthly ]]
  frequency = "monthly"
  input_files = "eam.h0"
  input_subdir = "archive/atm/hist"
  ts_fmt = "cmip"
  years = "0001:0020:5",

  [[ land_monthly ]]
  extra_vars = "landfrac"
  frequency = "monthly"
  input_files = "elm.h0"
  input_subdir = "archive/lnd/hist"
  ts_fmt = "cmip"
  vars = "FSH,RH2M,LAISHA,LAISUN"
  years = "0001:0020:5",

  [[ atm_monthly_glb ]]
  input_subdir = "archive/atm/hist"
  input_files = "eam.h0"
  frequency = "monthly"
  mapping_file = "glb"
  years = "0001:0020:10",

  [[ lnd_monthly_glb ]]
  input_subdir = "archive/lnd/hist"
  input_files = "elm.h0"
  frequency = "monthly"
  mapping_file = "glb"
  vars = "FSH,RH2M,LAISHA,LAISUN"
  years = "0001:0020:10",

Notice, no mapping_file specified for the ts_atm_monthly subtask

Environment

source /lcrc/soft/climate/e3sm-unified/test_e3sm_unified_1.10.0rc1_chrysalis.sh -> e3sm_unified_1.10.0rc1_login

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

2 participants