Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add changes to support R&D users #34

Closed
KateFriedman-NOAA opened this issue Apr 8, 2022 · 9 comments · Fixed by #52
Closed

Add changes to support R&D users #34

KateFriedman-NOAA opened this issue Apr 8, 2022 · 9 comments · Fixed by #52
Assignees

Comments

@KateFriedman-NOAA
Copy link
Member

The obsproc package does not work outside of the operational environment (e.g. developers running on R&D HPCs like Hera and Orion). This issue will document needed updates to support developers running obsproc outside of operations.

@KateFriedman-NOAA
Copy link
Member Author

Note: this issue requires a new prod_util (v2+) available on Hera/Orion in hpc-stack-gfsv16 because of changes to the utility path (bin -> ush), which is hardcoded jobs/JOBSPROC_GLOBAL_PREP:

#########################################################################
# Add some prod utilities to working directory
#########################################################################
echo "step ############# break ##############################" > ./break
cp $UTILROOT/ush/err_chk   .; chmod +x err_chk
cp $UTILROOT/ush/err_exit  .; chmod +x err_exit
cp $UTILROOT/ush/prep_step .; chmod +x prep_step
cp $UTILROOT/ush/postmsg   .; chmod +x postmsg
cp $UTILROOT/ush/setpdy.sh .; chmod +x setpdy.sh

...or this section of the script changes to accept input from outside. Thoughts?

Here is the request to get a newer prod_util in hpc-stack-gfsv16 on Hera/Orion: NOAA-EMC/hpc-stack#379 (comment)

Gonna need newer prod_util on Hera/Orion/Jet for the non-gfsv16 installs too eventually.

@KateFriedman-NOAA
Copy link
Member Author

Beyond needing a newer prod_util version on Hera/Orion...there is only one small change I needed to make to support obsproc on Hera/Orion...

This line in scripts/exglobal_makeprepbufr.sh does not work outside of operatons, even with prod_util setpdy.sh available:

cdate10=`cut -c7-16 ncepdate`

If I change that line to this:

cdate10=${cdate10:-`cut -c7-16 ncepdate`}

...and set export cdate10=${PDY}${cyc} in the g-w config.prep it works fine.

diff --git a/scripts/exglobal_makeprepbufr.sh b/scripts/exglobal_makeprepbufr.sh
index 7cb82f0..a4224b7 100755
--- a/scripts/exglobal_makeprepbufr.sh
+++ b/scripts/exglobal_makeprepbufr.sh
@@ -74,7 +74,7 @@ if [ "$DO_QC" = 'YES' -a "$CQCBUFR" = 'YES' -a -n "$COM1" -a -n "$CQCC" ]; then
    fi
 fi

-cdate10=`cut -c7-16 ncepdate`
+cdate10=${cdate10:-`cut -c7-16 ncepdate`}

 msg="CENTER TIME FOR PREPBUFR PROCESSING IS $cdate10"
 $DATA/postmsg "$jlogfile" "$msg"

I briefly explored not having to change that line in scripts/exglobal_makeprepbufr.sh and create ncepdate from g-w but it doesn't seem to be a simple matter and how it happens in ops isn't clear. Adding an override for cdate10 was the simplest way to create an entry point for me and from what I observe in the operational obsproc_prep jobs on WCOSS2 that cdate10 variable doesn't exist prior to that line in scripts/exglobal_makeprepbufr.sh...so I don't think this would break WCOSS2 ops functionality.

Thoughts on this small override change? I had to make zero other changes to support obsproc/v1.0.0 on Hera/Orion via the g-w prep job. @aerorahul

@KateFriedman-NOAA
Copy link
Member Author

@ShelleyMelchior-NOAA @ilianagenkova @aerorahul Thoughts on this simple override for supporting obsproc outside of operations?

@KateFriedman-NOAA
Copy link
Member Author

Committed cdate10 override to fork copy of develop-rd @ 531c007. See compare between fork of develop-rd and auth develop here: develop...KateFriedman-NOAA:develop-rd

@KateFriedman-NOAA
Copy link
Member Author

Have installed a copy of my fork develop-rd obsproc branch on Orion and moved my two current cycled tests (C192C96L127 and C384C192L127) to use it.

Orion-login-1[340] /work/noaa/global/kfriedma/expdir/devv16_192$ grep HOMEobsproc= config.base
#export HOMEobsproc="$BASE_GIT/obsproc/v${obsproc_run_ver}"
export HOMEobsproc=/work/noaa/global/kfriedma/git/obsproc_fork_develop-rd

Orion-login-1[318] /work/noaa/global/kfriedma/expdir/devv16_384$ grep HOMEobsproc= config.base
#export HOMEobsproc="$BASE_GIT/obsproc/v${obsproc_run_ver}"
export HOMEobsproc=/work/noaa/global/kfriedma/git/obsproc_fork_develop-rd

@ilianagenkova
Copy link
Contributor

@KateFriedman-NOAA , glad you found a solution for cdate10 on Hera/Orion. Changes like this can't be merged to develop without being tested on wcoss. When the time comes who would be doing it? You, @ShelleyMelchior-NOAA and I discussed and agreed to making obproc work on one development machine (Jet). No longer the case for some reason. How many people and how often would they be running obsproc outside of WCOSS? Don't these people have accounts on WCOSS already? What is the bigger picture for this effort that I am not told about?

@KateFriedman-NOAA
Copy link
Member Author

@ilianagenkova Answers to your questions:

Changes like this can't be merged to develop without being tested on wcoss. When the time comes who would be doing it?

This change will of course be tested on WCOSS2 and I will test it since it is my change. I am already preparing such tests to do on WCOSS2 this week to check my changes do not negatively impact functionality there. This kind of testing (test on production machines and all R&D machines) is part of the testing paradigm for the GFS/global-workflow and should always occur with all changes.

You, @ShelleyMelchior-NOAA and I discussed and agreed to making obproc work on one development machine (Jet).

I do not recall deciding on Jet, the global-workflow doesn't even fully support Jet (yet) but will do so in the coming months when we complete that port. Jet is a different discussion. For right now we need to support GFS developers on WCOSS2, Hera, and Orion (the tier-1 platforms). The obsproc/prepobs already build on those three platforms (in develop) so my current changes are for runtime support on those three platforms and for running outside of the operational environment, which would also occur on WCOSS2.

How many people and how often would they be running obsproc outside of WCOSS?

Many...likely dozens or more...but from the GFS perspective we are only concerned with the obsproc and prepobs packages, none of the rest are needed because of the global dump archive. I already currently support many global users running obsproc on Hera/Orion with copies of the operational OBSPROC packages that I have installed and updated locally to build/run there. These GitHub issues hope to formalize support for those HPCs and bring in the changes that I will have to make locally to these packages on Hera/Orion regardless...we require these packages to cycle the GFS for testing, it can't function without them.

Don't these people have accounts on WCOSS already?

Some do but most don't or won't. WCOSS2 is not meant to be a true development machine or the only one. To prepare GFS upgrades for operations we require multiple HPCs to perform development on due to its size (e.g. resources and output) and the number of users performing GFS experiments. The GFS community is large and spread across many HPCs. We define the primary HPCs that we are able to fully support as "tier-1"...which are WCOSS2, Hera, Orion (WCOSS1 is currently included but going away so I do not count it anymore).

What is the bigger picture for this effort that I am not told about?

The big picture is supporting GFS development outside of the operational environment and on multiple tier-1 HPCs. We have already been doing this in terms of OBSPROC for many years but not formally/officially due to a few factors but the build system was the main one. The move to use hpc-stack and cmake made the obsproc/prepobs build significantly more portable. With a few build changes that I sent back and which were accepted into the develop branches already, you guys are already supporting Hera/Orion in your repos. The last bit is the runtime changes I am now proposing, which will be minimal and should not impact WCOSS2/ops functionality.

If the concern is support moving forward:

  1. The changes I bring back now should create support outside of the operational environment for a long time and likely not require additional changes until we potentially obtain a new R&D HPC that isn't PBSpro (WCOSS2) or SLURM (Hera/Orion). This is a big "if" though since thus far the GFS can run obsproc/prepobs with BACK=YES in a manner that is agnostic to the scheduler type (see the changes I propose in the prepobs repo). The small cdate10 change in the obsproc repo is the only change required and, unless obsproc is completely rewritten down the road, likely to be the only change for a long time.
  2. EIB and myself can manage needed changes outside of the operational environment, this should not create additional work for the OBSPROC team beyond reviewing incoming PRs.

At the end of the day, WCOSS2 (production) functionality and support are the main goal of both the OBSPROC and GFS/global-workflow teams...so we are on the same page there. The GFS/global-workflow team, however, also has an equal responsibility to support development outside of operations. The small changes I have proposed thus far allow that support. GFS upgrades would not happen if we only supported cycled development and testing on one platform.

I'm happy to continue discussing concerns and questions in a group meeting. Thanks! :)

@ShelleyMelchior-NOAA
Copy link
Contributor

To summarize, how I am understanding things ... in the context of @KateFriedman-NOAA 's immediate needs, she is working within her fork and she (EIB) will do testing on all platforms, R&D + ops, to validate that the proposed changes work as intended in all environments, with no detriment to ops. Once EIB does the vetting, Kate will issue a PR from her fork into develop-rd. obsproc and prepobs code managers will need to review the PRs. There should be minimal burden on the obsproc team for R&D support beyond PR reviews/approvals.

Emphasis from Kate that EIB does the R&D heavy lifting and ops testing, is making me feel better about this

@ShelleyMelchior-NOAA
Copy link
Contributor

One last point, develop-rd serves as a way point back into develop. Anything that ever finds its way into develop-rd is considered gold and will not impact ops and can safely reside in develop, too.

ilianagenkova pushed a commit that referenced this issue May 5, 2022
- allows entry for global-workflow to pass in cycle value when
outside of operational environment
- global-workflow config.prep will set cdate10 using $PDY$cyc
- tested and works on Orion

Refs: #34
ilianagenkova added a commit that referenced this issue Jul 5, 2022
* Add override into cdate10 setting exglobal_makeprepbufr.sh (#37)

- allows entry for global-workflow to pass in cycle value when
outside of operational environment
- global-workflow config.prep will set cdate10 using $PDY$cyc
- tested and works on Orion

Refs: #34

* Develop sync and tcvitals bug fix updates for obsproc (#43)

* Set obsproc_ver=v1.0 and HOMEgfs=$COMROOT/gfs/v16.2 (#38)
* Set obsproc+ver=v1.0 and HOMEgfs=$COMROOT/gfs/v16.2
* Introduce 3-digit obsproc_ver_pckg and use with $PACKAGEROOT
* Added section to cd to logfile output dir prior to process submission.
* Squashed commit of the following:

commit 21d7d9a
Author: Shelley Melchior <[email protected]>
Date:   Thu Jun 2 13:30:52 2022 +0000

    Squashed commit of the following:

    commit daff5af
    Author: Shelley Melchior <[email protected]>
    Date:   Fri May 27 18:05:05 2022 +0000

        Added section to cd to logfile output dir prior to process submission.

    commit 30c01f9
    Author: iliana Genkova <[email protected]>
    Date:   Tue May 24 11:37:06 2022 -0500

        Set obsproc_ver=v1.0 and HOMEgfs=$COMROOT/gfs/v16.2 (#38)

        * Set obsproc+ver=v1.0 and HOMEgfs=$COMROOT/gfs/v16.2

        * Introduce 3-digit obsproc_ver_pckg and use with $PACKAGEROOT

    commit ebb6c68
    Author: Cory Martin <[email protected]>
    Date:   Tue Apr 12 15:15:47 2022 -0400

        Changes needed to build on Hera (#32)

    commit 0075564
    Author: Shelley Melchior <[email protected]>
    Date:   Sat Apr 9 17:07:10 2022 -0400

        Incorporating changes made by NCO SPA following: (#35)

        restart of cactus and
        changing output data directory for mods

    commit 59596a9
    Author: Shelley Melchior <[email protected]>
    Date:   Thu Apr 7 21:27:40 2022 +0000

        Removing top level README.md. This file now resides in docs/.

    commit 735f009
    Merge: bd4211a 8f28c00
    Author: Iliana Genkova <[email protected]>
    Date:   Thu Apr 7 20:51:34 2022 +0000

        Merge branch 'release/obsproc.v1.0.0' into develop

    commit 8f28c00
    Merge: f01f691 bd4211a
    Author: Iliana Genkova <[email protected]>
    Date:   Thu Apr 7 20:49:01 2022 +0000

        Removed build.sh (now in /build-obsproc) and
        ush/prepobs_makeprepbufr.sh(now in module prepobs)
        Merge branch 'develop' into release/obsproc.v1.0.0

    commit bd4211a
    Author: iliana Genkova <[email protected]>
    Date:   Mon Jan 3 19:23:04 2022 -0600

        Adopt Jack Woolen's (CFS) faster prepobs_makeprepbufr.sh: (#19)

        * Adopt Jack Woolen's (CFS) faster prepobs_makeprepbufr.sh:
        -use cfp with mpiexe -introduce FORT5 -move ksh interpreter

        * Clean up interpreter

    commit 309371d
    Author: Shelley Melchior <[email protected]>
    Date:   Mon Jan 3 18:25:21 2022 -0500

        created README.md

        Created README.md to better explain installation instructions, modified to adopt NCO's working practices.

    commit c5342eb
    Author: Rahul Mahajan <[email protected]>
    Date:   Mon Jan 3 17:57:29 2022 -0500

        adopt for current NCO working practices per discussion w/ StevenEarle et al (#18)

commit e63a78c
Author: Shelley Melchior <[email protected]>
Date:   Wed Jun 1 16:21:51 2022 +0000

    Updated to correctly locate tcvitals file.

commit 0bbb542
Author: Shelley Melchior <[email protected]>
Date:   Fri Apr 8 20:31:15 2022 +0000

    Incorporating changes made by NCO SPA following:
    restart of cactus and
    changing output data directory for mods

Co-authored-by: iliana Genkova <[email protected]>
Co-authored-by: Shelley Melchior <[email protected]>

* Update prepobs_ver to 1.0.1

Co-authored-by: iliana Genkova <[email protected]>
Co-authored-by: Shelley Melchior <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants