Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add/fix build capability for Gaea-C5, Gaea-C6, and container #87

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

DavidBurrows-NCO
Copy link
Contributor

Description

After the recent Gaea-C5 OS upgrade, gfs_utils fails to build.
This corrects Gaea-C5 build, adds Gaea-C6 build capability (following ufs-wx-model 2448), and adds containerized build capability.

Refs NOAA-EMC/global-workflow 3011
Refs NOAA-EMC/global-workflow 3025
Resolve #86

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO

How has this been tested?

Cloned and built on Gaea-C5, Gaea-C6, and container

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • I have made corresponding changes to the documentation if necessary

ush/module-setup.sh Show resolved Hide resolved
@@ -25,6 +25,7 @@ endif()

# User options.
option(OPENMP "use OpenMP threading" ON)
option(LINK_W3EMC "Link w3emc library" ON)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the purpose of this new option. Shouldn't this build in the same manner on all platforms? Have the other platforms been tested with this change?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is required for building with spack-stack-1.8.0. In spack-stack-1.8.0, the skgb function has been added to the g2 library and will throw an error for multiple definitions (it is also in w3emc). Once gfs_utils gets updated to spack-stack-1.8.0 everywhere, the option can be removed.

@@ -0,0 +1,22 @@
help([[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this module file?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a version-less replacement for gfsutils_common.lua that is used when building with a container. We can add in the version numbers, if you want, but leaving them out allows more flexibility to update the container without having to update module files in all repos. I understand that having specific version numbers is desirable for testing and reproducibility, but allowing flexibility in them for the container case improves portability and extensibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it improves portability and extensibility, but they come at the cost of reproducibility, reliability, and maintainability. Is there a way to have minimum versions in lua modulefiles? In cmake one can provide a minimum version.
The gfsutils_common.lua provides a mechanism for the user to provide a specific version. See here. If the container has a different version, it can set those version numbers and still be able to leverage gfsutils_common.lua. I think if default is provided, it would be similar to this file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll switch it around so that the container just uses the gfs_common.lua module file along with environment variables to specify the versions available to the container.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. Thanks @mark-a-potts

load(pathJoin("stack-intel-oneapi-mpi", stack_impi_ver))
load(pathJoin("cmake", cmake_ver))

load("gfsutils_generic")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would the container not load the gfsutils_common.lua modulefile?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version numbers are not the same in the container, since it is based on spack-stack 1.8.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Thanks for explaining.
@DavidBurrows-NCO Please create an issue to replace this and consolidate w/ gfsutils_common.lua and remove this file when spack-stack is upgraded to 1.8.0 so that we can revisit it then.
We can accept this for now, but please we cannot afford to maintain multiple copies of modulefiles for different spack-stack versions. That would be unmanageable for maintenance teams.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aerorahul @mark-a-potts Issue #88 created

Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidBurrows-NCO can you please address some questions from the review?

@@ -3,15 +3,15 @@ help([[
on the NOAA RDHPC machine Gaea C5 using Intel-2023.1.0.
]])

whatis([===[Loads libraries needed for building the UFS Weather Model on Gaea ]===])
whatis([===[Loads libraries needed for building the UFS Weather Model on Gaea C5]===])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
whatis([===[Loads libraries needed for building the UFS Weather Model on Gaea C5]===])
whatis([===[Loads libraries needed for building the GFS utilities Gaea C5]===])

on the NOAA RDHPC machine Gaea C6 using Intel-2023.1.0.
]])

whatis([===[Loads libraries needed for building the UFS Weather Model on Gaea C6 ]===])
Copy link
Contributor

@aerorahul aerorahul Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
whatis([===[Loads libraries needed for building the UFS Weather Model on Gaea C6 ]===])
whatis([===[Loads libraries needed for building the GFS utilities on Gaea C6 ]===])

Comment on lines 17 to 18
stack_python_ver=os.getenv("stack_python_ver") or "3.11.6"
load(pathJoin("stack-python", stack_python_ver))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is using python in this repo?

Comment on lines 8 to 9
prepend_path("MODULEPATH", "/ncrc/proj/epic/spack-stack/c6/spack-stack-1.6.0/envs/unified-env/install/modulefiles/Core")
prepend_path("MODULEPATH", "/ncrc/proj/epic/spack-stack/c6/spack-stack-1.6.0/envs/gsi-addon/install/modulefiles/Core")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the duplicate line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice its not duplicate. Why are both unified-env and gsi-addon envs needed to be loaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aerorahul unified-env has bufr/12.0.1, and gsi-addon has bufr/11.7.0 needed by gfs-utils. Initially when I ported to C5, I had to have both MODULEPATHs. I just tested C5 with only gsi-addon, and the build was successful. I haven't had access to C6 today so will test that build when I can.

load(pathJoin("stack-intel", stack_intel_ver))

stack_cray_mpich_ver=os.getenv("stack_cray_mpich_ver") or "8.1.25"
stack_cray_mpich_ver=os.getenv("stack_cray_mpich_ver") or "8.1.28"
load(pathJoin("stack-cray-mpich", stack_cray_mpich_ver))

stack_python_ver=os.getenv("stack_python_ver") or "3.11.6"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is using python in this repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aerorahul I have removed the python load and retested the C5 build successfully. I haven't had access to C6 today so will test that build when I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gfs_utils fails to build on Gaea-C5 after OS upgrade
4 participants