Skip to content

Conference call notes 20210804

Kenneth Hoste edited this page Aug 4, 2021 · 8 revisions

(back to Conference calls)

Notes on the 178th EasyBuild conference call, Wednesday Aug 4th 2021 (08:00 UTC)

Attendees

Alphabetical list of attendees (9):

  • Sebastian Achilles (Jülich Supercomputing Centre, Germany)
  • Simon Branford (Univ. of Birmingham, UK)
  • Alexander Grund (TU Dresden, Germany)
  • Victor Holanda Rusu (CSCS, Switzerland)
  • Kenneth Hoste (HPC-UGent, Belgium)
  • Kurt Lust (Univ. of Antwerp, Belgium + LUMI User Support Team)
  • Alan O'Cais (Jülich Supercomputing Centre, Germany)
  • Mikael Öhman (Chalmers University of Technology, Sweden)
  • Jörg Saßmannshausen (NIHR Biomedical Research Centre, UK)

Agenda

  • overview of recent developments
  • AlphaFold
  • 2021b common toolchains
  • Q&A

Recent developments

  • release timeline
    • last release: EasyBuild v4.4.1 (July 6th)
    • ETA next release: by end of August (?)
  • recent changes
    • framework
      • bug fixes
        • fix verify_imports by deleting all imported modules before re-importing them one by one (PR #3780)
          • related to including multiple easyblocks from a PR via --include-easyblocks-from-pr and inheritance
      • enhancements
        • (none)
      • changes
        • (none)
    • easyblocks
      • bug fixes
        • handle failure of running nvidia-smi in TensorFlow tests (PR #2506)
        • honor --ignore-test-failure in PythonPackage (PR #2516)
        • fix sanity-check and debug builds of TensorFlow (PR #2522)
        • correct cleanup of single-letter local variable in __init__ of easybuild.easyblocks (PR #2524)
      • enhancements
        • ...
      • new easyblocks
        • ...
      • changes
        • ...
    • easyconfigs
      • ~80 easyconfig PRs merged since last conf call!
      • bug fixes
        • (nothing major)
      • enhancements
        • (nothing major)
      • new software
        • (nothing major)
      • noteworthy software updates
  • to merge/fix/tackle soon
    • framework
      • reported bugs / bug fixes
        • make logdir writable also when --stop/--fetch is given (PR #3771)
        • don't parse patch files as EasyConfigs when searching for patch usage (PR #3786)
        • include easyblocks from multiple PRs at the same time (PR #3792)
        • no static libraries available for FlexiBLAS (PR #3783)
          • problem with GROMACS, see issue #2521
          • see issue in FlexiBLAS repo w.r.t. static linking
          • we should let framework define a list of shared libs, and then update easyblocks to use them
          • should we check whether libraries exist before including them in $BLAS_STATIC_LIBS
        • find_software_name_for_patch can fail when non UTF8 files exist (PR #3781)
        • add optimal optimization flags for Intel compilers on AMD CPUs (PR #3793)
          • -xHost results in only using SSE2 on AMD systems...
        • broken download for PyTorch 1.9.0 because of v1.9.0 branch that was shadowing v1.9.0 tag (see upstream issue)
          • does this point out a bug in the way we download sources from a Git repo?
      • enhancements
        • avoid using a priority in prepend_module_path (Lmod) to avoid costly module calls (PR #3636)
        • add support for installing extensions in parallel (WIP) (PR #3667)
          • needs more test + a dedicated unit test
          • should be marked experimental at first?
        • finding modules with multiple modulepaths and HMNS (issue #3703)
        • check for recursive symlinks by default before copying a folder (PR #3784)
        • add easybuild.tools.LooseVersion (PR #3794)
          • required because distutils which provides LooseVersion will be removed in Python 3.10+ ...
      • changes
        • fully drop support code for Python < 2.7 (PR #3788)
        • test suite improvements (PR #3790)
    • easyblocks
      • reported bugs / bug fixes
        • explicitly use only OpenBLAS for PyTorch if MKL is not used (PR #2448)
        • clean up installation of Tkinter (PR #2509)
      • enhancements
      • changes
        • (nothing major)
      • new software
    • easyconfigs
      • bug fixes
        • improve check for multi-variant dependencies per generation of easyconfigs (PR #12687)
      • enhancements
        • various PRs for easyconfigs with GCCcore 11.2.0 as toolchain (prep for 2021b common toolchains)
      • new software
        • (nothing major?)
      • software updates
        • SciPy-bundle with intel/2021a (PR #12964)
          • need to look into handful of failing tests...

Common toolchains

2021a foss+CUDA

  • some libraries on top of foss+CUDA: PR #13282

2021b

  • base will probably be GCC 11.2
  • OpenBLAS 0.3.17 tests fail on Intel systems when built with GCC 11.2
    • worth trying without -free-vectorize?
  • also PR for Perl with GCC 11.2: split up in Perl-bare and Perl

AlphaFold

  • Status of Alpha-Fold EC file:​
  • The following EC are already around:​
    • cuDNN-8.0.5.39-CUDA-11.1.1.eb
    • Python-3.8.6-GCCcore-10.2.0.eb
    • TensorFlow-2.5.0-foss-2020b.eb/TensorFlow-2.5.0-fosscuda-2020b.eb
    • SciPy-bundle-2020.11-foss-2020b.eb/SciPy-bundle-2020.11-fosscuda-2020b.eb
  • The following EC were created and uploaded:
    • OpenMM-7.5.1-fosscuda-2020b.eb/OpenMM-7.5.1-foss-2020b.eb (PR #13452)
    • pdbfixer-1.7-GCCcore-10.2.0.eb (PR #13464)
    • hhsuite-3.3.0-foss-2020b.eb (PR #13459)
    • kalign-2.04-gcccore-10.2.0.eb/kalign-3.3.1-gcccore-10.2.0.eb (PR #13463)
    • Biopython-1.78-foss-2020b-Python-3.8.6.eb (PR #13574)
  • These are working EC for Python (PR #13571)
    • opt-einsum-3.3.0-foss-2020b.eb
    • immutabledict-2.1.0-1.1.0-foss-2020b.eb
    • dm-tree-0.1.5-foss-2020b.eb
    • fastcache-1.1.0-foss-2020b.eb
    • absl-py-0.13.0-foss-2020b.eb
  • These are currently not working EC for Python (PR #13572)
    • ml-collection--foss-2020b.eb
    • jax-0.1.55-foss-2020b.eb
    • jaxlib-0.1.69-foss-2020b.eb
    • chex-0.0.8-foss-2020b.eb
    • dm-haiku-0.0.3-foss-2020b.eb
  • Basically, this procedure is being followed:
  • What about the docker required dependency?
    • Only needed for a particular script, may be fine to ignore
    • Looks like just Python bindings to Docker, could be an easy install?

Q&A

  • Victor: generoso needs to be updated

    • install new cluster and migrate, or update in place?
    • Kenneth: this deserves a separate meeting...
  • Again over 500 open easyconfig PRs... Ideas for changes we can/should make to keep up a bit better?

    • Coordinated pre-release merge sprints?
    • More aggressively closing old/stale PRs?
      • Automatically close PRs that have no activity for > 6 months or 1 year (unless a do-not-close label is present)?
      • Could use stale GitHub Action for this: https://github.com/actions/stale
  • Alan: PR merged for MPICH easyconfig

    • toolchain that includes MPICH?
      • fossm? fossmpich?
    • do we want to start adding easyconfigs for a toolchain like this too?
      • good way to communicate that others are using this toolchain, and easyconfigs are working
    • MPICH only allows to be built on top of either UCX or libfabric, not both (unclear why...)
    • Kurt: Clang-based toolchain would also be interesting, since most vendor compilers are based on Clang (AMD, Cray, Intel)
    • Kenneth: would a Clang+MPICH based toolchain be the best way forward then? seems like uncharted territory (and unclear how to deal with Fortran)
Clone this wiki locally