Skip to content

Conference call notes 20210217

Kenneth Hoste edited this page Mar 3, 2021 · 5 revisions

(back to Conference calls)

Notes on the 167th EasyBuild conference call, Wednesday February 17th 2021 (09:00 UTC)

Attendees

Alphabetical list of attendees (12):

  • Simon Branford (University of Birmingham, UK)
  • Miguel Dias Costa (National University of Singapore)
  • Alexander Grund (TU Dresden, Germany)
  • Kenneth Hoste (HPC-UGent, Belgium)
  • Adam Huffman (Big Data Institute, Oxford, UK)
  • Terje Kvernes (University of Oslo, Norway)
  • Kurt Lust (Univ. of Antwerp, Belgium + LUMI User Support Team)
  • Robert Mijakovic (LuxProvide)
  • Mikael Öhman (Chalmers University of Technology, Sweden)
  • Alan O'Cais (JSC, Germany)
  • Jurij Pečar (EMBL, Germany)
  • Jörg Saßmannshausen (NIHR Biomedical Research Centre, UK)

Agenda

  • PyTorch benchmarking results (conda vs EasyBuild) at EMBL
  • update on recent developments + outlook to next release
  • Q&A

PyTorch benchmarking results (conda vs EasyBuild) at EMBL

  • https://github.com/constantinpape/3d-unet-benchmarks#embl-cluster-results
  • improve speedups for PyTorch 1.7.1 installation with EasyBuild vs conda installations
    • 4x-6x faster!
    • across a range of GPUs (1080i, V100, A100, ...)
    • smaller speedup for 1080i (no Tensor cores)
  • is input data available so others can try to reproduce these results?
  • Alexander: has startup overhead been taken into account?
  • Does conda have binaries for the targeted GPUs?
    • PTX intermediate format would imply large overhead for JiT compilation
  • Mikael: workload could be relying heavily on CPU, so conda installing generic binaries could have big impact

Recent developments

  • next release (v4.3.3): before EUM'21? during EUM week? (still) hopefully next week...
  • recent changes
    • framework
      • bug fixes
        • avoid initializing Toolchain instance when taking into account toolchain dependencies for templates (PR #3560)
        • Create the lib64 symlink as a relative symlink (PR #3566)
        • Don't reuse variable name in the loop to fix adding extra compiler flags via toolchainopts (PR #3571)
      • enhancements
      • changes
        • (none)
    • easyblocks
      • bug fixes
        • pass down compilation flags from build environment for ESMF (PR #2325)
      • enhancements
        • add support for skipping steps in Python packages installed as extension + print progress on individual steps for installing Python packages as extensions (PR #2290)
        • updates for TensorFlow easyblock w.r.t. optional feature support & running CPU/GPU tests (PR #2314, PR #2312)
        • update OpenFOAM easyblock for v2012 (PR #2321)
      • new software
        • (none)
      • changes
        • (none)
    • easyconfigs
      • ~50 merged easyconfig PRs since last conf call!
      • bug fixes
        • Remove duplicate extensions in R 3.5.x easyconfigs, and add test to detect such issues (PR #12059)
        • add gnuplot dependency for OpenFOAM from v2.4.0 to v6 (PR #11801)
        • fix source URL in UDUNITS easyconfigs + add sources.easybuild.io fallback source URL (PR #12156)
        • add additional patches for PyTorch 1.7.1 to fix failing tests (PR #12147)
      • enhancements
        • Add Tensorboard profile plugin to TensorFlow 2.2.0 + 2.3.1 easyconfigs(PR #12136, PR #12137)
      • new software
      • software updates
      • changes
        • Update Bison (build) dependency for flex built with system compiler to v3.5.3 (PR #12111)
        • move make 4.3 easyconfigs to GCCcore toolchain (PR #12166)
        • move most recent BLIS and libFLAME easyconfigs from GCC to GCCcore (PR #12168)
  • to merge/fix/tackle soon
    • framework (v4.3.3 milestone)
      • bug fixes
        • (BLOCKER for v4.3.3) Fix UTF-8 encoding errors when running EasyBuild with Python 3.0.x-3.6.x (PR #3565)
        • Performance improvements for easyconfig parsing (PR #3555)
        • Avoid module use in Lmod if possible to allow faster execution (PR #3557)
        • Avoid metadata greedy behaviour (mostly relevant for Cray systems) (PR #3559)
      • enhancements
        • (NICE TO HAVE for v4.3.3) add support for using customized HTTP headers in download_file (PR #2472)
          • important for ITER.org (see EUM'21 talk)
        • support additional features in easystack files
        • Allow use of alternate envvar(s) to HOME for user modules (PR #3558)
    • easyblocks (v4.3.3 milestone)
      • bug fixes
        • correctly determine path to active binutils in TensorFlow easyblock (PR #2218) [EESSI, RPATH]
        • fix taking into account --sysroot when installing/using CMake [EESSI] (PR #2247 or PR #2248, latter is best option?)
        • treat files/directories of unpacked sources equally in PackedBinary generic easyblock (PR #2306)
        • make sure the installation of libiberty.a in the binutils easyblock goes into a populated directory (PR #2308)
        • Fix for building GCC with --sysroot on ppc64le (PR #2315)
        • RPackages: Change to using R_LIBS_SITE when installing RPackages (PR #2326)/pull/2325))
        • added regex to replace /lib/cpp with cpp in OpenFOAM's wmake rules file (PR #2331)
      • enhancements
        • enhance OpenBLAS easyblock to make it aware of optarch (PR #1946)
        • (NICE TO HAVE for v4.3.3) update impi easyblock for impi 2021.x (oneAPI) (PR #2313)
        • enhance test and install step of CMakePythonPackage easyblock (PR #2318)
        • pass $CXXFLAGS to PDT's configure script via '-useropt' (PR #2324)
        • Add sanity check commands to GCC (including LTO support) (PR #2322)
        • Add support for including PTX code in PyTorch (PR #2328)
      • changes
        • (nothing)
      • new software
        • add custom easyblock for:
          • ADIOS (PR #2070)
          • JAX (PR #2262)
          • RELION (PR #2274)
            • does adding an EasyBuild configuration option to specify the system job scheduler used make sense?
            • yes, if it can help easyblocks to make proper decisions
            • can also be useful for OpenMPI to opt-in to Slurm integration
        • (NICE TO HAVE for v4.3.3) new generic easyblock for Intel oneAPI compilers (PR #2305)
          • using intel-compilers as software name, including icc/icpc/ifort (classic compilers) + icx/icpx/ifx/dpcpp (oneAPI compilers)
    • easyconfigs (v4.3.3 milestone)
      • bug fixes
        • fix libxml2 dep for Clang (part of PR #12013)
        • p4est-2.2: Add patches to fix test problems and use a source_url that contains a complete tar-file (PR #12028
      • enhancements
        • (nothing major?)
      • new software
        • (NICE TO HAVE FOR v4.3.3) Intel oneAPI
          • there will be no further updates to Intel Parallel Studio (apparently)
          • intel-compilers v2020.1 (PR #11982)
      • software updates
        • (NICE TO HAVE FOR v4.3.3) impi v2021.1 (PR #12026)
        • PyTorch 1.7.1 with fosscuda/2020b (PR #12003)

Q&A

  • Mikael: Has somebody played with MLperf
  • Robert: PRs coming in, will start slowly
  • Jörg: looking into installing old software (EPACT), have to use foss/2016b
    • need to get latest R installed as well
    • is it worth the effort for contributing this back?
    • toolchainopts = {'cstd': 'c++98'}, maybe also disable -Werror?
  • BLIS evaluation follow-up call tomorrow (Thu Feb 18th 2021 at 2.15 CET)
  • Mikael: does BLIS at GCCcore imply we can move more stuff to GCCcore level?
  • Mikael: NCCL is actually open-source?
Clone this wiki locally