Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fujitsu toolchain TODO list #704

Open
28 of 36 tasks
migueldiascosta opened this issue Jun 1, 2021 · 0 comments
Open
28 of 36 tasks

Fujitsu toolchain TODO list #704

migueldiascosta opened this issue Jun 1, 2021 · 0 comments
Assignees
Labels
Milestone

Comments

@migueldiascosta
Copy link
Member

migueldiascosta commented Jun 1, 2021

Following up from #701

update June 2021: hoping for access to Deucalion's A64FX partition when ready, to see if the current draft implementation is sufficiently generic or not

update June 2024: access to Deucalion's A64FX partition confirms that the current draft implementation is not sufficiently generic

Framework

  • since we can't rely on the permanence and uniformity of fujitsu language environment modules, shall we use an environment variable to pass the location to the compiler? use a metadata file? use which to find it? (draft at migueldiascosta/easybuild-framework@b21987e)
  • HierarchicalMNS support: there isn't an mpi module, it's included in FCC (but we can could create a fake module to keep the same levels...)
  • add etc/fujitsu_external_modules_metadata.cfg?
    • for now, it would be simply [lang/tcsds-1.2.31]\nname = lang\nversion = tcsds-1.2.31\nprefix = FJSVXTCLANGA
  • -SSL2* and -SCALAPACK should be used only when linking but easybuilds prepends -L to LDFLAGS variables, so they are currently in compile flags (not a problem but generate warnings that pollute the logs) (it may be useful to have way of injecting direct flags into $LDFLAGS easybuild-framework#3700)
    • also, right now -SSL2* flags are being duplicated, probably being set by both _set_blas_variables and _set_lapack_variables, should be easy to fix

Easyblocks

Easyconfigs

  • revert to 21.05 in FCC and ffmpi easyconfigs instead of 4.5.0, since it seems we won't be able to pin the compiler version?
  • binutils, Perl: use osdep for zlib to avoid warnings from Fugaku's large page allocation feature, PR upcoming
  • HDF5: PR upcoming
    • hidden symbol __fixunstfsi' in /usr/lib/gcc/aarch64-redhat-linux/8/libgcc.a(fixunstfsi.o) is referenced by DSO`
      • we had added --rtlib=compiler-rt to $LDFLAGS in M4 because of https://bugs.llvm.org/show_bug.cgi?id=16404, we need to the the same here (this will likely pop up again...)
      • actually --rtlib=compiler-rt -lgcc_s, because we still need other symbols from libgcc (e.g. unwind)
  • CMake: PR upcoming
    • linker flags and FindMPI patches
    • installing with RPATH fails, apparently related to the static library libstdc++fs.a
  • LLVM: PR upcoming
  • Python: PR upcoming
    • Unzip Makefile has hardcoded CC=cc, needs CC="$CC" buildopt (added to all other UnZip easyconfigs in override Makefile with hardcoded CC=cc in UnZip easyconfigs easybuild-easyconfigs#12887)
    • Rust: fails with thread 'main' panicked at 'couldn't find required command: "far"', src/bootstrap/sanity.rs:60:13; problem seems to be when "finding compilers"
      • cc_detect tries to infer ar command name from fcc and comes up with far, but only if AR environment variable is not set, so setting it in prebuildopts
      • fails much later with clang-7: error: unable to execute command: Killed; clang-7: error: clang frontend command failed due to signal...
        • this happens when Rust is building it's own LLVM
        • (which is not honouring EB's parallel, needs prebuildopts += "export LLVM_PARALLEL_COMPILE_JOBS=%(parallel)s && " Ninja, which can be added as builddep if it is modified to use python -bare as builddep)
          [x] - make Rust use EB LLVM 12, after moving LLVM's python builddep to -bare
    • Python sometimes (?) fails building cryptography with error: cargo failed with code: -11
    • in this particular version one can use CRYPTOGRAPHY_DONT_BUILD_RUST=1 if necessary...
  • SciPy-bundle
  • h5py: PR upcoming
  • ELPA
    • "The 'OPTIONAL' attribute must not be specified for the dummy argument 'success' of a procedure that has the procedure language binding specifier", unless --disable-Fortran2008-features, but the Fujitsu Compiler is supposed to support it...
    • new configure opt --enable-FUGAKU in 2021.005.001, also --enable-sve-512
    • but it still fails
  • BerkeleyGW (without ELPA, for now): {lib,phys}[Fujitsu/2105,ffmpi/2105] fftlib v20170628, BerkeleyGW v2.1.0 (WIP) easybuild-easyconfigs#12868 (needs to be updated)

Questions about Fujitsu ecosystem

i.e. how the environment will change in the future and if/how it differs across systems

Fugaku
  • universality of the lang/tcsds modules: are these specific to Fugaku or generic to other Fujitsu a64fx systems?
    • at Fugaku, we are using the lang module name (and one of the environment variables it sets, FJSVXTCLANGA, although this could be moved to the external module metadata file, using it to set prefix and then using get_software_root instead), in the toolchain definitions in framework, and as an external module dependency in the FCC easyconfig
    • response: "language environment is Fugaku specific, it cannot be used in other Fujitsu machines". So it does seem this is a "Fugaku" toolchain, not a "Fujitsu" toolchain
  • permanence of the lang/tcsds modules: will they always be available?
    • at Fugaku, old modules that were only present in compute nodes have been removed, not sure if the ones that were also published in login nodes (as is the case of the tcsds-1.2.31 version that we are using for 4.5.0/21.05) ever will
    • response: "The language environment (...) is retained for three versions including the latest version". Suggested that older versions are archived instead of deleted, i.e. not immediately visible but still available after some extra step, e.g. module use .... Otherwise, we'll need to remove the version pinning and revert to FCC-21.05 instead of FCC-4.5.0, as a more recent module will change the compiler version...
Isambard
  • environment module is fujitsu-compiler/4.3.1 (after a module use), this needs to be changed in the FCC easyconfig and in the toolchain definition...
    • the module doesn't set FJSVXTCLANGA, so it needs to be set manually
    • this path is actually all that's needed, so since the environments differ, maybe we should simply rely on a single environment variable?
  • large page allocation doesn't seem to be enabled/supported ("libmpg BUG!! ... Assertion '0' failed.", setting -Knolargepage but again, we need a way of always injecting this without breaking scripts that expect CC to be only the executable...
  • the fujitsu-compiler module adds the top level include folder to C_INCLUDE_PATH and CPLUS_INCLUDE_PATH, but that breaks -Nclang mode, the wrong headers are included (in particular arm-sve.h)

Update June 2024:

Deucalion
  • environment module is FJSVstclanga/1.0.21.02a (which simply adds /opt/FJSVstclanga/cp-1.0.21.02a/{bin,lib64,man} to $PATH, $LD_LIBRARY_PATH and $MANPATH, plus UCX_RNDV_THRESH=64k)
    • the module also doesn't set FJSVXTCLANGA, same as Deucalion, so only the Fugaku module set it
    • the root path is actually all that's needed, so since the environments differ, maybe we should simply rely on a single environment variable?
  • large page allocation (libmpg) is enabled, same as Fugaku, different from Isambard, so -Klargepage, the default, can be used
  • numpy (<1.26) + ssl2 works very well, including multi-threaded, as long as python itself is built with the fujitsu compiler and linked with fjomplib
  • using gcccore as a subtoolchain instead of building everything from fcc from scratch also works
  • currently exploring trade-offs between "bottom-up" approach (build everything with fcc, better performance everywhere, but in most cases not by a lot, and lot more work supporting new versions) vs "top-down" approach (re-use gcccore (eventually from EESSI?)) and only rebuild what really benefits from the fujitsu compiler and libraries
  • possibility of adapting FlexiBLAS to support SSL2, so that even gofbf doesn't need to be rebuilt? (FFTW can easily be overriden with fujitsu's fork)
  • OpenMPI vs Fujitsu MPI may not be very relevant at Deucalion, since it has regular Infiniband, not TofuD like Fugaku
  • for things that benefit from multithreaded SSL2 called from Python (e.g. numpy/scipy/etc.), one might as well use the "bottom-up" approach, since Python itself is pretty far "down"
  • but for everything else, the "top-down" approach is currently looking more promising
@migueldiascosta migueldiascosta added this to the release after 4.4.0 milestone Jun 1, 2021
@migueldiascosta migueldiascosta self-assigned this Jun 1, 2021
@migueldiascosta migueldiascosta changed the title Fujitsu toolchain TODO list for EB v4.4.1 Fujitsu toolchain TODO list for EB v4.4.2 Jul 12, 2021
@boegel boegel modified the milestones: 4.4.2, release after 4.4.2 Sep 1, 2021
@boegel boegel changed the title Fujitsu toolchain TODO list for EB v4.4.2 Fujitsu toolchain TODO list Sep 1, 2021
@boegel boegel modified the milestones: 4.5.2, 4.x Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants