Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to add toolchain for NEC #53

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

PouyanLaleh
Copy link

Hello,

The toolchain for cross-compiling on the NEC system is added. The READM file is modified to instruct compiling for NEC as well. We applied minor changes in 2 cmake files to avoid compilation errors on NEC. With this pull request, nothing is modified in the code.

Thanks,
Laleh

@FussyDuck
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Fatemeh Pouyan seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@mlange05
Copy link
Collaborator

Hi, first of all, many thanks for this contribution; this looks like a very valuable addition to our collection of architectures, and the numbers seem very impressive. We have no facilities right now to automatically test this, but we would very happily accept the environment as a reference implementation.

For the contribution itself, I've flagged a few inconsistencies, that simply seem like debug/dev leftovers; but it looks like a small clean-up effort. I've also noted that the base and target branch are main, while we tend to merge new contributions into develop first. I'll change the target to develop, but you might also want to consider a rebase or merge to make the contribution more up-to-date?

@PouyanLaleh You'll also need to sing the CLA (fussyduck). I'll put this into "changes requested" for now, but once the above is addressed and the license is signed, I'll enable the CI runners to get this merged quickly. And thanks again for a fantastic first contribution. 😄

@mlange05 mlange05 changed the base branch from main to develop May 30, 2023 13:38
Copy link
Collaborator

@mlange05 mlange05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, many thanks. I've flagged a few things that need cleaning up before merging, and we will unfortunately be able to auto-test this, but once the clean-ups are addressed we should be able to merge this.

# OpenAcc FLAGS
####################################################################

set( OpenACC_Fortran_FLAGS "-acc -ta=tesla:lineinfo,deepcopy,maxregcount:100,fastmath" )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look suspiciously like Nvidia GPU flags; does NEC really support OpenACC, or are these development/debug leftovers?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, they are leftovers that I forgot to remove them.

# OpenAcc FLAGS
####################################################################

set( OpenACC_Fortran_FLAGS "-acc -ta=tesla:lineinfo,deepcopy,maxregcount:100,fastmath" )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look suspiciously like Nvidia GPU flags; does NEC really support OpenACC, or are these development/debug leftovers?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't. Leftovers as well!

set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -floop-unroll-complete=200 ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -ftrace")
###set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -fmove-loop-invariants-if ")
###set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -freplace-loop-equation ")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind having commented out options like this lying around, but could we maybe group them, and or annotate them with comments that provide a bit of context as to why they are enabled/disabled?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can comment them because -ftrace enables profiling feature and -floop-unroll-complete=200 are used for optimisation purpose here.

@@ -37,14 +37,6 @@ set( OpenACC_Fortran_FLAGS "-acc=gpu -mp=gpu -gpu=cc80,lineinfo,fastmath" CACHE
# Enable this to get more detailed compiler output
# set( OpenACC_Fortran_FLAGS "${OpenACC_Fortran_FLAGS} -Minfo" )

####################################################################
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not sure the NEC changes require us to change Nvidia configurations as such?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, defining the OpenACC flags does not cause error on NEC but they are unnecessary for this toolchain.

# COMPILER
####################################################################

include( /opt/nec/ve/share/cmake/toolchainVE.cmake )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traditionally, the low-level toolchains (per compiler version) are symlink up to the arch/toolchains/<arch-toolchain>-.cmake files higher up, for re-use?

@@ -60,29 +60,6 @@ Balthasar Reuter ([email protected])
move parameter structures to constant memory. To enable this variant,
a suitable CUDA installation is required and the `--with-cuda` flag
needs to be passed at the build stage.
- **dwarf-cloudsc-gpu-scc-cuf-k-caching**: GPU-enabled and further
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this here was deleted by accident? Or was this intended?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd also be great, if you could add a small section (or bullet point) to the README to tell users about the availability of the NEC variant. Probably a small paragraph?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the deleted part was not intentional.

I thought I already added how to compile on NEC in the README file. I'll check that again.

Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks, it's great to have the NEC toolchain with the promising performance numbers in the repo. Can only parrot what Michael has said already, but left a few extra questions.

Just a remark on the CLA: You may have to add the email address that you used for the commits to your Github account to be able to sign the CLA.

@@ -19,7 +19,7 @@ project( dwarf-p-cloudsc LANGUAGES C Fortran )

include( cmake/compat.cmake )
if( CMAKE_Fortran_COMPILER_ID MATCHES "GNU")
ecbuild_add_fortran_flags("-ffree-line-length-none")
# ecbuild_add_fortran_flags("-ffree-line-length-none")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only affect the GNU compiler, does this really have to be removed for NEC? If it makes problems with the NEC compiler, could this be suitably guarded so it still takes effect for GNU?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it makes compiling error on NEC.

@@ -231,6 +208,17 @@ cd build
./bin/dwarf-cloudsc-fortran 4 16384 32 # The cleaned-up Fortran
./bin/dwarf-cloudsc-c 4 16384 32 # The standalone C version
```
### Building on NEC SX-AURORA TSUBAS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the VE not called TSUBAS A ? Applies also to the line below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, VE is vector engine and NEC SX-AURORA TSUBAS is typically used as the architecture.

HDF5_ROOT=HDF5-installation-PATH ./cloudsc-bundle build --arch arch/ecmwf/aurora/nec/4.0.0/ [--single-precision] [--with-mpi] --hdf5 ON --cloudsc-fortran ON --cloudsc-prototype1 OFF --verbose --log DEBUG
```

Currently available `NEC ompiler/version` selections are:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"compiler" is missing a "c"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@@ -272,27 +260,6 @@ srun bash -c "CUDA_VISIBLE_DEVICES=\$SLURM_LOCALID bin/dwarf-cloudsc-gpu-scc-hoi

In principle, the same should work for multi-node execution (`-N 2`, `-N 4` etc.) once interconnect issues are resolved.

### GPU runs: Timing device kernels and data transfers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please re-add this paragraph


Currently available `NEC ompiler/version` selections are:

* `nec/4.0.0 (nfort, ncc, nc++)`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add also an example invocation and approximate performance that can be expected?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll add it

export VE_FPE_ENABLE=DIV,INV,FOF,FUF,INE


export PATH="/local/hdd/nabr/openmpi/nvhpc-nompi/20.9/bin:$PATH"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this absolute path setting should be necessary - also, it looks like it is carried over from volta?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I forgot to remove it

####################################################################
# Enviroment Variables
####################################################################
set(NMPI_ROOT /opt/nec/ve/mpi/2.23.0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same on all NEC VE machines, or should this maybe be exported as an environment variable in env.sh and then picked up in the toolchain file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not. I think environment variable is the better idea for it.

DEFINITIONS ${CLOUDSC_DEFINITIONS}
)

target_link_libraries( dwarf-cloudsc-fortran PRIVATE cloudsc-common-lib )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason for adding target_link_libraries, target_include_directories separately instead of pointing here to the cloudsc-common-lib in LIBS, as it was done before? That way, include and link settings propagate automatically.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to check this part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants