Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to add toolchain for NEC #53

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ project( dwarf-p-cloudsc LANGUAGES C Fortran )

include( cmake/compat.cmake )
if( CMAKE_Fortran_COMPILER_ID MATCHES "GNU")
ecbuild_add_fortran_flags("-ffree-line-length-none")
# ecbuild_add_fortran_flags("-ffree-line-length-none")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only affect the GNU compiler, does this really have to be removed for NEC? If it makes problems with the NEC compiler, could this be suitably guarded so it still takes effect for GNU?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it makes compiling error on NEC.

if( CMAKE_Fortran_COMPILER_VERSION VERSION_GREATER_EQUAL "10.0")
ecbuild_add_fortran_flags("-fallow-argument-mismatch")
endif()
Expand Down
55 changes: 11 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,29 +60,6 @@ Balthasar Reuter ([email protected])
move parameter structures to constant memory. To enable this variant,
a suitable CUDA installation is required and the `--with-cuda` flag
needs to be passed at the build stage.
- **dwarf-cloudsc-gpu-scc-cuf-k-caching**: GPU-enabled and further
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this here was deleted by accident? Or was this intended?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd also be great, if you could add a small section (or bullet point) to the README to tell users about the availability of the NEC variant. Probably a small paragraph?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the deleted part was not intentional.

I thought I already added how to compile on NEC in the README file. I'll check that again.

optimized version of CLOUDSC that uses the SCC loop layout in
combination with loop fusion and temporary local array demotion, implemented
using CUDA-Fortran (CUF). To enable this variant,
a suitable CUDA installation is required and the `--with-cuda` flag
needs to be passed at the build stage.
- **CUDA C prototypes**: To enable these variants, a suitable
CUDA installation is required and the `--with-cuda` flag needs
to be pased at the build stage.
- **dwarf-cloudsc-cuda**: GPU-enabled, CUDA C version of CLOUDSC.
- **dwarf-cloudsc-cuda-hoist**: GPU-enabled, optimized CUDA C version
of CLOUDSC including host side hoisted temporary local variables.
- **dwarf-cloudsc-cuda-k-caching**: GPU-enabled, further optimized CUDA
C version of CLOUDSC including loop fusion and temporary local
array demotion.
- **dwarf-cloudsc-gpu-scc-field**: GPU-enabled and optimized version of
CLOUDSC that uses the SCC loop layout, and a dedicated Fortran FIELD
API to manage device offload and copyback. The intent is to demonstrate
the explicit use of pinned host memory to speed-up data transfers, as
provided by the shipped prototype implmentation, and investigate the
effect of different data storage allocation layouts. To enable this
variant, a suitable CUDA installation is required and the
`--with-cuda` flag needs to be passed at the build stage.

## Download and Installation

Expand Down Expand Up @@ -231,6 +208,17 @@ cd build
./bin/dwarf-cloudsc-fortran 4 16384 32 # The cleaned-up Fortran
./bin/dwarf-cloudsc-c 4 16384 32 # The standalone C version
```
### Building on NEC SX-AURORA TSUBAS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the VE not called TSUBAS A ? Applies also to the line below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, VE is vector engine and NEC SX-AURORA TSUBAS is typically used as the architecture.

To build on NEC SX-AURORA TSUBAS system, run the following commands

```sh
./cloudsc-bundle create
HDF5_ROOT=HDF5-installation-PATH ./cloudsc-bundle build --arch arch/ecmwf/aurora/nec/4.0.0/ [--single-precision] [--with-mpi] --hdf5 ON --cloudsc-fortran ON --cloudsc-prototype1 OFF --verbose --log DEBUG
```

Currently available `NEC ompiler/version` selections are:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"compiler" is missing a "c"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks


* `nec/4.0.0 (nfort, ncc, nc++)`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add also an example invocation and approximate performance that can be expected?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll add it


### Running on ECMWF's Atos BullSequana XH2000

Expand Down Expand Up @@ -272,27 +260,6 @@ srun bash -c "CUDA_VISIBLE_DEVICES=\$SLURM_LOCALID bin/dwarf-cloudsc-gpu-scc-hoi

In principle, the same should work for multi-node execution (`-N 2`, `-N 4` etc.) once interconnect issues are resolved.

### GPU runs: Timing device kernels and data transfers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please re-add this paragraph


For GPU-enabled runs two internal timer results are reported:

* The isolated compute time of the main compute kernel on device (where `#BLKS == 1`)
* The overall time of the execution loop including data offload and copyback

It is important to note that due to the nature of the kernel, data
transfer overheads will dominate timings, and that most supported GPU
variants aim to optimise compute kernel timings only. However, a
dedicated variant `dwarf-cloudsc-gpu-scc-field` has been added to
explore host-side memory pinning, which improves data transfer times
and alternative data layout strategies. By default, this will allocate
each array variable individually in pinned memory. A runtime flag
`CLOUDSC_PACKED_STORAGE=ON` can be used to enable "packed" storage,
where multiple arrays are stored in a single base allocation, eg.

```sh
NV_ACC_CUDA_HEAPSIZE=8G CLOUDSC_PACKED_STORAGE=ON ./bin/dwarf-cloudsc-gpu-scc-field 1 80000 128
```

## Loki transformations for CLOUDSC

[Loki](https://github.com/ecmwf-ifs/loki) is an in-house developed
Expand Down
33 changes: 33 additions & 0 deletions arch/ecmwf/aurora/nec/4.0.0/env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Source me to get the correct configure/build/run environment

# Store tracing and disable (module is *way* too verbose)
{ tracing_=${-//[^x]/}; set +x; } 2>/dev/null

module_load() {
echo "+ module load $1"
module load $1
}
module_unload() {
echo "+ module unload $1"
module unload $1
}

export FC=nfort
export CC=ncc
export CXX=nc++

set -x

# Increase stack size to maximum
ulimit -S -s unlimited

# Enable floating point error trapping at run time
export VE_FPE_ENABLE=DIV,INV,FOF,FUF,INE


export PATH="/local/hdd/nabr/openmpi/nvhpc-nompi/20.9/bin:$PATH"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this absolute path setting should be necessary - also, it looks like it is carried over from volta?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I forgot to remove it


# Restore tracing to stored setting
if [[ -n "$tracing_" ]]; then set -x; else set +x; fi

export ECBUILD_TOOLCHAIN="./toolchain.cmake"
89 changes: 89 additions & 0 deletions arch/ecmwf/aurora/nec/4.0.0/toolchain.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@

####################################################################
# COMPILER
####################################################################

include( /opt/nec/ve/share/cmake/toolchainVE.cmake )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traditionally, the low-level toolchains (per compiler version) are symlink up to the arch/toolchains/<arch-toolchain>-.cmake files higher up, for re-use?



set( ECBUILD_FIND_MPI ON )

####################################################################
# Enviroment Variables
####################################################################
set(NMPI_ROOT /opt/nec/ve/mpi/2.23.0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same on all NEC VE machines, or should this maybe be exported as an environment variable in env.sh and then picked up in the toolchain file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not. I think environment variable is the better idea for it.


####################################################################
# OpenMP FLAGS
####################################################################

set( OpenMP_C_FLAGS "-fopenmp " )
set( OpenMP_CXX_FLAGS "-fopenmp " )
set( OpenMP_Fortran_FLAGS "-fopenmp " )

####################################################################
# OpenAcc FLAGS
####################################################################

set( OpenACC_Fortran_FLAGS "-acc -ta=tesla:lineinfo,deepcopy,maxregcount:100,fastmath" )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look suspiciously like Nvidia GPU flags; does NEC really support OpenACC, or are these development/debug leftovers?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, they are leftovers that I forgot to remove them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look suspiciously like Nvidia GPU flags; does NEC really support OpenACC, or are these development/debug leftovers?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't. Leftovers as well!

set( OpenACC_Fortran_FLAGS "${OpenACC_Fortran_FLAGS} -Mvect=levels:6" )
set( OpenACC_Fortran_FLAGS "${OpenACC_Fortran_FLAGS} -Mconcur=levels:6" )
set( OpenACC_Fortran_FLAGS "${OpenACC_Fortran_FLAGS} -Minfo" )

####################################################################
# NEC MPI Compiler
####################################################################

set(MPI_C_COMPILER ${NMPI_ROOT}/bin/mpincc CACHE FILEPATH "")
set(MPI_C_INCLUDE_PATH ${NMPI_ROOT}/include CACHE FILEPATH "")
set(MPI_C_LIBRARIES ${NMPI_ROOT}/lib64/ve/libmpi.a CACHE FILEPATH "")
set(MPI_C_COMPILE_FLAGS "-D_MPIPP_INCLUDE" CACHE STRING "")

set(MPI_CXX_COMPILER ${NMPI_ROOT}/bin/mpinc++ CACHE FILEPATH "")
set(MPI_CXX_INCLUDE_PATH ${NMPI_ROOT}/include CACHE FILEPATH "")
set(MPI_CXX_LIBRARIES ${NMPI_ROOT}/lib64/ve/libmpi++.a CACHE FILEPATH "")

set(MPI_Fortran_COMPILER ${NMPI_ROOT}/bin/mpifort CACHE FILEPATH "")
set(MPI_Fortran_INCLUDE_PATH ${NMPI_ROOT}/include CACHE FILEPATH "")
set(MPI_Fortran_ADDITIONAL_INCLUDE_DIR ${NMPI_ROOT}/lib/ve/module CACHE FILEPATH "")
set(MPI_Fortran_LIBRARIES ${NMPI_ROOT}/lib64/ve/libmpi.a CACHE FILEPATH "")
set(MPI_Fortran_COMPILE_FLAGS "-D_MPIPP_INCLUDE" CACHE STRING "")
####################################################################
# COMMON FLAGS
####################################################################

set(ECBUILD_Fortran_FLAGS "-fpic")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -mstack-arrays")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -fdiag-vector=3")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -fcse-after-vectorization")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -floop-collapse ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -floop-fusion ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -floop-interchange ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -floop-unroll-complete=200 ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -ftrace")
###set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -fmove-loop-invariants-if ")
###set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -freplace-loop-equation ")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind having commented out options like this lying around, but could we maybe group them, and or annotate them with comments that provide a bit of context as to why they are enabled/disabled?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can comment them because -ftrace enables profiling feature and -floop-unroll-complete=200 are used for optimisation purpose here.

set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -msched-interblock ")
###set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -mvector-floating-divide-instruction ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -mvector-power-to-explog ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -mvector-sqrt-instruction ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -mvector-threshold=3 ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -finline-functions ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -finline-max-depth=5 ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -finline-max-function-size=200 ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -mvector-merge-conditional ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -fivdep ")
###set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -floop-strip-mine ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -muse-mmap ")
##set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -mvector-packed")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -report-all")

set( ECBUILD_Fortran_FLAGS_BIT "-O4 -mvector-fma" )

set( ECBUILD_C_FLAGS "-O2 " )

set( ECBUILD_CXX_FLAGS "-O2" )

# Fix for C++ template headers needed for Serialbox
set( GNU_HEADER_INCLUDE "-I/usr/local/apps/gcc/7.3.0/lib/gcc/x86_64-linux-gnu/7.3.0/include-fixed" )
set( ECBUILD_CXX_FLAGS "${ECBUILD_CXX_FLAGS} ${GNU_HEADER_INCLUDE}" )
8 changes: 0 additions & 8 deletions arch/toolchains/ecmwf-hpc2020-nvhpc.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,6 @@ set( OpenACC_Fortran_FLAGS "-acc=gpu -mp=gpu -gpu=cc80,lineinfo,fastmath" CACHE
# Enable this to get more detailed compiler output
# set( OpenACC_Fortran_FLAGS "${OpenACC_Fortran_FLAGS} -Minfo" )

####################################################################
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not sure the NEC changes require us to change Nvidia configurations as such?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, defining the OpenACC flags does not cause error on NEC but they are unnecessary for this toolchain.

# CUDA FLAGS
####################################################################

if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
set(CMAKE_CUDA_ARCHITECTURES 80)
endif()

####################################################################
# COMMON FLAGS
####################################################################
Expand Down
89 changes: 89 additions & 0 deletions arch/toolchains/ecmwf-nec-aurora.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@

####################################################################
# COMPILER
####################################################################

include( /opt/nec/ve/share/cmake/toolchainVE.cmake )


set( ECBUILD_FIND_MPI ON )

####################################################################
# Enviroment Variables
####################################################################
set(NMPI_ROOT /opt/nec/ve/mpi/2.23.0)

####################################################################
# OpenMP FLAGS
####################################################################

set( OpenMP_C_FLAGS "-fopenmp " )
set( OpenMP_CXX_FLAGS "-fopenmp " )
set( OpenMP_Fortran_FLAGS "-fopenmp " )

####################################################################
# OpenAcc FLAGS
####################################################################

set( OpenACC_Fortran_FLAGS "-acc -ta=tesla:lineinfo,deepcopy,maxregcount:100,fastmath" )
set( OpenACC_Fortran_FLAGS "${OpenACC_Fortran_FLAGS} -Mvect=levels:6" )
set( OpenACC_Fortran_FLAGS "${OpenACC_Fortran_FLAGS} -Mconcur=levels:6" )
set( OpenACC_Fortran_FLAGS "${OpenACC_Fortran_FLAGS} -Minfo" )

####################################################################
# NEC MPI Compiler
####################################################################

set(MPI_C_COMPILER ${NMPI_ROOT}/bin/mpincc CACHE FILEPATH "")
set(MPI_C_INCLUDE_PATH ${NMPI_ROOT}/include CACHE FILEPATH "")
set(MPI_C_LIBRARIES ${NMPI_ROOT}/lib64/ve/libmpi.a CACHE FILEPATH "")
set(MPI_C_COMPILE_FLAGS "-D_MPIPP_INCLUDE" CACHE STRING "")

set(MPI_CXX_COMPILER ${NMPI_ROOT}/bin/mpinc++ CACHE FILEPATH "")
set(MPI_CXX_INCLUDE_PATH ${NMPI_ROOT}/include CACHE FILEPATH "")
set(MPI_CXX_LIBRARIES ${NMPI_ROOT}/lib64/ve/libmpi++.a CACHE FILEPATH "")

set(MPI_Fortran_COMPILER ${NMPI_ROOT}/bin/mpifort CACHE FILEPATH "")
set(MPI_Fortran_INCLUDE_PATH ${NMPI_ROOT}/include CACHE FILEPATH "")
set(MPI_Fortran_ADDITIONAL_INCLUDE_DIR ${NMPI_ROOT}/lib/ve/module CACHE FILEPATH "")
set(MPI_Fortran_LIBRARIES ${NMPI_ROOT}/lib64/ve/libmpi.a CACHE FILEPATH "")
set(MPI_Fortran_COMPILE_FLAGS "-D_MPIPP_INCLUDE" CACHE STRING "")
####################################################################
# COMMON FLAGS
####################################################################

set(ECBUILD_Fortran_FLAGS "-fpic")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -mstack-arrays")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -fdiag-vector=3")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -fcse-after-vectorization")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -floop-collapse ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -floop-fusion ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -floop-interchange ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -floop-unroll-complete=200 ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -ftrace")
###set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -fmove-loop-invariants-if ")
###set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -freplace-loop-equation ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -msched-interblock ")
###set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -mvector-floating-divide-instruction ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -mvector-power-to-explog ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -mvector-sqrt-instruction ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -mvector-threshold=3 ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -finline-functions ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -finline-max-depth=5 ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -finline-max-function-size=200 ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -mvector-merge-conditional ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -fivdep ")
###set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -floop-strip-mine ")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -muse-mmap ")
##set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -mvector-packed")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -report-all")

set( ECBUILD_Fortran_FLAGS_BIT "-O4 -mvector-fma" )

set( ECBUILD_C_FLAGS "-O2 " )

set( ECBUILD_CXX_FLAGS "-O2" )

# Fix for C++ template headers needed for Serialbox
set( GNU_HEADER_INCLUDE "-I/usr/local/apps/gcc/7.3.0/lib/gcc/x86_64-linux-gnu/7.3.0/include-fixed" )
set( ECBUILD_CXX_FLAGS "${ECBUILD_CXX_FLAGS} ${GNU_HEADER_INCLUDE}" )
2 changes: 0 additions & 2 deletions arch/toolchains/ecmwf-volta-pgi-gpu.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,6 @@ set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -Ktrap=fp")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -Kieee")
set(ECBUILD_Fortran_FLAGS "${ECBUILD_Fortran_FLAGS} -Mdaz")

set(ECBUILD_Fortran_LINK_FLAGS "-gpu=pinned")

set( ECBUILD_Fortran_FLAGS_BIT "-O2 -gopt" )

set( ECBUILD_C_FLAGS "-O2 -gopt -traceback" )
Expand Down
6 changes: 4 additions & 2 deletions src/cloudsc_fortran/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,20 @@ if( HAVE_CLOUDSC_FORTRAN )
dwarf_cloudsc.F90
cloudsc_driver_mod.F90
cloudsc.F90
LIBS
cloudsc-common-lib
DEFINITIONS ${CLOUDSC_DEFINITIONS}
)

target_link_libraries( dwarf-cloudsc-fortran PRIVATE cloudsc-common-lib )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason for adding target_link_libraries, target_include_directories separately instead of pointing here to the cloudsc-common-lib in LIBS, as it was done before? That way, include and link settings propagate automatically.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to check this part.


# Create symlink for the input data
if( HAVE_SERIALBOX )
execute_process(COMMAND ${CMAKE_COMMAND} -E create_symlink
${CMAKE_CURRENT_SOURCE_DIR}/../../data ${CMAKE_CURRENT_BINARY_DIR}/../../../data )
endif()

if( HAVE_HDF5 )
target_include_directories( dwarf-cloudsc-fortran PRIVATE ${HDF5_Fortran_INCLUDE_DIRS} )
target_link_libraries( dwarf-cloudsc-fortran PRIVATE ${HDF5_LIBRARIES} )
execute_process(COMMAND ${CMAKE_COMMAND} -E create_symlink
${CMAKE_CURRENT_SOURCE_DIR}/../../config-files/input.h5 ${CMAKE_CURRENT_BINARY_DIR}/../../../input.h5 )
execute_process(COMMAND ${CMAKE_COMMAND} -E create_symlink
Expand Down