C++ prototypes of muphys project using heterogeneous libraries and C extensions.
Online documentation is generated automatically using doxygen
.
- NetCDF for CXX
- for Levante:
spack load [email protected]
- for Levante:
Other dependency like googletest is built in-tree from github archives.
- Implementation - The sequential implementation is selected by default. The user can choose of the following options:
- MU_IMPL=seq - C++ serial implementation
- Precision (default is
double
) - MU_ENABLE_SINGLE - to switch to
float
- Unit-test - compile tests together with the main executable (default is
true
)- MU_ENABLE_TESTS
cmake -DMU_IMPL=seq -B build -S .
cmake --build build
./build/bin/graupel input_file.nc
- Run tests manually:
cd build && ctest
muphys-cpp is available under a BSD 3-clause license. See LICENSES/ for license information and AUTHORS.TXT for a list of authors.
-
Task: Creat an optimized parallel implementation of muphys
- Requirements:
- 4 Versions: Correctness/Performance x CPU/GPU
- Directive Programming: Use OpenMP (Offload to GPU) !No CUDA Kernels
- Parallelisation Task:
- Parallelize the sequential implementation
- Can rewrite the algorithm entirely
- Optimization Task:
- Look reordering/collapse/rewrte
- Minimize critical regions
- Data structure to optimize cache utilization
- Optimal CPU-GPU memory transfer
- Requirements:
-
Layout:
- code:
- /core: physics interactions
- /io: read/write/ I/O files
- /implementations/sequential
- external dependencies:
- /extern
- validation:
- /test: automated unit tests
- /tasks: input files
- /reference_results: output fiels
- /scripts: Levante batch files
- code:
-
Validation:
- CPU-CPU:
cdo diffv
bit-identical - CPU-GPU:
cdo sub
within floating point tolerance
- CPU-CPU:
-
Submission:
- Path to the implementation
- Script to build the 4 versions
- Slurm logs to confirm the results
- Summary list of optimizations performed
- Plots to confirm the performance results
- (Optional) Profiler analysis output & interpretation
- (Optional) Experience report for using OpenMP
-
Grading:
- Correctness(50%): results are correct on both CPU and GPU
- Performance(50%): the code is faster than the sequential version
- How to determine?
- Will the teams be ranked?
- No, but the fastest may get a bonus
- Bonus(20%): readability, portability, extreme performance, optional documentation ...