Yambo 4.5
With Yambo 4.5 support to CUDA Fortran has been implemented
- Yambo structure modified to deal with GPU accelerator devices porting done using CUDA Fortran (available whit the PGI compiler)
- DIPOLES, RESPONSE FUNCTION, HF, GW, BSE have been ported;
- fully compatible with MPI and OpenMP; typically, 1 MPI/card, OpenMP threads used to exploit the remaining computational capabilities of the host.
- inclusion of dedicated headers (dev_defs.h) to handle simultaneously the CPU and GPU compilation.
- GPU allocations integrated in YAMBO_ALLOC/YAMBO_FREE and memory module.
- DevXlib (developed jointly with the QE team and hosted as a separate repo on GitLab) imported and extensively used to provide wrappers for memcpy, sync, init, and simple data operations.
New DIPOLES_driver
As part of the modularization process of the code, within the MAX project, all the subroutines dealing with the calculation and the I/O of the dipoles have been moved under the folder src/dipoles and the "dipoles" runlevel has been created. The DIPOLES_driver is not called directly by the yambo_driver. This made possible the creation of a dedicated parallel scheme for the dipoles and thus a more efficient distribution of the calculation (both time-to-solution and memory footprint). Later, other runlevels just need to load the pre-computed DIPOLES from disk. This also avoids strong load umbalance for example in the calculation of the response function, where the dipoles are needed only at q=0 .
Modularization of the BSE subroutines.
The files
K.o K_correlation_collisions.o K_exchange_collisions.o
have been split into
K.o K_correlation_collisions.o K_exchange_collisions.o K_correlation_kernel.o K_exchange_kernel.o K_screened_interaction.o
This reduces code replication and to make possible an easier handling of CUDA and OPENMP directives
Moreover the code is ready for finite-q BSE implementation which will be likely made available with the next release
More:
- Reorganization of the main yambo_driver. The main subroutine of the code has been cleaned and reorganized to allow a more easy implementation of new features;
- p2y can now also read the output of the projwfc.x post processing (QE suite);
- improved configuration of external libraries;
- new mapping of the k-points introduced. It can be useful for gamma centered grids in hexagoanl cells, when the standard mapping may fail;
- subroutine G_index_energy_factor introduced;
- IO of some tables moved from integer/real to character to reduce disk use in real-time calculations;
- general improvements in coulomb_cutoff for reduced dimensionality systems;
- Modularization of the subroutines dealing with the input file. Subroutine
src/interface/INT.F
split into
INIT.o INIT_read_command_line.o INIT_check_databases.o INIT_activate.o ; - Several bug-fixes.