Align GpuComplex to its size (AMReX-Codes#3691)

## Summary As discussed in AMReX-Codes#3677, this PR makes the alignment of `amrex::GpuComplex` stricter to allow for coalesced memory accesses of arrays of GpuComplex by nvidia GPUs such as A100. Note that this may break `reinterpret_cast` from an array allocated as `std::complex` to `amrex::GpuComplex`, but not the other way around. ## Additional background Typical allocators (malloc, amrex CArena) give memory aligned to 16 bytes and CUDA allocators aligned to 256 bytes, which is sufficient for `amrex::GpuComplex<double>`. ## Checklist The proposed changes: - [x] fix a bug or incorrect behavior in AMReX - [ ] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate
ajnonaka · Jan 9, 2024 · 656bb64 · 656bb64
1 parent e0b77e1
commit 656bb64
Showing 1 changed file with 4 additions and 1 deletion.
diff --git a/Src/Base/AMReX_GpuComplex.H b/Src/Base/AMReX_GpuComplex.H
@@ -20,9 +20,12 @@ T norm (const GpuComplex<T>& a_z) noexcept;
  *  work in device code with Cuda yet.
  *
  *  Should be bit-wise compatible with std::complex.
+ *
+ *  GpuComplex is aligned to its size (stricter than std::complex) to allow for
+ *  coalesced memory accesses with nvidia GPUs.
  */
 template <typename T>
-struct GpuComplex
+struct alignas(2*sizeof(T)) GpuComplex
 {
     using value_type = T;