You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue documents a way to potentially enhance the codebase that briefly discussed in PR #106.
Overview
The idea is to replace nested for-loop in (C/C++) code with a macro of the format: FULL_LOOP(index, index_helper, omp_pragma), where
index is the name of the variable that is being iterated over
index_helper is a previously initialized instance of grackle_index_helper
omp_pragma is a string-literal holding the OpenMP-related pragma arguments. This is conditionally applied (when _OPENMP is defined) to the outer-loop using the C99 _Pragma keyword.
A major motivation for doing something like this is to facilitate the testing of different loop structures for GPU parallelism in the future.
For concreteness, I'll show how one of the loops in local_calculate_pressure would change.
Effective current implementation (with changes from PR #106)
double tiny_number = 1.e-20;
const grackle_index_helper ind_helper = _build_index_helper(my_fields);
# ifdef _OPENMP
# pragma omp parallel for schedule( runtime )
# endiffor (int outer_ind = 0; outer_ind < ind_helper.outer_ind_size; outer_ind++){
const grackle_index_range range = _inner_range(outer_ind, &ind_helper);
for (intindex = range.start; index <= range.end; index++) {
pressure[index] = ((my_chemistry->Gamma - 1.0) *
my_fields->density[index] *
my_fields->internal_energy[index]);
if (pressure[index] < tiny_number)
pressure[index] = tiny_number;
} // end: loop over i
} // end: loop over outer_ind
I'm not a huge fan of declaring/incrementing 2 variables inside the inner-loop, but doing it this way makes the application of the loop a lot cleaner. I have confirmed that the above macro definitely works (both with and without openmp), by converting both loops in local_calculate_pressure.
At this point, it would not take much effort to finish implementing this change, so let me know if you want me to do that (in PR #106 or a separate PR). However, it may make more sense to wait until we really need this for GPU-experimentation
The text was updated successfully, but these errors were encountered:
This issue documents a way to potentially enhance the codebase that briefly discussed in PR #106.
Overview
The idea is to replace nested for-loop in (C/C++) code with a macro of the format:
FULL_LOOP(index, index_helper, omp_pragma)
, whereindex
is the name of the variable that is being iterated overindex_helper
is a previously initialized instance of grackle_index_helperomp_pragma is
a string-literal holding the OpenMP-related pragma arguments. This is conditionally applied (when_OPENMP
is defined) to the outer-loop using the C99_Pragma
keyword.A major motivation for doing something like this is to facilitate the testing of different loop structures for GPU parallelism in the future.
For concreteness, I'll show how one of the loops in
local_calculate_pressure
would change.Effective current implementation (with changes from PR #106)
Equivalent implementation with FULL_LOOP macro
Sample Implementation
An example implementation of this macro is provided below (I have confirmed that implementation works - with & without OpenMP):
I'm not a huge fan of declaring/incrementing 2 variables inside the inner-loop, but doing it this way makes the application of the loop a lot cleaner. I have confirmed that the above macro definitely works (both with and without openmp), by converting both loops in
local_calculate_pressure
.At this point, it would not take much effort to finish implementing this change, so let me know if you want me to do that (in PR #106 or a separate PR). However, it may make more sense to wait until we really need this for GPU-experimentation
The text was updated successfully, but these errors were encountered: