Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving thread crash logic #406

Open
wants to merge 10 commits into
base: dev
Choose a base branch
from
18 changes: 11 additions & 7 deletions src/grid/grid3D.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@
#endif
#include "../global/global.h"
#include "../grid/grid3D.h"
#include "../grid/grid_enum.h" // provides grid_enum
#include "../hydro/hydro_cuda.h" // provides Calc_dt_GPU
#include "../grid/grid_enum.h" // provides grid_enum
#include "../hydro/average_cells.h" // provides Average_Slow_Cells and SlowCellConditionChecker
#include "../hydro/hydro_cuda.h" // provides Calc_dt_GPU
#include "../integrators/VL_1D_cuda.h"
#include "../integrators/VL_2D_cuda.h"
#include "../integrators/VL_3D_cuda.h"
Expand Down Expand Up @@ -152,7 +153,9 @@ void Grid3D::Initialize(struct Parameters *P)

#ifdef AVERAGE_SLOW_CELLS
H.min_dt_slow = 1e-100; // Initialize the minumum dt to a tiny number
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely don't think this needs to happen with this PR, but Evan and I have talked about how it would be good to set min_dt_slow from the input parameter file instead of at compile time. Additionally, as I mentioned below, I think it would be good to require that it's a positive number. This is definitely beyond the scope of this PR, so feel free to ignore-just wanted to make a note of it while we're overhauling cell averaging!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! Especially since most of the logic that actually sets min_dt_slow is currently buried in the gravity functions.

#endif // AVERAGE_SLOW_CELLS
#else
H.min_dt_slow = -1.0;
#endif // AVERAGE_SLOW_CELLS

#ifndef MPI_CHOLLA

Expand Down Expand Up @@ -446,12 +449,12 @@ void Grid3D::Execute_Hydro_Integrator(void)
#ifdef VL
VL_Algorithm_3D_CUDA(C.device, C.d_Grav_potential, H.nx, H.ny, H.nz, x_off, y_off, z_off, H.n_ghost, H.dx, H.dy,
H.dz, H.xbound, H.ybound, H.zbound, H.dt, H.n_fields, H.custom_grav, H.density_floor,
C.Grav_potential);
C.Grav_potential, SlowCellConditionChecker(1.0 / H.min_dt_slow, H.dx, H.dy, H.dz));
#endif // VL
#ifdef SIMPLE
Simple_Algorithm_3D_CUDA(C.device, C.d_Grav_potential, H.nx, H.ny, H.nz, x_off, y_off, z_off, H.n_ghost, H.dx, H.dy,
H.dz, H.xbound, H.ybound, H.zbound, H.dt, H.n_fields, H.custom_grav, H.density_floor,
C.Grav_potential);
C.Grav_potential, SlowCellConditionChecker(1.0 / H.min_dt_slow, H.dx, H.dy, H.dz));
#endif // SIMPLE
} else {
chprintf("Error: Grid dimensions nx: %d ny: %d nz: %d not supported.\n", H.nx, H.ny, H.nz);
Expand Down Expand Up @@ -578,8 +581,9 @@ Real Grid3D::Update_Hydro_Grid()
ny_off = ny_local_start;
nz_off = nz_local_start;
#endif
Average_Slow_Cells(C.device, H.nx, H.ny, H.nz, H.n_ghost, H.n_fields, H.dx, H.dy, H.dz, gama, max_dti_slow, H.xbound,
H.ybound, H.zbound, nx_off, ny_off, nz_off);
Average_Slow_Cells(C.device, H.nx, H.ny, H.nz, H.n_ghost, H.n_fields, gama,
SlowCellConditionChecker(max_dti_slow, H.dx, H.dy, H.dz), H.xbound, H.ybound, H.zbound, nx_off,
ny_off, nz_off);
#endif // AVERAGE_SLOW_CELLS

// ==Calculate the next time step using Calc_dt_GPU from hydro/hydro_cuda.h==
Expand Down
5 changes: 3 additions & 2 deletions src/grid/grid3D.h
Original file line number Diff line number Diff line change
Expand Up @@ -210,9 +210,10 @@ struct Header {
* \brief Length of the current timestep */
Real dt;

#ifdef AVERAGE_SLOW_CELLS
/*! \brief Cells that introduce timesteps shorter than will be averaged with
* neighboring cells. Should be a negative value when the
* AVERAGE_SLOW_CELLS macro isn't defined. */
Real min_dt_slow;
#endif

/*! \var t_wall
* \brief Wall time */
Expand Down
54 changes: 54 additions & 0 deletions src/hydro/average_cells.h
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style-related comment since this is a solitary bit of new code-it would be good for the naming of the physics variables in here to follow the style guide.

Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
/*! \file average_cells.h
* \brief Definitions of functions and classes that implement logic related to averaging cells with
* neighbors. */

#ifndef AVERAGE_CELLS_H
#define AVERAGE_CELLS_H

#include <math.h>

#include "../global/global.h"

/*! \brief Object that checks whether a given cell meets the conditions for slow-cell averaging.
* The main motivation for creating this class is reducing ifdef statements (and allow to modify the
* actual slow-cell-condition. */
struct SlowCellConditionChecker {
// omit data-members if they aren't used for anything
#ifdef AVERAGE_SLOW_CELLS
Real max_dti_slow, dx, dy, dz;
#endif

/*! \brief Construct a new object. */
__host__ __device__ SlowCellConditionChecker(Real max_dti_slow, Real dx, Real dy, Real dz)
#ifdef AVERAGE_SLOW_CELLS
: max_dti_slow{max_dti_slow}, dx{dx}, dy{dy}, dz{dz}
#endif
{
}

/*! \brief Returns whether the cell meets the condition for being considered a slow cell that must
* be averaged. */
template <bool verbose = false>
__device__ bool is_slow(Real E, Real d, Real d_inv, Real vx, Real vy, Real vz, Real gamma) const
{
return this->max_dti_if_slow(E, d, d_inv, vx, vy, vz, gamma) >= 0.0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another minor question/comment, but is it possible for max_dti_if_slow to ever be zero here? If not, just for the sake of clarity, it might be good to change the condition to max_dti_if_slow > 0 instead of max_dti_if_slow >= 0.

I know that hydroInverseCrossingTime can return zero, but I still don't think it's possible for (max_dti > max_dti_slow) ? max_dti : -1.0; to evaluate to zero unless someone sets max_dti_slow to a negative number. Again, very minor, but I found it a little confusing.

In line with my comment about setting max_dti_if_slow, requiring that it's positive would make it so that (max_dti > max_dti_slow) ? max_dti : -1.0; never evaluated to zero and clarify this ambiguity.

}

/*! \brief Returns the max inverse timestep of the specified cell, if it meets the criteria for being
* a slow cell. If it doesn't, return a negative value instead.
*/
__device__ Real max_dti_if_slow(Real E, Real d, Real d_inv, Real vx, Real vy, Real vz, Real gamma) const;
};

#ifdef AVERAGE_SLOW_CELLS

void Average_Slow_Cells(Real *dev_conserved, int nx, int ny, int nz, int n_ghost, int n_fields, Real gamma,
SlowCellConditionChecker slow_check, Real xbound, Real ybound, Real zbound, int nx_offset,
int ny_offset, int nz_offset);

__global__ void Average_Slow_Cells_3D(Real *dev_conserved, int nx, int ny, int nz, int n_ghost, int n_fields,
Real gamma, SlowCellConditionChecker slow_check, Real xbound, Real ybound,
Real zbound, int nx_offset, int ny_offset, int nz_offset);
#endif

#endif /* AVERAGE_CELLS_H */
78 changes: 57 additions & 21 deletions src/hydro/hydro_cuda.cu
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include "../global/global.h"
#include "../global/global_cuda.h"
#include "../gravity/static_grav.h"
#include "../hydro/average_cells.h"
#include "../hydro/hydro_cuda.h"
#include "../utils/DeviceVector.h"
#include "../utils/cuda_utilities.h"
Expand Down Expand Up @@ -377,14 +378,30 @@ __global__ void Update_Conserved_Variables_3D(Real *dev_conserved, Real *Q_Lx, R
0.5 * dt * (gx * (d * vx + d_n * vx_n) + gy * (d * vy + d_n * vy_n) + gz * (d * vz + d_n * vz_n));

#endif // GRAVITY
}
}

__global__ void PostUpdate_Conserved_Correct_Crashed_3D(Real *dev_conserved, int nx, int ny, int nz, int x_off,
int y_off, int z_off, int n_ghost, Real gamma, int n_fields,
SlowCellConditionChecker slow_check)
{
int n_cells = nx * ny * nz;

// get a global thread ID
int id = threadIdx.x + blockIdx.x * blockDim.x;
int zid = id / (nx * ny);
int yid = (id - zid * nx * ny) / nx;
int xid = id - zid * nx * ny - yid * nx;

if (xid > n_ghost - 1 && xid < nx - n_ghost && yid > n_ghost - 1 && yid < ny - n_ghost && zid > n_ghost - 1 &&
zid < nz - n_ghost) {
#if !(defined(DENSITY_FLOOR) && defined(TEMPERATURE_FLOOR))
// threads corresponding to real cells do the calculation
if (dev_conserved[id] < 0.0 || dev_conserved[id] != dev_conserved[id] || dev_conserved[4 * n_cells + id] < 0.0 ||
dev_conserved[4 * n_cells + id] != dev_conserved[4 * n_cells + id]) {
printf("%3d %3d %3d Thread crashed in final update. %e %e %e %e %e\n", xid + x_off, yid + y_off, zid + z_off,
dev_conserved[id], dtodx * (dev_F_x[imo] - dev_F_x[id]), dtody * (dev_F_y[jmo] - dev_F_y[id]),
dtodz * (dev_F_z[kmo] - dev_F_z[id]), dev_conserved[4 * n_cells + id]);
Average_Cell_All_Fields(xid, yid, zid, nx, ny, nz, n_cells, n_fields, gamma, dev_conserved);
printf("%3d %3d %3d Thread crashed in final update. %e - - - %e\n", xid + x_off, yid + y_off, zid + z_off,
dev_conserved[id], dev_conserved[4 * n_cells + id]);
Average_Cell_All_Fields(xid, yid, zid, nx, ny, nz, n_cells, n_fields, gamma, dev_conserved, n_ghost, slow_check);
}
#endif // DENSITY_FLOOR
/*
Expand All @@ -400,7 +417,6 @@ __global__ void Update_Conserved_Variables_3D(Real *dev_conserved, Real *Q_Lx, R
*/
}
}

__device__ __host__ Real hydroInverseCrossingTime(Real const &E, Real const &d, Real const &d_inv, Real const &vx,
Real const &vy, Real const &vz, Real const &dx, Real const &dy,
Real const &dz, Real const &gamma)
Expand Down Expand Up @@ -667,10 +683,21 @@ void Temperature_Ceiling(Real *dev_conserved, int nx, int ny, int nz, int n_ghos
}
}

__device__ Real SlowCellConditionChecker::max_dti_if_slow(Real E, Real d, Real d_inv, Real vx, Real vy, Real vz,
Real gamma) const
{
#ifndef AVERAGE_SLOW_CELLS
return -1.0;
#else
Real max_dti = hydroInverseCrossingTime(E, d, d_inv, vx, vy, vz, dx, dy, dz, gamma);
return (max_dti > max_dti_slow) ? max_dti : -1.0;
#endif
}

#ifdef AVERAGE_SLOW_CELLS

void Average_Slow_Cells(Real *dev_conserved, int nx, int ny, int nz, int n_ghost, int n_fields, Real dx, Real dy,
Real dz, Real gamma, Real max_dti_slow, Real xbound, Real ybound, Real zbound, int nx_offset,
void Average_Slow_Cells(Real *dev_conserved, int nx, int ny, int nz, int n_ghost, int n_fields, Real gamma,
SlowCellConditionChecker slow_check, Real xbound, Real ybound, Real zbound, int nx_offset,
int ny_offset, int nz_offset)
{
// set values for GPU kernels
Expand All @@ -683,12 +710,12 @@ void Average_Slow_Cells(Real *dev_conserved, int nx, int ny, int nz, int n_ghost

if (nx > 1 && ny > 1 && nz > 1) { // 3D
hipLaunchKernelGGL(Average_Slow_Cells_3D, dim1dGrid, dim1dBlock, 0, 0, dev_conserved, nx, ny, nz, n_ghost, n_fields,
dx, dy, dz, gamma, max_dti_slow, xbound, ybound, zbound, nx_offset, ny_offset, nz_offset);
gamma, slow_check, xbound, ybound, zbound, nx_offset, ny_offset, nz_offset);
}
}

__global__ void Average_Slow_Cells_3D(Real *dev_conserved, int nx, int ny, int nz, int n_ghost, int n_fields, Real dx,
Real dy, Real dz, Real gamma, Real max_dti_slow, Real xbound, Real ybound,
__global__ void Average_Slow_Cells_3D(Real *dev_conserved, int nx, int ny, int nz, int n_ghost, int n_fields,
Real gamma, SlowCellConditionChecker slow_check, Real xbound, Real ybound,
Real zbound, int nx_offset, int ny_offset, int nz_offset)
{
int id, xid, yid, zid, n_cells;
Expand All @@ -711,25 +738,26 @@ __global__ void Average_Slow_Cells_3D(Real *dev_conserved, int nx, int ny, int n
vz = dev_conserved[3 * n_cells + id] * d_inv;
E = dev_conserved[4 * n_cells + id];

// Compute the maximum inverse crossing time in the cell
max_dti = hydroInverseCrossingTime(E, d, d_inv, vx, vy, vz, dx, dy, dz, gamma);
// retrieve the max inverse crossing time in the cell if the cell meets the threshold for being a slow-cell.
// (if the cell doesn't meet the threshold, a negative value is returned instead)
max_dti = slow_check.max_dti_if_slow(E, d, d_inv, vx, vy, vz, gamma);

if (max_dti > max_dti_slow) {
if (max_dti >= 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think max_dti > 0 would be clearer here.

speed = sqrt(vx * vx + vy * vy + vz * vz);
temp = (gamma - 1) * (E - 0.5 * (speed * speed) * d) * ENERGY_UNIT / (d * DENSITY_UNIT / 0.6 / MP) / KB;
P = (E - 0.5 * d * (vx * vx + vy * vy + vz * vz)) * (gamma - 1.0);
cs = sqrt(d_inv * gamma * P) * VELOCITY_UNIT * 1e-5;
Real x = xbound + (nx_offset + xid - n_ghost + 0.5) * dx;
Real y = ybound + (ny_offset + yid - n_ghost + 0.5) * dy;
Real z = zbound + (nz_offset + zid - n_ghost + 0.5) * dz;
Real x = xbound + (nx_offset + xid - n_ghost + 0.5) * slow_check.dx;
Real y = ybound + (ny_offset + yid - n_ghost + 0.5) * slow_check.dy;
Real z = zbound + (nz_offset + zid - n_ghost + 0.5) * slow_check.dz;
// Average this cell
kernel_printf(
" Average Slow Cell [ %.5e %.5e %.5e ] -> dt_cell=%f dt_min=%f, n=%.3e, "
"T=%.3e, v=%.3e (%.3e, %.3e, %.3e), cs=%.3e\n",
x, y, z, 1. / max_dti, 1. / max_dti_slow, dev_conserved[id] * DENSITY_UNIT / 0.6 / MP, temp,
x, y, z, 1. / max_dti, 1. / slow_check.max_dti_slow, dev_conserved[id] * DENSITY_UNIT / 0.6 / MP, temp,
speed * VELOCITY_UNIT * 1e-5, vx * VELOCITY_UNIT * 1e-5, vy * VELOCITY_UNIT * 1e-5, vz * VELOCITY_UNIT * 1e-5,
cs);
Average_Cell_All_Fields(xid, yid, zid, nx, ny, nz, n_cells, n_fields, gamma, dev_conserved);
Average_Cell_All_Fields(xid, yid, zid, nx, ny, nz, n_cells, n_fields, gamma, dev_conserved, n_ghost, slow_check);
}
}
}
Expand Down Expand Up @@ -1253,7 +1281,8 @@ __device__ Real Average_Cell_Single_Field(int field_indx, int i, int j, int k, i
}

__device__ void Average_Cell_All_Fields(int i, int j, int k, int nx, int ny, int nz, int ncells, int n_fields,
Real gamma, Real *conserved)
Real gamma, Real *conserved, int stale_depth,
SlowCellConditionChecker slow_check)
{
int id = i + (j)*nx + (k)*nx * ny;

Expand Down Expand Up @@ -1281,18 +1310,25 @@ __device__ void Average_Cell_All_Fields(int i, int j, int k, int nx, int ny, int
for (int kk = k - 1; kk <= k + 1; kk++) {
for (int jj = j - 1; jj <= j + 1; jj++) {
for (int ii = i - 1; ii <= i + 1; ii++) {
if (ii <= stale_depth - 1 || ii >= nx - stale_depth || jj <= stale_depth - 1 || jj >= ny - stale_depth ||
kk <= stale_depth - 1 || kk >= nz - stale_depth) {
continue;
}

idn = ii + jj * nx + kk * nx * ny;
d = conserved[grid_enum::density * ncells + idn];
mx = conserved[grid_enum::momentum_x * ncells + idn];
my = conserved[grid_enum::momentum_y * ncells + idn];
mz = conserved[grid_enum::momentum_z * ncells + idn];
P = (conserved[grid_enum::Energy * ncells + idn] - (0.5 / d) * (mx * mx + my * my + mz * mz)) * (gamma - 1.0);
E = conserved[grid_enum::Energy * ncells + idn];
P = (E - (0.5 / d) * (mx * mx + my * my + mz * mz)) * (gamma - 1.0);
#ifdef SCALAR
for (int n = 0; n < NSCALARS; n++) { // NOLINT
scalar[n] = conserved[grid_enum::scalar * ncells + idn];
}
#endif
if (d > 0.0 && P > 0.0) {
Real d_inv = 1.0 / d;
if (d > 0.0 && P > 0.0 && not slow_check.is_slow(E, d, d_inv, mx * d_inv, my * d_inv, mz * d_inv, gamma)) {
d_av += d;
vx_av += mx;
vy_av += my;
Expand Down
48 changes: 36 additions & 12 deletions src/hydro/hydro_cuda.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#define HYDRO_CUDA_H

#include "../global/global.h"
#include "../hydro/average_cells.h"
#include "../utils/mhd_utilities.h"

__global__ void Update_Conserved_Variables_1D(Real *dev_conserved, Real *dev_F, int n_cells, int x_off, int n_ghost,
Expand All @@ -21,6 +22,10 @@ __global__ void Update_Conserved_Variables_3D(Real *dev_conserved, Real *Q_Lx, R
Real gamma, int n_fields, int custom_grav, Real density_floor,
Real *dev_potential);

__global__ void PostUpdate_Conserved_Correct_Crashed_3D(Real *dev_conserved, int nx, int ny, int nz, int x_off,
int y_off, int z_off, int n_ghost, Real gamma, int n_fields,
SlowCellConditionChecker slow_check);

/*!
* \brief Determine the maximum inverse crossing time in a specific cell
*
Expand Down Expand Up @@ -80,17 +85,6 @@ void Temperature_Ceiling(Real *dev_conserved, int nx, int ny, int nz, int n_ghos
Real T_ceiling);
#endif // TEMPERATURE CEILING

#ifdef AVERAGE_SLOW_CELLS

void Average_Slow_Cells(Real *dev_conserved, int nx, int ny, int nz, int n_ghost, int n_fields, Real dx, Real dy,
Real dz, Real gamma, Real max_dti_slow, Real xbound, Real ybound, Real zbound, int nx_offset,
int ny_offset, int nz_offset);

__global__ void Average_Slow_Cells_3D(Real *dev_conserved, int nx, int ny, int nz, int n_ghost, int n_fields, Real dx,
Real dy, Real dz, Real gamma, Real max_dti_slow, Real xbound, Real ybound,
Real zbound, int nx_offset, int ny_offset, int nz_offset);
#endif

void Apply_Temperature_Floor(Real *dev_conserved, int nx, int ny, int nz, int n_ghost, int n_fields, Real U_floor);

__global__ void Temperature_Floor_Kernel(Real *dev_conserved, int nx, int ny, int nz, int n_ghost, int n_fields,
Expand Down Expand Up @@ -119,8 +113,38 @@ __global__ void Select_Internal_Energy_2D(Real *dev_conserved, int nx, int ny, i

__global__ void Select_Internal_Energy_3D(Real *dev_conserved, int nx, int ny, int nz, int n_ghost, int n_fields);

/*! \brief Overwrites the values in the specified cell with the average of all the values from the (up to) 26
* neighboring cells.
*
* Care is taken when applying this logic to a cell near the edge of a block (where the entire simulation domain
* is decomposed into 1 or more blocks).
* * Recall that the entire reason we have ghost zones is that the stencil for computing flux-divergence can't
* be applied uniformly to all cells -- the cells in the ghost zone can't be properly updated with the rest
* the local block when applying the flux-divergence. We might refer to these cells that aren't properly
* updated as being "stale". We refer to the width of the outer ring of stale values as the ``stale-depth``
* * For concreteness, consider a pure hydro/mhd simulation using the VL integrator:
* - Right after refreshing the ghost-zones, the stale_depth is 0
* - After the partial time-step, the stale_depth is 1.
* - After the full timestep, the stale depth depends on the choice of reconstruction. (e.g. it is 2 for
* for nearest neighbor and 3 for plmp).
* - The ghost-depth should always be equal to the max stale-depth at the end of a simulation cycle (if
* ghost-depth is bigger, extra work is done. If it's smaller, then your simulation is wrong)
* * To respect the simulations boundaries, values in "stale" cells are excluded from the averages. If
* stale-depth is 0, then values from beyond the edge of the simulation are excluded from averages
*
* \note
* From a perfectionist's perspective, one could argue that we really should increment the stale-depth whenever
* we call this function (in other words, we should have an extra layer of ghost zones for each time we call
* this function).
* * rationale: if we don't, then the the number of neighbors considered results of the simulation can vary
* based on how close a cell is to a block-edge (the number of cells varies from 7 to 26).
* * more pragmatically: this probably doesn't matter a whole lot given that this piece of machinery is a
* band-aid solution to begin with.
* * Aside: a similar argument could be made for the energy-synchronization step of the dual-energy formalism.
*/
__device__ void Average_Cell_All_Fields(int i, int j, int k, int nx, int ny, int nz, int ncells, int n_fields,
Real gamma, Real *conserved);
Real gamma, Real *conserved, int stale_depth,
SlowCellConditionChecker slow_check);

__device__ Real Average_Cell_Single_Field(int field_indx, int i, int j, int k, int nx, int ny, int nz, int ncells,
Real *conserved);
Expand Down
Loading
Loading