-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving thread crash logic #406
base: dev
Are you sure you want to change the base?
Conversation
In slightly more detail, we invoke `Average_Cell_All_Fields` to get the average values from the neighbors to overwrite the value in a crashed cell or (when the `AVERAGE_SLOW_CELLS` macro is defined). - We have already been excluding any neighboring cell from this average that had crashed. - The key inovation here is that we now exclude a neighboring cell that we would consider a slow cell (when the `AVERAGE_SLOW_CELLS` macro is defined). To implement this, we pass around the SlowCellConditionCheck object, which implements methods to check whether a cell is considered slow or not. When the `AVERAGE_SLOW_CELLS` macro is defined, that method does the full check. When the macro isn't defined, the logic short-circuits and always returns that the cell isn't slow.
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great! I have a few minor comments, but other than those I think it's ready to go.
// Step 3b: Address any crashed threads | ||
hipLaunchKernelGGL(PostUpdate_Conserved_Correct_Crashed_3D, dim1dGrid, dim1dBlock, 0, 0, dev_conserved, nx, ny, nz, | ||
x_off, y_off, z_off, n_ghost, gama, n_fields, slow_check); | ||
GPU_Error_Check(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very minor, but there's a GPU_Error_Check
following this kernel launch in the Simple integrator, but not in Van Leer, so it might be good to either remove this one or add one to VL so that the behavior is consistent between the two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I explicitly thought I fixed that (but I must have seen it and not acted on it)
template <bool verbose = false> | ||
__device__ bool is_slow(Real E, Real d, Real d_inv, Real vx, Real vy, Real vz, Real gamma) const | ||
{ | ||
return this->max_dti_if_slow(E, d, d_inv, vx, vy, vz, gamma) >= 0.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another minor question/comment, but is it possible for max_dti_if_slow
to ever be zero here? If not, just for the sake of clarity, it might be good to change the condition to max_dti_if_slow > 0
instead of max_dti_if_slow >= 0
.
I know that hydroInverseCrossingTime
can return zero, but I still don't think it's possible for (max_dti > max_dti_slow) ? max_dti : -1.0;
to evaluate to zero unless someone sets max_dti_slow
to a negative number. Again, very minor, but I found it a little confusing.
In line with my comment about setting max_dti_if_slow
, requiring that it's positive would make it so that (max_dti > max_dti_slow) ? max_dti : -1.0;
never evaluated to zero and clarify this ambiguity.
@@ -152,7 +153,9 @@ void Grid3D::Initialize(struct Parameters *P) | |||
|
|||
#ifdef AVERAGE_SLOW_CELLS | |||
H.min_dt_slow = 1e-100; // Initialize the minumum dt to a tiny number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely don't think this needs to happen with this PR, but Evan and I have talked about how it would be good to set min_dt_slow
from the input parameter file instead of at compile time. Additionally, as I mentioned below, I think it would be good to require that it's a positive number. This is definitely beyond the scope of this PR, so feel free to ignore-just wanted to make a note of it while we're overhauling cell averaging!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely! Especially since most of the logic that actually sets min_dt_slow
is currently buried in the gravity functions.
|
||
if (max_dti > max_dti_slow) { | ||
if (max_dti >= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think max_dti > 0
would be clearer here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style-related comment since this is a solitary bit of new code-it would be good for the naming of the physics variables in here to follow the style guide.
This introduces 3 changes to the behavior revolving around fixing thread-crashes (i.e. density or energy drops to a negative value). For the uninitiated, the default behavior in a thread-crash, when
DENSITY_FLOOR
isn't defined, is to callAverage_Cell_All_Fields
, which overrides the values in the crashed cell with the values from the neighboring cells.The 3 changes include:
Average_Cell_All_Fields
from theUpdate_Conserved_Variables_3D
kernel to the newly createdPostUpdate_Conserved_Correct_Crashed_3D
kernel. We always launchPostUpdate_Conserved_Correct_Crashed_3D
afterUpdate_Conserved_Variables_3D
Average_Cell_All_Fields
always looked for neighbors, regardless of where the central cell was located on a grid. So if you were at the edge of the active zone, you might include the values of cells in the ghost-zone. This was problematic when dealing with a crashed-cell at the hydro update when the ghost zones had not been updated yet (it's unclear if this commonly happened).Average_Slow_Cells
, which could applyAverage_Cell_All_Fields
to a slow-cell at identified at any location on the grid (in the ghost zone or active zone).1 My change also happensAverage_Cell_All_Fields
would ignore any neighboring cells with a non-positive density or pressure.AVERAGE_SLOW_CELLS
macro isn't defined, no cell is considered a "slow" cell)The implementation of the third change got a little hairy. In principle I think it's a good design choice (especially for reducing ifdef statements). But, with the benefit of hindsight, maybe I should have done something simpler. Definitely open to feedback on this point!
Footnotes
In this case we continue for a cell on the edge of the active zone, we will include cells in the ghost zone in the average. As I write this out, I'm not totally sure the ghost zones are refreshed between the invocation of the Hydro-Solver and the call to
Average_Slow_Cells
. That is something to followup on! ↩