You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug #11646 was filed for an issue where GpuRand didn't work correctly in all cases. #11647 fixed the issue, but it exposed a problem where we are not doing a checkpoint restore retry in all locations that GpuRand could run. That means if we have a GpuRand outside of a regular project exec we might produce incorrect numbers on a retry.
So we need to do a few things for a complete solution
We need to go through the Spark code and figure out what are all of the places that a non-deterministic expression could be run. We can do this by looking at all of the places that initialize is called on non-deterministic expressions.
We need code changes so that if a retry happens on a non-deterministic expression that is outside of a checkpoint/restore, then we fail instead of retrying.
We also want a way to detect a non-deterministic expression being run outside of a checkpoint/restore retry block and throw an error from the plan so that when we can have tests validate that we have this covered.
We need a lot more tests to verify that we are doing the right thing with GpuRand.
The text was updated successfully, but these errors were encountered:
Describe the bug
#11646 was filed for an issue where GpuRand didn't work correctly in all cases. #11647 fixed the issue, but it exposed a problem where we are not doing a checkpoint restore retry in all locations that GpuRand could run. That means if we have a GpuRand outside of a regular project exec we might produce incorrect numbers on a retry.
#11647 (review)
So we need to do a few things for a complete solution
The text was updated successfully, but these errors were encountered: