-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide mechanism to discard data #695
base: master
Are you sure you want to change the base?
Conversation
08ecd2e
to
bbd1448
Compare
Add a function `parsec_data_discard` that releases the data such that the host copy remains intact but does not prevent destruction of the data once all device copies have been released. This keeps the host copy available for device copies to inspect and avoids potential race conditions in the release process. During an eviction, copies of data with a discarded host copy are not transfered but put directly into the lru. Signed-off-by: Joseph Schuchart <[email protected]>
bbd1448
to
1dd2d54
Compare
Otherwise we cannot destroy empty or discarded data. Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
parsec/data.c
Outdated
/* second, mark the host copy as discarded */ | ||
parsec_data_copy_t *cpu_copy = data->device_copies[0]; | ||
if (NULL != cpu_copy) { | ||
cpu_copy->flags = PARSEC_DATA_FLAG_DISCARDED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three issues here:
- the flag should be set with
|=
- you should only release the
data_t
is thecpu_copy
is valid. You do a test against NULL, you should use it. - this function is not thread-safe. This could not be an issue here, but I see it used by the GPU code without protection either, opening the code to race-conditions while manipulating these discarded copies/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I fixed setting the flag, added a lock to protect the check for the change to the device copy, and only now the data_t is only released twice if there is a device copy.
@@ -2342,7 +2333,17 @@ parsec_device_kernel_epilog( parsec_device_gpu_module_t *gpu_device, | |||
|
|||
assert( 0 <= gpu_copy->readers ); | |||
|
|||
if( gpu_task->pushout & (1 << i) ) { | |||
if (cpu_copy->flags & PARSEC_DATA_FLAG_DISCARDED) { | |||
PARSEC_DEBUG_VERBOSE(20, parsec_gpu_output_stream, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is where the race-condition could happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not a big issue here since we're not modifying the flags and neither the data_t nor the host copy will be freed, since the device copy maintains its reference. The worst thing that can happen is that we miss the flag being set and the release is delayed until we try to evict.
parsec/mca/device/transfer_gpu.c
Outdated
parsec_atomic_lock( &gpu_copy->original->lock ); | ||
/* get the next item before altering the next pointer */ | ||
item = (parsec_list_item_t*)item->list_next; /* conversion needed for volatile */ | ||
if( 0 == gpu_copy->readers ) { | ||
if (cpu_copy->flags & PARSEC_DATA_FLAG_DISCARDED) { | ||
parsec_list_item_ring_chop((parsec_list_item_t*)gpu_copy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. The flag isn't changed and the device copy maintains a reference on the data_t so if we miss the update of the flags here we will evict into the host copy and then release everything.
Also OR the flag instead of assigning it. Signed-off-by: Joseph Schuchart <[email protected]>
ba7ea19
to
fc1c247
Compare
Discarded data may never be pushed back so don't warn about it still being owned by the device. Signed-off-by: Joseph Schuchart <[email protected]>
dde7129
to
14934a5
Compare
Discarded data sit toward the end of the lru while the data to be evicted is at the front. We walk both forward and backward to collect the discarded data from the back, until we either meet the pivot or we found enough data to evict. If we discarded data we don't evict. Signed-off-by: Joseph Schuchart <[email protected]>
Add a function
parsec_data_discard
that releases the data such that the host copy remains intact but does not prevent destruction of the data once all device copies have been released. This keeps the host copy available for device copies to inspect and avoids potential race conditions in the release process. During an eviction, copies of data with a discarded host copy are not transfered but put directly into the lru.Replaces some duplicated code with a call to
parsec_device_release_gpu_copy
.