-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can GPU memory be restored from same address? #53
Comments
When restoring resources such as memory addresses the resources in the checkpoint file are mapped onto the newly created ones. So we essentially replace the memory addresses with the new ones. However, in my experiments calling cudaMalloc with the same parameters in the same order as during the original run will also lead to CUDA returning the same memory addresses. However, Cricket does not assume this, because we generally cannot know if there is anything else running on the same GPU. |
Thanks for the reply.
If so, I wonder how the Cricket intercepts the memory access request and maps to a newly created address ? Because there is no such CUDA API can explicitly the memory access address(e.g. cuLaunchKernel) |
So there are different memory addresses this is relevant for. |
I checked the GPU memory resore code: https://github.com/RWTH-ACS/cricket/blob/master/cpu/cr.c#L303
It restore the GPU memory by
cudaMalloc
to re-apply the device memory.I think the new created device memory address is not as same as the original device memory(the memory checkpointed from the original machine).
But in cpu/host side, it holds the same original device address pointer after the restore, how does the restored process use this new created device memory by visiting the original device address pointer?
The text was updated successfully, but these errors were encountered: