Can GPU memory be restored from same address? #53

lianghao208 · 2024-11-28T08:20:08Z

I checked the GPU memory resore code: https://github.com/RWTH-ACS/cricket/blob/master/cpu/cr.c#L303

It restore the GPU memory by cudaMalloc to re-apply the device memory.

I think the new created device memory address is not as same as the original device memory(the memory checkpointed from the original machine).

But in cpu/host side, it holds the same original device address pointer after the restore, how does the restored process use this new created device memory by visiting the original device address pointer?

The text was updated successfully, but these errors were encountered:

n-eiling · 2024-12-02T13:34:22Z

When restoring resources such as memory addresses the resources in the checkpoint file are mapped onto the newly created ones. So we essentially replace the memory addresses with the new ones. However, in my experiments calling cudaMalloc with the same parameters in the same order as during the original run will also lead to CUDA returning the same memory addresses. However, Cricket does not assume this, because we generally cannot know if there is anything else running on the same GPU.

lianghao208 · 2024-12-04T06:45:37Z

So we essentially replace the memory addresses with the new ones

Thanks for the reply.
Is the whole procedure looks like this?

cudaMalloc create a new memory, it returns new memory addresses.
when new restore process trys to use memory, Cricket will intercept the memory access request and map it to a newly created address.

If so, I wonder how the Cricket intercepts the memory access request and maps to a newly created address ? Because there is no such CUDA API can explicitly the memory access address(e.g. cuLaunchKernel)
@n-eiling

n-eiling · 2024-12-05T13:37:47Z

So there are different memory addresses this is relevant for.
For CUDA resources such as cudaStream, cublasHandle, etc, we map the addresses to the new addresses inside the API wrappers. They cannot be sensibly used outside of the CUDA APIs so it's not a problem that the pointer values do not point to actual memory.
During kernel execution the kernel gets the memory address of data either via a kernel parameter or via a global variable.
We can directly influence both and replace the memory addresses in the parameter or global variable before launching a kernel.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can GPU memory be restored from same address? #53

Can GPU memory be restored from same address? #53

lianghao208 commented Nov 28, 2024 •

edited

Loading

n-eiling commented Dec 2, 2024

lianghao208 commented Dec 4, 2024 •

edited

Loading

n-eiling commented Dec 5, 2024

Can GPU memory be restored from same address? #53

Can GPU memory be restored from same address? #53

Comments

lianghao208 commented Nov 28, 2024 • edited Loading

n-eiling commented Dec 2, 2024

lianghao208 commented Dec 4, 2024 • edited Loading

n-eiling commented Dec 5, 2024

lianghao208 commented Nov 28, 2024 •

edited

Loading

lianghao208 commented Dec 4, 2024 •

edited

Loading