-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue]: can't hipMemPoolExportPointer signal memory? #104
Comments
Hi @Epliz, From my understanding, it seems that you are trying to use Please give that a read and let me know if you have any questions, thanks! |
Hi @darren-amd , Thank you for your answer!
It would be great to support both points:
|
Hi @Epliz, Good questions, firstly, I don't see a particular way to pass in a signal flag to memory allocated in a pool. For the second question, However, I believe |
Hi @darren-amd , |
Hi @Epliz, Of course, let me know if you have any questions! |
Hi @darren-amd , Here is a small program I tried: #include <iostream>
#include <hip/hip_runtime.h>
#include <hip/hip_fp16.h>
int main(int argc, char** argv) {
int device_count;
if (hipGetDeviceCount(&device_count) != hipSuccess) {
std::cout<<"Failed to get the numnber of GPUs"<<std::endl;
return -1;
}
std::cout<<"Devices "<<device_count<<std::endl;
hipError_t error;
int supported;
if (hipDeviceGetAttribute(&supported, hipDeviceAttributeCanUseStreamWaitValue, 0) != hipSuccess) {
std::cout<<"Error getting the the property"<<std::endl;
return -2;
}
std::cout<<"wait stream value: "<<supported<<std::endl;
int* signal = nullptr;
// apparently needs to allocate 8 bytes
if ((error = hipExtMallocWithFlags((void**) &signal, 8, hipMallocSignalMemory)) != hipSuccess) {
std::cout<<"Failed allocating signal memory"<<std::endl;
return -3;
}
std::cout<<"signal ptr: "<<signal<<std::endl;
hipPointerAttribute_t attributes;
if (hipPointerGetAttributes (&attributes, signal) != hipSuccess) {
std::cout<<"Failed getting attributes"<<std::endl;
return -4;
}
std::cout<<"Memory type "<<attributes.type<<std::endl;
hipIpcMemHandle_t mem_handle;
if (hipIpcGetMemHandle (&mem_handle, signal) != hipSuccess) {
std::cout<<"Failed getting mem handle"<<std::endl;
return -5;
}
std::cout<<"got mem handle"<<std::endl;
return 0;
} Let me know if you see anything wrong, but for me, it fails when getting the memory handle:
And as I said before, the memory type seems to be CPU. Best, |
Hi @Epliz, It does seem that using hipExtMallocWithFlags with hipMallocSignalMemory is allocating on the host rather than device. Is there a particular reason you need hipMallocSignalMemory? Allocating with this flag is only necessary in specific use cases (such as Stream Memory Operations), so if it isn't needed you could use hipMalloc instead. |
Hi @darren-amd , I precisely want to use stream memory operations with that signal memory to implement multi-gpu operations like all-reduce. I have determined that in single process micro-benchmarks that is the best approach, compared to using events. (The stream memory operations APIs could probably be enhanced further though, to control cache flushing during write ops - I might open a feature request for that). I have also micro-benchmarked the single-machine multi process case for events, and saw there that interprocess events are particularly bad due to the fact that they use stream callbacks internally (and therefore preventing the cpu from queueing commands ahead on the gpu). I will probably open a ticket about that suboptimal implementation of interprocess events at some point, as it can for sure be made much better. Best, |
Hi @Epliz, I had a chat with our internal team and the current recommendation is to not use stream memory operations in favor of using HIP events instead. The HIP stream memory operations are still in beta and can be prone to errors/changes that may make code unstable. Additionally, stream waits may negatively affect kernel performance if they are executing at the same time. |
Problem Description
Hi,
I have allocated 8 bytes of signal memory with hipExtMallocWithFlags and am trying to share it with another process (as it seems like the stream wait value APIs would be perfect for cheaper sync between processes than events). But hipMemPoolExportPointer returns an error.
Is there any allocation size at which it would work?
Would it be possible for you to help me to get it working or if there is a limitation at the moment, to make it work?
Best regards,
Epliz
Operating System
Ubuntu 24.04
CPU
Intel xeon XE9680
GPU
MI300x x8
ROCm Version
ROCm 6.2.2
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: