Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime error when copying to constant device memory with HIP #912

Open
gzagaris opened this issue Sep 24, 2024 · 9 comments
Open

Runtime error when copying to constant device memory with HIP #912

gzagaris opened this issue Sep 24, 2024 · 9 comments

Comments

@gzagaris
Copy link
Member

Describe the bug

Copying data from the host to constant device memory using HIP throws a runtime error. The code works fine with CUDA , but, Umpire throws the following runtime error when compiled with HIP:

C++ exception with description "! Umpire runtime_error [.../Umpire/src/umpire/op/HipCopyOperation.cpp:31]: hipMemcpy( dest_ptr = 0xe3e480, src_ptr = 0x52fb90, length = 4096) failed with error: invalid argument

To Reproduce

Here is a code snippet that reproduces this behavior:

auto& rm = umpire::ResourceManager::getInstance();
auto const_allocator = rm.getAllocator("DEVICE_CONST");

static constexpr int N = 4;
static constexpr int BYTESIZE = N * sizeof(int);
static constexpr int TEST_VAL = 42;

auto host_allocator = rm.getAllocator("HOST");
int* HOST_DATA = static_cast< int* >(host_allocator.allocate(BYTESIZE));
for ( int i = 0; i < N; ++i ) {
    HOST_DATA[ i ] = TEST_VAL;
}

int* A_d = static_cast< int* >(const_allocator.allocate(BYTESIZE));
EXPECT_TRUE(A_d != nullptr);
rm.copy(A_d, HOST_DATA, BYTESIZE); // <------------- RUNTIME ERROR THROWN HERE!

I am compiling Umpire with -DENABLE_HIP=On -DUMPIRE_ENABLE_DEVICE_CONST=On.

Am I missing anything?

Expected behavior

I would have expected this to work and not throw a runtime error.

Compilers & Libraries (please complete the following information):

  • Compiler & version: amdclang-16.0.0
  • ROCM version: v5.6.0
@gzagaris
Copy link
Member Author

One more thing to add to this. I did look into the implementation a bit and I did not see any calls to hipGetSymbolAddress() in the HipConstantMemoryResource.cpp, which I would have expected and it may be related to the issue that I am seeing.

@gzagaris
Copy link
Member Author

gzagaris commented Sep 24, 2024

Following a similar approach as in CudaConstantMemoryResource.cu, I made the following changes to HipConstantMemoryResource.cpp:

diff --git a/src/umpire/resource/HipConstantMemoryResource.cpp b/src/umpire/resource/HipConstantMemoryResource.cpp
index 65c5f72c..d00103a2 100644
--- a/src/umpire/resource/HipConstantMemoryResource.cpp
+++ b/src/umpire/resource/HipConstantMemoryResource.cpp
@@ -25,7 +25,8 @@ HipConstantMemoryResource::HipConstantMemoryResource(const std::string& name, in
       m_highwatermark{0},
       m_platform{Platform::hip},
       m_offset{0},
-      m_ptr{s_umpire_internal_device_constant_memory}
+      m_ptr{nullptr},
+      m_initialized{false}
 {
 }

@@ -33,6 +34,16 @@ void* HipConstantMemoryResource::allocate(std::size_t bytes)
 {
   std::lock_guard<std::mutex> lock{m_mutex};

+  if (!m_initialized) {
+    hipError_t error = ::hipGetSymbolAddress((void**)&m_ptr, s_umpire_internal_device_constant_memory);
+
+    if (error != hipSuccess) {
+      UMPIRE_ERROR(runtime_error, umpire::fmt::format("hipGetSymbolAddress failed with error: {}", ::hipGetErrorString(error)));
+    }
+
+    m_initialized = true;
+  }
+
   char* ptr{static_cast<char*>(m_ptr) + m_offset};
   m_offset += bytes;

diff --git a/src/umpire/resource/HipConstantMemoryResource.hpp b/src/umpire/resource/HipConstantMemoryResource.hpp
index d7afac23..5e32f45f 100644
--- a/src/umpire/resource/HipConstantMemoryResource.hpp
+++ b/src/umpire/resource/HipConstantMemoryResource.hpp
@@ -39,7 +39,8 @@ class HipConstantMemoryResource : public MemoryResource {

   std::size_t m_offset;
   void* m_ptr;
-
+  bool m_initialized;
+
   std::mutex m_mutex;
 };

I didn't have any luck with that though, hipGetSymbolAddress, now throws the following runtime error:

terminate called after throwing an instance of 'umpire::runtime_error'
45:   what():  ! Umpire runtime_error [.../Umpire/src/umpire/resource/HipConstantMemoryResource.cpp:41]: hipGetSymbolAddress failed with error: invalid device symbol

@gzagaris
Copy link
Member Author

gzagaris commented Oct 3, 2024

Hello, just wanted to follow up on this. Does anyone have any thoughts/suggestions for this?

ping: @davidbeckingsale @mcfadden8

@davidbeckingsale
Copy link
Member

Hey @gzagaris I will try and reproduce and debug locally. No ideas I'm afraid

@gzagaris
Copy link
Member Author

gzagaris commented Oct 8, 2024

Thank you @davidbeckingsale!

I pushed some of my changes in my effort to debug this a bit further in #920 in case you find them helpful. I also include a unit test natively in Umpire, which might help with reproducing this on your end.

Note, my changes to HipConstantMemoryResource.cpp (included in #920) followed what was done in CudaConstantMemoryResource.cu. Do these look reasonable to you?

Unfortunately, what I found was that hipGetSymbolAddress() fails with the error "invalid device symbol".

I was able to reproduce the hipGetSymbolAdress() error in a standalone HIP program:

#include <cstdio>
#include "hip/hip_runtime_api.h"


__constant__ static char s_constant_device_memory[64*1024];

int main()
{
  hipError_t rc = hipSuccess; 
  int nDevices{0};

  rc = hipGetDeviceCount(&nDevices);
  if (rc != hipSuccess)
  {
    fprintf(stderr, "[HIP ERROR]: %s", hipGetErrorString(rc));
    return -1;
  }

  printf("number of devices %d\n", nDevices);

  void* ptr{nullptr}; 
  rc = hipGetSymbolAddress((void**)&ptr, s_constant_device_memory);
  if (rc != hipSuccess)
  {
    fprintf(stderr, "[HIP ERROR]: %s\n", hipGetErrorString(rc));
    return -1;
  }


  return 0;
}

So, this could be an error in rocm/hip? or perhaps, I am somehow calling hipGetSymbolAddress() wrong?

I submitted the standalone HIP reproducer to some folks from AMD and I am awaiting to hear back....

@davidbeckingsale
Copy link
Member

Yup, it could well be. My other thought in the Umpire case is that we actually don't compile that resource file as a HIP file, just C++ and link against hip runtime. I tried hacking it to compile as HIP but then ran into some linking errors. Let me know what you hear back

@gzagaris
Copy link
Member Author

gzagaris commented Oct 11, 2024

Yup, it could well be. My other thought in the Umpire case is that we actually don't compile that resource file as a HIP file, just C++ and link against hip runtime. I tried hacking it to compile as HIP but then ran into some linking errors. Let me know what you hear back

Hi @davidbeckingsale, just wanted to give you a quick update on this.

AMD was not able to reproduce the issue with the standalone reproducer. However, based on your comment, I tried compiling the standalone reproducer with the HIP compiler (hipcc) directly and hipGetSymbolAddress() works as expected in that case. Previously, I was building the reproducer as a test within my project with BLT (just like Umpire) and I was using the host C++ compiler (e.g. amdclang) and linking the hip runtime.

So, in addition to the changes in #920, there is also a build-system issue, as you have guessed it.

We should be able to instruct CMake to build a source file as a HIP program using source file properties, e.g.,

set_source_file_properties(${PROJECT_SOURCE_DIR}/src/umpire/resource/HipConstantMemoryResource.cpp, PROPERTIES LANGUAGE HIP)

Or, perhaps, there is a better way to do this within BLT(?)

Thoughts?

NOTE: Looks like CMake added support for HIP as a language in 3.21 I believe.

@davidbeckingsale
Copy link
Member

That's what I tried, but then it wouldn't link. We have a BLT PR that will use enable_language(HIP), and once that's merged I will switch Umpire over and give it another shot. Hopefully that + your PR will resolve the issue.

@gzagaris
Copy link
Member Author

Excellent! Thank you for your help with this @davidbeckingsale!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants