You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consult the security policy. If reporting a security vulnerability, do not report the bug using this form. Use the process described in the policy to report the issue.
Make sure you've read the documentation. Your issue may be addressed there.
Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
If possible, make a PR with a failing test to give us a starting point to work on!
Describe the bug
The NVIDIA GH200 Grace Hopper Superchip is promoted as being capable of utilizing the entire system memory for GPU tasks (NVIDIA blog). However, CUDA-Q does not use the full system memory when specifying the nvidia target.
Steps to reproduce the bug
Create the following source file ghz.cpp:
#include<cudaq.h>// Define a quantum kernel with a runtime parameterstructghz {
autooperator()(constint N) __qpu__ {
// Dynamically sized vector of qubits
cudaq::qvector q(N);
h(q[0]);
for (int i = 0; i < N - 1; i++) {
x<cudaq::ctrl>(q[i], q[i + 1]);
}
mz(q);
}
};
intmain(int argc, char *argv[]) {
int qubits_count = 2;
if (argc > 1) {
qubits_count = atoi(argv[1]);
}
auto counts = cudaq::sample(/*shots=*/1000, ghz{}, qubits_count);
if (!cudaq::mpi::is_initialized() || cudaq::mpi::rank() == 0) {
counts.dump();
}
return0;
}
Compile it as follows: nvq++ ghz.cpp -o ghz.out --target nvidia
And then run it:
33 qubits: ./ghz.out 33 ✅ nvidia-smi reports a VRAM usage of about 66400MiB
34 qubits: ./ghz.out 34 ❌:
terminate called after throwing an instance of 'ubackend::RuntimeError'
what(): requested size is too big
Aborted (core dumped)
Expected behavior
I expect the GPU to be able to use system memory when necessary and simulate up to 35/36 qubits. Memory quickly becomes a limit in quantum simulations and a possible way to increase simulated qubits would be appreciated.
Is this a regression? If it is, put the last known working version (or commit) here.
Slide 66 reports that the regular cudaMalloc is not enough and suggests using cudaMallocManaged or malloc/mmap. I had a look at the cuQuantum repository and I saw some occurences of cudaMalloc in the code, but none of cudaMallocManaged.
Do you think GH200 systems will ever be able to fully utilize their memory for quantum simulation using CUDA-Q/cuQuantum? Would this hypothetical approach affect too much simulation performance?
The text was updated successfully, but these errors were encountered:
Hi @1tnguyen, thank you for the good pointer. I have some new questions:
Am I correct in stating that the simulation happens either fully on system memory or fully on GPU memory?
I tried setting CUDAQ_MAX_CPU_MEMORY_GB and CUDAQ_MAX_GPU_MEMORY_GB together, but the executable crashes. $ CUDAQ_MAX_CPU_MEMORY_GB=200 CUDAQ_MAX_GPU_MEMORY_GB=20 ./ghz.out 32 => 'ubackend::RuntimeError' ... cudaErrorInvalidValue.
The only GPU memory value found by me that does not cause the executable to crash in this situation is 1.
Is this expected?
I tried experimenting with CUDAQ_MAX_CPU_MEMORY_GB using the reported GHZ example, but the results baffle me:
I'm expecting to obtain strings of only 0s and 1s from a generalized GHZ state, but it seems that the first qubits are simulated correctly while the last ones are random. Why is this the case?
Required prerequisites
Describe the bug
The NVIDIA GH200 Grace Hopper Superchip is promoted as being capable of utilizing the entire system memory for GPU tasks (NVIDIA blog). However, CUDA-Q does not use the full system memory when specifying the
nvidia
target.Steps to reproduce the bug
Create the following source file
ghz.cpp
:Compile it as follows:
nvq++ ghz.cpp -o ghz.out --target nvidia
And then run it:
33 qubits:
./ghz.out 33
✅nvidia-smi
reports a VRAM usage of about 66400MiB34 qubits:
./ghz.out 34
❌:Expected behavior
I expect the GPU to be able to use system memory when necessary and simulate up to 35/36 qubits. Memory quickly becomes a limit in quantum simulations and a possible way to increase simulated qubits would be appreciated.
Is this a regression? If it is, put the last known working version (or commit) here.
Not a regression
Environment
Suggestions
I was looking at a Grace Hopper presentation from John Linford and noticed two details:
cudaMalloc
is not enough and suggests usingcudaMallocManaged
ormalloc
/mmap
. I had a look at the cuQuantum repository and I saw some occurences ofcudaMalloc
in the code, but none ofcudaMallocManaged
.Do you think GH200 systems will ever be able to fully utilize their memory for quantum simulation using CUDA-Q/cuQuantum? Would this hypothetical approach affect too much simulation performance?
The text was updated successfully, but these errors were encountered: