You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running the same data side-by-side on 32 CPU node and a 12 CPU / 1 A100 GPU node. It seems the GPU node is ~1 s/it slower than the CPU node. Could you advise me on what I'm doing wrong?
Hello, I tested VeloVAE on cpu(Intel Xeon Gold 6154, 4 nodes, 32 cores per node), spgpu(Nvidia A40) and gpu (Nvidia V100). Using GPUs should give you a 3-5x speed up. For example, for the pancreas dataset shown in the example notebook, CPU training took about 23 minutes, while for both spgpu and gpu training took about 5-6 minutes. The difference is quite clear even without using a time profiler.
It seems you might have a cuda issue. Could you provide more details?
@g-yichen I'd be happy to provide more details, I'm just not sure what to provide :-)
You have my full pip freeze above and my notebook snippets. There's no warning about "GPU not found" which occurs on a non-GPU node. Using nvidia-smi, I can see GPU usage bounce up and down, but nothing overwhelming.
I'm running the same data side-by-side on 32 CPU node and a 12 CPU / 1 A100 GPU node. It seems the GPU node is ~1 s/it slower than the CPU node. Could you advise me on what I'm doing wrong?
For the GPU node:
I'm running:
The text was updated successfully, but these errors were encountered: