Replies: 1 comment 3 replies
-
Hi @hmcezar , Thanks for your interest in our codes---this is a bit odd. The relevant difference would seem to be the use of a larger default batch size in (Also CCing some of those who might have insight or find this relevant: @anjohan @svandenhaute @johkl and linking mir-group/pair_allegro#23) |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm trying to train a model on LUMI which has AMD GPUs.
Starting from rocm/pytorch Docker images, I successfully created a container containing everything I need with:
Since I used the rocm/pytorch image as starting point, I'm pretty sure pytorch is correctly installed (version 1.13.1 in this case, but I tried 2.0.1 as well).
I also tried the
develop
branch, but I get the same error.Using this container, I can run the
minimal.yaml
example on gpu without a problem.However, if I try to run the
example.yaml
example I get:On CPU, the exact same example runs.
Any idea why I'm getting this?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions