-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AMD GPU support #1546
Add AMD GPU support #1546
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work!
@@ -0,0 +1,117 @@ | |||
# Accelerated inference on AMD GPUs | |||
|
|||
By default, ONNX Runtime runs inference on CPU devices. However, it is possible to place supported operations on an AMD Instinct GPU, while leaving any unsupported ones on CPU. In most cases, this allows costly operations to be placed on GPU and significantly accelerate inference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could clarify that we have tested on Instinct GPUs, but that support matrix is https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html (unless ROCMExecutionProvider explicitely requires Instinct? In which case we can give a ref)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
## Installation | ||
|
||
#### 1. ROCM Installation (V 5.7.X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#### 1. ROCM Installation (V 5.7.X) | |
#### 1. ROCm Installation (V 5.7.X) |
(V 5.7.X)
means that ROCm 5.7 is a requirement? Can we put that rather in a sentence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The branch AMD team shared for onnxruntime was on ROCm 5.7. So this has been tested on the ROCm5.7. I could mention that the following instructions are torun the ROCMEP on AMD gpu with ROCm 5.7 installed.
tests/onnxruntime/test_modeling.py
Outdated
@require_torch_gpu | ||
@pytest.mark.amdgpu_test | ||
def test_load_model_rocm_provider(self): | ||
model = ORTModel.from_pretrained(self.ONNX_MODEL_ID, provider="ROCMExecutionProvider") | ||
self.assertListEqual(model.providers, ["ROCMExecutionProvider", "CPUExecutionProvider"]) | ||
self.assertListEqual(model.model.get_providers(), model.providers) | ||
self.assertEqual(model.device, torch.device("cuda:0")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this test fail if the install of ORT + ROCM EP is not done correctly? If so, can we add a @require_ort_rocm
or something similar? If not, can we make these new tests @slow
? The CI is already huge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these tests should run by default as the marker of @pytest.mark.amdgpu_test
is present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I believe this marker is useful only to select a subset of tests to run (pytest -m "amdgpu_test"
).
If you try: pytest tests/onnxruntime -k "test_load_model_rocm_provider" -s -vvvvv
this test is IMO likely to run while it should probably not unless ROCM EP is installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the above line would run. What I meant was in our CI, the cpu tests wont't run it as it would require GPU?
and for the gpu tests only the gpu_test
marker is selected. https://github.com/huggingface/optimum/blob/main/tests/onnxruntime/docker/Dockerfile_onnxruntime_gpu#L26
So these tests are not running in the CI? Am I missing something or anyother place the test might run?
But if you want that if a user/ developer is running locally, something like @require_ort_rocm
is nice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the marker gpu_test
is misleading and should be changed to cuda_ep_test
or trt_ep_test
.
But if you want that if a user/ developer is running locally, something like @require_ort_rocm is nice?
Yes, that's what I meant. If I run locally pytest tests/onnxruntime -s -vvvvv
I would like it to be (somewhat) green
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok will make the change with @require_ort_rocm . Could change the gpu_test, to cuda_and_trt_ep_tests
as cuda and try ep are tested in same method .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, creating two markers for existing gpu tests cuda_ep_test or trt_ep_test. As only some of them have art
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work!
One test is failing:
|
Added a check for this to determine the provider. |
Hum it is still failing |
What does this PR do?
Adds AMD gpu support for
optimum-onnxruntime
usingROCMExecutionProvider
Before submitting