Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AMD GPU support #1546

Merged
merged 11 commits into from
Nov 24, 2023
Merged

Conversation

mht-sharma
Copy link
Contributor

What does this PR do?

Adds AMD gpu support for optimum-onnxruntime using ROCMExecutionProvider

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@mht-sharma mht-sharma marked this pull request as ready for review November 20, 2023 12:17
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 20, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@fxmarty fxmarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work!

@@ -0,0 +1,117 @@
# Accelerated inference on AMD GPUs

By default, ONNX Runtime runs inference on CPU devices. However, it is possible to place supported operations on an AMD Instinct GPU, while leaving any unsupported ones on CPU. In most cases, this allows costly operations to be placed on GPU and significantly accelerate inference.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could clarify that we have tested on Instinct GPUs, but that support matrix is https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html (unless ROCMExecutionProvider explicitely requires Instinct? In which case we can give a ref)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

docs/source/onnxruntime/usage_guides/amdgpu.mdx Outdated Show resolved Hide resolved

## Installation

#### 1. ROCM Installation (V 5.7.X)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### 1. ROCM Installation (V 5.7.X)
#### 1. ROCm Installation (V 5.7.X)

(V 5.7.X) means that ROCm 5.7 is a requirement? Can we put that rather in a sentence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The branch AMD team shared for onnxruntime was on ROCm 5.7. So this has been tested on the ROCm5.7. I could mention that the following instructions are torun the ROCMEP on AMD gpu with ROCm 5.7 installed.

docs/source/onnxruntime/usage_guides/amdgpu.mdx Outdated Show resolved Hide resolved
Comment on lines 243 to 249
@require_torch_gpu
@pytest.mark.amdgpu_test
def test_load_model_rocm_provider(self):
model = ORTModel.from_pretrained(self.ONNX_MODEL_ID, provider="ROCMExecutionProvider")
self.assertListEqual(model.providers, ["ROCMExecutionProvider", "CPUExecutionProvider"])
self.assertListEqual(model.model.get_providers(), model.providers)
self.assertEqual(model.device, torch.device("cuda:0"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this test fail if the install of ORT + ROCM EP is not done correctly? If so, can we add a @require_ort_rocm or something similar? If not, can we make these new tests @slow? The CI is already huge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these tests should run by default as the marker of @pytest.mark.amdgpu_test is present?

Copy link
Contributor

@fxmarty fxmarty Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I believe this marker is useful only to select a subset of tests to run (pytest -m "amdgpu_test").

If you try: pytest tests/onnxruntime -k "test_load_model_rocm_provider" -s -vvvvv this test is IMO likely to run while it should probably not unless ROCM EP is installed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the above line would run. What I meant was in our CI, the cpu tests wont't run it as it would require GPU?
and for the gpu tests only the gpu_test marker is selected. https://github.com/huggingface/optimum/blob/main/tests/onnxruntime/docker/Dockerfile_onnxruntime_gpu#L26

So these tests are not running in the CI? Am I missing something or anyother place the test might run?

But if you want that if a user/ developer is running locally, something like @require_ort_rocm is nice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the marker gpu_test is misleading and should be changed to cuda_ep_test or trt_ep_test.

But if you want that if a user/ developer is running locally, something like @require_ort_rocm is nice?

Yes, that's what I meant. If I run locally pytest tests/onnxruntime -s -vvvvv I would like it to be (somewhat) green

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok will make the change with @require_ort_rocm . Could change the gpu_test, to cuda_and_trt_ep_tests as cuda and try ep are tested in same method .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, creating two markers for existing gpu tests cuda_ep_test or trt_ep_test. As only some of them have art

tests/onnxruntime/test_modeling.py Show resolved Hide resolved
tests/onnxruntime/test_modeling.py Show resolved Hide resolved
tests/onnxruntime/test_modeling.py Outdated Show resolved Hide resolved
tests/onnxruntime/test_modeling.py Outdated Show resolved Hide resolved
Copy link
Contributor

@fxmarty fxmarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work!

docs/source/onnxruntime/usage_guides/amdgpu.mdx Outdated Show resolved Hide resolved
@fxmarty
Copy link
Contributor

fxmarty commented Nov 23, 2023

One test is failing:

2023-11-23T13:43:36.0587352Z FAILED onnxruntime/test_utils.py::ProviderAndDeviceGettersTest::test_get_provider_for_device - AssertionError: 'ROCMExecutionProvider' != 'CUDAExecutionProvider'
2023-11-23T13:43:36.0587540Z - ROCMExecutionProvider
2023-11-23T13:43:36.0587650Z ? -- ^
2023-11-23T13:43:36.0587755Z + CUDAExecutionProvider
2023-11-23T13:43:36.0587831Z ?  ^^^
2023-11-23T13:43:36.0588086Z ==== 1 failed, 756 passed, 788 skipped, 2765 warnings in 1773.38s (0:29:33) ====

@mht-sharma
Copy link
Contributor Author

One test is failing:

2023-11-23T13:43:36.0587352Z FAILED onnxruntime/test_utils.py::ProviderAndDeviceGettersTest::test_get_provider_for_device - AssertionError: 'ROCMExecutionProvider' != 'CUDAExecutionProvider'
2023-11-23T13:43:36.0587540Z - ROCMExecutionProvider
2023-11-23T13:43:36.0587650Z ? -- ^
2023-11-23T13:43:36.0587755Z + CUDAExecutionProvider
2023-11-23T13:43:36.0587831Z ?  ^^^
2023-11-23T13:43:36.0588086Z ==== 1 failed, 756 passed, 788 skipped, 2765 warnings in 1773.38s (0:29:33) ====

Added a check for this to determine the provider.

@fxmarty
Copy link
Contributor

fxmarty commented Nov 24, 2023

Hum it is still failing

@mht-sharma mht-sharma merged commit e2d3c8b into huggingface:main Nov 24, 2023
48 of 52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants