Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct Driver HAL #816

Merged
merged 35 commits into from
Oct 19, 2024
Merged

Direct Driver HAL #816

merged 35 commits into from
Oct 19, 2024

Conversation

makslevental
Copy link
Collaborator

@makslevental makslevental commented Oct 2, 2024

quote-you-know-crazy-straws-they-go-all-over-the-place-these-straws-are-sane-they-never-lost-mitch-hedberg-129-5-0527

Let's try this again: this is a HAL that skips/avoids/elides XRT completely. It does so by (essentially) going straight to the XDNA driver's (kmq) shim which talks directly to the kernel module. Note that XDNA's shim actually couples tightly to XRT itself so the bulk of the actual "work" here is/was decoupling. Thus, we vendor the decoupled shim. Just to be explicit/clear: after this PR, on Linux, the only thing needed in the environment is the kernel module itself - not XRT, nor XDNA's shim.

Call outs:

  1. Only Linux for now (Windows soon);
  2. On Linux, we no longer even need to clone XRT (completely really does mean completely);
    • We no longer use .xclbin as the artifact and now just load the .pdi directly;
  3. This uses the so-called kernel-managed queue (kmq) path; user-managed queues (umq) aren't supported until aie4;
  4. The current implementation here is still sync and still uses direct (rather than indirect) command buffers; this PR is already large/complex enough without changing the paradigms. The next PR will add/implement those;
  5. Nothing about how drivers themselves are bumped needs to change - you/anyone can continue to install the XDNA drivers using their full instructions there and everything will work the same, i.e., the XRT pieces will still be skipped even if you have them installed. But I will probably put together a script that builds/installs only their kernel module for those that want to opt-out of XRT in their environment;
  6. For now, the XRT HAL will still work on Linux but it should be considered deprecated (hence no longer tested in CI).

TODOs before landing:

  • remove instructional comments leftover from the null HAL;
  • undo "OO"-ness of the HAL
  • add missing trace zones
  • remove smart pointers where possible
  • move hwq into kernel
  • remove exceptions from the shim.
  • parameterize number of cols/rows (currently hardcoded 4x4 for Phoenix);

shout-out to @benvanik for putting together the null HAL lickety-split for me to study while working on this PR.

@makslevental makslevental force-pushed the makslevental/xrt-lite branch 11 times, most recently from 9644459 to 8f82b0a Compare October 3, 2024 02:35
@makslevental makslevental force-pushed the makslevental/xrt-lite branch 19 times, most recently from 8e0d61a to dae35a9 Compare October 12, 2024 00:11
@makslevental makslevental force-pushed the makslevental/xrt-lite branch from 29941e2 to 022cda3 Compare October 18, 2024 19:50
@makslevental makslevental force-pushed the makslevental/xrt-lite branch from 022cda3 to 8974d20 Compare October 18, 2024 20:02
@makslevental makslevental enabled auto-merge (squash) October 18, 2024 23:08
@makslevental makslevental disabled auto-merge October 18, 2024 23:18
@makslevental makslevental enabled auto-merge (squash) October 18, 2024 23:21
@makslevental makslevental disabled auto-merge October 18, 2024 23:22
@makslevental makslevental enabled auto-merge (squash) October 18, 2024 23:22
@makslevental makslevental force-pushed the makslevental/xrt-lite branch from dd6cafb to 3df44d3 Compare October 18, 2024 23:31
@makslevental makslevental force-pushed the makslevental/xrt-lite branch from 3df44d3 to 93a7ba5 Compare October 18, 2024 23:45
@makslevental makslevental merged commit fad9629 into main Oct 19, 2024
5 checks passed
@makslevental makslevental deleted the makslevental/xrt-lite branch October 19, 2024 00:06
@@ -147,6 +148,8 @@ cd ${OUTPUT_DIR}
export MATMUL_TESTS_RUN=0
export MATMUL_TESTS_FAILS=0

DEVICE_HAL="${DEVICE_HAL:-xrt-lite}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to run this test on windows, do we need to change this line so DEVICE_HAL can be xrt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the CI computer, this link of course is a good solution. But can we make it easier for developers to run this script on a local machine?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Developers should know how to set environment variables

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we summarize all environment variables needed for developers in this file?
So not all developers need to read the CI script to duplicate the effort?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the final time: XRT is deprecated so I am not spending absolutely any time documenting/fixing/describing/explaining anything about it.

So not all developers need to read the CI script to duplicate the effort?

Developers are responsible for reading code relevant to their jobs.

@@ -0,0 +1,114 @@
# Copyright 2024 The IREE Authors

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to add a CTS test for XRT HAL driver in this PR?

This not mentioned in PR description. What does the CTS testdo?

Refer to the PR description below:

Let's try this again: this is a HAL that skips/avoids/elides XRT completely. It does so by (essentially) going straight to the [XDNA driver's (kmq) shim](https://github.com/amd/xdna-driver/tree/main/src/shim) which talks directly to the kernel module. Note that XDNA's shim actually couples tightly to XRT itself so the bulk of the actual "work" here is/was decoupling. Thus, we vendor the decoupled shim. Just to be explicit/clear: after this PR, on Linux, the only thing needed in the environment is the [kernel module itself](https://github.com/amd/xdna-driver/tree/main/src/driver/amdxdna) - not XRT, nor XDNA's shim.

Call outs:

Only Linux for now (Windows soon);
On Linux, we no longer even need to clone XRT (completely really does mean completely);
We no longer use .xclbin as the artifact and now just load the .pdi directly;
This uses the so-called kernel-managed queue (kmq) path; user-managed queues (umq) aren't supported until aie4;
The current implementation here is still sync and still uses direct (rather than indirect) command buffers; this PR is already large/complex enough without changing the paradigms. The next PR will add/implement those;
Nothing about how drivers themselves are bumped needs to change - you/anyone can continue to install the XDNA drivers using their full instructions there and everything will work the same, i.e., the XRT pieces will still be skipped even if you have them installed. But I will probably put together a script that builds/installs only their kernel module for those that want to opt-out of XRT in their environment;
For now, the XRT HAL will still work on Linux but it should be considered deprecated (hence no longer tested in CI).
TODOs before landing:

 remove instructional comments leftover from the null HAL;
 undo "OO"-ness of the HAL
 add missing trace zones
 remove smart pointers where possible
 move hwq into kernel
 remove exceptions from the shim.
 parameterize number of cols/rows (currently hardcoded 4x4 for Phoenix);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants