Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct Driver HAL #816

Merged
merged 35 commits into from
Oct 19, 2024
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
22ada20
[WIP] XRT-LITE HAL
makslevental Oct 2, 2024
5a17ba0
add linux kmq shim (doesn't build)
makslevental Oct 2, 2024
be04318
compiles
makslevental Oct 3, 2024
488b3d2
start from scratch
makslevental Oct 9, 2024
b972a98
bring shim back in-tree
makslevental Oct 10, 2024
12a9844
impl allocator
makslevental Oct 10, 2024
b953f91
buffer impl
makslevental Oct 10, 2024
d1072ee
executable acche impl
makslevental Oct 10, 2024
89a4a01
command buffer
makslevental Oct 11, 2024
f66ed50
e2e
makslevental Oct 11, 2024
1494af7
hack tests for xrt-lite
makslevental Oct 12, 2024
f34f5b2
remove xclbin (and XRT) dep
makslevental Oct 12, 2024
d9c83cc
removed non-load-bearing vtable functions
makslevental Oct 12, 2024
ec92dfb
put shim_debug behind define
makslevental Oct 12, 2024
63e917d
parameterize tests with device-hal
makslevental Oct 13, 2024
9310e77
make xrt-lite default
makslevental Oct 15, 2024
01787d8
remove null comments
makslevental Oct 15, 2024
a5e6785
refactor device.cc
makslevental Oct 15, 2024
8aaa1d2
fix iree-benchmark
makslevental Oct 15, 2024
976c418
undo OO
makslevental Oct 16, 2024
df0e66b
address comments
makslevental Oct 16, 2024
0045d4f
add missing trace zones
makslevental Oct 16, 2024
7356f7a
remove smart pointers
makslevental Oct 16, 2024
dc3d7b9
remove unnecessary sync to device
makslevental Oct 16, 2024
57e0f2c
undo reinterpret_cast
makslevental Oct 16, 2024
e27d2c0
remove exceptions
makslevental Oct 17, 2024
8bed954
parameterize n_rows, n_cols
makslevental Oct 17, 2024
1a3efc4
incorporate comments
makslevental Oct 18, 2024
514db1d
make xrt_lite_n_core_rows, xrt_lite_n_core_cols required
makslevental Oct 18, 2024
a224bcc
really make xrt-lite default and test building in "cleanroom"
makslevental Oct 18, 2024
db3cc6b
add iree-benchmark-module test
makslevental Oct 18, 2024
8974d20
remove WERROR hack and add `run_all_runtime_tests.sh`
makslevental Oct 18, 2024
05ddaec
Delete .github/workflows/ci-linux-cleanroom.yml
makslevental Oct 18, 2024
66db9ce
Update compiler/plugins/target/AMD-AIE/iree-amd-aie/Target/AIETarget.h
makslevental Oct 18, 2024
93a7ba5
Apply suggestions from code review
makslevental Oct 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 35 additions & 39 deletions .github/workflows/ci-linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,10 @@ jobs:
git remote add origin $REPO_ADDRESS
git -c protocol.version=2 fetch --depth 1 origin $BRANCH_NAME
git reset --hard FETCH_HEAD
git -c submodule."third_party/torch-mlir".update=none -c submodule."third_party/stablehlo".update=none -c submodule."src/runtime_src/core/common/aiebu".update=none submodule update --init --recursive --depth 1 --single-branch -j 10

- name: Install deps
run: |
dnf install -y almalinux-release-devel epel-release
yum remove -y openssl-devel zlib-devel || true
yum install -y protobuf-devel protobuf-compiler tmate
git -c submodule."third_party/torch-mlir".update=none \
-c submodule."third_party/stablehlo".update=none \
-c submodule."third_party/XRT".update=none \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look ma no XRT

submodule update --init --recursive --depth 1 --single-branch -j 10

- name: Python deps
run: |
Expand All @@ -69,6 +66,11 @@ jobs:
key: ${{ env.CACHE_KEY }}
restore-keys: linux-build-test-cpp-

- name: Peano dep
run: |
bash build_tools/download_peano.sh
echo "PEANO_INSTALL_DIR=$PWD/llvm-aie" >> $GITHUB_ENV

- name: Build packages
run: |
export cache_dir="${{ env.CACHE_DIR }}"
Expand Down Expand Up @@ -147,60 +149,54 @@ jobs:
source .venv/bin/activate
pip install -r tests/requirements.txt

- name: Query device info
run: |
source .venv/bin/activate
echo "aie-metadata"
python build_tools/ci/amdxdna_driver_utils/amdxdna_ioctl.py --aie-metadata
echo "aie-version"
python build_tools/ci/amdxdna_driver_utils/amdxdna_ioctl.py --aie-version
echo "XRT_LITE_N_CORE_ROWS=$(python build_tools/ci/amdxdna_driver_utils/amdxdna_ioctl.py --num-rows)" >> $GITHUB_ENV
echo "XRT_LITE_N_CORE_COLS=$(python build_tools/ci/amdxdna_driver_utils/amdxdna_ioctl.py --num-cols)" >> $GITHUB_ENV

- name : E2E comparison of AIE to llvm-cpu
run: |
source .venv/bin/activate
source /opt/xilinx/xrt/setup.sh
python build_tools/ci/cpu_comparison/run.py \
test_aie_vs_cpu \
$PWD/iree-install \
$PWD/llvm-aie \
--xrt-dir /opt/xilinx/xrt \
--vitis-dir /opt/Xilinx/Vitis/2024.2 \
--reset-npu-between-runs -v
--reset-npu-between-runs -v \
--xrt_lite_n_core_rows=$XRT_LITE_N_CORE_ROWS \
--xrt_lite_n_core_cols=$XRT_LITE_N_CORE_COLS

- name: E2E correctness matmul test
run: |
# Without this additional line an error like
#
# [XRT] ERROR: Failed to allocate host memory buffer (mmap(len=10616832, prot=3, flags=8193, offset=4294967296)
# failed (err=11): Resource temporarily unavailable), make sure host bank is enabled (see xbutil configure --host-mem)
# iree-amd-aie/runtime/src/iree-amd-aie/driver/xrt/direct_allocator.cc:179: RESOURCE_EXHAUSTED; could not allocate
# memory for buffer; while invoking C++ function matmul_test.generate_random_matrix; while calling import;
#
# might be observed when too much memory is allocated. This
# error was seen when running a bf16->f32 matmul with m=n=k=2304.
#
# This line was suggested at https://github.com/Xilinx/mlir-air/issues/566
#
# Note that this is only half of the fix. It is also necessary that
# the machine that CI is running on has permission to run this line.
#
# This permission can be adding by adding the line
# ```
# %github ALL=(ALL) NOPASSWD: /usr/bin/prlimit *
# ```
#
# to the file /etc/sudoers.d/github, which can be done by running
# ```
# sudo visudo -f /etc/sudoers.d/github
# ```
# on the github CI machine.
# https://stackoverflow.com/a/17567422
# shim_xdna::bo::map_drm_bo does an mmap with MAP_LOCKED
# which can fail if limit is to low
sudo prlimit -lunlimited --pid $$
source .venv/bin/activate
source /opt/xilinx/xrt/setup.sh
bash build_tools/ci/run_matmul_test.sh \
test_matmuls \
iree-install \
$PWD/llvm-aie \
/opt/xilinx/xrt \
/opt/Xilinx/Vitis/2024.2

- name: Python tests
run: |
source .venv/bin/activate
source /opt/xilinx/xrt/setup.sh
pytest -v tests \
--capture=tee-sys \
--iree-install-dir=$PWD/iree-install \
--peano-install-dir=$PWD/llvm-aie
--peano-install-dir=$PWD/llvm-aie \
--xrt_lite_n_core_rows=$XRT_LITE_N_CORE_ROWS \
--xrt_lite_n_core_cols=$XRT_LITE_N_CORE_COLS

- name: XRT-LITE tests
run: |
DEVICE_TEST_DIR="$PWD/iree-install/device_tests"
for t in $(ls $DEVICE_TEST_DIR); do
$DEVICE_TEST_DIR/$t --xrt_lite_n_core_rows=$XRT_LITE_N_CORE_ROWS --xrt_lite_n_core_cols=$XRT_LITE_N_CORE_COLS
done
5 changes: 4 additions & 1 deletion .github/workflows/ci-macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,10 @@ jobs:
git remote add origin $REPO_ADDRESS
git -c protocol.version=2 fetch --depth 1 origin $BRANCH_NAME
git reset --hard FETCH_HEAD
git -c submodule."third_party/torch-mlir".update=none -c submodule."third_party/stablehlo".update=none -c submodule."src/runtime_src/core/common/aiebu".update=none submodule update --init --recursive --depth 1 --single-branch -j 10
git -c submodule."third_party/torch-mlir".update=none \
-c submodule."third_party/stablehlo".update=none \
-c submodule."third_party/XRT".update=none \
submodule update --init --recursive --depth 1 --single-branch -j 10

- uses: actions/setup-python@v4
with:
Expand Down
20 changes: 15 additions & 5 deletions .github/workflows/ci-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,10 @@ jobs:
git remote add origin $REPO_ADDRESS
git -c protocol.version=2 fetch --depth 1 origin $BRANCH_NAME
git reset --hard FETCH_HEAD
git -c submodule."third_party/torch-mlir".update=none -c submodule."third_party/stablehlo".update=none -c submodule."src/runtime_src/core/common/aiebu".update=none submodule update --init --recursive --depth 1 --single-branch -j 10
git -c submodule."third_party/torch-mlir".update=none \
-c submodule."third_party/stablehlo".update=none \
-c submodule."src/runtime_src/core/common/aiebu".update=none \
submodule update --init --recursive --depth 1 --single-branch -j 10

- name: Setup Cpp
uses: aminya/setup-cpp@v1
Expand All @@ -87,14 +90,18 @@ jobs:
key: ${{ env.CACHE_KEY }}
restore-keys: windows-build-test-cpp-

- name: Peano dep
run: |
.\build_tools\download_peano.ps1
Add-Content -Path $env:GITHUB_ENV -Value "PEANO_INSTALL_DIR=$PWD\llvm-aie"

- name: Build packages
run: |
$env:cache_dir = "${{ env.CACHE_DIR }}"
$env:CCACHE_COMPILERCHECK = "string:$(clang-cl.exe --version)"
.\build_tools\build_llvm.ps1
# Remove-Item -Path "$pwd\llvm-build" -Force
$env:llvm_install_dir = "$pwd\llvm-install"
echo $env:llvm_install_dir
.\build_tools.\build_test_cpp.ps1

- name: Create artifacts
Expand Down Expand Up @@ -170,6 +177,7 @@ jobs:
shell: bash
run: |
source .venv/Scripts/activate
export DEVICE_HAL=xrt
bash build_tools/ci/run_matmul_test.sh \
/c/test_matmuls \
$PWD/iree-install \
Expand All @@ -182,7 +190,8 @@ jobs:
python build_tools/ci/cpu_comparison/run.py \
/c/test_aie_vs_cpu \
$PWD/iree-install \
$PWD/llvm-aie -v
$PWD/llvm-aie -v \
--device-hal=xrt

- name: Python tests
run: |
Expand All @@ -191,5 +200,6 @@ jobs:
mkdir temp
pytest tests -sv `
--basetemp=$PWD\temp `
--iree-install-dir="$PWD/iree-install" `
--peano-install-dir="$PWD/llvm-aie"
--iree-install-dir="$PWD\iree-install" `
--peano-install-dir="$PWD\llvm-aie" `
--device-hal=xrt
125 changes: 54 additions & 71 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,7 @@

# AMD AIE Plugin for IREE

This repository contains an early-phase IREE compiler and runtime plugin for interfacing the AMD AIE accelerator to IREE.

## Architectural Overview

![image](https://github.com/nod-ai/iree-amd-aie/assets/74956/3fa73139-5fdf-4658-86c3-0705352c4ea0)

This repository contains an early-phase IREE compiler and runtime plugin for targeting AMD NPUs with IREE.

## Developer Setup

Expand All @@ -26,32 +21,30 @@ git clone --recursive [email protected]:nod-ai/iree-amd-aie.git
git clone --recursive https://github.com/nod-ai/iree-amd-aie.git
```

or if you want a faster checkout

or, if you want a faster checkout,

```
git \
-c submodule."third_party/torch-mlir".update=none \
-c submodule."third_party/stablehlo".update=none \
-c submodule."src/runtime_src/core/common/aiebu".update=none \
-c submodule."third_party/XRT".update=none \
clone \
--recursive \
--shallow-submodules \
https://github.com/nod-ai/iree-amd-aie.git
[email protected]:nod-ai/iree-amd-aie.git # https://github.com/nod-ai/iree-amd-aie.git
```

The above avoids cloning entire repo histories, and skips unused nested submodules.
The above avoids cloning entire repo histories for submodules, and skips a few, currently, unused,
submodules that are nested in IREE.

## Building (along with IREE)

### Just show me the CMake

To configure and build with XRT runtime enabled

```
cd iree-amd-aie
cmake \
-B $WHERE_YOU_WOULD_LIKE_TO_BUILD \
-B <WHERE_YOU_WOULD_LIKE_TO_BUILD> \
-S third_party/iree \
-DIREE_CMAKE_PLUGIN_PATHS=$PWD \
-DIREE_BUILD_PYTHON_BINDINGS=ON \
Expand All @@ -62,20 +55,20 @@ cmake \
-DIREE_TARGET_BACKEND_DEFAULTS=OFF \
-DIREE_TARGET_BACKEND_LLVM_CPU=ON \
-DIREE_BUILD_TESTS=ON \
-DIREE_EXTERNAL_HAL_DRIVERS=xrt \
-DCMAKE_INSTALL_PREFIX=$WHERE_YOU_WOULD_LIKE_TO_INSTALL
cmake --build $WHERE_YOU_WOULD_LIKE_TO_BUILD
-DIREE_EXTERNAL_HAL_DRIVERS=xrt-lite \
-DCMAKE_INSTALL_PREFIX=<WHERE_YOU_WOULD_LIKE_TO_INSTALL>
cmake --build <WHERE_YOU_WOULD_LIKE_TO_BUILD>
```

### Instructions

The bare minimum configure command for IREE with the amd-aie plugin
The bare minimum configure command for IREE with the amd-aie plugin

```
cmake \
-B $WHERE_YOU_WOULD_LIKE_TO_BUILD \
-S $IREE_REPO_SRC_DIR \
-DIREE_CMAKE_PLUGIN_PATHS=$IREE_AMD_AIE_REPO_SRC_DIR \
-B <WHERE_YOU_WOULD_LIKE_TO_BUILD> \
-S <IREE_REPO_SRC_DIR> \
-DIREE_CMAKE_PLUGIN_PATHS=<IREE_AMD_AIE_REPO_SRC_DIR> \
-DIREE_BUILD_PYTHON_BINDINGS=ON
```

Expand All @@ -88,7 +81,8 @@ Very likely, you will want to use `ccache` and `lld` (or some other modern linke
-DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=lld"
```

If you don't plan on using any of IREE's frontends or backends/targets (e.g., you're doing work on this code base itself), you can opt-out of everything (except the `llvm-cpu` backend) with
If you don't plan on using any of IREE's frontends or backends/targets (e.g., you're doing work on this code base itself),
you can opt-out of everything (except the `llvm-cpu` backend) with

```
-DIREE_INPUT_STABLEHLO=OFF \
Expand All @@ -111,75 +105,64 @@ If you're "bringing your own LLVM", i.e., you have a prebuilt/compiled distribut
-DIREE_BUILD_BUNDLED_LLVM=OFF
```

In this case you will need to supply `-DLLVM_EXTERNAL_LIT=$SOMEWHERE` (e.g., `pip install lit; SOMEWHERE=$(which lit)`).
In this case you will need `lit` somewhere in your environment and you will need to add to CMake `-DLLVM_EXTERNAL_LIT=<SOMEWHERE>`
(e.g., `pip install lit; SOMEWHERE=$(which lit)`).

Note, getting the right/matching build of LLVM, that works with IREE is tough (besides the commit hash, there are various flags to set).
To enable adventurous users to avail themselves of `-DIREE_BUILD_BUNDLED_LLVM=OFF` we cache/store/save the LLVM distribution for every successful CI run.
These can then be downloaded by checking the artifacts section of any recent CI run's [Summary page](https://github.com/nod-ai/iree-amd-aie/actions/runs/10713474448):
See [Bringing your own LLVM](#bringing-your-own-llvm) below for more information on using prebuilt/compiled distributions of LLVM.

<p align="center">
<img src="https://github.com/user-attachments/assets/97fdeff2-41af-4a6d-a072-6ef0a1ec5695" width="500">
</p>
## Testing

Lit tests specific to AIE can be run with something like
Lit tests (i.e., compiler tests) specific to AIE can be run with something like

```
cd $WHERE_YOU_WOULD_LIKE_TO_BUILD
ctest -R amd-aie
cd <WHERE_YOU_WOULD_LIKE_TO_BUILD>
ctest -R amd-aie --output-on-failure -j 10
```

Other tests which run on hardware and requiring XRT are in the `build_tools` subdirectory.

## Runtime driver setup
(the `-j 10` runs `10` tests in parallel)

To enable the runtime driver, you need to also enable the XRT HAL
Other tests, which run on device, are in the `build_tools` subdirectory.
See [build_tools/ci/run_all_runtime_tests.sh](build_tools/ci/run_all_runtime_tests.sh) for an example script that shows how to run all the runtime tests.

```
-DIREE_EXTERNAL_HAL_DRIVERS=xrt
```
## Pro-tips

Additional IREE-specific flags are explained at [IREE's build instructions](https://iree.dev/building-from-source/getting-started/#quickstart-clone-and-build). To use Ninja instead of Make, and clang++ instead of g++, you can add
### Bringing your own LLVM

When using a pre-built distribution of LLVM, getting the right/matching build, that works with IREE, is tough (besides the commit hash, there are various flags to set).
To enable adventurous users to avail themselves of `-DIREE_BUILD_BUNDLED_LLVM=OFF` we cache/store/save the LLVM distribution for every successful CI run.
These can then be downloaded by checking the artifacts section of any recent CI run's [Summary page](https://github.com/nod-ai/iree-amd-aie/actions/runs/10713474448):

```
-G Ninja \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_C_COMPILER=clang
```
<p align="center">
<img src="https://github.com/user-attachments/assets/97fdeff2-41af-4a6d-a072-6ef0a1ec5695" width="500">
</p>


### Ubuntu Dependencies
### Debugging HAL

XRT requires a number of packages. Here are the requirements for various operating systems
You can turn on HAL API tracing by adding to CMake:

```
apt install \
libcurl4-openssl-dev \
libdrm-dev \
libelf-dev \
libprotobuf-dev \
libudev-dev \
pkg-config \
protobuf-compiler \
python3-pybind11 \
systemtap-sdt-dev \
uuid-dev
-DIREE_ENABLE_RUNTIME_TRACING=ON
-DIREE_TRACING_PROVIDER=console
// optional but recommended
-DIREE_TRACING_CONSOLE_FLUSH=1
```

### RH Based Deps
This will you show you all the HAL APIs that have `IREE_TRACE_ZONE_BEGIN ... IREE_TRACE_ZONE_END` that are hit during a run/execution (of, e.g., `iree-run-module`).

This is an incomplete list derived by adding what is needed to our development base manylinux (AlmaLinux 8) image.
You can turn on VM tracing by adding to CMake:

```
yum install \
libcurl-devel \
libdrm-devel \
libudev-devel \
libuuid-devel \
ncurses-devel \
pkgconfig \
protobuf-compiler \
protobuf-devel \
systemtap-sdt-devel \
uuid-devel
-DIREE_VM_EXECUTION_TRACING_ENABLE=1
-DIREE_VM_EXECUTION_TRACING_FORCE_ENABLE=1
// optional
-DIREE_VM_EXECUTION_TRACING_SRC_LOC_ENABLE=1
```

This will show you all of the [VM dispatches](https://github.com/iree-org/iree/blob/0e8a5737dfe49a48a4e9c15ba7a7d24dd2fd7623/runtime/src/iree/vm/bytecode/dispatch.c#L661) that actually occur during a run/execution.
Note, this is roughly equivalent to [passing](https://github.com/nod-ai/iree-amd-aie/blob/737092791dc2428ad71bc172f69804c583b0f60e/build_tools/ci/run_matmul_test.sh#L420) `--compile-to=vm` to `iree-compile`.

## Architectural overview (out of date)

![image](https://github.com/nod-ai/iree-amd-aie/assets/74956/3fa73139-5fdf-4658-86c3-0705352c4ea0)

Loading
Loading