IRON is an open-source & close-to-metal Python API enabling fast and efficient execution on AMD Ryzen™ AI NPUs. It relies on language bindings around the MLIR-AIE dialect.
The IRON Python API for Ryzen™ AI NPUs is described in the following paper:
E. Hunhoff, J. Melber, K. Denolf, A. Bisca, S. Bayliss, S. Neuendorffer, J. Fifield, J. Lo, P. Vasireddy, P. James-Roxby, E. Keller. "Efficiency, Expressivity, and Extensibility in a Close-to-Metal NPU Programming Interface". In 33rd IEEE International Symposium On Field-Programmable Custom Computing Machines, May 2025.
| Section | Description | Datatype | Status | Design Example |
|---|---|---|---|---|
| Element-wise Add | Element-wise addition kernel | bfloat16 | 🟢 | example/elementwise_add/ |
| Element-wise Mul | Element-wise multiplication kernel | bfloat16 | 🟢 | example/elementwise_mul/ |
| GEMM | General Matrix Multiplication kernel | bfloat16 | 🟢 | example/gemm/ |
| GEMV | General Matrix-Vector Multiplication kernel | bfloat16 | 🟢 | example/gemv/ |
| GQA | Grouped Query Attention kernel (Single pipeline) | bfloat16 | 🟢 | example/mha/ |
| MHA | Multi-Head Attention kernel & Grouped Query Attention | bfloat16 | 🟢 | example/mha/ |
| RMSNorm | RMSNorm kernel | bfloat16 | 🟢 | example/rms_norm/ |
| RoPE | Rotary Positional Embedding kernel | bfloat16 | 🟢 | example/rope/ |
| SiLU | Sigmoid Linear Unit activation kernel | bfloat16 | 🟢 | example/silu/ |
| Softmax | Softmax kernel | bfloat16 | 🟢 | example/softmax/ |
| Weighted RMSNorm | Weighted RMSNorm kernel | bfloat16 | 🟢 | example/rms_norm/ |
| Copy | Copy | bfloat16 | 🟢 | example/copy/ |
| Transpose | Transpose | bfloat16 | 🟢 | example/transpose/ |
| AXPY | AXPY | bfloat16 | 🟢 | example/axpy/ |
| Reduction | Reduction | bfloat16 | 🟡 | |
| Dequant | Dequant Q4NX from AWQ to bfloat16 | bfloat16 | 🟢 | example/dequant/ |
| RELU | RELU | bfloat16 | 🟢 | example/relu/ |
| Leaky RELU | Leaky RELU | bfloat16 | ⚪ | |
| GELU | GELU | bfloat16 | 🟢 | example/gelu/ |
| LayerNorm | LayerNorm | bfloat16 | 🟢 | example/layer_norm/ |
| Convolution | Convolution | bfloat16 | 🟡 | |
| MaxPool | MaxPool | bfloat16 | ⚪ | |
| AveragePool | AveragePool | bfloat16 | ⚪ |
Use this dashboard to quickly check the status of each kernel and locate relevant setup, build, and usage information.
| Status | Meaning |
|---|---|
| 🟢 | Done |
| 🟡 | In Development |
| ⚪ | Not Assigned |
These instructions will guide you through everything required for building and executing a program on the Ryzen™ AI NPU, starting from a fresh bare-bones Ubuntu 24.04 or Ubuntu 24.10 install.
Be sure you have the latest BIOS on your laptop or mini-PC that enables the NPU. See here.
If starting from Ubuntu 24.04 you may need to update the Linux kernel to 6.11+ by installing the Hardware Enablement (HWE) stack:
sudo apt update
sudo apt install --install-recommends linux-generic-hwe-24.04
sudo reboot-
Install XDNA™ Driver and XRT:
-
Install the packages needed for IRON and MLIR-AIE:
# Python versions 3.10, 3.12 and 3.13 are currently supported by our wheels sudo apt install \ build-essential clang clang-14 lld lld-14 cmake ninja-build python3-venv python3-pip -
Setup a virtual environment and activate it:
python3 -m venv ironenv source ironenv/bin/activate python3 -m pip install --upgrade pip -
Source XRT (installed in step 1):
source /opt/xilinx/xrt/setup.sh -
Install required Python packages (from requirements.txt):
MLIR_PYTHON_EXTRAS_SET_VERSION="0.0.8.3" HOST_MLIR_PYTHON_PACKAGE_PREFIX="aie" pip install -r requirements.txt
-
To test your installation, you can try to build and run the example below:
cmake -B build cmake --build build --target silu_1_cols_1_channels_2048_tile_2048_run
Note: On a fresh install, if you get CMake Error: Could not find CMAKE_ROOT !!!, just deactivate and reactivate your python environment.
NOTE: Be sure the XRT setup script has been sourced:
source /opt/xilinx/xrt/setup.sh
IRON is a CMake-based project. To configure the project, run:
cmake -B buildTo build all designs, use:
cmake --build buildTo test all the designs, use the following python script:
./scripts/run_tests.py --iter 1You can select a single test to run using the --select flag.
Targets are listed when running
cmake -B buildwith the following syntax:Registering Executable: <TARGET_NAME>
If you want to build only a specific design, run:
# Example: cmake --build build --target silu_4_cols_1_channels_2048_tile_512
cmake --build build --target <TARGET_NAME>You can also test an individual (or a selection of multiple) test(s) using the same script:
./scripts/run_tests.py --select <TARGET_ONE> --select <TARGET_TWO>Additionally a target to build & run is made available under the <TARGET_NAME>_run symbol.
cmake --build build --target silu_4_cols_1_channels_2048_tile_512_runTo ensure your code passes CI linting checks before pushing, install the pre-push hook:
cp scripts/hooks/pre-push .git/hooks/pre-push
chmod +x .git/hooks/pre-pushThe hook will run the same linting checks as CI:
- License checks (reuse)
- Python formatting (black)
- C++ formatting (clang-format)
To bypass the hook if needed: git push --no-verify
Copyright© 2025 Advanced Micro Devices, Inc
