Skip to content

amd/IRON

🦾 - IRON: Unlocking the Full Potential of NPUs - 🦾

Discord Latest Release GitHub downloads Iron Tests PRs Welcome license: Apache Code style: black

IRONCLAD Logo

IRON is an open-source & close-to-metal Python API enabling fast and efficient execution on AMD Ryzen™ AI NPUs. It relies on language bindings around the MLIR-AIE dialect.

The IRON Python API for Ryzen™ AI NPUs is described in the following paper:

E. Hunhoff, J. Melber, K. Denolf, A. Bisca, S. Bayliss, S. Neuendorffer, J. Fifield, J. Lo, P. Vasireddy, P. James-Roxby, E. Keller. "Efficiency, Expressivity, and Extensibility in a Close-to-Metal NPU Programming Interface". In 33rd IEEE International Symposium On Field-Programmable Custom Computing Machines, May 2025.

🎯 Operator Dashboard

Section Description Datatype Status Design Example
Element-wise Add Element-wise addition kernel bfloat16 🟢 example/elementwise_add/
Element-wise Mul Element-wise multiplication kernel bfloat16 🟢 example/elementwise_mul/
GEMM General Matrix Multiplication kernel bfloat16 🟢 example/gemm/
GEMV General Matrix-Vector Multiplication kernel bfloat16 🟢 example/gemv/
GQA Grouped Query Attention kernel (Single pipeline) bfloat16 🟢 example/mha/
MHA Multi-Head Attention kernel & Grouped Query Attention bfloat16 🟢 example/mha/
RMSNorm RMSNorm kernel bfloat16 🟢 example/rms_norm/
RoPE Rotary Positional Embedding kernel bfloat16 🟢 example/rope/
SiLU Sigmoid Linear Unit activation kernel bfloat16 🟢 example/silu/
Softmax Softmax kernel bfloat16 🟢 example/softmax/
Weighted RMSNorm Weighted RMSNorm kernel bfloat16 🟢 example/rms_norm/
Copy Copy bfloat16 🟢 example/copy/
Transpose Transpose bfloat16 🟢 example/transpose/
AXPY AXPY bfloat16 🟢 example/axpy/
Reduction Reduction bfloat16 🟡
Dequant Dequant Q4NX from AWQ to bfloat16 bfloat16 🟢 example/dequant/
RELU RELU bfloat16 🟢 example/relu/
Leaky RELU Leaky RELU bfloat16
GELU GELU bfloat16 🟢 example/gelu/
LayerNorm LayerNorm bfloat16 🟢 example/layer_norm/
Convolution Convolution bfloat16 🟡
MaxPool MaxPool bfloat16
AveragePool AveragePool bfloat16

Use this dashboard to quickly check the status of each kernel and locate relevant setup, build, and usage information.

📌 Legend

Status Meaning
🟢 Done
🟡 In Development
Not Assigned

Installation (Linux)

These instructions will guide you through everything required for building and executing a program on the Ryzen™ AI NPU, starting from a fresh bare-bones Ubuntu 24.04 or Ubuntu 24.10 install.

Initial Setup

Be sure you have the latest BIOS on your laptop or mini-PC that enables the NPU. See here.

If starting from Ubuntu 24.04 you may need to update the Linux kernel to 6.11+ by installing the Hardware Enablement (HWE) stack:

sudo apt update
sudo apt install --install-recommends linux-generic-hwe-24.04
sudo reboot
  1. Install XDNA™ Driver and XRT:

    Instructions from mlir-aie repository

  2. Install the packages needed for IRON and MLIR-AIE:

    # Python versions 3.10, 3.12 and 3.13 are currently supported by our wheels
    sudo apt install \
    build-essential clang clang-14 lld lld-14 cmake ninja-build python3-venv python3-pip
  3. Setup a virtual environment and activate it:

    python3 -m venv ironenv
    source ironenv/bin/activate
    python3 -m pip install --upgrade pip
  4. Source XRT (installed in step 1):

    source /opt/xilinx/xrt/setup.sh
  5. Install required Python packages (from requirements.txt):

    MLIR_PYTHON_EXTRAS_SET_VERSION="0.0.8.3" HOST_MLIR_PYTHON_PACKAGE_PREFIX="aie" pip install -r requirements.txt
  6. To test your installation, you can try to build and run the example below:

    cmake -B build
    cmake --build build --target silu_1_cols_1_channels_2048_tile_2048_run

Note: On a fresh install, if you get CMake Error: Could not find CMAKE_ROOT !!!, just deactivate and reactivate your python environment.

Building & Testing

NOTE: Be sure the XRT setup script has been sourced: source /opt/xilinx/xrt/setup.sh

IRON is a CMake-based project. To configure the project, run:

cmake -B build

To build all designs, use:

cmake --build build

To test all the designs, use the following python script:

./scripts/run_tests.py --iter 1

You can select a single test to run using the --select flag.

Targets are listed when running cmake -B build with the following syntax:

Registering Executable: <TARGET_NAME>

If you want to build only a specific design, run:

# Example: cmake --build build --target silu_4_cols_1_channels_2048_tile_512
cmake --build build --target <TARGET_NAME>

You can also test an individual (or a selection of multiple) test(s) using the same script:

./scripts/run_tests.py --select <TARGET_ONE> --select <TARGET_TWO>

Additionally a target to build & run is made available under the <TARGET_NAME>_run symbol.

cmake --build build --target silu_4_cols_1_channels_2048_tile_512_run

Git Hooks (Optional but Recommended)

To ensure your code passes CI linting checks before pushing, install the pre-push hook:

cp scripts/hooks/pre-push .git/hooks/pre-push
chmod +x .git/hooks/pre-push

The hook will run the same linting checks as CI:

  • License checks (reuse)
  • Python formatting (black)
  • C++ formatting (clang-format)

To bypass the hook if needed: git push --no-verify


Copyright© 2025 Advanced Micro Devices, Inc

About

Close-to-metal programming for AMD NPUs

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published