|
| 1 | +# Understanding Targets |
| 2 | + |
| 3 | +TileLang is built on top of TVM, which relies on **targets** to describe the device you want to compile for. |
| 4 | +The target determines which code generator is used (CUDA, HIP, Metal, LLVM, …) and allows you to pass |
| 5 | +device-specific options such as GPU architecture flags. This page summarises how to pick and customise a target |
| 6 | +when compiling TileLang programs. |
| 7 | + |
| 8 | +## Common target strings |
| 9 | + |
| 10 | +TileLang ships with a small set of common targets; each accepts the full range of TVM options so you can fine-tune |
| 11 | +the generated code. The most frequent choices are listed below: |
| 12 | + |
| 13 | +| Base name | Description | |
| 14 | +| --------- | ----------- | |
| 15 | +| `auto` | Detects CUDA → HIP → Metal in that order. Useful when running the same script across machines. | |
| 16 | +| `cuda` | NVIDIA GPUs. Supports options such as `-arch=sm_80`, `-max_num_threads=1024`, etc. | |
| 17 | +| `hip` | AMD GPUs via ROCm. Options like `-mcpu=gfx90a` can be appended. | |
| 18 | +| `metal` | Apple Silicon GPUs (arm64 Macs). | |
| 19 | +| `llvm` | CPU execution; accepts the standard TVM LLVM switches. | |
| 20 | +| `webgpu` | Browser / WebGPU runtimes. | |
| 21 | +| `c` | Emit plain C source for inspection or custom toolchains. | |
| 22 | + |
| 23 | +To add options, append them after the base name, separated by spaces. For example: |
| 24 | + |
| 25 | +```python |
| 26 | +target = "cuda -arch=sm_90" |
| 27 | +kernel = tilelang.compile(func, target=target, execution_backend="cython") |
| 28 | +# or |
| 29 | +@tilelang.jit(target=target) |
| 30 | +def compiled_kernel(*args): |
| 31 | + return func(*args) |
| 32 | +``` |
| 33 | + |
| 34 | +The same convention works for HIP or LLVM (e.g. `hip -mcpu=gfx940`, `llvm -mtriple=x86_64-linux-gnu`). |
| 35 | + |
| 36 | +### Advanced: Specify Exact Hardware |
| 37 | + |
| 38 | +When you already know the precise GPU model, you can encode it in the target string—either via `-arch=sm_XX` or by |
| 39 | +using one of TVM’s pre-defined target tags such as `nvidia/nvidia-h100`. Supplying this detail is optional for |
| 40 | +TileLang in general use, but it becomes valuable when the TVM cost model is enabled (e.g. during autotuning). The |
| 41 | +cost model uses the extra attributes to make better scheduling predictions. If you skip this step (or do not use the |
| 42 | +cost model), generic targets like `cuda` or `auto` are perfectly fine. |
| 43 | + |
| 44 | +All CUDA compute capabilities recognised by TVM’s target registry are listed below. Pick the one that matches your |
| 45 | +GPU and append it to the target string or use the corresponding target tag—for example `nvidia/nvidia-a100`. |
| 46 | + |
| 47 | +| Architecture | GPUs (examples) | |
| 48 | +| ------------ | ---------------- | |
| 49 | +| `sm_20` | `nvidia/tesla-c2050`, `nvidia/tesla-c2070` | |
| 50 | +| `sm_21` | `nvidia/nvs-5400m`, `nvidia/geforce-gt-520` | |
| 51 | +| `sm_30` | `nvidia/quadro-k5000`, `nvidia/geforce-gtx-780m` | |
| 52 | +| `sm_35` | `nvidia/tesla-k40`, `nvidia/quadro-k6000` | |
| 53 | +| `sm_37` | `nvidia/tesla-k80` | |
| 54 | +| `sm_50` | `nvidia/quadro-k2200`, `nvidia/geforce-gtx-950m` | |
| 55 | +| `sm_52` | `nvidia/tesla-m40`, `nvidia/geforce-gtx-980` | |
| 56 | +| `sm_53` | `nvidia/jetson-tx1`, `nvidia/jetson-nano` | |
| 57 | +| `sm_60` | `nvidia/tesla-p100`, `nvidia/quadro-gp100` | |
| 58 | +| `sm_61` | `nvidia/tesla-p4`, `nvidia/quadro-p6000`, `nvidia/geforce-gtx-1080` | |
| 59 | +| `sm_62` | `nvidia/jetson-tx2` | |
| 60 | +| `sm_70` | `nvidia/nvidia-v100`, `nvidia/quadro-gv100` | |
| 61 | +| `sm_72` | `nvidia/jetson-agx-xavier` | |
| 62 | +| `sm_75` | `nvidia/nvidia-t4`, `nvidia/quadro-rtx-8000`, `nvidia/geforce-rtx-2080` | |
| 63 | +| `sm_80` | `nvidia/nvidia-a100`, `nvidia/nvidia-a30` | |
| 64 | +| `sm_86` | `nvidia/nvidia-a40`, `nvidia/nvidia-a10`, `nvidia/geforce-rtx-3090` | |
| 65 | +| `sm_87` | `nvidia/jetson-agx-orin-32gb`, `nvidia/jetson-agx-orin-64gb` | |
| 66 | +| `sm_89` | `nvidia/geforce-rtx-4090` | |
| 67 | +| `sm_90a` | `nvidia/nvidia-h100` (DPX profile) | |
| 68 | +| `sm_100a` | `nvidia/nvidia-b100` | |
| 69 | + |
| 70 | +Refer to NVIDIA’s [CUDA GPUs](https://developer.nvidia.com/cuda-gpus) page or the TVM source |
| 71 | +(`3rdparty/tvm/src/target/tag.cc`) for the latest mapping between devices and compute capabilities. |
| 72 | + |
| 73 | +## Creating targets programmatically |
| 74 | + |
| 75 | +If you prefer working with TVM’s `Target` objects, TileLang exposes the helper |
| 76 | +`tilelang.utils.target.determine_target` (returns a canonical target string by default, or the `Target` |
| 77 | +object when `return_object=True`): |
| 78 | + |
| 79 | +```python |
| 80 | +from tilelang.utils.target import determine_target |
| 81 | + |
| 82 | +tvm_target = determine_target("cuda -arch=sm_80", return_object=True) |
| 83 | +kernel = tilelang.compile(func, target=tvm_target) |
| 84 | +``` |
| 85 | + |
| 86 | +You can also build targets directly through TVM: |
| 87 | + |
| 88 | +```python |
| 89 | +from tvm.target import Target |
| 90 | + |
| 91 | +target = Target("cuda", host="llvm") |
| 92 | +target = target.with_host(Target("llvm -mcpu=skylake")) |
| 93 | +``` |
| 94 | + |
| 95 | +TileLang accepts either `str` or `Target` inputs; internally they are normalised and cached using the canonical |
| 96 | +string representation. **In user code we strongly recommend passing target strings rather than |
| 97 | +`tvm.target.Target` instances—strings keep cache keys compact and deterministic across runs, whereas constructing |
| 98 | +fresh `Target` objects may lead to slightly higher hashing overhead or inconsistent identity semantics.** |
| 99 | + |
| 100 | +## Discovering supported targets in code |
| 101 | + |
| 102 | +Looking for a quick reminder of the built-in base names and their descriptions? Use: |
| 103 | + |
| 104 | +```python |
| 105 | +from tilelang.utils.target import describe_supported_targets |
| 106 | + |
| 107 | +for name, doc in describe_supported_targets().items(): |
| 108 | + print(f"{name:>6}: {doc}") |
| 109 | +``` |
| 110 | + |
| 111 | +This helper mirrors the table above and is safe to call at runtime (for example when validating CLI arguments). |
| 112 | + |
| 113 | +## Troubleshooting tips |
| 114 | + |
| 115 | +- If you see `Target cuda -arch=sm_80 is not supported`, double-check the spellings and that the option is valid for |
| 116 | + TVM. Any invalid switch will surface as a target-construction error. |
| 117 | +- Runtime errors such as “no kernel image is available” usually mean the `-arch` flag does not match the GPU you are |
| 118 | + running on. Try dropping the flag or switching to the correct compute capability. |
| 119 | +- When targeting multiple environments, use `auto` for convenience and override with an explicit string only when |
| 120 | + you need architecture-specific tuning. |
0 commit comments