Skip to content

Commit b4e7379

Browse files
Update docs
1 parent 68de01b commit b4e7379

File tree

172 files changed

+1653
-158
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

172 files changed

+1653
-158
lines changed

_sources/autoapi/tilelang/carver/arch/cdna/index.rst.txt

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,6 @@ Module Contents
3030
Bases: :py:obj:`tilelang.carver.arch.arch_base.TileDevice`
3131

3232

33-
Represents the architecture of a computing device, capturing various hardware specifications.
34-
35-
3633
.. py:attribute:: target
3734
3835

_sources/autoapi/tilelang/carver/arch/cuda/index.rst.txt

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,6 @@ Module Contents
4848
Bases: :py:obj:`tilelang.carver.arch.arch_base.TileDevice`
4949

5050

51-
Represents the architecture of a computing device, capturing various hardware specifications.
52-
53-
5451
.. py:attribute:: target
5552
5653

_sources/autoapi/tilelang/carver/arch/metal/index.rst.txt

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,6 @@ Module Contents
3030
Bases: :py:obj:`tilelang.carver.arch.arch_base.TileDevice`
3131

3232

33-
Represents the architecture of a computing device, capturing various hardware specifications.
34-
35-
3633
.. py:attribute:: target
3734
3835

_sources/autoapi/tilelang/carver/roller/hint/index.rst.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -307,12 +307,12 @@ Module Contents
307307
308308
309309
.. py:property:: raxis_order
310-
:type: tilelang.carver.roller.rasterization.List[int]
310+
:type: List[int]
311311

312312

313313

314314
.. py:property:: step
315-
:type: tilelang.carver.roller.rasterization.List[int]
315+
:type: List[int]
316316

317317

318318

_sources/autoapi/tilelang/carver/roller/policy/tensorcore/index.rst.txt

Lines changed: 1 addition & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -30,15 +30,11 @@ Module Contents
3030

3131
.. py:data:: logger
3232
33-
.. py:class:: TensorCorePolicy(arch, tags = None)
33+
.. py:class:: TensorCorePolicy
3434
3535
Bases: :py:obj:`tilelang.carver.roller.policy.default.DefaultPolicy`
3636

3737

38-
Default Policy for fastdlight, a heuristic plan that tries to
39-
minimize memory traffic and maximize parallelism.for BitBLAS Schedule.
40-
41-
4238
.. py:attribute:: wmma_k
4339
:type: int
4440
:value: 16
@@ -65,67 +61,16 @@ Module Contents
6561

6662
.. py:method:: infer_node_smem_usage(td, node)
6763
68-
Infers the shared memory usage of a node given a TileDict configuration.
69-
70-
:param td: The TileDict object containing the tile configuration.
71-
:type td: TileDict
72-
:param node: The node for which to infer the shared memory usage.
73-
:type node: PrimFuncNode
74-
75-
:returns: The estimated amount of shared memory used by the node.
76-
:rtype: int
77-
78-
7964
8065
.. py:method:: get_node_reduce_step_candidates(node)
8166
82-
Calculates reduction step candidates for each reduction axis in a PrimFuncNode. General idea : use factor first, since it does not require extra boundary check. for large prime number, which is rare case, use power of 2.
83-
84-
:param node: The node for which to calculate reduction step candidates. It contains reduction axes (raxis)
85-
with their domains (dom.extent).
86-
:type node: PrimFuncNode
87-
88-
:returns: A dictionary mapping axis variable names to lists of step candidates. For each axis in the node,
89-
this function calculates possible step sizes. For axes with a large prime domain, it uses powers of 2
90-
as step candidates; for others, it uses all factors of the domain.
91-
:rtype: Dict[str, List[int]]
92-
93-
9467
9568
.. py:method:: check_tile_shape_isvalid(td)
9669
97-
Checks if the tile shapes in the TileDict are valid for the nodes in this context.
98-
99-
Parameters:
100-
- td (TileDict): The TileDict object containing tile shapes and other configurations.
101-
102-
Returns:
103-
- bool: True if all tile shapes are valid, False otherwise.
104-
105-
10670
10771
.. py:method:: compute_node_stride_map(node, td)
10872
109-
Computes the stride map for a given node based on the TileDict configuration.
110-
111-
:param node: The node for which to compute the stride map.
112-
:type node: PrimFuncNode
113-
:param td: The TileDict object containing the tile configuration.
114-
:type td: TileDict
115-
116-
:returns: A tuple of dictionaries containing the output strides and tensor strides.
117-
:rtype: Tuple[Dict, Dict]
118-
119-
12073
12174
.. py:method:: plan_rasterization(td)
12275
123-
Plans the rasterization for the given TileDict. This function is not implemented yet.
124-
125-
:param td: The TileDict object to plan rasterization for.
126-
:type td: TileDict
127-
128-
:raises RasterRationPlan: This function is not implemented yet.
129-
130-
13176

_sources/autoapi/tilelang/carver/template/general_reduce/index.rst.txt

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,11 @@ Module Contents
2020
Bases: :py:obj:`tilelang.carver.template.base.BaseTemplate`
2121

2222

23+
Base class template for hardware-aware configurations.
24+
This serves as an abstract base class (ABC) that defines the structure
25+
for subclasses implementing hardware-specific optimizations.
26+
27+
2328
.. py:attribute:: structure
2429
:type: Union[str, List[str]]
2530
:value: None
@@ -40,6 +45,19 @@ Module Contents
4045

4146
.. py:method:: get_hardware_aware_configs(arch = None, topk = 10)
4247
48+
Abstract method that must be implemented by subclasses.
49+
It should return a list of hardware-aware configurations (hints)
50+
based on the specified architecture.
51+
52+
:param arch: The target architecture. Defaults to None.
53+
:type arch: TileDevice, optional
54+
:param topk: Number of top configurations to return. Defaults to 10.
55+
:type topk: int, optional
56+
57+
:returns: A list of recommended hardware-aware configurations.
58+
:rtype: List[Hint]
59+
60+
4361

4462
.. py:method:: initialize_function()
4563

_sources/autoapi/tilelang/engine/phase/index.rst.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ Functions
1515
tilelang.engine.phase.allow_vectorize
1616
tilelang.engine.phase.allow_global_thread_synchronization
1717
tilelang.engine.phase.should_enable_aggressive_merge
18+
tilelang.engine.phase.should_force_let_inline
1819
tilelang.engine.phase.LowerAndLegalize
1920
tilelang.engine.phase.OptimizeForTarget
2021

@@ -34,6 +35,8 @@ Module Contents
3435
3536
.. py:function:: should_enable_aggressive_merge(pass_ctx = None, target = None)
3637
38+
.. py:function:: should_force_let_inline(pass_ctx = None)
39+
3740
.. py:function:: LowerAndLegalize(mod, target)
3841
3942
Bind target information and progressively legalize and lower frontend Tile IR into a form suitable for downstream optimization and codegen.

_sources/autoapi/tilelang/transform/pass_config/index.rst.txt

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,15 @@ Module Contents
179179
synchronization is not needed. Default: False
180180

181181

182+
.. py:attribute:: TL_FORCE_LET_INLINE
183+
:value: 'tl.force_let_inline'
184+
185+
186+
False
187+
188+
:type: Force TileLang to inline let bindings during simplification. Default
189+
190+
182191
.. py:attribute:: TIR_ENABLE_EQUIV_TERMS_IN_CSE
183192
:value: 'tir.enable_equiv_terms_in_cse_tir'
184193

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# LetStmt Inlining in TileLang
2+
3+
This document explains how `LetStmt` inlining works in TileLang's simplification pipeline, which is an important optimization that affects code generation and performance.
4+
5+
## Overview
6+
7+
A `LetStmt` (Let Statement) is a temporary variable binding in the IR (Intermediate Representation). During compilation, TileLang's simplifier may choose to inline these temporary variables to simplify the code. TileLang also provides a standalone `LetInline` pass that performs eager substitution before the main legalization pipeline. However, not all `LetStmt` nodes can be safely inlined.
8+
9+
## When Does LetStmt Get Inlined?
10+
11+
The inlining logic is implemented in `src/transform/simplify.cc`. A `LetStmt` will be inlined if **both** of the following conditions are met:
12+
13+
### 1. The value satisfies `CanInlineLetStmt`
14+
15+
The `CanInlineLetStmt` helper returns `true` when:
16+
17+
- **The value is a constant** (`is_const_number(op->value)` returns true)
18+
- **The value is a variable** (`op->value.as<VarNode>()` returns a node)
19+
- **The value is an integer expression without side effects**:
20+
- The value has `int` dtype
21+
- The side effect level is `kPure` or lower (no observable side effects)
22+
23+
```cpp
24+
bool CanInlineLetStmt(const LetStmtNode *op) {
25+
if (is_const_number(op->value))
26+
return true;
27+
if (op->value.as<VarNode>())
28+
return true;
29+
// Won't face the deep expression explosion problem as in Let expression.
30+
// attempt to inline as much as possible if the value integer type(can be
31+
// index).
32+
if (!op->value.dtype().is_int())
33+
return false;
34+
return SideEffect(op->value) <= CallEffectKind::kPure;
35+
}
36+
```
37+
38+
### 2. The variable is NOT used in buffer definitions
39+
40+
Even if `CanInlineLetStmt` returns true, the variable will **not** be inlined if it's used in a buffer's definition (shape, strides, elem_offset, or data fields).
41+
42+
This protection exists because:
43+
- Buffer definitions are not updated during the simplification pass
44+
- If a variable used in a buffer definition is inlined, later references to that buffer would fail to find the variable definition
45+
- This would cause compilation errors or incorrect behavior
46+
47+
The mutator checks this before dropping the binding:
48+
49+
```cpp
50+
bool used_in_buffer_def = used_in_buffer_def_.count(op->var.get());
51+
52+
if (can_inline && !used_in_buffer_def) {
53+
return body; // Inline: remove LetStmt and return body directly
54+
}
55+
```
56+
57+
## Example: Why Buffer Definition Variables Are Protected
58+
59+
Consider this code:
60+
61+
```python
62+
let stride = M * 16
63+
let buffer_a = Buffer(data, shape=[M, N], strides=[stride, 1])
64+
buffer_a[i, j] = ...
65+
```
66+
67+
- `stride` satisfies `CanInlineLetStmt` (it's an int expression with no side effects)
68+
- However, `stride` is used in `buffer_a`'s `strides` field
69+
- If we inline it, the buffer definition becomes `strides=[M*16, 1]`
70+
- But the Buffer object's fields are not updated during simplification
71+
- Later code accessing `buffer_a` would fail to find the `stride` variable
72+
73+
Therefore, `stride` is added to `used_in_buffer_def_` and will **not** be inlined.
74+
75+
## How Variables Are Collected
76+
77+
The `CollectVarsUsedInBufferDefinition` helper traverses all `BufferLoad` and `BufferStore` nodes and collects variables used in their buffer definitions:
78+
79+
```cpp
80+
void VisitBuffer(const Buffer &buf) {
81+
// Collect variables that should remain defined
82+
VarUseDefAnalyzer usage(Array<Var>{});
83+
usage(buf->data);
84+
for (const auto &dim : buf->shape) {
85+
usage(dim);
86+
}
87+
for (const auto &dim : buf->strides) {
88+
usage(dim);
89+
}
90+
usage(buf->elem_offset);
91+
92+
// Track for use in LetStmtNode mutator
93+
for (const auto &var : usage.undefined_) {
94+
used_in_buffer_def_.insert(var.get());
95+
}
96+
}
97+
```
98+
99+
## Practical Example: Temporary Variable Issue
100+
101+
Consider this TileLang code:
102+
103+
```python
104+
for i in T.Parallel(block_N):
105+
idx = bx * block_N + i
106+
tmp = T.max(A[idx], 1)
107+
B[idx] = tmp / 2
108+
A[idx] = tmp * 2
109+
```
110+
111+
In this case:
112+
- `tmp` is an integer-like temporary variable
113+
- It satisfies `CanInlineLetStmt` (pure int expression)
114+
- It's **not** used in any buffer definition
115+
- Therefore, `tmp` **will be inlined**
116+
117+
This means the IR becomes:
118+
119+
```python
120+
for i in T.Parallel(block_N):
121+
idx = bx * block_N + i
122+
B[idx] = T.max(A[idx], 1) / 2
123+
A[idx] = T.max(A[idx], 1) * 2
124+
```
125+
126+
If this causes issues (e.g., `A[idx]` being read twice with different values due to the first write), it indicates a potential problem with the inlining heuristic or the code pattern.
127+
128+
## Controlling Let Inlining via Pass Config
129+
130+
TileLang exposes an explicit pass configuration key, `tilelang.PassConfigKey.TL_FORCE_LET_INLINE` (`"tl.force_let_inline"`), that allows users to force the eager `LetInline` pass to run before the legalization pipeline begins. When enabled, the pipeline invokes `tilelang.transform.LetInline()` at the start of `LowerAndLegalize` (see `tilelang/engine/phase.py`). This knob is useful when debugging LetStmt-related issues or when deterministic inlining behavior is desired across different environments.
131+
132+
```python
133+
from tilelang import transform
134+
from tilelang.engine.phase import LowerAndLegalize
135+
136+
with transform.PassContext(
137+
config={transform.PassConfigKey.TL_FORCE_LET_INLINE: True}
138+
):
139+
lowered_mod = LowerAndLegalize(input_mod, target)
140+
```
141+
142+
If the flag is left unset (the default), the eager pass is only applied when downstream transforms opt in (for example, by calling `_Simplify(..., inline_let=True)` inside Tile operators). The guard in `tilelang/engine/phase.py` ensures the eager pass is only triggered when the user explicitly requests it.
143+
144+
## Summary
145+
146+
The LetStmt inlining mechanism is a **conservative optimization** that:
147+
1. Aggressively inlines simple, pure integer expressions to simplify the IR
148+
2. Protects variables used in buffer definitions to avoid breaking buffer access
149+
3. Helps reduce IR complexity and improve code generation
150+
4. Can be forced through `TL_FORCE_LET_INLINE` when deterministic eager inlining is required
151+
152+
Understanding when inlining happens is crucial for:
153+
- Debugging compilation issues
154+
- Understanding generated code
155+
- Writing efficient TileLang programs
156+
- Identifying potential optimization opportunities or bugs
157+
158+
## Related Files
159+
160+
- `src/transform/simplify.cc`: Main Simplify implementation
161+
- `src/transform/frontend_legalize.cc`: Standalone LetInline pass
162+
- `tilelang/engine/phase.py`: Pipeline integration for eager LetInlining
163+
- `testing/python/transform/test_tilelang_transform_let_inline.py`: Regression coverage for the pass

_sources/index.md.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,13 @@ deeplearning_operators/matmul
3535
deeplearning_operators/deepseek_mla
3636
:::
3737

38+
:::{toctree}
39+
:maxdepth: 1
40+
:caption: COMPILER INTERNALS
41+
42+
compiler_internals/letstmt_inline
43+
:::
44+
3845
:::{toctree}
3946
:maxdepth: 1
4047
:caption: API Reference

0 commit comments

Comments
 (0)