Skip to content

Commit ac0dbd2

Browse files
Update docs
1 parent 3066eb5 commit ac0dbd2

File tree

7 files changed

+91
-19
lines changed

7 files changed

+91
-19
lines changed

_sources/autoapi/tilelang/language/kernel/index.rst.txt

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,6 +216,31 @@ Module Contents
216216
:returns: **res** -- The result LaunchThreadFrame.
217217
:rtype: Tuple[frame.LaunchThreadFrame]
218218

219+
.. rubric:: Examples
220+
221+
Create a 1-D CUDA kernel launch and unpack the single block index:
222+
223+
.. code-block:: python
224+
225+
with T.Kernel(T.ceildiv(N, 128), threads=128) as bx:
226+
# bx is the blockIdx.x binding (also iterable as (bx,))
227+
...
228+
229+
Launch a 2-D grid while requesting two thread dimensions:
230+
231+
.. code-block:: python
232+
233+
with T.Kernel(grid_x, grid_y, threads=(64, 2)) as (bx, by):
234+
tx, ty = T.get_thread_bindings()
235+
...
236+
237+
Emit a CPU kernel where thread bindings are skipped:
238+
239+
.. code-block:: python
240+
241+
with T.Kernel(loop_extent, is_cpu=True) as (i,):
242+
...
243+
219244
220245
.. py:function:: get_thread_binding(dim = 0)
221246

_sources/autoapi/tilelang/transform/index.rst.txt

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ Functions
4949
tilelang.transform.LegalizeSafeMemoryAccess
5050
tilelang.transform.MakePackedAPI
5151
tilelang.transform.AnnotateDeviceRegions
52+
tilelang.transform.SplitHostDevice
5253
tilelang.transform.VectorizeLoop
5354
tilelang.transform.InjectPTXAsyncCopy
5455
tilelang.transform.LowerDeviceStorageAccessInfo
@@ -273,6 +274,14 @@ Package Contents
273274
:rtype: tvm.transform.Pass
274275

275276

277+
.. py:function:: SplitHostDevice()
278+
279+
Split host/device functions even for empty kernels.
280+
281+
:returns: **fpass** -- The result pass
282+
:rtype: tvm.transform.Pass
283+
284+
276285
.. py:function:: VectorizeLoop(enable_vectorize = True)
277286
278287
VectorizeLoop

autoapi/tilelang/language/kernel/index.html

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -790,6 +790,24 @@ <h2>Module Contents<a class="headerlink" href="#module-contents" title="Link to
790790
<dd class="field-odd"><p>Tuple[frame.LaunchThreadFrame]</p>
791791
</dd>
792792
</dl>
793+
<p class="rubric">Examples</p>
794+
<p>Create a 1-D CUDA kernel launch and unpack the single block index:</p>
795+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">T</span><span class="o">.</span><span class="n">Kernel</span><span class="p">(</span><span class="n">T</span><span class="o">.</span><span class="n">ceildiv</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="mi">128</span><span class="p">),</span> <span class="n">threads</span><span class="o">=</span><span class="mi">128</span><span class="p">)</span> <span class="k">as</span> <span class="n">bx</span><span class="p">:</span>
796+
<span class="c1"># bx is the blockIdx.x binding (also iterable as (bx,))</span>
797+
<span class="o">...</span>
798+
</pre></div>
799+
</div>
800+
<p>Launch a 2-D grid while requesting two thread dimensions:</p>
801+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">T</span><span class="o">.</span><span class="n">Kernel</span><span class="p">(</span><span class="n">grid_x</span><span class="p">,</span> <span class="n">grid_y</span><span class="p">,</span> <span class="n">threads</span><span class="o">=</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span> <span class="k">as</span> <span class="p">(</span><span class="n">bx</span><span class="p">,</span> <span class="n">by</span><span class="p">):</span>
802+
<span class="n">tx</span><span class="p">,</span> <span class="n">ty</span> <span class="o">=</span> <span class="n">T</span><span class="o">.</span><span class="n">get_thread_bindings</span><span class="p">()</span>
803+
<span class="o">...</span>
804+
</pre></div>
805+
</div>
806+
<p>Emit a CPU kernel where thread bindings are skipped:</p>
807+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">T</span><span class="o">.</span><span class="n">Kernel</span><span class="p">(</span><span class="n">loop_extent</span><span class="p">,</span> <span class="n">is_cpu</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> <span class="k">as</span> <span class="p">(</span><span class="n">i</span><span class="p">,):</span>
808+
<span class="o">...</span>
809+
</pre></div>
810+
</div>
793811
</dd></dl>
794812

795813
<dl class="py function">

autoapi/tilelang/transform/index.html

Lines changed: 36 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -564,58 +564,61 @@ <h2>Functions<a class="headerlink" href="#functions" title="Link to this heading
564564
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.AnnotateDeviceRegions" title="tilelang.transform.AnnotateDeviceRegions"><code class="xref py py-obj docutils literal notranslate"><span class="pre">AnnotateDeviceRegions</span></code></a>()</p></td>
565565
<td><p>AnnotateDeviceRegions</p></td>
566566
</tr>
567-
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.VectorizeLoop" title="tilelang.transform.VectorizeLoop"><code class="xref py py-obj docutils literal notranslate"><span class="pre">VectorizeLoop</span></code></a>([enable_vectorize])</p></td>
567+
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.SplitHostDevice" title="tilelang.transform.SplitHostDevice"><code class="xref py py-obj docutils literal notranslate"><span class="pre">SplitHostDevice</span></code></a>()</p></td>
568+
<td><p>Split host/device functions even for empty kernels.</p></td>
569+
</tr>
570+
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.VectorizeLoop" title="tilelang.transform.VectorizeLoop"><code class="xref py py-obj docutils literal notranslate"><span class="pre">VectorizeLoop</span></code></a>([enable_vectorize])</p></td>
568571
<td><p>VectorizeLoop</p></td>
569572
</tr>
570-
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.InjectPTXAsyncCopy" title="tilelang.transform.InjectPTXAsyncCopy"><code class="xref py py-obj docutils literal notranslate"><span class="pre">InjectPTXAsyncCopy</span></code></a>()</p></td>
573+
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.InjectPTXAsyncCopy" title="tilelang.transform.InjectPTXAsyncCopy"><code class="xref py py-obj docutils literal notranslate"><span class="pre">InjectPTXAsyncCopy</span></code></a>()</p></td>
571574
<td><p>Rewrite global to shared memory copy on CUDA with asynchronous copy.</p></td>
572575
</tr>
573-
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.LowerDeviceStorageAccessInfo" title="tilelang.transform.LowerDeviceStorageAccessInfo"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LowerDeviceStorageAccessInfo</span></code></a>()</p></td>
576+
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.LowerDeviceStorageAccessInfo" title="tilelang.transform.LowerDeviceStorageAccessInfo"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LowerDeviceStorageAccessInfo</span></code></a>()</p></td>
574577
<td><p>Lower attached storage access information on device.</p></td>
575578
</tr>
576-
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.LoopVectorizeDynamic" title="tilelang.transform.LoopVectorizeDynamic"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LoopVectorizeDynamic</span></code></a>()</p></td>
579+
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.LoopVectorizeDynamic" title="tilelang.transform.LoopVectorizeDynamic"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LoopVectorizeDynamic</span></code></a>()</p></td>
577580
<td><p>Try to vectorize loop with dynamic shape.</p></td>
578581
</tr>
579-
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.ConfigIndexBitwidth" title="tilelang.transform.ConfigIndexBitwidth"><code class="xref py py-obj docutils literal notranslate"><span class="pre">ConfigIndexBitwidth</span></code></a>()</p></td>
582+
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.ConfigIndexBitwidth" title="tilelang.transform.ConfigIndexBitwidth"><code class="xref py py-obj docutils literal notranslate"><span class="pre">ConfigIndexBitwidth</span></code></a>()</p></td>
580583
<td><p>Config index bitwidth.</p></td>
581584
</tr>
582-
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.FlattenBuffer" title="tilelang.transform.FlattenBuffer"><code class="xref py py-obj docutils literal notranslate"><span class="pre">FlattenBuffer</span></code></a>()</p></td>
585+
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.FlattenBuffer" title="tilelang.transform.FlattenBuffer"><code class="xref py py-obj docutils literal notranslate"><span class="pre">FlattenBuffer</span></code></a>()</p></td>
583586
<td><p>FlattenBuffer</p></td>
584587
</tr>
585-
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.EliminateStorageSyncForMBarrier" title="tilelang.transform.EliminateStorageSyncForMBarrier"><code class="xref py py-obj docutils literal notranslate"><span class="pre">EliminateStorageSyncForMBarrier</span></code></a>()</p></td>
588+
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.EliminateStorageSyncForMBarrier" title="tilelang.transform.EliminateStorageSyncForMBarrier"><code class="xref py py-obj docutils literal notranslate"><span class="pre">EliminateStorageSyncForMBarrier</span></code></a>()</p></td>
586589
<td><p>EliminateStorageSyncForMBarrier</p></td>
587590
</tr>
588-
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.MergeSharedMemoryAllocations" title="tilelang.transform.MergeSharedMemoryAllocations"><code class="xref py py-obj docutils literal notranslate"><span class="pre">MergeSharedMemoryAllocations</span></code></a>([...])</p></td>
591+
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.MergeSharedMemoryAllocations" title="tilelang.transform.MergeSharedMemoryAllocations"><code class="xref py py-obj docutils literal notranslate"><span class="pre">MergeSharedMemoryAllocations</span></code></a>([...])</p></td>
589592
<td><p>MergeSharedMemoryAllocations</p></td>
590593
</tr>
591-
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.LowerL2Persistent" title="tilelang.transform.LowerL2Persistent"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LowerL2Persistent</span></code></a>()</p></td>
594+
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.LowerL2Persistent" title="tilelang.transform.LowerL2Persistent"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LowerL2Persistent</span></code></a>()</p></td>
592595
<td><p>LowerL2Persistent</p></td>
593596
</tr>
594-
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.PersistThreadblock" title="tilelang.transform.PersistThreadblock"><code class="xref py py-obj docutils literal notranslate"><span class="pre">PersistThreadblock</span></code></a>()</p></td>
597+
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.PersistThreadblock" title="tilelang.transform.PersistThreadblock"><code class="xref py py-obj docutils literal notranslate"><span class="pre">PersistThreadblock</span></code></a>()</p></td>
595598
<td><p>PersistThreadblock</p></td>
596599
</tr>
597-
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.AlignDynamicSharedMemoryAllocations" title="tilelang.transform.AlignDynamicSharedMemoryAllocations"><code class="xref py py-obj docutils literal notranslate"><span class="pre">AlignDynamicSharedMemoryAllocations</span></code></a>([align_bytes])</p></td>
600+
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.AlignDynamicSharedMemoryAllocations" title="tilelang.transform.AlignDynamicSharedMemoryAllocations"><code class="xref py py-obj docutils literal notranslate"><span class="pre">AlignDynamicSharedMemoryAllocations</span></code></a>([align_bytes])</p></td>
598601
<td><p>AlignDynamicSharedMemoryAllocations</p></td>
599602
</tr>
600-
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.LowerSharedBarrier" title="tilelang.transform.LowerSharedBarrier"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LowerSharedBarrier</span></code></a>()</p></td>
603+
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.LowerSharedBarrier" title="tilelang.transform.LowerSharedBarrier"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LowerSharedBarrier</span></code></a>()</p></td>
601604
<td><p>LowerSharedBarrier</p></td>
602605
</tr>
603-
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.StorageRewrite" title="tilelang.transform.StorageRewrite"><code class="xref py py-obj docutils literal notranslate"><span class="pre">StorageRewrite</span></code></a>()</p></td>
606+
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.StorageRewrite" title="tilelang.transform.StorageRewrite"><code class="xref py py-obj docutils literal notranslate"><span class="pre">StorageRewrite</span></code></a>()</p></td>
604607
<td><p>StorageRewrite</p></td>
605608
</tr>
606-
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.LowerOpaqueBlock" title="tilelang.transform.LowerOpaqueBlock"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LowerOpaqueBlock</span></code></a>()</p></td>
609+
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.LowerOpaqueBlock" title="tilelang.transform.LowerOpaqueBlock"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LowerOpaqueBlock</span></code></a>()</p></td>
607610
<td><p>LowerOpaqueBlock</p></td>
608611
</tr>
609-
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.LowerThreadAllreduce" title="tilelang.transform.LowerThreadAllreduce"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LowerThreadAllreduce</span></code></a>()</p></td>
612+
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.LowerThreadAllreduce" title="tilelang.transform.LowerThreadAllreduce"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LowerThreadAllreduce</span></code></a>()</p></td>
610613
<td><p>LowerThreadAllreduce</p></td>
611614
</tr>
612-
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.LowerDeviceKernelLaunch" title="tilelang.transform.LowerDeviceKernelLaunch"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LowerDeviceKernelLaunch</span></code></a>()</p></td>
615+
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.LowerDeviceKernelLaunch" title="tilelang.transform.LowerDeviceKernelLaunch"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LowerDeviceKernelLaunch</span></code></a>()</p></td>
613616
<td><p>Create and return a transform pass that lowers device kernel launch constructs to target-specific IR.</p></td>
614617
</tr>
615-
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.LowerSharedTmem" title="tilelang.transform.LowerSharedTmem"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LowerSharedTmem</span></code></a>()</p></td>
618+
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.LowerSharedTmem" title="tilelang.transform.LowerSharedTmem"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LowerSharedTmem</span></code></a>()</p></td>
616619
<td><p>LowerSharedTmem</p></td>
617620
</tr>
618-
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.transform.LayoutReducer" title="tilelang.transform.LayoutReducer"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LayoutReducer</span></code></a>()</p></td>
621+
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.transform.LayoutReducer" title="tilelang.transform.LayoutReducer"><code class="xref py py-obj docutils literal notranslate"><span class="pre">LayoutReducer</span></code></a>()</p></td>
619622
<td><p>Return a TVM transform pass that performs layout reduction/normalization.</p></td>
620623
</tr>
621624
</tbody>
@@ -960,6 +963,20 @@ <h3>Returns:<a class="headerlink" href="#returns" title="Link to this heading">
960963
</dl>
961964
</dd></dl>
962965

966+
<dl class="py function">
967+
<dt class="sig sig-object py" id="tilelang.transform.SplitHostDevice">
968+
<span class="sig-prename descclassname"><span class="pre">tilelang.transform.</span></span><span class="sig-name descname"><span class="pre">SplitHostDevice</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#tilelang.transform.SplitHostDevice" title="Link to this definition"></a></dt>
969+
<dd><p>Split host/device functions even for empty kernels.</p>
970+
<dl class="field-list simple">
971+
<dt class="field-odd">Returns<span class="colon">:</span></dt>
972+
<dd class="field-odd"><p><strong>fpass</strong> – The result pass</p>
973+
</dd>
974+
<dt class="field-even">Return type<span class="colon">:</span></dt>
975+
<dd class="field-even"><p>tvm.transform.Pass</p>
976+
</dd>
977+
</dl>
978+
</dd></dl>
979+
963980
<dl class="py function">
964981
<dt class="sig sig-object py" id="tilelang.transform.VectorizeLoop">
965982
<span class="sig-prename descclassname"><span class="pre">tilelang.transform.</span></span><span class="sig-name descname"><span class="pre">VectorizeLoop</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">enable_vectorize</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#tilelang.transform.VectorizeLoop" title="Link to this definition"></a></dt>
@@ -1254,6 +1271,7 @@ <h3>Returns:<a class="headerlink" href="#returns" title="Link to this heading">
12541271
<li><a class="reference internal" href="#tilelang.transform.LegalizeSafeMemoryAccess"><code class="docutils literal notranslate"><span class="pre">LegalizeSafeMemoryAccess()</span></code></a></li>
12551272
<li><a class="reference internal" href="#tilelang.transform.MakePackedAPI"><code class="docutils literal notranslate"><span class="pre">MakePackedAPI()</span></code></a></li>
12561273
<li><a class="reference internal" href="#tilelang.transform.AnnotateDeviceRegions"><code class="docutils literal notranslate"><span class="pre">AnnotateDeviceRegions()</span></code></a></li>
1274+
<li><a class="reference internal" href="#tilelang.transform.SplitHostDevice"><code class="docutils literal notranslate"><span class="pre">SplitHostDevice()</span></code></a></li>
12571275
<li><a class="reference internal" href="#tilelang.transform.VectorizeLoop"><code class="docutils literal notranslate"><span class="pre">VectorizeLoop()</span></code></a></li>
12581276
<li><a class="reference internal" href="#tilelang.transform.InjectPTXAsyncCopy"><code class="docutils literal notranslate"><span class="pre">InjectPTXAsyncCopy()</span></code></a></li>
12591277
<li><a class="reference internal" href="#tilelang.transform.LowerDeviceStorageAccessInfo"><code class="docutils literal notranslate"><span class="pre">LowerDeviceStorageAccessInfo()</span></code></a></li>

genindex.html

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4587,6 +4587,8 @@ <h2>S</h2>
45874587
<li><a href="autoapi/tilelang/jit/param/index.html#tilelang.jit.param.Kernel.source_code">source_code (tilelang.jit.param.Kernel attribute)</a>
45884588
</li>
45894589
<li><a href="autoapi/tilelang/carver/roller/hint/index.html#tilelang.carver.roller.hint.Hint.split_k_factor">split_k_factor (tilelang.carver.roller.hint.Hint attribute)</a>
4590+
</li>
4591+
<li><a href="autoapi/tilelang/transform/index.html#tilelang.transform.SplitHostDevice">SplitHostDevice() (in module tilelang.transform)</a>
45904592
</li>
45914593
<li><a href="autoapi/tilelang/language/tir/ir/index.html#tilelang.language.tir.ir.sqrt">sqrt (in module tilelang.language.tir.ir)</a>
45924594
</li>

objects.inv

14 Bytes
Binary file not shown.

searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)