Release v0.2.1 · flashinfer-ai/flashinfer

What's Changed

misc: addressing the package renaming issues by @yzh119 in #770
feat: support deepseek prefill attention shape by @yzh119 in #765
refactor: change the structure of attention updater by @yzh119 in #772
hotfix: follow up of #772 by @yzh119 in #773
bugfix: Ensure Loop Termination by Enforcing IEEE-754 Compliance in Sampling Kernels by @yzh119 in #774
bugfix: fix the JIT warmup arguments in unittests by @yzh119 in #775
ci: change whl folder to flashinfer-python by @abcdabcd987 in #779
perf: refactor fa2 prefill template by @yzh119 in #776
feat: Separate QK/VO head dim dispatch for sm90 AOT by @abcdabcd987 in #778
bugfix: fix batch prefill attention kernel unittests by @yzh119 in #781
misc: remove head dimension 64 from AOT by @yzh119 in #782
misc: allow head_dim=64 for sm90 AOT by @abcdabcd987 in #783
bugfix: drop CTA_TILE_Q=32 by @abcdabcd987 in #785
refactor: make group_size a part of params by @yzh119 in #786
bugfix: MLA decode should multiply sm_scale by math::log2e by @tsu-bin in #787
fix rope logic in mla decoding by @zhyncs in #793
Fix arguments of plan for split QK/VO head dims by @abmfy in #795
test: add unittest comparing deepseek prefill fa2 & 3 implementation by @yzh119 in #797
bugfix: fix aot build not compatible with cmake command by @tsu-bin in #796
Fix the type annotation of q_dtype and kv_dtype on ragged prefill by @nandor in #798
feat: support f32 attention output in FA2 template by @yzh119 in #799
feat: apply sm_scale at logits instead of q in FA2 template by @yzh119 in #801
bugfix: mla decode failed under cuda graph mode, and update test case by @tsu-bin in #803
perf: memory efficient deepseek mla fused page-attention kernel by @yzh119 in #804
bugfix: mla page-attention kernel for different page sizes by @yzh119 in #810
doc: add documentation to new MLA interface by @yzh119 in #811
feat: unlocking MLA for A100 by @yzh119 in #812
feat: cudagraph-compatible MLA API by @yzh119 in #813
feat: unlock MLA attention for sm89 (L40/L40s/4090) by @yzh119 in #814
misc: fix sphinx by @abcdabcd987 in #815
bugfix: fix the behavior of mla plan function when provided with host tensors by @yzh119 in #816
doc: improve mla related documentation by @yzh119 in #818

New Contributors

@abmfy made their first contribution in #795

Full Changelog: v0.2.0.post2...v0.2.1

What's Changed

misc: addressing the package renaming issues by @yzh119 in #770
feat: support deepseek prefill attention shape by @yzh119 in #765
refactor: change the structure of attention updater by @yzh119 in #772
hotfix: follow up of #772 by @yzh119 in #773
bugfix: Ensure Loop Termination by Enforcing IEEE-754 Compliance in Sampling Kernels by @yzh119 in #774
bugfix: fix the JIT warmup arguments in unittests by @yzh119 in #775
ci: change whl folder to flashinfer-python by @abcdabcd987 in #779
perf: refactor fa2 prefill template by @yzh119 in #776
feat: Separate QK/VO head dim dispatch for sm90 AOT by @abcdabcd987 in #778
bugfix: fix batch prefill attention kernel unittests by @yzh119 in #781
misc: remove head dimension 64 from AOT by @yzh119 in #782
misc: allow head_dim=64 for sm90 AOT by @abcdabcd987 in #783
bugfix: drop CTA_TILE_Q=32 by @abcdabcd987 in #785
refactor: make group_size a part of params by @yzh119 in #786
bugfix: MLA decode should multiply sm_scale by math::log2e by @tsu-bin in #787
fix rope logic in mla decoding by @zhyncs in #793
Fix arguments of plan for split QK/VO head dims by @abmfy in #795
test: add unittest comparing deepseek prefill fa2 & 3 implementation by @yzh119 in #797
bugfix: fix aot build not compatible with cmake command by @tsu-bin in #796
Fix the type annotation of q_dtype and kv_dtype on ragged prefill by @nandor in #798
feat: support f32 attention output in FA2 template by @yzh119 in #799
feat: apply sm_scale at logits instead of q in FA2 template by @yzh119 in #801
bugfix: mla decode failed under cuda graph mode, and update test case by @tsu-bin in #803
perf: memory efficient deepseek mla fused page-attention kernel by @yzh119 in #804
bugfix: mla page-attention kernel for different page sizes by @yzh119 in #810
doc: add documentation to new MLA interface by @yzh119 in #811
feat: unlocking MLA for A100 by @yzh119 in #812
feat: cudagraph-compatible MLA API by @yzh119 in #813
feat: unlock MLA attention for sm89 (L40/L40s/4090) by @yzh119 in #814
misc: fix sphinx by @abcdabcd987 in #815
bugfix: fix the behavior of mla plan function when provided with host tensors by @yzh119 in #816
doc: improve mla related documentation by @yzh119 in #818
release: bump version to v0.2.1 by @yzh119 in #819
refactor: change to TORCH_LIBRARY by @youkaichao in #764
Revert "refactor: change to TORCH_LIBRARY" by @yzh119 in #820
bugfix: bugfix on sm89 MLA by @yzh119 in #821
hotfix: bugfix on #812 by @yzh119 in #822
refactor: change to TORCH_LIBRARY by @abmfy in #823

New Contributors

@abmfy made their first contribution in #795

Full Changelog: v0.2.0.post2...v0.2.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.1

What's Changed

New Contributors

What's Changed

New Contributors

Contributors