forked from NVIDIA/cutlass
-
Notifications
You must be signed in to change notification settings - Fork 53
Pull requests: intel/cutlass-sycl
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[WIP] Added support for Rotary Embedding in flash_attention
#523
opened Sep 19, 2025 by
pralay-das
•
Draft
Add comprehensive model-specific tests for flash attention decode and…
#519
opened Sep 18, 2025 by
rishi-yadav
Loading…
Add a new tile scheduler for varlen prefill to avoid launching empty work groups
#516
opened Sep 18, 2025 by
carsonwang
Loading…
Also use column-major B matrix in the example
00_bmg_gemm.cpp
#510
opened Sep 13, 2025 by
sanchitintel
Loading…
Remove redundant code from GroupGEMM implementation
#508
opened Sep 12, 2025 by
sanchitintel
Loading…
Example of FP32 -> BF16 conversion in epilogue of GEMM
#506
opened Sep 12, 2025 by
sanchitintel
•
Draft
1 task
Support FP32 -> BF16 conversion in epilogue of GroupedGEMM
#505
opened Sep 12, 2025 by
sanchitintel
•
Draft
Support fp32 accumulation for bf16 gemm and grouped gemm
#482
opened Aug 27, 2025 by
wuxun-zhang
Loading…
[WIP] FP8 scaledMM with DeepSeek-style dequantization
#453
opened Jul 2, 2025 by
sanchitintel
•
Draft
4 tasks
Refactor tests for Flash Attention Prefill Cached
#449
opened Jun 26, 2025 by
muhammad-tanvir-1211
Loading…
Refactor benchmarks for Flash Attention Prefill
#447
opened Jun 26, 2025 by
muhammad-tanvir-1211
Loading…
Simplify Flash Attention Decode benchmarks generation
#437
opened Jun 19, 2025 by
muhammad-tanvir-1211
Loading…
Unify interface for Flash Attention Decode
#423
opened Jun 11, 2025 by
muhammad-tanvir-1211
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2025-09-17.