intel / cutlass-sycl Public

forked from NVIDIA/cutlass

Notifications You must be signed in to change notification settings
Fork 53
Star 39

Code
Issues 13
Pull requests 28
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: intel/cutlass-sycl

Labels 13 Milestones 0

New pull request New

28 Open 458 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Update Docs

#524 opened Sep 19, 2025 by ratnampa • Draft

[WIP] Added support for Rotary Embedding in flash_attention

#523 opened Sep 19, 2025 by pralay-das • Draft

Update CI

#522 opened Sep 18, 2025 by anupren • Draft

Enable -Werror and -Wall

#521 opened Sep 18, 2025 by nsingh-habana • Draft

MoEGEMM as an extension of GroupGEMM

#520 opened Sep 18, 2025 by sanchitintel • Draft

Add comprehensive model-specific tests for flash attention decode and…

#519 opened Sep 18, 2025 by rishi-yadav

Loading…

Resolve SYCL debug trace compatibilityy

#518 opened Sep 18, 2025 by ClarkChin08

Loading…

Add a new tile scheduler for varlen prefill to avoid launching empty work groups

#516 opened Sep 18, 2025 by carsonwang

Loading…

Also use column-major B matrix in the example 00_bmg_gemm.cpp

#510 opened Sep 13, 2025 by sanchitintel

Loading…

Remove redundant code from GroupGEMM implementation

#508 opened Sep 12, 2025 by sanchitintel

Loading…

Example of FP32 -> BF16 conversion in epilogue of GEMM

#506 opened Sep 12, 2025 by sanchitintel • Draft

1 task

Support FP32 -> BF16 conversion in epilogue of GroupedGEMM

#505 opened Sep 12, 2025 by sanchitintel • Draft

chunk prefill

#498 opened Sep 10, 2025 by sunjiweiswift

Loading…

Changes to fix compile errors in hopper (SYCL) unit tests

#492 opened Sep 4, 2025 by jojivk73 • Draft

G++ host compiler support

#490 opened Sep 4, 2025 by LiyangLingIntel

Loading…

Support FP8 KV cache for prefill

#486 opened Sep 3, 2025 by Valentine233

Loading…

Support fp32 accumulation for bf16 gemm and grouped gemm

#482 opened Aug 27, 2025 by wuxun-zhang

Loading…

[Draft PR, NOT review] improve mixed data type performance

#475 opened Aug 4, 2025 by taozha2 • Draft

[WIP] FP8 scaledMM with DeepSeek-style dequantization

#453 opened Jul 2, 2025 by sanchitintel • Draft

4 tasks

Refactor tests for Flash Attention Prefill Cached

#449 opened Jun 26, 2025 by muhammad-tanvir-1211

Loading…

Refactor benchmarks for Flash Attention Prefill

#447 opened Jun 26, 2025 by muhammad-tanvir-1211

Loading…

Windows Fixes

#444 opened Jun 26, 2025 by rolandschulz

Loading…

Simplify Flash Attention Decode benchmarks generation

#437 opened Jun 19, 2025 by muhammad-tanvir-1211

Loading…

[CI] -Werror

#435 opened Jun 16, 2025 by joeatodd

Loading…

Unify interface for Flash Attention Decode

#423 opened Jun 11, 2025 by muhammad-tanvir-1211

Loading…

Previous 1 2 Next

Previous Next

ProTip! Updated in the last three days: updated:>2025-09-17.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!