Persistent SDPA kernel #608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

wuxun-zhang wants to merge 2 commits into intel:main from wuxun-zhang:wuxun/persistent-sdpa

wuxun-zhang commented Nov 4, 2025

The new kernel implements below method, key points are:

num of work groups are fixed to num of total XeCores
dynamically split KV seq length from all seqs into all work groups
each XeCore gets balanced work units

As of now there are two limitations:

only decode support (seq_len_qo==1)
batch_size * num_heads_q <= num of total XeCores

wuxun-zhang added 2 commits

November 3, 2025 19:02


          Persistent SDPA kernel

252b0d1


          update tile scheduler & add runtime check

93ec5c8

pengzhao-intel commented Nov 4, 2025

maybe add the limitation of this algorithm in the code as well, especially for one with atomic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet