Skip to content

Conversation

@CongMa13
Copy link
Owner

Demo code. Will not merge.

In this PR, I want to have multiple threads load same data from input tensor AQ.

AQ

image

** In VGPR**

image

How to replicate data?

  • Don't create copy in global memory.
  • Replicate the view to make multiple addresses map to the same data in global memory
transform:

AQ[16, 2] 
==replicate by 16==> rep_AQ[16, 16, 2] 
==pass_through dim 1 to dim 0; merge dim 2, 0 to dim1 ==>  merge_rep_AQ[16, 32]

Apply replicate and merge transforms on AQ, we can get merge_rep_AQ

image

Now there are duplicated elements(address) in merge_rep_AQ. We can have multiple threads load the same data with regular tile distribution.

@CongMa13 CongMa13 changed the base branch from develop to replicate_in_k_prev July 31, 2025 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants