[quidditch_snitch] Add padding capabilities to start_tensor_copy
#116
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We occasionally encounter shapes that are challenging to tile due to their prime factors involved. Attempting to distribute these (e.g. to compute cores or vector lanes) when the number of required tiles is not a factor of the dimension leads to generating dynamic dimensions which the microkernel compilation is unable to deal with. Similarly, once we are on
f32
, we are required to vectorize the kernel and have a restriction that the tile size of e.g. a matvec is a multiple of 4, 8 etc.This PR therefore introduces optional padding to the
start_dma_transfer
op that can be added at the end of each tensor dimension. When tiled, the padding can be chosen to guarantee that a tensor is always of a given static shape, solving the issue noted above. For now, the value used for padding is always zero which works for any matmul, elementwise operation and convolution.Note that the padding option is not yet used in the pipeline but will be lowered to from
tensor.pad
operations in a future PR.