[quidditch_snitch] Reintroduce tensor.microkernel
op
#106
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
1b20d75 previously removed the
tensor.microkernel
operation as it at the time seemed not worth the extra code.Since then, we noted that microkernels execute in a more asynchronous manner due to Snitch's asynchronous FPU requiring the use of an explicit
microkernel_fence
operation. Optimizing the placement of these is easier done in tensor land, making the operation more worth it. Additionally, more experience in bufferization lead to simplifying its implementation by restrictingmicrokernel_yield
to only tensor operations.A tensor counterpart of
microkernel_fence
calledsync_tensor
has also been added which makes a result tensor of atensor.microkernel
operation available. It bufferizes tomicrokernel_fence
and its placement could be further optimized in the future. The conservative placement ofmicrokernel_fence
operations was also removed fromspeicalize-dma-code
leading to less barriers andmicrokernel_fence
s.