Skip to content

Conversation

@kurtis-b-1
Copy link

@kurtis-b-1 kurtis-b-1 commented Nov 4, 2025

The current GEMM is placed, while the rest of the examples are unplaced. Moreover, it's hard to determine how the prio-accuracy feature affects the execution of GEMM. Lastly, no metric for throughput is saved for GEMM. This PR tries to address the first and third points and improve the second point.

Let me know if I should split this up into multiple PRs to make it easier to review.

Added

-Flag in CMakeLists.txt to enable/disable prio-accuracy in GEMM, with the appropriate kernel used based on whether this flag is enabled or disabled
-Throughput calculation and metric added to CI in GFLOP/s

Changed

-Placed reconfigurable GEMM design is now unplaced reconfigurable GEMM
-Clearer blocking regarding how the prio-accuracy flag changes the design

Removed

-Placed reconfigurable GEMM design

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR reviewed and approved.
  3. All checks are passing.

@kurtis-b-1 kurtis-b-1 marked this pull request as draft November 4, 2025 22:17
@kurtis-b-1 kurtis-b-1 changed the title Use unplaced GEMM, python code refactor, and add throughput metric DRAFT: Unplaced Reconfigurable GEMM, Python Code Refactor, and Add Throughput Metric Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant