-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial NoC Placement #2421
Initial NoC Placement #2421
Conversation
Added a simple annealer for routers that starts with a random placement for routers and tries random swaps to improve NoC cost function. During initial placement, routers are placed before other blocks. The starting temperature needed to be reduce so that SA does not destroy the initial router placement.
The description was somewhat misleading because the proposed move does not always involve swapping two router clusters. Sometimes it just moves a router cluster to an empty location.
find_compatible_compressed_loc_in_range() searches for a compatible location for a specific block type in a range around a given location. The returned location is not necessarily empty. This does not cause any problems for SA because two blocks can be easily swapped. However, in initial placement, we want to place blocks in empty locations. If this function returns a location which has already been occupied, initial placement resorts to random placement. The modified function get a new argument (check_empty) to indicate that the returned location must be empty.
propose_router_swap_flow_centroid() randomly selects a router cluster and calculates a weighted average over the locations of the routers connected to it with flows. The weight of each router is determined by the bandwidth and latency of the flow that connects two routers.
Removed update_noc_normalization_factors() call in the loop. Aggregated bandwidth and latency usually decrease over SA, and as a result, the normalization factor extracted from the increases. This means that the NoC related cost may increase solely due to higher normalizing factors.
…nce to pass some arguments. Some functions in initial_placement.cpp were modified to receive const reference arguments. For instance, t_pl_macro is simply a std:vector that contains its members. Passing it by reference avoids copying the underlying std::vector. Converted some index based loops to range based loops for more readability.
Some NoC-related functions like print_noc_grid() and reinitialize_noc_routing() were called without checking whether NoC is enabled, causing illegal memory access. These function calls were moved into if statements to check that NoC option is enabled.
With the previous condition, if a checkpoint has a better CPD and its WL is more than 5% better than the current solution, it would not be restored. Passed some arguments as const reference in initial_placement.cpp
…ets to infer high fanout connectivity between molecules.
# Conflicts: # vpr/src/place/initial_placement.cpp # vpr/src/place/move_utils.cpp # vpr/src/place/move_utils.h # vpr/src/place/noc_place_utils.cpp # vpr/src/place/place_checkpoint.cpp # vpr/src/place/place_checkpoint.h
This reverts commit ea9f5be.
When averaging over the sinks connected to block, I use the inverse of the number of sinks driven by a net as weight
This reverts commit 7295447.
# Conflicts: # vpr/src/place/initial_placement.cpp # vpr/src/place/initial_placement.h # vpr/src/place/place.cpp
Need to make CI pass ... @soheilshahrouz tells me that there is a significant clustering change in this code which he'll back out of ... it is changing QoR and causing failures. |
Also attach a QoR run on titan, and the NoC benchmarks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally good. I have some suggestions for commenting mostly.
t_pack_high_fanout_thresholds& operator=(t_pack_high_fanout_thresholds&& other) noexcept; | ||
|
||
///@brief Returns the high fanout threshold of the specified block | ||
int get_threshold(const std::string& block_type_name) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think about whether you should have these two packing classes in vpr_types.h, or if you should move them into a packing header (could be a new one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like these should go in an existing or a new packing_utility.h or some such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both high fanout threshold and target pin utilization are commang line arguements that impact the packing stage. I think if we move these two classes two a separate file, we also need to move other packing data types like e_balance_block_type_util and e_unrelated_clustering to the same file. What is your view on this?
@@ -681,6 +692,11 @@ struct t_pl_loc { | |||
, y(yloc) | |||
, sub_tile(sub_tile_loc) | |||
, layer(layer_num) {} | |||
t_pl_loc(const t_physical_tile_loc& phy_loc, int sub_tile_loc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a brief comment for the constructors (can just say various constructors)
vpr/src/place/noc_place_utils.cpp
Outdated
|
||
// If the calculated centroid does not have a compatible type, find a compatible location nearby | ||
if (!is_tile_compatible(physical_type, cluster_from_type)) { | ||
//Determine centroid location in the compressed space of the current block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a long routine ... it seems the contents of this if statement could be a helper routine to shorten it. That helper routine might be generally useful in other centroid moves so please take a look and see if it would result in some other refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use router centroid moves right now? If you haven't experimented with them it would be good to measure their impact now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I wrote this function, I tried using both random and flow cetroid swaps with static probabilities. I did not observe any gains compared to only using random swaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, good to put that in a comment if you haven't already (I assume the centroid moves are off?).
# Conflicts: # vpr/src/place/move_utils.cpp
With NoC initial placement, we no longer need to move NoC routers more than what the RL agent suggests.
MLP benchmarks
|
Complex synthetic benchmarks
|
All synthetic benchmarks
|
Thanks. |
Placement time has only increased for simple synthetic benchmarks. In these benchmarks, the initial placement of NoC routers takes longer than simulated annealing, leading to a longer overall placement time. However, when complex synthetic benchmarks are considered in isolation from simple ones, the placement time is reduced by 0.645x. In simple benchmarks, a substantial number of packed blocks consist of NoC routers. For instance, in simple_64_noc_star.blif, there are 140 packed blocks, with 64 being NoC routers. Since relocating NoC routers after initial placement is unlikely to enhance placement cost, the estimated initial temperature would be lower, resulting in a significant reduction in the number of swaps. The increased WL and CPD can be attributed to shorter annealing. Conversely, in complex benchmarks, the number of NoC routers is minimal compared to the total number of post-packing blocks. For instance, in the netlist complex_64_noc_nearest_neighbor.blif, there are 64 NoC routers and 7798 conventional blocks. Therefore, the reduction in initial temperature is not as pronounced as in simple benchmarks. |
Google sheet with the detailed results: |
After fixing the PR labeler issure, all checks passed. Can I go ahead and merge this into master? |
Description
This PR adds a new initial placement algorithm for NoC routers. It uses a simulated annealing algorithm to move only routers, giving the main placer a better starting point. It runs before the initial placement algorithm for other blocks.
Motivation and Context
It results in better QoR in a given runtime for designs with hard NoCs.
How Has This Been Tested?
Run on the synthetic and MLP NoC designs, which get better QoR. Run on various conventional benchmark suites (which do not change).
Types of changes
Checklist: