Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial NoC Placement #2421

Merged
merged 41 commits into from
Jan 22, 2024
Merged

Initial NoC Placement #2421

merged 41 commits into from
Jan 22, 2024

Conversation

soheilshahrouz
Copy link
Contributor

@soheilshahrouz soheilshahrouz commented Oct 19, 2023

Description

This PR adds a new initial placement algorithm for NoC routers. It uses a simulated annealing algorithm to move only routers, giving the main placer a better starting point. It runs before the initial placement algorithm for other blocks.

Motivation and Context

It results in better QoR in a given runtime for designs with hard NoCs.

How Has This Been Tested?

Run on the synthetic and MLP NoC designs, which get better QoR. Run on various conventional benchmark suites (which do not change).

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

Added a simple annealer for routers that starts with a random placement for routers and tries random swaps to improve NoC cost function. During initial placement, routers are placed before other blocks. The starting temperature needed to be reduce so that SA does not destroy the initial router placement.
The description was somewhat misleading because the proposed move does not always involve swapping two router clusters. Sometimes it just moves a router cluster to an empty location.
find_compatible_compressed_loc_in_range() searches for a compatible location for a specific block type in a range around a given location. The returned location is not necessarily empty. This does not cause any problems for SA because two blocks can be easily swapped. However, in initial placement, we want to place blocks in empty locations. If this function returns a location which has already been occupied, initial placement resorts to random placement. The modified function get a new argument (check_empty) to indicate that the returned location must be empty.
propose_router_swap_flow_centroid() randomly selects a router cluster and calculates a weighted average over the locations of the routers connected to it with flows. The weight of each router is determined by the bandwidth and latency of the flow that connects two routers.
Removed update_noc_normalization_factors() call in the loop. Aggregated bandwidth and latency usually decrease over SA, and as a result, the normalization factor extracted from the increases. This means that the NoC related cost may increase solely due to higher normalizing factors.
…nce to pass some arguments.

Some functions in initial_placement.cpp were modified to receive const reference arguments. For instance, t_pl_macro is simply a std:vector that contains its members. Passing it by reference avoids copying the underlying std::vector.

Converted some index based loops to range based loops for more readability.
Some NoC-related functions like print_noc_grid() and reinitialize_noc_routing() were called without checking whether NoC is enabled, causing illegal memory access. These function calls were moved into if statements to check that NoC option is enabled.
With the previous condition, if a checkpoint has a better CPD and its WL is more than 5% better than the current solution, it would not be restored.

Passed some arguments as const reference in initial_placement.cpp
…ets to infer high fanout connectivity between molecules.
# Conflicts:
#	vpr/src/place/initial_placement.cpp
#	vpr/src/place/move_utils.cpp
#	vpr/src/place/move_utils.h
#	vpr/src/place/noc_place_utils.cpp
#	vpr/src/place/place_checkpoint.cpp
#	vpr/src/place/place_checkpoint.h
When averaging over the sinks connected to block, I use the inverse of the number of sinks driven by a net as weight
@github-actions github-actions bot added the VPR VPR FPGA Placement & Routing Tool label Oct 19, 2023
# Conflicts:
#	vpr/src/place/initial_placement.cpp
#	vpr/src/place/initial_placement.h
#	vpr/src/place/place.cpp
@vaughnbetz
Copy link
Contributor

Need to make CI pass ... @soheilshahrouz tells me that there is a significant clustering change in this code which he'll back out of ... it is changing QoR and causing failures.

@vaughnbetz
Copy link
Contributor

Also attach a QoR run on titan, and the NoC benchmarks.

Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally good. I have some suggestions for commenting mostly.

t_pack_high_fanout_thresholds& operator=(t_pack_high_fanout_thresholds&& other) noexcept;

///@brief Returns the high fanout threshold of the specified block
int get_threshold(const std::string& block_type_name) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think about whether you should have these two packing classes in vpr_types.h, or if you should move them into a packing header (could be a new one).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like these should go in an existing or a new packing_utility.h or some such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both high fanout threshold and target pin utilization are commang line arguements that impact the packing stage. I think if we move these two classes two a separate file, we also need to move other packing data types like e_balance_block_type_util and e_unrelated_clustering to the same file. What is your view on this?

@@ -681,6 +692,11 @@ struct t_pl_loc {
, y(yloc)
, sub_tile(sub_tile_loc)
, layer(layer_num) {}
t_pl_loc(const t_physical_tile_loc& phy_loc, int sub_tile_loc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a brief comment for the constructors (can just say various constructors)

vpr/src/noc/noc_traffic_flows.cpp Outdated Show resolved Hide resolved
vpr/src/noc/noc_traffic_flows.h Outdated Show resolved Hide resolved
vpr/src/noc/noc_traffic_flows.h Outdated Show resolved Hide resolved

// If the calculated centroid does not have a compatible type, find a compatible location nearby
if (!is_tile_compatible(physical_type, cluster_from_type)) {
//Determine centroid location in the compressed space of the current block
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a long routine ... it seems the contents of this if statement could be a helper routine to shorten it. That helper routine might be generally useful in other centroid moves so please take a look and see if it would result in some other refactoring.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use router centroid moves right now? If you haven't experimented with them it would be good to measure their impact now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I wrote this function, I tried using both random and flow cetroid swaps with static probabilities. I did not observe any gains compared to only using random swaps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, good to put that in a comment if you haven't already (I assume the centroid moves are off?).

vpr/src/place/noc_place_checkpoint.cpp Show resolved Hide resolved
vpr/src/place/noc_place_utils.h Outdated Show resolved Hide resolved
vpr/src/place/place.cpp Show resolved Hide resolved
vpr/src/place/place.cpp Show resolved Hide resolved
@soheilshahrouz
Copy link
Contributor Author

MLP benchmarks

branch placed WL place time place CPD agg NoC bw NoC latency latency constraints met
master 9074780.081 2929.384093 9.518756595 1.387775298e10 2.481268995e-9 14.29302186
init_noc_sa 9123041.368 2639.320356 9.46998078 1.3517584024e10 2.398770309e-9 14.29302186
ratio 1.005 0.90 0.995 0.975 0.967 1.0

@soheilshahrouz
Copy link
Contributor Author

soheilshahrouz commented Dec 4, 2023

Complex synthetic benchmarks

branch placed WL place time place CPD agg NoC bw NoC latency latency constraints met total swaps
master 266665.0343 184.1180546 6.685467081 3.22E+07 1.94E-08 19.97472706 1873413.474
init_noc_sa 253070.732 118.1450945 6.715915068 3.18E+07 1.64E-08 21.27173319 1296432.209
ratio 0.949 0.645 1.004 0.988 0.845 1.065 0.692

@soheilshahrouz
Copy link
Contributor Author

soheilshahrouz commented Dec 4, 2023

All synthetic benchmarks

branch placed WL place time place CPD agg NoC bw NoC latency latency constraints met total swaps
master 49681.93871 15.980712 6.235260554 6.14E+07 2.59E-08 31.13121279 340757.7527
init_noc_sa 51375.77725 40.97801128 6.401616903 6.01E+07 2.33E-08 31.87534585 168648.1201
ratio 1.034 2.564 1.027 0.979 0.900 1.024 0.495

@vaughnbetz
Copy link
Contributor

Thanks.
Any more code review comments to apply?
Any idea why all synthetic benchmarks went up in place time? Is initial placement fairly slow now with quite a few NoCs? Overall the results are definitely good so this is a good change; just wondering if there's any potential further optimization for initial placement speed after we land this one.

@soheilshahrouz
Copy link
Contributor Author

Thanks. Any more code review comments to apply? Any idea why all synthetic benchmarks went up in place time? Is initial placement fairly slow now with quite a few NoCs? Overall the results are definitely good so this is a good change; just wondering if there's any potential further optimization for initial placement speed after we land this one.

Placement time has only increased for simple synthetic benchmarks. In these benchmarks, the initial placement of NoC routers takes longer than simulated annealing, leading to a longer overall placement time. However, when complex synthetic benchmarks are considered in isolation from simple ones, the placement time is reduced by 0.645x.

In simple benchmarks, a substantial number of packed blocks consist of NoC routers. For instance, in simple_64_noc_star.blif, there are 140 packed blocks, with 64 being NoC routers. Since relocating NoC routers after initial placement is unlikely to enhance placement cost, the estimated initial temperature would be lower, resulting in a significant reduction in the number of swaps. The increased WL and CPD can be attributed to shorter annealing.

Conversely, in complex benchmarks, the number of NoC routers is minimal compared to the total number of post-packing blocks. For instance, in the netlist complex_64_noc_nearest_neighbor.blif, there are 64 NoC routers and 7798 conventional blocks. Therefore, the reduction in initial temperature is not as pronounced as in simple benchmarks.

@soheilshahrouz
Copy link
Contributor Author

soheilshahrouz commented Dec 4, 2023

@soheilshahrouz soheilshahrouz changed the title [WIP] Initial NoC Placement Initial NoC Placement Dec 4, 2023
@github-actions github-actions bot added the lang-cpp C/C++ code label Dec 6, 2023
@soheilshahrouz
Copy link
Contributor Author

After fixing the PR labeler issure, all checks passed. Can I go ahead and merge this into master?

@vaughnbetz vaughnbetz merged commit c8bd8b1 into master Jan 22, 2024
98 checks passed
@vaughnbetz vaughnbetz deleted the init_noc_sa branch January 22, 2024 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants