Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Router Lookahead Profiler #2683

Draft
wants to merge 30 commits into
base: master
Choose a base branch
from

Conversation

nedsels
Copy link
Contributor

@nedsels nedsels commented Aug 12, 2024

There is a new class LookaheadProfiler which records a variety of data that can be used to evaluate the quality of the router lookahead.

Description

Using the branch reconstructed in RouteTree::add_subtree_from_heap(), LookaheadProfiler records the following data into a .csv file for every node in the branch:

  • iteration no.
  • source node id (the first node in the branch; not necessarily a SOURCE)
  • sink node id
  • sink block name
  • sink atom block model
  • sink cluster block type
  • sink cluster tile height
  • sink cluster tile width
  • current node
  • node type
  • node length
  • num. nodes from sink
  • delta x, delta y
  • actual cost, delay, and congestion from current node to sink
  • predicted cost, delay, and congestion from current node to sink (predicted by the lookahead)
  • criticality

The profiler is enabled by specifying CMAKE_PARAMS="DVPR_PROFILE_LOOKAHEAD=on" when building VPR, and also using the command-line option --profile_router_lookahead [output_file_name.csv].

This data can be analyzed using the script vtr_flow/scripts/profiling_utils/parse_lookahead_data.py, which produces a data summary file and a number of scatter plots, bar graphs, pie charts, and heatmaps. (TBD: Currently, using -h gives detailed instructions, but a README will be added as well.)

Motivation and Context

This was used to verify issues that led to #2639. It can be used to identify current/future issues with the lookahead.

How Has This Been Tested?

There is a new CI test that ensures that VPR builds properly with PROFILE_LOOKAHEAD defined.

Currently, even with PROFILE_LOOKAHEAD undefined, there is a degradation in QoR. These results are averages over 3 runs each:

Routing Runtime (s) CPD (ns) Wirelength Memory Footprint
vtr (old) 0.94 10.00 27433.36 153.79
vtr (new) 0.91 10.00 27520.35 153.59
titan (old) 294.51 17.18 3200094.23 5494.64
titan (new) 294.30 17.48 3252463.91 5486.61
titan_large* (old) 618.20 17.72 6430833.54 8865.55
titan_large* (new) 614.95 18.14 6532452.49 8854.25

For some reason, there is a 1.6% wirelength increase on titan (and titan_large) circuits and a 2.4% CPD increase on titan_large circuits.

*10 largest titan circuits.

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@github-actions github-actions bot added VPR VPR FPGA Placement & Routing Tool lang-cpp C/C++ code scripts Utility & Infrastructure scripts lang-python Python code labels Aug 12, 2024
@nedsels nedsels force-pushed the feature_router_lookahead_verifier branch from d1025c9 to c86ff7c Compare August 12, 2024 19:44
@nedsels nedsels force-pushed the feature_router_lookahead_verifier branch from 25cbc7f to 0826ad7 Compare August 12, 2024 20:52
@vaughnbetz
Copy link
Contributor

@soheilshahrouz was also interested in reviewing this code. We'll eventually need a QoR run to show no slowdown (good to run both VTR and Titan suites and attach spreadsheets).

@nedsels nedsels force-pushed the feature_router_lookahead_verifier branch from e939ea2 to cbe1334 Compare August 13, 2024 01:46
utils/route_diag/src/main.cpp Outdated Show resolved Hide resolved
.help(
"For every routed sink, records the cost, delay, and congestion estimated by the router lookahead and the "
"actual cost, delay, and congestion, from every node along each route to the sink. These results, along with many "
"other attributes of the node, are recorded into lookahead_verifier_info.csv.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the user should be able to specify the file name.

vpr/src/base/vpr_context.h Outdated Show resolved Hide resolved
vpr/src/base/vpr_context.h Show resolved Hide resolved
vpr/src/base/vpr_types.h Show resolved Hide resolved
@@ -109,7 +115,7 @@ float NoOpLookahead::get_expected_cost(RRNodeId /*current_node*/, RRNodeId /*tar
return 0.;
}

std::pair<float, float> NoOpLookahead::get_expected_delay_and_cong(RRNodeId /*node*/, RRNodeId /*target_node*/, const t_conn_cost_params& /*params*/, float /*R_upstream*/) const {
std::pair<float, float> NoOpLookahead::get_expected_delay_and_cong(RRNodeId, RRNodeId, const t_conn_cost_params&, float) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you remove argument name comments?

RRNodeId to_node,
const t_conn_cost_params& params,
float R_upstream) const override;
std::pair<float, float> get_expected_delay_and_cong(RRNodeId from_node, RRNodeId to_node, const t_conn_cost_params& params, float R_upstream) const override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long line

@@ -181,7 +181,7 @@ float ExtendedMapLookahead::get_chan_ipin_delays(RRNodeId to_node) const {
//
// The from_node can be of one of the following types: CHANX, CHANY, SOURCE, OPIN
// The to_node is always a SINK
std::pair<float, float> ExtendedMapLookahead::get_expected_delay_and_cong(RRNodeId from_node, RRNodeId to_node, const t_conn_cost_params& params, float /*R_upstream*/) const {
std::pair<float, float> ExtendedMapLookahead::get_expected_delay_and_cong(RRNodeId from_node, RRNodeId to_node, const t_conn_cost_params& params, float) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was the argument name comment removed?

float expected_delay_cost = params.criticality * expected_delay;
float expected_cong_cost = (1.0 - params.criticality) * expected_congestion;
float expected_delay_cost = expected_delay * params.criticality;
float expected_cong_cost = expected_congestion * (1. - params.criticality);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you need to change the order?

vpr/test/test_connection_router.cpp Outdated Show resolved Hide resolved
@nedsels
Copy link
Contributor Author

nedsels commented Aug 15, 2024

@vaughnbetz @AlexandreSinger Python lint is failing because some of the libraries I used can't be imported; is there any way around this?

@soheilshahrouz
Copy link
Contributor

@vaughnbetz @AlexandreSinger Python lint is failing because some of the libraries I used can't be imported; is there any way around this?

It can be probably fixed by updating requirements.txt file

@nedsels nedsels force-pushed the feature_router_lookahead_verifier branch from d5dc37d to 3b9615d Compare August 15, 2024 17:49
@nedsels nedsels force-pushed the feature_router_lookahead_verifier branch from ebdc050 to 1168090 Compare August 15, 2024 18:06
@nedsels nedsels force-pushed the feature_router_lookahead_verifier branch from d7f1068 to 2db5325 Compare August 15, 2024 19:25
@nedsels
Copy link
Contributor Author

nedsels commented Aug 16, 2024

@soheilshahrouz @vaughnbetz Some regtests are having some QoR failures. On vtr_reg_nightly_test2 (and _odin), there are a few, but I'm not sure if the first 3 are small enough to just update the golden results:

image

However, on vtr_reg_nightly_test7, there are some failures which seem quite bad, and I'm not sure how to approach fixing these. Do you have any advice?

image

@vaughnbetz
Copy link
Contributor

I'm not too worried about dsip; it is small.

The long route times on wb_conmax are also probably OK if there are just a few of these. Adding @amin1377 since those 3D and flat routing ones are probably his tests.

The more I think about it the more I think we should #ifdef adding data members to the heap though; I am really not keen on carrying around extra data in a big and hot data structure for profiling. I think it would be better to have no impact on the default flow.

@github-actions github-actions bot added infra Project Infrastructure build Build system lang-make CMake/Make code labels Aug 20, 2024
Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; just have some commenting suggestions. We'll also need a QoR run on VTR & Titan to check cpu time & quality.

.help(
"For every routed sink, record the cost, delay, and congestion estimated by the router lookahead and the "
"actual cost, delay, and congestion, from every node along each route to the sink. These results, along with many "
"other attributes of the node, are recorded into the file name provided. File extension must be .csv.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add "to assist in debugging or validating the router lookahead."

@@ -1400,6 +1400,8 @@ struct t_router_opts {
std::string write_intra_cluster_router_lookahead;
std::string read_intra_cluster_router_lookahead;

std::string lookahead_profiling_output;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add doxygen comment for this (can use ///).
If it's an empty string does it not get printed? (If so, say that).

*/
struct t_rr_node_route_inf {
RREdgeId prev_edge;

float acc_cost;
float path_cost;
float backward_path_cost;
#ifdef PROFILE_LOOKAHEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a brief comment saying why this extra data is needed for PROFILE_LOOKAHEAD, and that we conditionally compile it because this is a hot and large data structure.

@@ -441,7 +441,7 @@ static void add_delay_to_matrix(
}

static void generic_compute_matrix_dijkstra_expansion(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a doxygen comment for this, detailing what you know about the function and its arguments (and for sure what the route_profiler does).

@@ -239,7 +239,21 @@ t_heap* ConnectionRouter<Heap>::timing_driven_route_connection_from_heap(RRNodeI
// This is then placed into the traceback so that the correct path is returned
// TODO: This can be eliminated by modifying the actual traceback function in route_timing
if (rcv_path_manager.is_enabled()) {
rcv_path_manager.insert_backwards_path_into_traceback(cheapest->path_data, cheapest->cost, cheapest->backward_path_cost, route_ctx);
#ifdef PROFILE_LOOKAHEAD
rcv_path_manager.insert_backwards_path_into_traceback(cheapest->path_data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is PROFILE_LOOKAHEAD tested with rcv? If not, we could just choose to not support PROFILE_LOOKAHEAD with RCV to simplify the code a little.

@@ -53,7 +58,8 @@ class RouterDelayProfiler {

vtr::vector<RRNodeId, float> calculate_all_path_delays_from_rr_node(RRNodeId src_rr_node,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you know enough about this routine to write a doxygen comment it would be good to write one.

// pass in an iteration value, which is the only context in which we want to profile.
if (iteration < 1)
return;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should have a comment giving an overview of how it works, unless that's already covered in the (to be added) lookahead_profiler.cpp file-scope comment that gives the high-level context.

*
* @param iteration The router iteration.
* @param target_net_pin_index Target pin of this sink in the net.
* @param cost_params
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest briefly commenting each input's purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a high-level comment of how the lookahead profiler works somewhere (file level, or maybe on this routine).

@@ -123,7 +169,7 @@ const RouterLookahead* get_cached_router_lookahead(const t_det_routing_arch& det
class ClassicLookahead : public RouterLookahead {
public:
float get_expected_cost(RRNodeId node, RRNodeId target_node, const t_conn_cost_params& params, float R_upstream) const override;
std::pair<float, float> get_expected_delay_and_cong(RRNodeId node, RRNodeId target_node, const t_conn_cost_params& params, float R_upstream) const override;
std::pair<float, float> get_expected_delay_and_cong_ignore_criticality(RRNodeId node, RRNodeId target_node, const t_conn_cost_params& params, float R_upstream) const override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long line; maybe break it.

/**
* @brief Get expected (delay, congestion) from node to target_node.
*
* @attention Either compute or read methods must be invoked before invoking get_expected_delay_and_cong_ignore_criticality.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must be invoked --> must be invoked to set up the lookahead data structures

@nedsels
Copy link
Contributor Author

nedsels commented Aug 29, 2024

Sorry for the delay with the QoR results; for some reason, they have degraded with this change. I'm trying to figure out why; if/when I can, I'll post the results in the description.

@vaughnbetz
Copy link
Contributor

Thanks. If you can post what the current QoR is that would be helpful too; even if it is not that good it helps show what the status is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Build system external_libs infra Project Infrastructure lang-cpp C/C++ code lang-make CMake/Make code lang-python Python code scripts Utility & Infrastructure scripts VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants