-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements #67
base: dev
Are you sure you want to change the base?
Conversation
if (it == input_nodes.end()) { | ||
it = input_nodes.begin(); | ||
} | ||
unsigned node_id = *it; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a concern whether this piece of code is equivalent to the original code.
I understand that std::set<unsigned> destination_set
is a subset of the range from 0 to total_nodes. The outer loop iterates over it. So, i
is equivalent to dest
.
In the original code, the first value which node_id
can be assigned is (0 + next_node[i]) % total_nodes
.
In your PR, the first value that node_id
can get is (1 + next_node[dest]) % total_nodes
. This offset of 1 comes from std::upper_bound
which yields the first iterator iter
that can satisfy bool(comp(start_node, *iter))==true
. (citing cppreference.com). comp(start_node, start_node)
yields false
.
Which one (original vs PR) is correct? Or does this matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of this is round-robin. So starting from 0 or 1 should be fine either way. The difference should be minimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this specified in the ISLIP paper or not? If it was, better do it right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it wasn't specified in the paper, I'm fine with the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this piece of code at gpu-sim.cc:1900:
if (clock_mask & DRAM) {
for (unsigned i = 0; i < m_memory_config->m_n_mem; i++) {
if (m_memory_config->simple_dram_model)
m_memory_partition_unit[i]->simple_dram_model_cycle();
else
m_memory_partition_unit[i]
->dram_cycle(); // Issue the dram command (scheduler + delay model)
// Update performance counters for DRAM
m_memory_partition_unit[i]->set_dram_power_stats(
m_power_stats->pwr_mem_stat->n_cmd[CURRENT_STAT_IDX][i],
m_power_stats->pwr_mem_stat->n_activity[CURRENT_STAT_IDX][i],
m_power_stats->pwr_mem_stat->n_nop[CURRENT_STAT_IDX][i],
m_power_stats->pwr_mem_stat->n_act[CURRENT_STAT_IDX][i],
m_power_stats->pwr_mem_stat->n_pre[CURRENT_STAT_IDX][i],
m_power_stats->pwr_mem_stat->n_rd[CURRENT_STAT_IDX][i],
m_power_stats->pwr_mem_stat->n_wr[CURRENT_STAT_IDX][i],
m_power_stats->pwr_mem_stat->n_wr_WB[CURRENT_STAT_IDX][i],
m_power_stats->pwr_mem_stat->n_req[CURRENT_STAT_IDX][i]);
}
}
also updates m_power_stats
. It's a good idea to add the if (m_config.g_power_simulation_enabled)
too for consistency.
thx. fix applied. |
This PR makes Accel-Sim runs much faster.