Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements #67

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from
Open

Performance improvements #67

wants to merge 4 commits into from

Conversation

JRPan
Copy link

@JRPan JRPan commented Feb 21, 2024

This PR makes Accel-Sim runs much faster.

  1. Rewrite xbar advance
  2. temporary fix for querying cache stats every cycle.

@JRPan JRPan requested a review from tgrogers April 4, 2024 16:59
@JRPan JRPan marked this pull request as ready for review April 4, 2024 16:59
@JRPan JRPan requested review from a team, Anunalla and FJShen and removed request for tgrogers, a team and Anunalla July 7, 2024 22:58
if (it == input_nodes.end()) {
it = input_nodes.begin();
}
unsigned node_id = *it;
Copy link

@FJShen FJShen Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a concern whether this piece of code is equivalent to the original code.

I understand that std::set<unsigned> destination_set is a subset of the range from 0 to total_nodes. The outer loop iterates over it. So, i is equivalent to dest.

In the original code, the first value which node_id can be assigned is (0 + next_node[i]) % total_nodes.

In your PR, the first value that node_id can get is (1 + next_node[dest]) % total_nodes. This offset of 1 comes from std::upper_bound which yields the first iterator iter that can satisfy bool(comp(start_node, *iter))==true. (citing cppreference.com). comp(start_node, start_node) yields false.

Which one (original vs PR) is correct? Or does this matter?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this is round-robin. So starting from 0 or 1 should be fine either way. The difference should be minimal.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this specified in the ISLIP paper or not? If it was, better do it right.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it wasn't specified in the paper, I'm fine with the PR

Copy link

@FJShen FJShen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this piece of code at gpu-sim.cc:1900:

if (clock_mask & DRAM) {
    for (unsigned i = 0; i < m_memory_config->m_n_mem; i++) {
      if (m_memory_config->simple_dram_model)
        m_memory_partition_unit[i]->simple_dram_model_cycle();
      else
        m_memory_partition_unit[i]
            ->dram_cycle();  // Issue the dram command (scheduler + delay model)
      // Update performance counters for DRAM
      m_memory_partition_unit[i]->set_dram_power_stats(
          m_power_stats->pwr_mem_stat->n_cmd[CURRENT_STAT_IDX][i],
          m_power_stats->pwr_mem_stat->n_activity[CURRENT_STAT_IDX][i],
          m_power_stats->pwr_mem_stat->n_nop[CURRENT_STAT_IDX][i],
          m_power_stats->pwr_mem_stat->n_act[CURRENT_STAT_IDX][i],
          m_power_stats->pwr_mem_stat->n_pre[CURRENT_STAT_IDX][i],
          m_power_stats->pwr_mem_stat->n_rd[CURRENT_STAT_IDX][i],
          m_power_stats->pwr_mem_stat->n_wr[CURRENT_STAT_IDX][i],
          m_power_stats->pwr_mem_stat->n_wr_WB[CURRENT_STAT_IDX][i],
          m_power_stats->pwr_mem_stat->n_req[CURRENT_STAT_IDX][i]);
    }
  }

also updates m_power_stats. It's a good idea to add the if (m_config.g_power_simulation_enabled) too for consistency.

@JRPan
Copy link
Author

JRPan commented Jul 12, 2024

Also, this piece of code at gpu-sim.cc:1900:

if (clock_mask & DRAM) {
    for (unsigned i = 0; i < m_memory_config->m_n_mem; i++) {
      if (m_memory_config->simple_dram_model)
        m_memory_partition_unit[i]->simple_dram_model_cycle();
      else
        m_memory_partition_unit[i]
            ->dram_cycle();  // Issue the dram command (scheduler + delay model)
      // Update performance counters for DRAM
      m_memory_partition_unit[i]->set_dram_power_stats(
          m_power_stats->pwr_mem_stat->n_cmd[CURRENT_STAT_IDX][i],
          m_power_stats->pwr_mem_stat->n_activity[CURRENT_STAT_IDX][i],
          m_power_stats->pwr_mem_stat->n_nop[CURRENT_STAT_IDX][i],
          m_power_stats->pwr_mem_stat->n_act[CURRENT_STAT_IDX][i],
          m_power_stats->pwr_mem_stat->n_pre[CURRENT_STAT_IDX][i],
          m_power_stats->pwr_mem_stat->n_rd[CURRENT_STAT_IDX][i],
          m_power_stats->pwr_mem_stat->n_wr[CURRENT_STAT_IDX][i],
          m_power_stats->pwr_mem_stat->n_wr_WB[CURRENT_STAT_IDX][i],
          m_power_stats->pwr_mem_stat->n_req[CURRENT_STAT_IDX][i]);
    }
  }

also updates m_power_stats. It's a good idea to add the if (m_config.g_power_simulation_enabled) too for consistency.

thx. fix applied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need for Speed: Make Accel-Sim FAST again
2 participants