Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform BarrierBeforeFinalMeasurements analysis in parallel #13411

Merged
merged 13 commits into from
Feb 12, 2025

Conversation

mtreinish
Copy link
Member

@mtreinish mtreinish commented Nov 8, 2024

Summary

With #13410 removing the non-threadsafe structure from our circuit
representation we're now able to read and iterate over a DAGCircuit from
multiple threads. This commit is the first small piece doing this, it
moves the analysis portion of the BarrierBeforeFinalMeasurements pass to
execute in parallel. The pass checks every node to ensure all it's
decedents are either a measure or a barrier before reaching the end of
the circuit. This commit iterates over all the nodes and does the check
in parallel.

Details and comments

TODO:

OnceLock is a thread-safe version of OnceCell that enables us to use
PackedInstruction from a threaded environment. There is some overhead
associated with this, primarily in memory as the OnceLock is a larger
type than a OnceCell. But the tradeoff is worth it to start leverage
multithreading for circuits.

Fixes Qiskit#13219
With Qiskit#13410 removing the non-threadsafe structure from our circuit
representation we're now able to read and iterate over a DAGCircuit from
multiple threads. This commit is the first small piece doing this, it
moves the analysis portion of the BarrierBeforeFinalMeasurements pass to
execure in parallel. The pass checks every node to ensure all it's
decendents are either a measure or a barrier before reaching the end of
the circuit. This commit iterates over all the nodes and does the check
in parallel.
@mtreinish mtreinish added performance Rust This PR or issue is related to Rust code in the repository labels Nov 8, 2024
@mtreinish mtreinish added this to the 2.0.0 milestone Nov 8, 2024
@coveralls
Copy link

coveralls commented Nov 8, 2024

Pull Request Test Coverage Report for Build 13271408905

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 67 of 83 (80.72%) changed or added relevant lines in 2 files are covered.
  • 10 unchanged lines in 3 files lost coverage.
  • Overall coverage increased (+0.002%) to 88.317%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/circuit/src/dag_circuit.rs 5 11 45.45%
crates/accelerate/src/barrier_before_final_measurement.rs 62 72 86.11%
Files with Coverage Reduction New Missed Lines %
crates/accelerate/src/unitary_synthesis.rs 1 93.29%
crates/qasm2/src/lex.rs 4 92.98%
qiskit/circuit/delay.py 5 75.71%
Totals Coverage Status
Change from base Build 13268699272: 0.002%
Covered Lines: 78853
Relevant Lines: 89284

💛 - Coveralls

This commit updates the logic in the pass to simplify the search
algorithm and improve it's overall efficiency. Previously the pass would
search the entire dag for all barrier and measurements and then did a
BFS from each found node to check that all descendants are either
barriers or measurements. Then with the set of nodes matching that
condition a full topological sort of the dag was run, then the
topologically ordered nodes were filtered for the matching set. That
sorted set is then used for filtering

This commit refactors this to do a reverse search from the output
nodes which reduces the complexity of the algorithm. This new algorithm
is also conducive for parallel execution because it does a search
starting from each qubit's output node. Doing a test with a quantum
volume circuit from 10 to 1000 qubits which scales linearly in depth
and number of qubits a crossover point between the parallel and serial
implementations was found around 150 qubits.
@mtreinish mtreinish changed the title [WIP] Perform BarrierBeforeFinalMeasurements analysis in parallel Perform BarrierBeforeFinalMeasurements analysis in parallel Feb 4, 2025
@mtreinish mtreinish added the Changelog: None Do not include in changelog label Feb 4, 2025
@mtreinish mtreinish marked this pull request as ready for review February 4, 2025 02:47
@mtreinish mtreinish requested a review from a team as a code owner February 4, 2025 02:47
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

@mtreinish
Copy link
Member Author

I ran a benchmark with a quantum volume circuit and did a sweep from 10 to 1000 qubits/depthand ran the pass on it with 1.3.2, the pass running with a serial iterator and with a parallel iterator:

barrier_pass_time

Based on these results I went with a parallel threshold of 150 qubits in: b89c826 so when we use a parallel iterator where the performance is better. This might vary on other environments though, so it would be useful for someone else to test this and we can adjust that value if it's not a good value.

@raynelfss raynelfss self-assigned this Feb 11, 2025
Copy link
Contributor

@raynelfss raynelfss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks straightforward to me and easy to understand. I just left a couple of comments related to the documentation of the code.

Comment on lines 5636 to 5644
/// Returns an iterator of tuples of (DAGNode, [DAGNodes]) where the DAGNode is the current node
/// and [DAGNode] is its successors in BFS order.
pub fn bfs_predecessors(
&self,
node: NodeIndex,
) -> impl Iterator<Item = (NodeIndex, Vec<NodeIndex>)> + '_ {
core_bfs_predecessors(&self.dag, node).filter(move |(_, others)| !others.is_empty())
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it quite funny that we didn't have this one before 😄, I assume it's because we didn't use anything like it. But still, nice addition :)

crates/circuit/src/dag_circuit.rs Outdated Show resolved Hide resolved
Comment on lines +4771 to +4775
/// Returns an immutable view of the qubit io map
#[inline(always)]
pub fn qubit_io_map(&self) -> &[[NodeIndex; 2]] {
&self.qubit_io_map
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it'd make sense to have one of these getters for the clbit_io_map too? I assume it's not needed now, but for the sake of consistency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, in practice I don't think we do that very often in the transpiler. But having the method for consistency would be good.

crates/accelerate/src/barrier_before_final_measurement.rs Outdated Show resolved Hide resolved
crates/circuit/src/dag_circuit.rs Show resolved Hide resolved
@mtreinish mtreinish requested a review from raynelfss February 11, 2025 19:58
raynelfss
raynelfss previously approved these changes Feb 11, 2025
Copy link
Contributor

@raynelfss raynelfss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just one minor docstring comment.

crates/circuit/src/dag_circuit.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@raynelfss raynelfss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy merging!

@raynelfss raynelfss added this pull request to the merge queue Feb 12, 2025
Merged via the queue into Qiskit:main with commit 9ca951c Feb 12, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog performance Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants