-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform BarrierBeforeFinalMeasurements analysis in parallel #13411
Conversation
OnceLock is a thread-safe version of OnceCell that enables us to use PackedInstruction from a threaded environment. There is some overhead associated with this, primarily in memory as the OnceLock is a larger type than a OnceCell. But the tradeoff is worth it to start leverage multithreading for circuits. Fixes Qiskit#13219
With Qiskit#13410 removing the non-threadsafe structure from our circuit representation we're now able to read and iterate over a DAGCircuit from multiple threads. This commit is the first small piece doing this, it moves the analysis portion of the BarrierBeforeFinalMeasurements pass to execure in parallel. The pass checks every node to ensure all it's decendents are either a measure or a barrier before reaching the end of the circuit. This commit iterates over all the nodes and does the check in parallel.
Pull Request Test Coverage Report for Build 13271408905Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
This commit updates the logic in the pass to simplify the search algorithm and improve it's overall efficiency. Previously the pass would search the entire dag for all barrier and measurements and then did a BFS from each found node to check that all descendants are either barriers or measurements. Then with the set of nodes matching that condition a full topological sort of the dag was run, then the topologically ordered nodes were filtered for the matching set. That sorted set is then used for filtering This commit refactors this to do a reverse search from the output nodes which reduces the complexity of the algorithm. This new algorithm is also conducive for parallel execution because it does a search starting from each qubit's output node. Doing a test with a quantum volume circuit from 10 to 1000 qubits which scales linearly in depth and number of qubits a crossover point between the parallel and serial implementations was found around 150 qubits.
One or more of the following people are relevant to this code:
|
I ran a benchmark with a quantum volume circuit and did a sweep from 10 to 1000 qubits/depthand ran the pass on it with 1.3.2, the pass running with a serial iterator and with a parallel iterator: Based on these results I went with a parallel threshold of 150 qubits in: b89c826 so when we use a parallel iterator where the performance is better. This might vary on other environments though, so it would be useful for someone else to test this and we can adjust that value if it's not a good value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks straightforward to me and easy to understand. I just left a couple of comments related to the documentation of the code.
crates/circuit/src/dag_circuit.rs
Outdated
/// Returns an iterator of tuples of (DAGNode, [DAGNodes]) where the DAGNode is the current node | ||
/// and [DAGNode] is its successors in BFS order. | ||
pub fn bfs_predecessors( | ||
&self, | ||
node: NodeIndex, | ||
) -> impl Iterator<Item = (NodeIndex, Vec<NodeIndex>)> + '_ { | ||
core_bfs_predecessors(&self.dag, node).filter(move |(_, others)| !others.is_empty()) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it quite funny that we didn't have this one before 😄, I assume it's because we didn't use anything like it. But still, nice addition :)
/// Returns an immutable view of the qubit io map | ||
#[inline(always)] | ||
pub fn qubit_io_map(&self) -> &[[NodeIndex; 2]] { | ||
&self.qubit_io_map | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it'd make sense to have one of these getters for the clbit_io_map
too? I assume it's not needed now, but for the sake of consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, in practice I don't think we do that very often in the transpiler. But having the method for consistency would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just one minor docstring comment.
Co-authored-by: Raynel Sanchez <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy merging!
Summary
With #13410 removing the non-threadsafe structure from our circuit
representation we're now able to read and iterate over a DAGCircuit from
multiple threads. This commit is the first small piece doing this, it
moves the analysis portion of the BarrierBeforeFinalMeasurements pass to
execute in parallel. The pass checks every node to ensure all it's
decedents are either a measure or a barrier before reaching the end of
the circuit. This commit iterates over all the nodes and does the check
in parallel.
Details and comments
TODO: