Add C API for two qubit unitary peephole optimization pass#16088
Add C API for two qubit unitary peephole optimization pass#16088mtreinish wants to merge 83 commits intoQiskit:mainfrom
Conversation
This commit adds a new transpiler pass for physical optimization, TwoQubitPeepholeOptimization. This replaces the use of Collect2qBlocks, ConsolidateBlocks, and UnitarySynthesis in the optimization stage for a default pass manager setup. The pass logically works the same way where it analyzes the dag to get a list of 2q runs, calculates the matrix of each run, and then synthesizes the matrix and substitutes it inplace. The distinction this pass makes though is it does this all in a single pass and also parallelizes the matrix calculation and synthesis steps because there is no data dependency there. This new pass is not meant to fully replace the Collect2qBlocks, ConsolidateBlocks, or UnitarySynthesis passes as those also run in contexts where we don't have a physical circuit. This is meant instead to replace their usage in the optimization stage only. Accordingly this new pass also changes the logic on how we select the synthesis to use and when to make a substituion. Previously this logic was primarily done via the ConsolidateBlocks pass by only consolidating to a UnitaryGate if the number of basis gates needed based on the weyl chamber coordinates was less than the number of 2q gates in the block (see Qiskit#11659 for discussion on this). Since this new pass skips the explicit consolidation stage we go ahead and try all the available synthesizers Right now this commit has a number of limitations, the largest are: - Only supports the target - It doesn't support any synthesizers besides the TwoQubitBasisDecomposer, because it's the only one in rust currently. For plugin handling I left the logic as running the three pass series, but I'm not sure this is the behavior we want. We could say keep the synthesis plugins for `UnitarySynthesis` only and then rely on our built-in methods for physical optimiztion only. But this also seems less than ideal because the plugin mechanism is how we support synthesizing to custom basis gates, and also more advanced approximate synthesis methods. Both of those are things we need to do as part of the synthesis here. Additionally, this is currently missing tests and documentation and while running it manually "works" as in it returns a circuit that looks valid, I've not done any validation yet. This also likely will need several rounds of performance optimization and tuning. t this point this is just a rough proof of concept and will need a lof refinement along with larger changes to Qiskit's rust code before this is ready to merge. Fixes Qiskit#12007 Fixes Qiskit#11659
Since Qiskit#13139 merged we have another two qubit decomposer available to run in rust, the TwoQubitControlledUDecomposer. This commit updates the new TwoQubitPeepholeOptimization to call this decomposer if the target supports appropriate 2q gates.
Clippy is correctly warning that the size difference between the two decomposer types in the TwoQubitDecomposer enumese two types is large. TwoQubitBasisDecomposer is 1640 bytes and TwoQubitControlledUDecomposer is only 24 bytes. This means each element of ControlledU is wasting > 1600 bytes. However, in this case that is acceptable in order to avoid a layer of pointer indirection as these are stored temporarily in a vec inside a thread to decompose a unitary. A trait would be more natural for this to define a common interface between all the two qubit decomposers but since we keep them instantiated for each edge in a Vec they need to be sized and doing something like `Box<dyn TwoQubitDecomposer>` (assuming a trait `TwoQubitDecomposer` instead of a enum) to get around this would have additional runtime overhead. This is also considering that TwoQubitControlledUDecomposer has far less likelihood in practice as it only works with some targets that have RZZ, RXX, RYY, or RZX gates on an edge which is less common.
Also don't run scoring more than needed.
The priority for the two qubit peephole pass should be decreasing the 2q gate count. The error rate heuristic should only matter if the 2q counts are the same. This commit flips the heuristic to first check the 2q gate count so the first priority is reducing the 2q gate count.
This commit removes the unitary synthesis plugin mechanism from the pass. This was a layer violation to support this when the pass logic doesn't actually support using the plugin interface. It is easier and more clear that if the plugin interface usage is desired to handle that in the pass manager construction rather than have this pass internally build a pass manager and execute other passes to emulate behavior it doesn't have.
There were two issues identified by the testing which required fixing and adjusting the tests based on limitations in the pass. The first issue was the parameters for the target gate was not handled correctly. In the case of using the Controlled U decomposer we were not passing the computed parameter value correctly to the output circuit and instead the ParameterExpression from the target was being used. Then in the case of controlled gates (not supercontrolled) that had a fixed angle that are normally intended for the xx decomposer were incorrectly being passed to the TwoQubitBasisDecomposer which can't work with them. This was resulting in invalid circuit outputs. The use of the TwoQubitBasisDecomposer is now correctly filtering to only be run with supercontrolled gates. The tests were adjusted for this limitation because they were mostly copied from the UnitarySynthesis tests which supports xx decomposer.
This commit fixes an edge case in the scoring when there was only a single decomposition available. In UnitarySynthesis for these cases the fidelity calculation is not run as part of the synthesis because there is no reason to compute the estimated score when it's not being compared to anything. However, for the new peephole pass we need to run the scoring because the estimated fidelity is used to compare against the original circuit's block to determine whether to replace it or not. Previously, the new peephole pass treated the lack of a score as it being ideal, which was incorrect and led to the pass replacing every block unless the original circuit's block had more 2q gates. This fixes this and now runs the scoring manually if there is not one returned by the synthesis.
This commit makes a couple of small adjustments to the pass implementation to improve the runtime performance. The first is it switches the topological sort function used for the final iteration to rebuild the dag. The lexicographical toplogical sort this was using before had extra overhead which wasn't necessary for what we were doing. The output from a slightly different ordering doesn't change the structure of the dag so using this new function will be slightly faster. The other key change is moving the node mapping away from dashmap to use a `Mutex<Vec<usize>>`. This will require a slightly larger object in memory but it is much faster to access and write to because there is no hashing or other indirection once the lock is acquired by a thread.
…re locking is needed
We don't want to spend time reconstructing an exact copy of the dag if there are no substituions needed. Prior to using a vec for tracking the run indices that nodes are part of we would check if that map was empty. The vec is always populated and to determine if there are no entries we'd have to do a worse case O(n) lookup to determine if any entries are set. To avoid that this overhead but keeping the check this adds an atomic bool that is used to track whether we've substituted any blocks. If this is not set to true we can just exit early since there are no substitutions to make.
Previously there was a mismatch between the scoring of synthesis results and the peephole pass's comparison with the original block. The pass is documented as using the tuple (num_2q_gates, error, num_gates) and picking the min of all the choices. But, when we called the unitary synthesis function that selects the best synthesis outcome it was maximizing the estimated fidelity but not considering the gate counts like the pass is documented as doing. This corrects this mismatch by updating the function doing the synthesis to be generic on score type and taking a scorer callback. This lets the peephole pass control the heuristic used for selecting the best score.
This commit adds the new TwoQubitPeepholeOptimization pass to the C API. It adds two new functions: `qk_transpiler_pass_standalone_two_qubit_peephole_optimization` and `qk_transpiler_pass_two_qubit_peephole_optimization` for running the pass on a `QkCircuit` and a `QkDag` respectively.
|
One or more of the following people are relevant to this code:
|
eliarbel
left a comment
There was a problem hiding this comment.
Thanks Matthew, this seems straightforward.
I've left a few minor inline comments.
| // Create a target with cx connectivity in a line. | ||
| QkExitCode result_x = qk_target_add_instruction(target, qk_target_entry_new(QkGate_X)); | ||
| if (result_x != QkExitCode_Success) { | ||
| printf("Unexpected error occurred when adding a global X gate."); |
There was a problem hiding this comment.
FWIW, I know Jake likes the error messages to end with \n. This applies to all the printf calls here.
| /// This transpiler pass is designed to perform two qubit unitary peephole | ||
| /// optimization. This pass finds all the 2 qubit blocks in the circuit, | ||
| /// computes the unitary of that block, and then synthesizes that unitary. | ||
| /// If the synthesized two qubit unitary is "better" than the original | ||
| /// subcircuit that subcircuit is used to replace the original. The heuristic | ||
| /// used to determine if it's better first looks at the two qubit gate count | ||
| /// in the circuit, and prefers the synthesis with fewer two qubit gates, if | ||
| /// the two qubit gate counts are the same then it looks at the estimated | ||
| /// fidelity of the circuit and picks the subcircuit with higher estimated | ||
| /// fidelity, and finally if needed it picks the subcircuit with the fewest | ||
| /// total gates. | ||
| /// | ||
| /// In case the target is overcomplete the pass will try all the | ||
| /// decomposers supported for all the gates supported on a given qubit. | ||
| /// The decomposition that has the best expected performance using the above | ||
| /// heuristic will be selected and used to replace the block. | ||
| /// | ||
| /// This pass is designed to be run on a physical circuit and the details of | ||
| /// operations on a given qubit is assumed to be the hardware qubit from the | ||
| /// target. However, the output of the pass might not use hardware operations, | ||
| /// specifically single qubit gates might be emitted outside the target's supported | ||
| /// operations, typically only if a parameterized gate supported by the | ||
| /// :class:`.TwoQubitControlledUDecomposer` is used for synthesis. As such if running | ||
| /// this pass in a physical optimization stage (such as :ref:`transpiler-preset-stage-optimization`) | ||
| /// this should be paired with passes such as :class:`.BasisTranslator` and/or | ||
| /// :class:`.Optimize1qGatesDecomposition` to ensure that these errant single qubit | ||
| /// gates are replaced with hardware supported operations prior to exiting the stage. | ||
| /// | ||
| /// This pass is multithreaded, and will perform the analysis in parallel | ||
| /// and use all the cores available on your local system. You can refer to | ||
| /// the `configuration guide <https://docs.quantum.ibm.com/guides/configure-qiskit-local>`__ | ||
| /// for details on how to control the threading behavior for Qiskit more broadly | ||
| /// which will also control this pass |
There was a problem hiding this comment.
In #15614 we went with the approach of having single source of truth for the general description of a given pass. We used the dag-based passes as such and had the standalone ones point to the dag-based functions for more detail. Should we follow this here as well?
| /// | ||
| /// Behavior is undefined if ``circuit`` or ``target`` is not a valid, non-null pointer to a ``QkCircuit`` and ``QkTarget``. | ||
| #[unsafe(no_mangle)] | ||
| pub unsafe extern "C" fn qk_transpiler_pass_standalone_two_qubit_peephole_optimization( |
There was a problem hiding this comment.
A suggestion, for your discretion: replace _two_qubit_ with _2q_ to make the function names shorter.
| /// } | ||
| /// QkDag *dag = qk_circuit_to_dag(qc); | ||
| /// qk_transpiler_pass_two_qubit_peephole_optimization(dag, target, 1.0); | ||
| /// ``` |
There was a problem hiding this comment.
I think it would be nice to add the qk_*_free functions here for completeness.
| /// target. However, the output of the pass might not use hardware operations, | ||
| /// specifically single qubit gates might be emitted outside the target's supported | ||
| /// operations, typically only if a parameterized gate supported by the | ||
| /// :class:`.TwoQubitControlledUDecomposer` is used for synthesis. As such if running |
There was a problem hiding this comment.
The Python-side Sphinx crossrefs don't render nicely on the C API pages. Should we just use backticks?
| /// if (i % 2) { | ||
| /// qk_circuit_gate(qc, QkGate_CX, forward, NULL); | ||
| /// } else { | ||
| /// qk_circuit_gate(QkGate_CX, reverse, NULL); |
There was a problem hiding this comment.
| /// qk_circuit_gate(QkGate_CX, reverse, NULL); | |
| /// qk_circuit_gate(qc, QkGate_CX, reverse, NULL); |
| /// # Example | ||
| /// | ||
| /// ```c | ||
| /// QkTarget *target = qk_target_new(2); |
There was a problem hiding this comment.
Nitpicking: we don't need to indent code examples anymore
| } else { | ||
| Some(approximation_degree) | ||
| }; | ||
| let out_dag = match two_qubit_unitary_peephole_optimize(&dag, target.into(), approximation) { |
There was a problem hiding this comment.
Nitpicking: I think we don't need the .into() here, do we?
| }; | ||
| let mut gate_names = qc.count_ops().keys().copied().collect::<Vec<_>>(); | ||
| gate_names.sort(); | ||
| assert_eq!(gate_names, vec!["cx", "u"]); |
There was a problem hiding this comment.
Is there any particular reason why we test this here rather than in `test_two_qubit_peephole.c'? This doesn't seem like something we can't cover in the C API testing.
| qk_circuit_gate(qc, QkGate_CX, reverse, NULL); | ||
| } | ||
| } | ||
| qk_transpiler_pass_standalone_two_qubit_peephole_optimization(qc, target, 1.0); |
There was a problem hiding this comment.
It would be nice to cover the NAN branch of approximation_degree as well (this also applies to the dag-based function).
ShellyGarion
left a comment
There was a problem hiding this comment.
Looks good. I have some minor comments and questions on the docs and tests.
|
|
||
| /// @ingroup QkTranspilerPassesStandalone | ||
| /// Run the TwoQubitPeepholeOptimization transpiler pass. | ||
| /// |
There was a problem hiding this comment.
we must have the same documentation twice for both functions? the standalone and the other one?
it's quite lengthy...
| }; | ||
| let operation = PackedOperation::from_unitary(Box::new(gate)); | ||
| qc.push_packed_operation(operation, None, &[Qubit(0), Qubit(2)], &[]) | ||
| .unwrap(); |
There was a problem hiding this comment.
perhaps also add to the circuit some parametrized gate as well?
| .unwrap(); | ||
| target | ||
| .add_instruction( | ||
| PackedOperation::from_standard_gate(StandardGate::CX), |
There was a problem hiding this comment.
perhaps also test also a parametrized 2-qubit gate in the target?
| return RuntimeError; | ||
| } | ||
|
|
||
| QkTargetEntry *cx_entry = qk_target_entry_new(QkGate_CX); |
There was a problem hiding this comment.
perhaps it's also worth to check RZZ here? not only CX?
This commit adds the new TwoQubitPeepholeOptimization pass to the C API.
It adds two new functions:
qk_transpiler_pass_standalone_two_qubit_peephole_optimizationandqk_transpiler_pass_two_qubit_peephole_optimizationfor running thepass on a
QkCircuitand aQkDagrespectively.This is based on top of #13419 and will need to be rebased after that PR merges. In the meantime you can view the contents of just this PR by looking at the HEAD commit: d91d1da
AI/LLM disclosure