Add C API for two qubit unitary peephole optimization pass by mtreinish · Pull Request #16088 · Qiskit/qiskit

mtreinish · 2026-04-25T14:54:54Z

This commit adds the new TwoQubitPeepholeOptimization pass to the C API.
It adds two new functions:
qk_transpiler_pass_standalone_two_qubit_peephole_optimization and
qk_transpiler_pass_two_qubit_peephole_optimization for running the
pass on a QkCircuit and a QkDag respectively.

This is based on top of #13419 and will need to be rebased after that PR merges. In the meantime you can view the contents of just this PR by looking at the HEAD commit: d91d1da

AI/LLM disclosure

I didn't use LLM tooling, or only used it privately.
I used the following tool to help write this PR description:
I used the following tool to generate or modify code:

This commit adds a new transpiler pass for physical optimization, TwoQubitPeepholeOptimization. This replaces the use of Collect2qBlocks, ConsolidateBlocks, and UnitarySynthesis in the optimization stage for a default pass manager setup. The pass logically works the same way where it analyzes the dag to get a list of 2q runs, calculates the matrix of each run, and then synthesizes the matrix and substitutes it inplace. The distinction this pass makes though is it does this all in a single pass and also parallelizes the matrix calculation and synthesis steps because there is no data dependency there. This new pass is not meant to fully replace the Collect2qBlocks, ConsolidateBlocks, or UnitarySynthesis passes as those also run in contexts where we don't have a physical circuit. This is meant instead to replace their usage in the optimization stage only. Accordingly this new pass also changes the logic on how we select the synthesis to use and when to make a substituion. Previously this logic was primarily done via the ConsolidateBlocks pass by only consolidating to a UnitaryGate if the number of basis gates needed based on the weyl chamber coordinates was less than the number of 2q gates in the block (see Qiskit#11659 for discussion on this). Since this new pass skips the explicit consolidation stage we go ahead and try all the available synthesizers Right now this commit has a number of limitations, the largest are: - Only supports the target - It doesn't support any synthesizers besides the TwoQubitBasisDecomposer, because it's the only one in rust currently. For plugin handling I left the logic as running the three pass series, but I'm not sure this is the behavior we want. We could say keep the synthesis plugins for `UnitarySynthesis` only and then rely on our built-in methods for physical optimiztion only. But this also seems less than ideal because the plugin mechanism is how we support synthesizing to custom basis gates, and also more advanced approximate synthesis methods. Both of those are things we need to do as part of the synthesis here. Additionally, this is currently missing tests and documentation and while running it manually "works" as in it returns a circuit that looks valid, I've not done any validation yet. This also likely will need several rounds of performance optimization and tuning. t this point this is just a rough proof of concept and will need a lof refinement along with larger changes to Qiskit's rust code before this is ready to merge. Fixes Qiskit#12007 Fixes Qiskit#11659

…rallel-pass

Since Qiskit#13139 merged we have another two qubit decomposer available to run in rust, the TwoQubitControlledUDecomposer. This commit updates the new TwoQubitPeepholeOptimization to call this decomposer if the target supports appropriate 2q gates.

Clippy is correctly warning that the size difference between the two decomposer types in the TwoQubitDecomposer enumese two types is large. TwoQubitBasisDecomposer is 1640 bytes and TwoQubitControlledUDecomposer is only 24 bytes. This means each element of ControlledU is wasting > 1600 bytes. However, in this case that is acceptable in order to avoid a layer of pointer indirection as these are stored temporarily in a vec inside a thread to decompose a unitary. A trait would be more natural for this to define a common interface between all the two qubit decomposers but since we keep them instantiated for each edge in a Vec they need to be sized and doing something like `Box<dyn TwoQubitDecomposer>` (assuming a trait `TwoQubitDecomposer` instead of a enum) to get around this would have additional runtime overhead. This is also considering that TwoQubitControlledUDecomposer has far less likelihood in practice as it only works with some targets that have RZZ, RXX, RYY, or RZX gates on an edge which is less common.

…rallel-pass

Also don't run scoring more than needed.

…rallel-pass

…tinuous

…rallel-pass

The priority for the two qubit peephole pass should be decreasing the 2q gate count. The error rate heuristic should only matter if the 2q counts are the same. This commit flips the heuristic to first check the 2q gate count so the first priority is reducing the 2q gate count.

…rallel-pass

This commit removes the unitary synthesis plugin mechanism from the pass. This was a layer violation to support this when the pass logic doesn't actually support using the plugin interface. It is easier and more clear that if the plugin interface usage is desired to handle that in the pass manager construction rather than have this pass internally build a pass manager and execute other passes to emulate behavior it doesn't have.

…rallel-pass

There were two issues identified by the testing which required fixing and adjusting the tests based on limitations in the pass. The first issue was the parameters for the target gate was not handled correctly. In the case of using the Controlled U decomposer we were not passing the computed parameter value correctly to the output circuit and instead the ParameterExpression from the target was being used. Then in the case of controlled gates (not supercontrolled) that had a fixed angle that are normally intended for the xx decomposer were incorrectly being passed to the TwoQubitBasisDecomposer which can't work with them. This was resulting in invalid circuit outputs. The use of the TwoQubitBasisDecomposer is now correctly filtering to only be run with supercontrolled gates. The tests were adjusted for this limitation because they were mostly copied from the UnitarySynthesis tests which supports xx decomposer.

…rallel-pass

…instead of panic

…rallel-pass

This commit fixes an edge case in the scoring when there was only a single decomposition available. In UnitarySynthesis for these cases the fidelity calculation is not run as part of the synthesis because there is no reason to compute the estimated score when it's not being compared to anything. However, for the new peephole pass we need to run the scoring because the estimated fidelity is used to compare against the original circuit's block to determine whether to replace it or not. Previously, the new peephole pass treated the lack of a score as it being ideal, which was incorrect and led to the pass replacing every block unless the original circuit's block had more 2q gates. This fixes this and now runs the scoring manually if there is not one returned by the synthesis.

…rallel-pass

This commit makes a couple of small adjustments to the pass implementation to improve the runtime performance. The first is it switches the topological sort function used for the final iteration to rebuild the dag. The lexicographical toplogical sort this was using before had extra overhead which wasn't necessary for what we were doing. The output from a slightly different ordering doesn't change the structure of the dag so using this new function will be slightly faster. The other key change is moving the node mapping away from dashmap to use a `Mutex<Vec<usize>>`. This will require a slightly larger object in memory but it is much faster to access and write to because there is no hashing or other indirection once the lock is acquired by a thread.

…rallel-pass

…re locking is needed

We don't want to spend time reconstructing an exact copy of the dag if there are no substituions needed. Prior to using a vec for tracking the run indices that nodes are part of we would check if that map was empty. The vec is always populated and to determine if there are no entries we'd have to do a worse case O(n) lookup to determine if any entries are set. To avoid that this overhead but keeping the check this adds an atomic bool that is used to track whether we've substituted any blocks. If this is not set to true we can just exit early since there are no substitutions to make.

…rallel-pass

Previously there was a mismatch between the scoring of synthesis results and the peephole pass's comparison with the original block. The pass is documented as using the tuple (num_2q_gates, error, num_gates) and picking the min of all the choices. But, when we called the unitary synthesis function that selects the best synthesis outcome it was maximizing the estimated fidelity but not considering the gate counts like the pass is documented as doing. This corrects this mismatch by updating the function doing the synthesis to be generic on score type and taking a scorer callback. This lets the peephole pass control the heuristic used for selecting the best score.

This commit adds the new TwoQubitPeepholeOptimization pass to the C API. It adds two new functions: `qk_transpiler_pass_standalone_two_qubit_peephole_optimization` and `qk_transpiler_pass_two_qubit_peephole_optimization` for running the pass on a `QkCircuit` and a `QkDag` respectively.

qiskit-bot · 2026-04-25T14:54:59Z

One or more of the following people are relevant to this code:

@Qiskit/terra-core
@levbishop

eliarbel

Thanks Matthew, this seems straightforward.
I've left a few minor inline comments.

eliarbel · 2026-04-27T12:42:25Z

+    // Create a target with cx connectivity in a line.
+    QkExitCode result_x = qk_target_add_instruction(target, qk_target_entry_new(QkGate_X));
+    if (result_x != QkExitCode_Success) {
+        printf("Unexpected error occurred when adding a global X gate.");


FWIW, I know Jake likes the error messages to end with \n. This applies to all the printf calls here.

eliarbel · 2026-04-27T12:49:41Z

+/// This transpiler pass is designed to perform two qubit unitary peephole
+/// optimization. This pass finds all the 2 qubit blocks in the circuit,
+/// computes the unitary of that block, and then synthesizes that unitary.
+/// If the synthesized two qubit unitary is "better" than the original
+/// subcircuit that subcircuit is used to replace the original. The heuristic
+/// used to determine if it's better first looks at the two qubit gate count
+/// in the circuit, and prefers the synthesis with fewer two qubit gates, if
+/// the two qubit gate counts are the same then it looks at the estimated
+/// fidelity of the circuit and picks the subcircuit with higher estimated
+/// fidelity, and finally if needed it picks the subcircuit with the fewest
+/// total gates.
+///
+/// In case the target is overcomplete the pass will try all the
+/// decomposers supported for all the gates supported on a given qubit.
+/// The decomposition that has the best expected performance using the above
+/// heuristic will be selected and used to replace the block.
+///
+/// This pass is designed to be run on a physical circuit and the details of
+/// operations on a given qubit is assumed to be the hardware qubit from the
+/// target. However, the output of the pass might not use hardware operations,
+/// specifically single qubit gates might be emitted outside the target's supported
+/// operations, typically only if a parameterized gate supported by the
+/// :class:`.TwoQubitControlledUDecomposer` is used for synthesis. As such if running
+/// this pass in a physical optimization stage (such as :ref:`transpiler-preset-stage-optimization`)
+/// this should be paired with passes such as :class:`.BasisTranslator` and/or
+/// :class:`.Optimize1qGatesDecomposition` to ensure that these errant single qubit
+/// gates are replaced with hardware supported operations prior to exiting the stage.
+///
+/// This pass is multithreaded, and will perform the analysis in parallel
+/// and use all the cores available on your local system. You can refer to
+/// the `configuration guide <https://docs.quantum.ibm.com/guides/configure-qiskit-local>`__
+/// for details on how to control the threading behavior for Qiskit more broadly
+/// which will also control this pass


In #15614 we went with the approach of having single source of truth for the general description of a given pass. We used the dag-based passes as such and had the standalone ones point to the dag-based functions for more detail. Should we follow this here as well?

eliarbel · 2026-04-27T12:51:08Z

+///
+/// Behavior is undefined if ``circuit`` or ``target`` is not a valid, non-null pointer to a ``QkCircuit`` and ``QkTarget``.
+#[unsafe(no_mangle)]
+pub unsafe extern "C" fn qk_transpiler_pass_standalone_two_qubit_peephole_optimization(


A suggestion, for your discretion: replace _two_qubit_ with _2q_ to make the function names shorter.

eliarbel · 2026-04-27T12:57:11Z

+///     }
+///     QkDag *dag = qk_circuit_to_dag(qc);
+///     qk_transpiler_pass_two_qubit_peephole_optimization(dag, target, 1.0);
+/// ```


I think it would be nice to add the qk_*_free functions here for completeness.

eliarbel · 2026-04-27T12:57:49Z

+/// target. However, the output of the pass might not use hardware operations,
+/// specifically single qubit gates might be emitted outside the target's supported
+/// operations, typically only if a parameterized gate supported by the
+/// :class:`.TwoQubitControlledUDecomposer` is used for synthesis. As such if running


The Python-side Sphinx crossrefs don't render nicely on the C API pages. Should we just use backticks?

eliarbel · 2026-04-27T13:00:30Z

+///         if (i % 2) {
+///             qk_circuit_gate(qc, QkGate_CX, forward, NULL);
+///         } else {
+///             qk_circuit_gate(QkGate_CX, reverse, NULL);


Suggested change

/// qk_circuit_gate(QkGate_CX, reverse, NULL);

/// qk_circuit_gate(qc, QkGate_CX, reverse, NULL);

eliarbel · 2026-04-27T13:01:44Z

+/// # Example
+///
+/// ```c
+///     QkTarget *target = qk_target_new(2);


Nitpicking: we don't need to indent code examples anymore

eliarbel · 2026-04-27T13:03:25Z

+    } else {
+        Some(approximation_degree)
+    };
+    let out_dag = match two_qubit_unitary_peephole_optimize(&dag, target.into(), approximation) {


Nitpicking: I think we don't need the .into() here, do we?

eliarbel · 2026-04-27T13:05:38Z

+        };
+        let mut gate_names = qc.count_ops().keys().copied().collect::<Vec<_>>();
+        gate_names.sort();
+        assert_eq!(gate_names, vec!["cx", "u"]);


Is there any particular reason why we test this here rather than in `test_two_qubit_peephole.c'? This doesn't seem like something we can't cover in the C API testing.

eliarbel · 2026-04-27T13:09:59Z

+            qk_circuit_gate(qc, QkGate_CX, reverse, NULL);
+        }
+    }
+    qk_transpiler_pass_standalone_two_qubit_peephole_optimization(qc, target, 1.0);


It would be nice to cover the NAN branch of approximation_degree as well (this also applies to the dag-based function).

ShellyGarion

Looks good. I have some minor comments and questions on the docs and tests.

ShellyGarion · 2026-04-30T11:56:28Z

+
+/// @ingroup QkTranspilerPassesStandalone
+/// Run the TwoQubitPeepholeOptimization transpiler pass.
+///


we must have the same documentation twice for both functions? the standalone and the other one?
it's quite lengthy...

ShellyGarion · 2026-04-30T11:58:20Z

+        };
+        let operation = PackedOperation::from_unitary(Box::new(gate));
+        qc.push_packed_operation(operation, None, &[Qubit(0), Qubit(2)], &[])
+            .unwrap();


perhaps also add to the circuit some parametrized gate as well?

ShellyGarion · 2026-04-30T11:59:07Z

+            .unwrap();
+        target
+            .add_instruction(
+                PackedOperation::from_standard_gate(StandardGate::CX),


perhaps also test also a parametrized 2-qubit gate in the target?

ShellyGarion · 2026-04-30T12:02:02Z

+        return RuntimeError;
+    }
+
+    QkTargetEntry *cx_entry = qk_target_entry_new(QkGate_CX);


perhaps it's also worth to check RZZ here? not only CX?

mtreinish added 30 commits November 14, 2024 07:34

Merge remote-tracking branch 'origin/main' into two-qubit-peephole-pa…

decee9a

…rallel-pass

Merge remote-tracking branch 'origin/main' into two-qubit-peephole-pa…

4d4df68

…rallel-pass

Embed 2q gate count into score as tie breaker

cb6b70f

Also don't run scoring more than needed.

Release GIL during parallel portion

f06a070

Merge branch 'main' into two-qubit-peephole-parallel-pass

90b16e8

Fix lint

a175ee8

Merge remote-tracking branch 'origin/main' into two-qubit-peephole-pa…

af0c144

…rallel-pass

Update ControlledUDecomposer to ensure we only run if the gate is con…

79a46c5

…tinuous

Add reversed synthesis for two qubit basis decomposer

839b4c9

Fix handling of single direction gates

d9399a6

Fix import cycle

b4c4360

Merge branch 'main' into two-qubit-peephole-parallel-pass

aefdc90

Merge remote-tracking branch 'origin/main' into two-qubit-peephole-pa…

746b953

…rallel-pass

Add docstring to new pass

55b05c0

Add release note

7756d1e

Run serially in multiprocessing context

6a01332

Fix cache build

d83562b

Add tests

5f73b93

Merge remote-tracking branch 'origin/main' into two-qubit-peephole-pa…

5e6c4cb

…rallel-pass

Rebase updates

8c7e67c

Adjust tests

f41e855

Clean-up test lint

3cdee4f

Fix oversight in test code

89931a4

Merge remote-tracking branch 'origin/main' into two-qubit-peephole-pa…

19368bb

…rallel-pass

mtreinish added 21 commits March 18, 2026 12:35

Merge remote-tracking branch 'origin/main' into two-qubit-peephole-pa…

b88de4a

…rallel-pass

Merge remote-tracking branch 'origin/main' into two-qubit-peephole-pa…

ce3dddf

…rallel-pass

Return a user error from a decomposer build with an incorrect string …

158725f

…instead of panic

Merge remote-tracking branch 'origin/main' into two-qubit-peephole-pa…

a8a3fe7

…rallel-pass

Merge remote-tracking branch 'origin/main' into two-qubit-peephole-pa…

8c58c3a

…rallel-pass

Merge remote-tracking branch 'origin/main' into two-qubit-peephole-pa…

9356125

…rallel-pass

Don't use a hashset for tracking processed runs

9e3a651

Merge remote-tracking branch 'origin/main' into two-qubit-peephole-pa…

500a964

…rallel-pass

Remove unnecessary empty check and use Mutex::into_inner() when no mo…

a8dce96

…re locking is needed

Merge remote-tracking branch 'origin/main' into two-qubit-peephole-pa…

469c06b

…rallel-pass

Add release note details to the pass docstring

b9ce054

Update test module docstring

12b14ee

Update pass module docstring

f80db21

Fix typo in code comment

8c03af3

Add docstrings for the new UnitarySynthesis helper methods

819110d

Expand 2q basis fore new tests

56ea5aa

mtreinish added this to the 2.5.0 milestone Apr 25, 2026

mtreinish requested a review from a team as a code owner April 25, 2026 14:54

mtreinish added Changelog: Added Add an "Added" entry in the GitHub Release changelog. C API Related to the C API labels Apr 25, 2026

github-project-automation Bot added this to Qiskit 2.5 Apr 25, 2026

github-project-automation Bot moved this to Ready in Qiskit 2.5 Apr 25, 2026

eliarbel reviewed Apr 27, 2026

View reviewed changes

ShellyGarion reviewed Apr 30, 2026

View reviewed changes

	/// qk_circuit_gate(QkGate_CX, reverse, NULL);
	/// qk_circuit_gate(qc, QkGate_CX, reverse, NULL);

Conversation

mtreinish commented Apr 25, 2026

AI/LLM disclosure

Uh oh!

qiskit-bot commented Apr 25, 2026

Uh oh!

eliarbel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eliarbel Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShellyGarion left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

eliarbel Apr 27, 2026 •

edited

Loading