Skip to content

Add C API for two qubit unitary peephole optimization pass#16088

Open
mtreinish wants to merge 83 commits intoQiskit:mainfrom
mtreinish:peephole-c
Open

Add C API for two qubit unitary peephole optimization pass#16088
mtreinish wants to merge 83 commits intoQiskit:mainfrom
mtreinish:peephole-c

Conversation

@mtreinish
Copy link
Copy Markdown
Member

This commit adds the new TwoQubitPeepholeOptimization pass to the C API.
It adds two new functions:
qk_transpiler_pass_standalone_two_qubit_peephole_optimization and
qk_transpiler_pass_two_qubit_peephole_optimization for running the
pass on a QkCircuit and a QkDag respectively.

This is based on top of #13419 and will need to be rebased after that PR merges. In the meantime you can view the contents of just this PR by looking at the HEAD commit: d91d1da

AI/LLM disclosure

  • I didn't use LLM tooling, or only used it privately.
  • I used the following tool to help write this PR description:
  • I used the following tool to generate or modify code:

This commit adds a new transpiler pass for physical optimization,
TwoQubitPeepholeOptimization. This replaces the use of Collect2qBlocks,
ConsolidateBlocks, and UnitarySynthesis in the optimization stage for
a default pass manager setup. The pass logically works the same way
where it analyzes the dag to get a list of 2q runs, calculates the matrix
of each run, and then synthesizes the matrix and substitutes it inplace.
The distinction this pass makes though is it does this all in a single
pass and also parallelizes the matrix calculation and synthesis steps
because there is no data dependency there.

This new pass is not meant to fully replace the Collect2qBlocks,
ConsolidateBlocks, or UnitarySynthesis passes as those also run in
contexts where we don't have a physical circuit. This is meant instead
to replace their usage in the optimization stage only. Accordingly this
new pass also changes the logic on how we select the synthesis to use
and when to make a substituion. Previously this logic was primarily done
via the ConsolidateBlocks pass by only consolidating to a UnitaryGate if
the number of basis gates needed based on the weyl chamber coordinates
was less than the number of 2q gates in the block (see Qiskit#11659 for
discussion on this). Since this new pass skips the explicit
consolidation stage we go ahead and try all the available synthesizers

Right now this commit has a number of limitations, the largest are:

- Only supports the target
- It doesn't support any synthesizers besides the TwoQubitBasisDecomposer,
  because it's the only one in rust currently.

For plugin handling I left the logic as running the three pass series,
but I'm not sure this is the behavior we want. We could say keep the
synthesis plugins for `UnitarySynthesis` only and then rely on our
built-in methods for physical optimiztion only. But this also seems less
than ideal because the plugin mechanism is how we support synthesizing
to custom basis gates, and also more advanced approximate synthesis
methods. Both of those are things we need to do as part of the synthesis
here.

Additionally, this is currently missing tests and documentation and while
running it manually "works" as in it returns a circuit that looks valid,
I've not done any validation yet. This also likely will need several
rounds of performance optimization and tuning. t this point this is
just a rough proof of concept and will need a lof refinement along with
larger changes to Qiskit's rust code before this is ready to merge.

Fixes Qiskit#12007
Fixes Qiskit#11659
Since Qiskit#13139 merged we have another two qubit decomposer available to
run in rust, the TwoQubitControlledUDecomposer. This commit updates the
new TwoQubitPeepholeOptimization to call this decomposer if the target
supports appropriate 2q gates.
Clippy is correctly warning that the size difference between the two
decomposer types in the TwoQubitDecomposer enumese two types is large.
TwoQubitBasisDecomposer is 1640 bytes and TwoQubitControlledUDecomposer
is only 24 bytes. This means each element of ControlledU is wasting
> 1600 bytes. However, in this case that is acceptable in order to
avoid a layer of pointer indirection as these are stored temporarily
in a vec inside a thread to decompose a unitary. A trait would be more
natural for this to define a common interface between all the two qubit
decomposers but since we keep them instantiated for each edge in a Vec
they need to be sized and doing something like
`Box<dyn TwoQubitDecomposer>` (assuming a trait `TwoQubitDecomposer`
instead of a enum) to get around this would have additional runtime
overhead. This is also considering that TwoQubitControlledUDecomposer
has far less likelihood in practice as it only works with some targets
that have RZZ, RXX, RYY, or RZX gates on an edge which is less common.
Also don't run scoring more than needed.
The priority for the two qubit peephole pass should be decreasing the 2q
gate count. The error rate heuristic should only matter if the 2q counts
are the same. This commit flips the heuristic to first check the 2q gate
count so the first priority is reducing the 2q gate count.
This commit removes the unitary synthesis plugin mechanism from the
pass. This was a layer violation to support this when the pass logic
doesn't actually support using the plugin interface. It is easier and
more clear that if the plugin interface usage is desired to handle that
in the pass manager construction rather than have this pass internally
build a pass manager and execute other passes to emulate behavior it
doesn't have.
There were two issues identified by the testing which required fixing
and adjusting the tests based on limitations in the pass. The first
issue was the parameters for the target gate was not handled correctly.
In the case of using the Controlled U decomposer we were not passing the
computed parameter value correctly to the output circuit and instead the
ParameterExpression from the target was being used. Then in the case of
controlled gates (not supercontrolled) that had a fixed angle that are
normally intended for the xx decomposer were incorrectly being passed to
the TwoQubitBasisDecomposer which can't work with them. This was
resulting in invalid circuit outputs. The use of the
TwoQubitBasisDecomposer is now correctly filtering to only be run with
supercontrolled gates. The tests were adjusted for this limitation
because they were mostly copied from the UnitarySynthesis tests which
supports xx decomposer.
This commit fixes an edge case in the scoring when there was only a
single decomposition available. In UnitarySynthesis for these cases
the fidelity calculation is not run as part of the synthesis because there
is no reason to compute the estimated score when it's not being compared
to anything. However, for the new peephole pass we need to run the
scoring because the estimated fidelity is used to compare against the
original circuit's block to determine whether to replace it or not.
Previously, the new peephole pass treated the lack of a score as it
being ideal, which was incorrect and led to the pass replacing every
block unless the original circuit's block had more 2q gates. This fixes
this and now runs the scoring manually if there is not one returned by
the synthesis.
This commit makes a couple of small adjustments to the pass
implementation to improve the runtime performance. The first is it
switches the topological sort function used for the final iteration to
rebuild the dag. The lexicographical toplogical sort this was using
before had extra overhead which wasn't necessary for what we were doing.
The output from a slightly different ordering doesn't change the
structure of the dag so using this new function will be slightly faster.
The other key change is moving the node mapping away from dashmap to use
a `Mutex<Vec<usize>>`. This will require a slightly larger object in
memory but it is much faster to access and write to because there is no
hashing or other indirection once the lock is acquired by a thread.
We don't want to spend time reconstructing an exact copy of the dag if
there are no substituions needed. Prior to using a vec for tracking the
run indices that nodes are part of we would check if that map was empty.
The vec is always populated and to determine if there are no entries
we'd have to do a worse case O(n) lookup to determine if any entries are
set. To avoid that this overhead but keeping the check this adds an atomic
bool that is used to track whether we've substituted any blocks. If this
is not set to true we can just exit early since there are no
substitutions to make.
Previously there was a mismatch between the scoring of synthesis
results and the peephole pass's comparison with the original block. The
pass is documented as using the tuple (num_2q_gates, error, num_gates)
and picking the min of all the choices. But, when we called the unitary
synthesis function that selects the best synthesis outcome it was
maximizing the estimated fidelity but not considering the gate counts
like the pass is documented as doing. This corrects this mismatch by
updating the function doing the synthesis to be generic on score type
and taking a scorer callback. This lets the peephole pass control the
heuristic used for selecting the best score.
This commit adds the new TwoQubitPeepholeOptimization pass to the C API.
It adds two new functions:
`qk_transpiler_pass_standalone_two_qubit_peephole_optimization` and
`qk_transpiler_pass_two_qubit_peephole_optimization` for running the
pass on a `QkCircuit` and a `QkDag` respectively.
@mtreinish mtreinish added this to the 2.5.0 milestone Apr 25, 2026
@mtreinish mtreinish requested a review from a team as a code owner April 25, 2026 14:54
@mtreinish mtreinish added Changelog: Added Add an "Added" entry in the GitHub Release changelog. C API Related to the C API labels Apr 25, 2026
@qiskit-bot
Copy link
Copy Markdown
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core
  • @levbishop

Copy link
Copy Markdown
Member

@eliarbel eliarbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matthew, this seems straightforward.
I've left a few minor inline comments.

// Create a target with cx connectivity in a line.
QkExitCode result_x = qk_target_add_instruction(target, qk_target_entry_new(QkGate_X));
if (result_x != QkExitCode_Success) {
printf("Unexpected error occurred when adding a global X gate.");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I know Jake likes the error messages to end with \n. This applies to all the printf calls here.

Comment on lines +23 to +55
/// This transpiler pass is designed to perform two qubit unitary peephole
/// optimization. This pass finds all the 2 qubit blocks in the circuit,
/// computes the unitary of that block, and then synthesizes that unitary.
/// If the synthesized two qubit unitary is "better" than the original
/// subcircuit that subcircuit is used to replace the original. The heuristic
/// used to determine if it's better first looks at the two qubit gate count
/// in the circuit, and prefers the synthesis with fewer two qubit gates, if
/// the two qubit gate counts are the same then it looks at the estimated
/// fidelity of the circuit and picks the subcircuit with higher estimated
/// fidelity, and finally if needed it picks the subcircuit with the fewest
/// total gates.
///
/// In case the target is overcomplete the pass will try all the
/// decomposers supported for all the gates supported on a given qubit.
/// The decomposition that has the best expected performance using the above
/// heuristic will be selected and used to replace the block.
///
/// This pass is designed to be run on a physical circuit and the details of
/// operations on a given qubit is assumed to be the hardware qubit from the
/// target. However, the output of the pass might not use hardware operations,
/// specifically single qubit gates might be emitted outside the target's supported
/// operations, typically only if a parameterized gate supported by the
/// :class:`.TwoQubitControlledUDecomposer` is used for synthesis. As such if running
/// this pass in a physical optimization stage (such as :ref:`transpiler-preset-stage-optimization`)
/// this should be paired with passes such as :class:`.BasisTranslator` and/or
/// :class:`.Optimize1qGatesDecomposition` to ensure that these errant single qubit
/// gates are replaced with hardware supported operations prior to exiting the stage.
///
/// This pass is multithreaded, and will perform the analysis in parallel
/// and use all the cores available on your local system. You can refer to
/// the `configuration guide <https://docs.quantum.ibm.com/guides/configure-qiskit-local>`__
/// for details on how to control the threading behavior for Qiskit more broadly
/// which will also control this pass
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #15614 we went with the approach of having single source of truth for the general description of a given pass. We used the dag-based passes as such and had the standalone ones point to the dag-based functions for more detail. Should we follow this here as well?

///
/// Behavior is undefined if ``circuit`` or ``target`` is not a valid, non-null pointer to a ``QkCircuit`` and ``QkTarget``.
#[unsafe(no_mangle)]
pub unsafe extern "C" fn qk_transpiler_pass_standalone_two_qubit_peephole_optimization(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion, for your discretion: replace _two_qubit_ with _2q_ to make the function names shorter.

/// }
/// QkDag *dag = qk_circuit_to_dag(qc);
/// qk_transpiler_pass_two_qubit_peephole_optimization(dag, target, 1.0);
/// ```
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to add the qk_*_free functions here for completeness.

/// target. However, the output of the pass might not use hardware operations,
/// specifically single qubit gates might be emitted outside the target's supported
/// operations, typically only if a parameterized gate supported by the
/// :class:`.TwoQubitControlledUDecomposer` is used for synthesis. As such if running
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python-side Sphinx crossrefs don't render nicely on the C API pages. Should we just use backticks?

/// if (i % 2) {
/// qk_circuit_gate(qc, QkGate_CX, forward, NULL);
/// } else {
/// qk_circuit_gate(QkGate_CX, reverse, NULL);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// qk_circuit_gate(QkGate_CX, reverse, NULL);
/// qk_circuit_gate(qc, QkGate_CX, reverse, NULL);

/// # Example
///
/// ```c
/// QkTarget *target = qk_target_new(2);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking: we don't need to indent code examples anymore

} else {
Some(approximation_degree)
};
let out_dag = match two_qubit_unitary_peephole_optimize(&dag, target.into(), approximation) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking: I think we don't need the .into() here, do we?

};
let mut gate_names = qc.count_ops().keys().copied().collect::<Vec<_>>();
gate_names.sort();
assert_eq!(gate_names, vec!["cx", "u"]);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason why we test this here rather than in `test_two_qubit_peephole.c'? This doesn't seem like something we can't cover in the C API testing.

qk_circuit_gate(qc, QkGate_CX, reverse, NULL);
}
}
qk_transpiler_pass_standalone_two_qubit_peephole_optimization(qc, target, 1.0);
Copy link
Copy Markdown
Member

@eliarbel eliarbel Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to cover the NAN branch of approximation_degree as well (this also applies to the dag-based function).

Copy link
Copy Markdown
Member

@ShellyGarion ShellyGarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have some minor comments and questions on the docs and tests.


/// @ingroup QkTranspilerPassesStandalone
/// Run the TwoQubitPeepholeOptimization transpiler pass.
///
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we must have the same documentation twice for both functions? the standalone and the other one?
it's quite lengthy...

};
let operation = PackedOperation::from_unitary(Box::new(gate));
qc.push_packed_operation(operation, None, &[Qubit(0), Qubit(2)], &[])
.unwrap();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps also add to the circuit some parametrized gate as well?

.unwrap();
target
.add_instruction(
PackedOperation::from_standard_gate(StandardGate::CX),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps also test also a parametrized 2-qubit gate in the target?

return RuntimeError;
}

QkTargetEntry *cx_entry = qk_target_entry_new(QkGate_CX);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps it's also worth to check RZZ here? not only CX?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C API Related to the C API Changelog: Added Add an "Added" entry in the GitHub Release changelog.

Projects

Status: Ready

Development

Successfully merging this pull request may close these issues.

5 participants