Skip to content

Replace specific loops with Boolean masking and vectorized assignments#8045

Open
mhucka wants to merge 4 commits intoquantumlib:mainfrom
mhucka:use-np-eye
Open

Replace specific loops with Boolean masking and vectorized assignments#8045
mhucka wants to merge 4 commits intoquantumlib:mainfrom
mhucka:use-np-eye

Conversation

@mhucka
Copy link
Copy Markdown
Contributor

@mhucka mhucka commented Apr 17, 2026

This amounts to a simple bit of micro-optimization. With the help of Gemini, I looked for cases where a specific type of matrix assignment was implemented in a for loop that could be rewritten to use a Boolean mask and vectorized assignment. There were a few cases, and that's what this PR is about.

This amounts to a simple bit of micro-optimization. With the help of
Gemini, I looked for cases where some simple loops could be rewriten to
use NumPy's `eye()` operator if appropriate. There were a few cases, and
that's what this PR is about.
@mhucka mhucka requested review from a team and vtomole as code owners April 17, 2026 04:12
@mhucka mhucka requested a review from tanujkhattar April 17, 2026 04:12
@github-actions github-actions Bot added the size: S 10< lines changed <50 label Apr 17, 2026
@mhucka mhucka changed the title Use NumPy eye() for some simple optimizations Replace some simple loops with NumPy eye() if possible Apr 17, 2026
@mhucka mhucka marked this pull request as draft April 17, 2026 04:12
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.63%. Comparing base (89993a9) to head (47c7651).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8045      +/-   ##
==========================================
- Coverage   99.63%   99.63%   -0.01%     
==========================================
  Files        1110     1110              
  Lines       99795    99787       -8     
==========================================
- Hits        99435    99425      -10     
- Misses        360      362       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mhucka mhucka changed the title Replace some simple loops with NumPy eye() if possible Replace specific loops with Boolean masking and vectorized assignments Apr 21, 2026
@mhucka mhucka marked this pull request as ready for review April 21, 2026 03:05
@mhucka mhucka requested a review from NoureldinYosri April 21, 2026 03:05
matrix = np.copy(matrix)
for i in range(min(matrix.shape)):
matrix[i, i] = 0
matrix[np.eye(*matrix.shape, dtype=np.bool)] = 0
Copy link
Copy Markdown
Collaborator

@pavoljuhas pavoljuhas Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes assignment cost from O(N) to O(N^2) and allocates an extra N^2 sized temporary matrix. Note that matrix gets copied one more time in all_near_zero.

Here is a version that does away with one matrix copy w/r to the initial code:

    absolute = np.abs(matrix)
    np.fill_diagonal(absolute, 0)
    return tolerance.near_zero(np.max(absolute, initial=0), atol=atol)

And here are timings for original, PR, and suggestion:

In [1]: a = np.eye(2048, dtype=float)
In [2]: %timeit is_diagonal(a)
17.9 ms ± 113 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [3]: %timeit is_diagonal1(a)
18.4 ms ± 21.8 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [4]: %timeit is_diagonal2(a)
8.9 ms ± 38.2 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

for i in range(width):
result[(i + shift) % width, i] = 1
return result
return np.roll(np.eye(width), shift, axis=0)
Copy link
Copy Markdown
Collaborator

@pavoljuhas pavoljuhas Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NVM this is a test helper function so its performance does not matter much.


This has to copy N^2 values (compared to N assignments before) and allocate 2 arrays instead of 1. It is about 6 times slower than the original. Here is a better way of vectorizing Python loop:

Suggested change
return np.roll(np.eye(width), shift, axis=0)
result = np.zeros(width * width)
shift = shift % width if width else 0
# lower diagonal starting at the first column
result[shift * width :: width + 1].fill(1)
# upper diagonal starting at the first row
if shift:
result[width - shift : shift * width : width + 1].fill(1)
return result.reshape((width, width))

Again, timings for the original, PR, and suggestion:

%timeit shift_matrix(8, 3)

1.46 μs ± 66.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
8.13 μs ± 305 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.1 μs ± 26.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

%timeit shift_matrix(1024, 3)

475 μs ± 4.82 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
3.69 ms ± 8.34 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
298 μs ± 1.85 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

@pavoljuhas pavoljuhas self-assigned this Apr 24, 2026
Comment on lines +105 to +106
N = 2**self._num_qubits
return self._state[None, :, None] * np.eye(N, dtype=self._state.dtype)[:, None, :]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocates an extra temporary matrix of N^2 elements which gets multiplied N times. The overall complexity is increased from O(N^2) to O(N^3) with over 50% slowdown.

Here is a version which is comparable and in some cases faster than the original - depending on num_qubits:

        N = 2**self._num_qubits
        operator = np.zeros(shape=(N, N, N), dtype=self._state.dtype)
        idx = np.arange(N)
        operator[idx, :, idx] = self._state
        return operator

Here are timings for original, PR, suggestion:

# num_qubits = 2, N = 4
2.17 μs ± 67.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.51 μs ± 96.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.29 μs ± 43.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# num_qubits = 4, N = 16
6.95 μs ± 189 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
10.8 μs ± 114 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.48 μs ± 206 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

new_xs = np.zeros((2 * self.n + 1, self.n), dtype=bool)
for i in range(self.n):
new_xs[i, i] = True
new_xs[: self.n, : self.n] = np.eye(self.n, dtype=bool)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This increases number of assignments from N to N^2, but runs still faster for a bit array. That said, np.fill_diagonal does the job with O(N) complexity

        np.fill_diagonal(new_xs[: self.n, :], True)

Timing for 20 qubits for original, PR, suggestion:

4.01 μs ± 98.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.5 μs ± 29 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.57 μs ± 7.33 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

new_zs = np.zeros((2 * self.n + 1, self.n), dtype=bool)
for i in range(self.n):
new_zs[self.n + i, i] = True
new_zs[self.n : 2 * self.n, : self.n] = np.eye(self.n, dtype=bool)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
new_zs[self.n : 2 * self.n, : self.n] = np.eye(self.n, dtype=bool)
np.fill_diagonal(new_zs[self.n : 2 * self.n, :], True)

Copy link
Copy Markdown
Collaborator

@pavoljuhas pavoljuhas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLMs seems very capable of making suggestions that look good at a first sight, but in reality make things worse. We should be careful about quantifying any supposed code optimizations and only accept them if they are truly change for better.

Before doing such timings, we should also consider if the modified code is speed critical. If not the changes are probably not worth the effort, unless they make the code clearly easier to read which is again not much of a strength of LLMs.

@pavoljuhas pavoljuhas added the ci/no-release Use this label for pull request that should not have Cirq pre-release on PyPI. label Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/performance ci/no-release Use this label for pull request that should not have Cirq pre-release on PyPI. size: S 10< lines changed <50

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants