Skip to content

[numba.md] Update np.random → Generator API#550

Open
Chihiro2000GitHub wants to merge 2 commits into
mainfrom
update-rng-numba
Open

[numba.md] Update np.random → Generator API#550
Chihiro2000GitHub wants to merge 2 commits into
mainfrom
update-rng-numba

Conversation

@Chihiro2000GitHub
Copy link
Copy Markdown
Collaborator

Summary

This PR migrates legacy NumPy random API usage in numba.md as part of QuantEcon/meta#299.

Details

The Numba/JIT-related classification and changes in this PR follow the guidance at https://manual.quantecon.org/styleguide/code.html#numpy-random-number-generation. I would be grateful if you could refer to the Numba section of this page when reviewing.

Case A (update(), main text, @jit): Left unchanged. Although update() is @jit-decorated, it is called from a prange loop in compute_long_run_median_parallel, making it unsafe to pass a shared Generator. Flagged for reviewer judgment.

Case B (speed_ex1 solution, @jit / no parallel): rng = np.random.default_rng() placed before the @jit definition; signature changed to calculate_pi(rng, n=...); np.random.uniformrng.uniform. Call sites updated. Note: passing a Generator into a @jit function may require Numba to use object mode. I checked that the updated code runs as expected on my side, but I would appreciate reviewer confirmation.

Case C (speed_ex2 solution, plain Python / later JIT-compiled via jit(compute_series)): Same pattern as Case B. Signature changed to compute_series(n, rng); np.random.uniformrng.uniform. Both the pure Python and jit-compiled call sites updated.

Case D (numba_ex3 solution, @jit(parallel=True) + prange): Draws lifted outside the prange loop. rng, u_draws, and v_draws defined before the @jit(parallel=True) definition; signature changed to calculate_pi(u_draws, v_draws); loop length derived from len(u_draws). n = 1_000_000 kept to match the original example size.

Case E (numba_ex4 solution, @jit(parallel=True) + prange): Legacy np.random.randn() calls kept intentionally as a narrow memory-constrained exception. Pre-allocating (M, n) shock arrays with M = 10_000_000 and n = 20 would require approximately 3.2 GB, which is likely beyond what most readers' machines can comfortably accommodate. A short comment has been added inside the loop explaining this.

Hi @mmcky and @HumphreyYang, I'd be grateful if you could take a look when you have time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 11, 2026

@github-actions github-actions Bot temporarily deployed to pull request May 11, 2026 22:51 Inactive
Copy link
Copy Markdown
Member

@HumphreyYang HumphreyYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks @Chihiro2000GitHub! It looks good to me. Just one small comment!

Comment thread lectures/numba.md
Comment on lines +643 to 656
n = 1_000_000
rng = np.random.default_rng()
u_draws = rng.uniform(size=n)
v_draws = rng.uniform(size=n)

@jit(parallel=True)
def calculate_pi(n=1_000_000):
def calculate_pi(u_draws, v_draws):
n = len(u_draws)
count = 0
for i in prange(n):
u, v = np.random.uniform(0, 1), np.random.uniform(0, 1)
u, v = u_draws[i], v_draws[i]
d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
if d < 0.5:
count += 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this avoids the thread-safety issue, but the timing comparison is no longer quite apples-to-apples because the random draws have been moved outside the timed function.

Personally, I think we can generate the draws in the first solution by moving

n = 1_000_000
rng = np.random.default_rng()
u_draws = rng.uniform(size=n)
v_draws = rng.uniform(size=n)

to line 457, and modifying the function in the code block starting at line 457 (solution to exercise 1) to take u_draws and v_draws, like the function here.

In this case, we would have a fair timing comparison.

Please let me know what you think!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mmcky, I think this is an example where we need to make a tough choice between keeping the legacy API for simplicity and using the new rng API with some trade-offs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @HumphreyYang I agree with you. I have added some additional notes below regarding double checking nopython mode as well.

@mmcky
Copy link
Copy Markdown
Contributor

mmcky commented May 14, 2026

Thanks @Chihiro2000GitHub for the careful classification and write-up. As @HumphreyYang says most of the PR looks good. A few things to resolve before merging:

1. Generator passed into @jit (Case B, speed_ex1)

I'd like to confirm that this stays in nopython mode before we merge. Numba's support for numpy.random.Generator objects is limited and version-dependent — "runs" isn't the same as "stays in nopython mode," and if it silently falls back, the lecture's claim immediately below the code cell breaks:

If we switch off JIT compilation by removing @jit, the code takes around 150 times as long on our machine.

Could you re-run with @jit(nopython=True) (which raises rather than falls back) and report the with-JIT vs. without-JIT timing? If the 150x figure no longer holds, we should change approach.

2. Apples-to-apples timing (Case D vs Case B) — agree with @HumphreyYang

Humphrey's point on numba_ex3 is right: lifting the draws out of the timed function in Case D but leaving them inside in Case B makes the parallel-vs-serial comparison unfair. I'd suggest applying the "lift draws outside" pattern to speed_ex1 as well.

Two benefits:

  • The timing comparison across the two exercises becomes consistent.
  • It sidesteps issue 1 entirely — the JITed function only sees plain arrays, no Generator object crossing the JIT boundary.

This also gives readers a clean, consistent rule: JIT-compiled hot loops consume pre-drawn arrays.

3. Same concern in Case C (speed_ex2)

compute_series_numba = jit(compute_series) JIT-compiles a function that calls rng.uniform(...) internally. Same nopython-mode question as issue 1. Since rng.uniform(0, 1, size=n) is called once to produce the array U, the cleanest fix is symmetric with issue 2: draw U outside and pass it in as an argument. That keeps the JITed body free of Generator objects.

4. Minor — comment placement in Case E

The comment inside the inner loop explaining why draws are kept inline would read more naturally as a one-liner above the for t in range(n): loop, or in the prose just before the code cell. Not blocking.


Summary of requested changes: confirm nopython mode for Cases B and C (or refactor them to take pre-drawn arrays, which I'd prefer), and adopt the same pre-drawn-array pattern for speed_ex1 so the timing comparison with numba_ex3 is fair. Cases A and E are good as-is.

Thanks again!

Also this is definitely the hardest one I think :-) Thanks for iterating on this.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Chihiro2000GitHub
Copy link
Copy Markdown
Collaborator Author

Chihiro2000GitHub commented May 18, 2026

Thanks for the detailed review, @mmcky and @HumphreyYang! Here is a summary of the changes made and a timing verification.

Changes to numba.md

  • Case B (speed_ex1) — Issues 1 & 2: Pre-draw u_draws and v_draws outside the @jit function and pass them as arguments, matching the pattern already used in numba_ex3. The rng.Generator object no longer crosses the JIT boundary, and the timing comparison with numba_ex3 is now apples-to-apples.
  • Case C (speed_ex2) — Issue 3: Draw U outside compute_series and pass it as an argument, keeping the JIT-compiled body free of Generator objects.
  • Case E (numba_ex4) — Issue 4: Moved the inline comment about pre-allocation trade-offs to the prose just before the code cell.

Timing verification (@jit(nopython=True))

I chose the refactor approach, but also ran @jit(nopython=True) as requested in Issue 1 to confirm nopython mode and check the timing:

Click to expand
"""
Temporary test: verify refactored Cases B and C in numba.md.
- Case B: @jit(nopython=True) timing, old API vs new API
- Case C: @jit(nopython=True) timing, new API only
"""
import numpy as np
import quantecon as qe
from numba import jit

# ---------------------------------------------------------------------------
# Case B: speed_ex1  (pi approximation)
# ---------------------------------------------------------------------------
print("=== Case B: calculate_pi ===")

# --- Old API ---
@jit(nopython=True)
def calculate_pi_old_jit(n=1_000_000):
    count = 0
    for i in range(n):
        u, v = np.random.uniform(0, 1), np.random.uniform(0, 1)
        d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
        if d < 0.5:
            count += 1
    area_estimate = count / n
    return area_estimate * 4

def calculate_pi_old_pure(n=1_000_000):
    count = 0
    for i in range(n):
        u, v = np.random.uniform(0, 1), np.random.uniform(0, 1)
        d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
        if d < 0.5:
            count += 1
    area_estimate = count / n
    return area_estimate * 4

print("  [Old API] Warm-up:")
with qe.Timer():
    calculate_pi_old_jit()

print("  [Old API] Timed @jit(nopython=True):")
with qe.Timer() as t_old_jit:
    calculate_pi_old_jit()

print("  [Old API] Timed pure Python:")
with qe.Timer() as t_old_pure:
    calculate_pi_old_pure()

print(f"  Old API speedup: {t_old_pure.elapsed / t_old_jit.elapsed:.1f}x")

# --- New API ---
n = 1_000_000
rng = np.random.default_rng()
u_draws = rng.uniform(size=n)
v_draws = rng.uniform(size=n)

@jit(nopython=True)
def calculate_pi_new_jit(u_draws, v_draws):
    n = len(u_draws)
    count = 0
    for i in range(n):
        u, v = u_draws[i], v_draws[i]
        d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
        if d < 0.5:
            count += 1
    area_estimate = count / n
    return area_estimate * 4

def calculate_pi_new_pure(u_draws, v_draws):
    n = len(u_draws)
    count = 0
    for i in range(n):
        u, v = u_draws[i], v_draws[i]
        d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
        if d < 0.5:
            count += 1
    area_estimate = count / n
    return area_estimate * 4

print("  [New API] Warm-up:")
with qe.Timer():
    calculate_pi_new_jit(u_draws, v_draws)

print("  [New API] Timed @jit(nopython=True):")
with qe.Timer() as t_new_jit:
    calculate_pi_new_jit(u_draws, v_draws)

print("  [New API] Timed pure Python:")
with qe.Timer() as t_new_pure:
    calculate_pi_new_pure(u_draws, v_draws)

print(f"  New API speedup: {t_new_pure.elapsed / t_new_jit.elapsed:.1f}x")

# ---------------------------------------------------------------------------
# Case C: speed_ex2  (Markov chain simulation)
# ---------------------------------------------------------------------------
print("\n=== Case C: compute_series ===")

p, q = 0.1, 0.2
n = 1_000_000
rng = np.random.default_rng()
U = rng.uniform(0, 1, size=n)

def compute_series(n, U):
    x = np.empty(n, dtype=np.int64)
    x[0] = 1
    for t in range(1, n):
        current_x = x[t-1]
        if current_x == 0:
            x[t] = U[t] < p
        else:
            x[t] = U[t] > q
    return x

x = compute_series(n, U)
print(np.mean(x == 0))

with qe.Timer() as t_c_pure:
    compute_series(n, U)

print(f"  Pure Python time: {t_c_pure.elapsed:.4f}s")

compute_series_numba = jit(compute_series, nopython=True)

x = compute_series_numba(n, U)
print(np.mean(x == 0))

with qe.Timer() as t_c_jit:
    compute_series_numba(n, U)

print(f"  JIT time:         {t_c_jit.elapsed:.4f}s")

Results on my machine (speed_ex1 / Case B):

Speedup
Old API (np.random.uniform inside loop), @jit(nopython=True) vs pure ~150x
New API (pre-drawn arrays), @jit(nopython=True) vs pure ~400–700x (variable across runs)

Both Cases B and C compiled successfully under @jit(nopython=True) — no object-mode fallback.

Question about "~150 times"

The old API gives a speedup consistent with the existing "around 150 times" claim. However, after the refactor (pre-drawn arrays), the speedup is considerably larger, so the current text no longer holds. Would you like me to update that line? Options I see:

  1. Replace with "significantly faster"
  2. Replace with "hundreds of times faster"
  3. Something else you prefer

You can also reproduce these results by running the code above.

One more question: numba_ex3 timing claim

The solution to numba_ex3 states: "On our workstation, we find that parallelization increases execution speed by a factor of 2 or 3." Since this PR also modified the numba_ex3 solution to use pre-drawn arrays, should we re-verify this figure under the new code as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants