From 13177ec477d085df298cce880a4ece5c5491923d Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Tue, 19 May 2026 11:53:03 -0400
Subject: [PATCH 1/5] PR-C: PreTrendsPower R `pretrends` parity goldens
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Closes the deferred R-parity row from PR-B (PreTrendsPower implementation
audit, Roth 2022). Generates JSON goldens at
`benchmarks/data/r_pretrends_golden.json` from the committed R script
against `jonathandroth/pretrends` commit `122731d082` (package version
0.1.0, R 4.5.2), and activates `TestPretrendsParityR` in
`tests/test_methodology_pretrends.py`.

Four fixtures (regular K=3, irregular K=3 `[-5,-3,-1]`, anticipation-shifted
K=4, K=1 closed form) × NIS power + γ_p MDV at `atol=1e-4`. K=1 also
asserts three-way cross-check against Roth Proposition 2's analytical
truncated-normal expression `1 - Φ(z - γ/σ) + Φ(-z - γ/σ)` at `atol=1e-7`
(Python/closed-form effectively exact; both within `atol=1e-4` of R).

End-to-end irregular-grid `fit()` parity test exercises the full
`fit() → _extract_pre_period_params → _get_violation_weights → _compute_mdv_nis`
chain through the public API, locking PR-B Step 4's γ-unit linear-weight
fix.

Tolerance rationale: R hardcodes `thresholdTstat.Pretest=1.96` while Python
uses `scipy.stats.norm.ppf(0.975) = 1.959963984540054`; R `slope_for_power`
uses `uniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)` vs Python
`brentq(xtol=2e-12)`; the inverse-solver tolerance gap dominates γ_p, and
`mvtnorm::pmvnorm` vs `scipy.stats.multivariate_normal.cdf` Genz-Bretz
randomized-lattice differences bound the K=4 NIS power gap at ~5e-5.

Cross-surface tracker promotion:
- `METHODOLOGY_REVIEW.md` PreTrendsPower row: `**Complete** (R parity
  pending)` → `**Complete**`; Last Review `2026-05-18` → `2026-05-19`.
- REGISTRY.md Requirements checklist: `[x] R \`pretrends\` parity at
  commit \`122731d082\` (PR-C, 2026-05-19)`.
- `roth-2022-review.md` "R `pretrends` package version pin (provisional)"
  Gaps bullet struck.
- `TODO.md` PR-C row deleted.
- CHANGELOG.md `[Unreleased]` Added entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md                                |   1 +
 METHODOLOGY_REVIEW.md                       |  13 +-
 TODO.md                                     |   4 +-
 benchmarks/R/generate_pretrends_golden.R    | 168 +++++++-------
 benchmarks/data/r_pretrends_golden.json     | 106 +++++++++
 docs/methodology/REGISTRY.md                |   4 +-
 docs/methodology/papers/roth-2022-review.md |   7 -
 tests/test_methodology_pretrends.py         | 236 +++++++++++++++++---
 8 files changed, 410 insertions(+), 129 deletions(-)
 create mode 100644 benchmarks/data/r_pretrends_golden.json

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 7679075a..dae1a561 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [3.4.0] - 2026-05-19
 
 ### Added
+- **PreTrendsPower R `pretrends` parity goldens (PR-C closes PR-B's deferred R-parity row).** JSON goldens at `benchmarks/data/r_pretrends_golden.json` generated from the committed `benchmarks/R/generate_pretrends_golden.R` script against `jonathandroth/pretrends` commit `122731d082` (package version 0.1.0, R 4.5.2). 4 fixtures cover regular K=3 grid (`uniform_3_pre_periods_no_anticipation`), irregular K=3 grid `[-5,-3,-1]` (`irregular_pre_periods` — locks the PR-B Step 4 γ-unit linear-weight fix), anticipation-shifted K=4 grid (`anticipation_shifted`), and K=1 closed form (`single_pre_period_closed_form` — Roth Proposition 2 univariate truncated-normal). `TestPretrendsParityR` in `tests/test_methodology_pretrends.py` now active (4 tests): NIS power vs R `pretrends::pretrends()` at `atol=1e-4` across all 4 fixtures × 4 γ values; γ_p MDV vs R `slope_for_power()` at `atol=1e-4` across all 4 fixtures × 2 target_power values; end-to-end `fit()` on irregular grid vs R γ_p at `atol=1e-4` (locks the full `fit() → _extract_pre_period_params → _get_violation_weights → _compute_mdv_nis` chain through the public API); K=1 three-way cross-check (Python ≡ analytical truncated-normal closed form `1 - Φ(z - γ/σ) + Φ(-z - γ/σ)` at `atol=1e-7`; both within `atol=1e-4` of R). Tolerance rationale: R hardcodes `thresholdTstat.Pretest=1.96` while Python uses `scipy.stats.norm.ppf(0.975) = 1.959963984540054` (`dz ≈ 3.6e-5`); R `slope_for_power` uses `uniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)` versus Python `brentq(xtol=2e-12)`; the inverse-solver tolerance gap dominates γ_p, and `mvtnorm::pmvnorm` (R) vs `scipy.stats.multivariate_normal.cdf` (Python) Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5. `METHODOLOGY_REVIEW.md` PreTrendsPower row promoted `**Complete** (R parity pending)` → `**Complete**`. Roth (2022) paper review's `R \`pretrends\` package version pin (provisional)` Gaps bullet struck. Closes the PR-C TODO row.
 - **Agent-discoverability contract test (`tests/test_agent_discoverability.py`).** New static-snapshot test pinning the agent-facing surface introduced by PR #464: `__all__` membership of `agent_workflow` / `profile_panel` / `get_llm_guide` / `practitioner_next_steps` / `BusinessReport`; `dir(diff_diff)` head-first ordering against `_AGENT_FACING_ORDER` (catches drift in the `_OrderedName` `__lt__` ordering trick); `_OrderedName` `isinstance(_, str)` + str-method compatibility; `dir()` full-namespace + `inspect.getmembers` parity; top-level `__doc__` first-paragraph mention of `agent_workflow` + named references to the 5-step workflow primitives; `agent_workflow()` script content references each downstream helper by name; canonical estimator class names (CallawaySantAnna, ContinuousDiD, HeterogeneousAdoptionDiD, etc.) remain importable. No live API calls; runs in the default pytest suite. Closes [issue #461](https://github.com/igerber/diff-diff/issues/461) (snapshot variant — live-agent regression test deferred to a separate follow-up that depends on causal-llm-eval packaging its harness). Also closes the `__dir__()` contract-test row from `TODO.md` that PR #464 deferred here.
 - **`diff_diff.agent_workflow(df, unit=..., time=..., treatment=..., outcome=...)` — stateless orchestrator for LLM-agent discoverability** (`diff_diff/agent_workflow.py`). Prints (and returns as dict) a copy-pasteable 5-step workflow with the caller's column names templated in: `profile_panel` → `get_llm_guide("autonomous")` → `<Estimator>(...).fit(df, ...)` → `practitioner_next_steps(result)` → `BusinessReport(result).full_report()`. The function calls nothing internally and does not inspect `df`; it is a guided tour, not a router. Surfaces the canonical workflow primitives (`profile_panel`, `get_llm_guide`, `practitioner_next_steps`, `BusinessReport`) that cold-start agent dry-passes at [igerber/causal-llm-eval](https://github.com/igerber/causal-llm-eval) showed agents practically never reach for on their own. Output structure: `{"profile_call", "guide_call", "fit_candidates", "validation_calls", "reporting_call", "script"}`; `fit_candidates` is a flat list of estimator/diagnostic class names referenced in the workflow patterns (each must remain importable on `diff_diff`, locked by `tests/test_agent_workflow.py::test_fit_candidates_all_importable`). Closes [issue #460](https://github.com/igerber/diff-diff/issues/460).
 - **Top-level `__doc__` rewritten to lead with the agent workflow** (`diff_diff/__init__.py`). `help(diff_diff)` now opens with the `agent_workflow(df, ...)` recommendation as the first non-blank paragraph; `get_llm_guide("full")` and `get_llm_guide("practitioner")` pointers preserved for the existing `tests/test_guides.py::test_module_docstring_mentions_helper` guard.
diff --git a/METHODOLOGY_REVIEW.md b/METHODOLOGY_REVIEW.md
index e61ca04e..133f24ef 100644
--- a/METHODOLOGY_REVIEW.md
+++ b/METHODOLOGY_REVIEW.md
@@ -80,7 +80,7 @@ The catalog grew incrementally over several quarters, so formats vary across the
 |------|--------|-------------|--------|-------------|
 | BaconDecomposition | `bacon.py` | `bacondecomp::bacon()` | **Complete** | 2026-05-16 |
 | HonestDiD | `honest_did.py` | `HonestDiD` package | **Complete** | 2026-04-01 |
-| PreTrendsPower | `pretrends.py` | `pretrends` package | **Complete** (R parity pending) | 2026-05-18 |
+| PreTrendsPower | `pretrends.py` | `pretrends` package | **Complete** | 2026-05-19 |
 | PowerAnalysis | `power.py` | `pwr` / `DeclareDesign` | **In Progress** | — |
 | PlaceboTests | `diagnostics.py` | (no canonical reference) | **In Progress** | — |
 
@@ -1047,14 +1047,15 @@ and covariate-adjusted specifications.)
 | Module | `pretrends.py` |
 | Primary Reference | Roth (2022), *Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends*, AER:I 4(3), 305-322 |
 | R Reference | `pretrends` package |
-| Status | **Complete** (R parity pending) |
-| Last Review | 2026-05-18 |
+| Status | **Complete** |
+| Last Review | 2026-05-19 |
 
 **Documentation in place:**
 - REGISTRY.md section: `## PreTrendsPower` — NIS-framed audit per Roth (2022) Section II.A-B with full equation blocks for both NIS and Wald forms; paper-supported alternative + γ-unit MDV + full-Σ_22 routing all locked.
 - Paper review on file: `docs/methodology/papers/roth-2022-review.md` (added 2026-05-17 via PR #463).
 - Implementation: `tests/test_pretrends.py` (67 tests — point-estimator, MDV, power curve, sensitivity, plus the PR-A R18 silent-failure regression and the PR-B custom-weight persistence regression) + event-study coverage in `tests/test_pretrends_event_study.py` (27 tests).
-- Dedicated `tests/test_methodology_pretrends.py` (added 2026-05-18 in PR-B Step 7) — Roth (2022) Section II.A-B paper-equation-numbered Verified Components walk-through (8 classes, 30-40 tests covering NIS box probability, Wald-vs-NIS, Propositions 1-4 simulation parity, linear-units γ-scale, custom-weight persistence, CS/SA full-VCV, helper API).
+- Dedicated `tests/test_methodology_pretrends.py` (added 2026-05-18 in PR-B Step 7; PR-C 2026-05-19 activated `TestPretrendsParityR` with 4 concrete tests) — Roth (2022) Section II.A-B paper-equation-numbered Verified Components walk-through (8 classes covering NIS box probability, Wald-vs-NIS, Propositions 1-4 simulation parity, linear-units γ-scale, custom-weight persistence, CS/SA full-VCV, helper API, R parity at commit `122731d082`).
+- R parity goldens: `benchmarks/data/r_pretrends_golden.json` generated by `benchmarks/R/generate_pretrends_golden.R` against `jonathandroth/pretrends` commit `122731d082` (package version 0.1.0); 4 fixtures (regular K=3, irregular K=3 `[-5,-3,-1]`, anticipation-shifted K=4, K=1 closed form) × NIS power + γ_p MDV at `atol=1e-4`.
 
 **Verified Components:**
 - [x] NIS box probability implemented via `scipy.stats.multivariate_normal.cdf` (Roth Section II.A-B primary form)
@@ -1067,9 +1068,7 @@ and covariate-adjusted specifications.)
 - [x] `PreTrendsPowerResults` persists fitted `violation_weights` + `pretest_form` + `nis_box_probability`; `power_at(M)` works for all four violation types on fresh fits
 - [x] Helper API (`compute_pretrends_power`, `compute_mdv`) accepts `violation_weights` and `pretest_form`; closes the PR-A R18 helper/class API gap
 - [x] Summary, `to_dict`, `to_dataframe` dispatch on `pretest_form` (NIS prints box probability; Wald prints noncentrality)
-
-**Outstanding for promotion to fully Complete:**
-- R parity fixture against the `pretrends` R package at a **pinned revision** (deferred to PR-C). The generator script `benchmarks/R/generate_pretrends_golden.R` is committed in PR-B with a placeholder commit reference; PR-C will install the package, generate the JSON goldens at `benchmarks/data/r_pretrends_golden.json`, activate `TestPretrendsParityR` (currently skips when goldens missing), and record the audited R-package revision. Until that lands, the R-package surface claims in `docs/methodology/papers/roth-2022-review.md` Gaps section remain provisional.
+- [x] R `pretrends` parity at commit `122731d082` (PR-C, 2026-05-19) — 4 fixtures × NIS power + γ_p MDV at `atol=1e-4`; `tests/test_methodology_pretrends.py::TestPretrendsParityR` active
 
 ---
 
diff --git a/TODO.md b/TODO.md
index 83f35525..1a32b34d 100644
--- a/TODO.md
+++ b/TODO.md
@@ -94,8 +94,8 @@ Deferred items from PR reviews that were not addressed before merge.
 | WooldridgeDiD: aggregation weights use cell-level n_{g,t} counts. Paper (W2025 Eqs. 7.2-7.4) defines cohort-share weights. Add optional `weights="cohort_share"` parameter to `aggregate()`. | `wooldridge_results.py` | #216 | Medium |
 | WooldridgeDiD: optional *efficiency hint* (NOT a canonical-link violation per W2023 Prop 3.1) when method/outcome pairing is sub-optimal — e.g., `method="ols"` on binary data is consistent under QMLE, but `method="logit"` is typically more efficient. The original framing in this row as a "canonical link requirement" tied to Prop 3.1 was incorrect: Wooldridge (2023) Table 1 lists Gaussian/OLS for "any response" and logistic-Bernoulli for "binary OR fractional". A useful hint exists (efficiency), but should not be framed as a methodology violation. See PR #453 R1 review for the corrected reading. | `wooldridge.py` | #216 | Low |
 | WooldridgeDiD: Stata `jwdid` golden value tests — add R/Stata reference script and `TestReferenceValues` class. | `tests/test_wooldridge.py` | #216 | Medium |
-| PreTrendsPower R parity goldens (PR-C): pin the R `pretrends` package commit/release, run `benchmarks/R/generate_pretrends_golden.R` (committed in PR-B), commit the JSON goldens at `benchmarks/data/r_pretrends_golden.json`, activate the `TestPretrendsParityR` class in `tests/test_methodology_pretrends.py` (currently skips when goldens missing), and flip the METHODOLOGY_REVIEW.md `PreTrendsPower` row from `**Complete** (R parity pending)` → `**Complete**`. Until that lands, the R-package surface claims in `docs/methodology/papers/roth-2022-review.md` remain provisional. | `benchmarks/R/generate_pretrends_golden.R`, `benchmarks/data/r_pretrends_golden.json` (new), `tests/test_methodology_pretrends.py::TestPretrendsParityR`, `METHODOLOGY_REVIEW.md` (PreTrendsPower row) | PR-C (PreTrendsPower R parity) | Low |
-<!-- The remaining four PR-A-tagged PreTrendsPower rows (CS/SA Σ_22 fidelity, helper `violation_weights`, custom-weight persistence, linear γ-unit MDV) were all resolved in PR-B 2026-05-18 — see CHANGELOG.md [Unreleased] Added/Changed/Fixed entries for the new behavior. -->
+<!-- The PreTrendsPower R parity row (PR-C, 2026-05-19) and the four PR-A-tagged PreTrendsPower rows (CS/SA Σ_22 fidelity, helper `violation_weights`, custom-weight persistence, linear γ-unit MDV; resolved in PR-B 2026-05-18) are all closed — see CHANGELOG.md [Unreleased] Added/Changed/Fixed entries for the new behavior. -->
+
 
 | Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 8 standalone estimators that expose `cluster=`: `CallawaySantAnna`, `SunAbraham`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added `vcov_type` to the `DifferenceInDifferences` inheritance chain only. | multiple | Phase 1a | Medium |
 | Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster) currently raises `NotImplementedError`. `_compute_bm_dof_from_contrasts` builds its hat matrix from the unscaled design via `X (X'WX)^{-1} X' W`, but `solve_ols` solves the WLS problem by transforming to `X* = sqrt(w) X`, so the correct symmetric idempotent residual-maker is `M* = I - sqrt(W) X (X'WX)^{-1} X' sqrt(W)`. Rederive the Satterthwaite `(tr G)^2 / tr(G^2)` ratio on the transformed design and add weighted parity tests before lifting the guard. | `linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_validate_vcov_args` | Phase 1a | Medium |
diff --git a/benchmarks/R/generate_pretrends_golden.R b/benchmarks/R/generate_pretrends_golden.R
index 78af170e..6721ab94 100644
--- a/benchmarks/R/generate_pretrends_golden.R
+++ b/benchmarks/R/generate_pretrends_golden.R
@@ -1,47 +1,48 @@
 #!/usr/bin/env Rscript
 # Generate R `pretrends` parity goldens for diff-diff PreTrendsPower (PR-C).
 #
-# This script is committed in PR-B (PreTrendsPower implementation audit,
-# Roth 2022); the JSON goldens at ../data/r_pretrends_golden.json are
-# DEFERRED to PR-C. Running this script writes the JSON to that path; PR-C
-# pins the R `pretrends` package commit / release, runs this script, and
-# commits the resulting JSON to land the parity tests.
+# Script committed in PR-B; JSON goldens generated and committed in PR-C
+# against `jonathandroth/pretrends` commit `122731d082` (package version
+# 0.1.0). Running this script writes ../data/r_pretrends_golden.json.
 #
 # Requires:
 #   - R 4.4+ (tested on 4.5.2)
 #   - install.packages("remotes")
-#   - remotes::install_github("jonathandroth/pretrends", ref = "<PR-C-PIN>")
-#   - install.packages("jsonlite")
-#
-# **R `pretrends` commit pin (TODO — PR-C):** the audited revision MUST be
-# recorded here before parity assertions are committed. As of 2026-05-18
-# (PR-B implementation date) the script targets the default `main` branch
-# at https://github.com/jonathandroth/pretrends with no pin. PR-C will
-# replace `<PR-C-PIN>` with the exact commit hash AND verify the surface
-# claims documented in REGISTRY.md `## PreTrendsPower` and the paper
-# review's "R `pretrends` package version pin (provisional)" Gaps bullet.
+#   - remotes::install_github("jonathandroth/pretrends", ref = "122731d082")
+#   - install.packages(c("jsonlite", "MASS"))
 #
 # Output: ../data/r_pretrends_golden.json
 #
 # diff-diff PreTrendsPower with `pretest_form='nis'` (the new default per
-# PR-B Step 2) is expected to match the values in this JSON at atol=1e-6
-# along a three-tier contract:
-#   (1) NIS box probability `P(β̂_pre ∈ B_NIS(Σ))` at fixed M values on
-#       all 3 fixtures;
-#   (2) MDV / gamma_p (slope at target power 0.5 and 0.8) on regular and
-#       irregular pre-period grids;
-#   (3) γ-unit MDV invariance: PR-B's "skip L2 norm for linear with
-#       relative_times" path produces MDV in Roth's γ units exactly,
-#       matching R's `slope_for_power()` which also reports γ.
+# PR-B Step 2) matches the values in this JSON along a three-tier contract:
+#   (1) NIS box probability `P(beta_hat_pre in B_NIS(Sigma))` at fixed gamma
+#       values on all 4 fixtures, at atol=1e-5. R hardcodes thresholdTstat
+#       = 1.96 while Python uses scipy.stats.norm.ppf(0.975) =
+#       1.959963984540054; the ~4e-5 dz gap propagates to a ~1e-6 ceiling on
+#       the box probability — 1e-5 is the realistic atol.
+#   (2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular, irregular,
+#       anticipation, and K=1 grids, at atol=1e-4. R uniroot defaults to
+#       tol = .Machine$double.eps^0.25 ~= 1.22e-4 vs Python brentq xtol=2e-12;
+#       the inverse-solver tolerance gap dominates, so 1e-4 is the realistic
+#       atol without tightening either solver.
+#   (3) gamma-unit MDV invariance: PR-B's "skip L2 norm for linear with
+#       relative_times" path produces MDV in Roth's gamma units exactly,
+#       matching R's `slope_for_power()` which also reports gamma. Fixture 2
+#       (irregular grid {-5, -3, -1}) and the end-to-end fit() test in
+#       tests/test_methodology_pretrends.py lock this.
 #
-# Three fixtures (matched to test_methodology_pretrends.py expectations):
-#   1. uniform_3_pre_periods_no_anticipation — K=3 regular grid (t ∈ {-3, -2, -1}),
+# Four fixtures (matched to test_methodology_pretrends.py expectations):
+#   1. uniform_3_pre_periods_no_anticipation — K=3 regular grid (t in {-3, -2, -1}),
 #      never-treated control. Default-case parity baseline.
 #   2. irregular_pre_periods — K=3 with relative_times = [-5, -3, -1].
-#      Exercises the PR-B γ-unit linear-pattern fix.
+#      Exercises the PR-B gamma-unit linear-pattern fix end-to-end.
 #   3. anticipation_shifted — K=4 with anticipation=1 (pre-cutoff at t<-1,
 #      so pre-periods are {-5, -4, -3, -2}). Verifies the pre-period filter
 #      logic in `_extract_pre_period_params`.
+#   4. single_pre_period_closed_form — K=1 with diagonal Sigma = 0.25*I
+#      (Roth Proposition 2 univariate truncated-normal closed form). Locks
+#      the scalar fast-path against R AND against the analytical expression
+#      `1 - Phi(z - gamma/sigma) + Phi(-z - gamma/sigma)`.
 #
 # Run:
 #   cd benchmarks/R && Rscript generate_pretrends_golden.R
@@ -53,9 +54,11 @@ suppressPackageStartupMessages({
 
 stopifnot(packageVersion("pretrends") >= "0.1.0")
 
+PRETRENDS_COMMIT <- "122731d082"
+
 # ---------------------------------------------------------------------------
 # DGP helper: build a synthetic event-study coefficient vector + VCV under a
-# stylized null DGP (β = 0, Σ_22 ~ correlated). Mirrors the simulation
+# stylized null DGP (beta = 0, Sigma_22 ~ correlated). Mirrors the simulation
 # fixtures in test_methodology_pretrends.py.
 # ---------------------------------------------------------------------------
 
@@ -66,8 +69,9 @@ build_event_study_fixture <- function(
   rho = 0.3,
   seed = 42L
 ) {
-  # Generate a correlated equicorrelation Σ across all (pre + post) periods.
-  # Realized β̂ drawn from N(0, Σ) — null DGP, no real treatment effect.
+  # Generate a correlated equicorrelation Sigma across all (pre + post) periods.
+  # Realized beta_hat drawn from N(0, Sigma) — null DGP, no real treatment
+  # effect.
   set.seed(seed)
   all_periods <- c(pre_periods, post_periods)
   K_total <- length(all_periods)
@@ -95,32 +99,36 @@ extract_pretrends <- function(fixture_data, fixture_name) {
   all_periods <- fixture_data$all_periods
 
   # R `pretrends` expects: betahat (coefficient vector), sigma (VCV matrix),
-  # tVec (relative-time labels including the reference period 0, omitted
-  # from betahat / sigma per convention), referencePeriod = 0, alpha = 0.05.
-
-  # The `slopes_for_power` helper returns gamma values at target power.
-  # For the three-tier parity contract, we capture both NIS power at a fixed
-  # slope and the inverse (γ_p MDV) at target power 0.5 and 0.8.
-
-  # NIS power at fixed gamma values (for tier-1 parity):
+  # tVec (relative-time labels excluding the reference period 0),
+  # referencePeriod = 0, and deltatrue (length-K_total hypothesized delta
+  # vector — only the pre-period entries are used for the rejection
+  # probability).
+
+  # Tier 1: NIS power at fixed gamma values.
+  # Build delta_pre = gamma * pre_periods per Roth's slope convention
+  # delta_t = gamma * t (t < 0 for pre-periods, so delta_pre is negative;
+  # the NIS box is symmetric, so the sign does not affect the rejection
+  # probability).
   gamma_test_values <- c(0.0, 0.2, 0.5, 1.0)
   power_values <- sapply(gamma_test_values, function(g) {
-    # Build δ = γ * |t| for pre-periods (Roth's δ_t = γ·t convention,
-    # using |t| since pre-period t < 0).
-    delta_pre <- g * abs(pre_periods)
-    # `pretrends` package: pretrends() with explicit delta vector.
-    # The exact R API: pretrends(betahat, sigma, tVec, referencePeriod,
-    #                            deltahypothesis, ...).
-    # PR-C: replace this stub with the actual R pretrends() call and
-    # extract the rejection probability.
-    NA_real_  # PR-C will populate
+    deltatrue_full <- rep(0, length(all_periods))
+    deltatrue_full[seq_along(pre_periods)] <- g * pre_periods
+    res <- pretrends(betahat = beta_hat,
+                     sigma = Sigma,
+                     deltatrue = deltatrue_full,
+                     tVec = all_periods,
+                     referencePeriod = 0)
+    as.numeric(res$df_power$Power)
   })
 
-  # γ_p MDV: solve for γ such that NIS rejection probability = target power.
-  # R `slope_for_power(betahat, sigma, tVec, referencePeriod, power)`.
+  # Tier 2: gamma_p MDV at target powers 0.5 and 0.8.
+  # R `slope_for_power()` solves uniroot on the slope -> rejection-probability
+  # map; returns gamma in the same units as Roth's slope convention.
   gamma_p_values <- sapply(c(0.5, 0.8), function(p) {
-    # PR-C: replace with actual R slope_for_power() call.
-    NA_real_
+    as.numeric(slope_for_power(sigma = Sigma,
+                               targetPower = p,
+                               tVec = all_periods,
+                               referencePeriod = 0))
   })
 
   list(
@@ -156,10 +164,11 @@ f1 <- build_event_study_fixture(
 fixture_1 <- extract_pretrends(f1, "uniform_3_pre_periods_no_anticipation")
 
 cat("Building fixture 2: irregular_pre_periods...\n")
-# K=3 with t ∈ {-5, -3, -1}. Tests PR-B's γ-unit linear-pattern fix:
-# pre-PR-B Python with normalized count-based weights would silently report
-# MDV in [0.45, 0.30, 0.15] / sqrt(0.3) units, not γ. R `slope_for_power()`
-# always reports γ; Python's PR-B Step 4 makes the two match at atol=1e-6.
+# K=3 with t in {-5, -3, -1}. Tests PR-B's gamma-unit linear-pattern fix:
+# pre-PR-B Python with normalized count-based weights silently reported
+# MDV in [0.45, 0.30, 0.15] / sqrt(0.3) units, not gamma. R
+# `slope_for_power()` always reports gamma; Python's PR-B Step 4 makes the
+# two match at atol=1e-4.
 f2 <- build_event_study_fixture(
   pre_periods = c(-5L, -3L, -1L),
   post_periods = c(1L, 2L, 3L),
@@ -172,12 +181,25 @@ cat("Building fixture 3: anticipation_shifted...\n")
 # so the {-5, -4, -3, -2} cells are the genuine pre-periods; t=-1 is the
 # anticipation window. Tests the pre-period filtering logic.
 f3 <- build_event_study_fixture(
-  pre_periods = c(-5L, -4L, -3L, -2L),  # genuine pre-periods (cutoff = -1)
+  pre_periods = c(-5L, -4L, -3L, -2L),
   post_periods = c(1L, 2L, 3L),
   seed = 303L
 )
 fixture_3 <- extract_pretrends(f3, "anticipation_shifted")
 
+cat("Building fixture 4: single_pre_period_closed_form...\n")
+# K=1 with diagonal Sigma = 0.25*I. Locks Roth Proposition 2 univariate
+# truncated-normal closed form against R AND the analytical scalar
+# expression `1 - Phi(z - gamma/sigma) + Phi(-z - gamma/sigma)`.
+f4 <- build_event_study_fixture(
+  pre_periods = c(-1L),
+  post_periods = c(1L),
+  sigma2 = 0.25,
+  rho = 0.0,
+  seed = 404L
+)
+fixture_4 <- extract_pretrends(f4, "single_pre_period_closed_form")
+
 # ---------------------------------------------------------------------------
 # Write JSON
 # ---------------------------------------------------------------------------
@@ -186,38 +208,30 @@ out <- list(
   meta = list(
     generated_at = format(Sys.Date()),
     pretrends_version = as.character(packageVersion("pretrends")),
-    pretrends_commit = "<PR-C-PIN>",  # TODO PR-C: replace with actual git SHA
+    pretrends_commit = PRETRENDS_COMMIT,
     r_version = R.version.string,
     description = paste(
       "Roth (2022) PreTrendsPower parity goldens for diff-diff",
-      "compute_pretrends_power / PreTrendsPower (PR-C parity target).",
-      "Parity at atol=1e-6 along a three-tier contract:",
-      "(1) NIS box probability at fixed γ values on all 3 fixtures;",
-      "(2) γ_p MDV (slope at target power 0.5 and 0.8) on regular and",
-      "irregular grids;",
-      "(3) γ-unit MDV invariance: PR-B's skip-L2-norm path produces MDV",
-      "in Roth's γ units exactly, matching R's slope_for_power().",
+      "compute_pretrends_power / PreTrendsPower (PR-C).",
+      "Three-tier parity contract:",
+      "(1) NIS box probability at fixed gamma values on all 4 fixtures",
+      "(atol=1e-5; R hardcodes thresholdTstat=1.96, Python uses qnorm(0.975)",
+      "= 1.959963984540054);",
+      "(2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular,",
+      "irregular, anticipation, and K=1 grids (atol=1e-4; R uniroot tol",
+      "vs Python brentq xtol gap);",
+      "(3) gamma-unit MDV invariance: PR-B's skip-L2-norm path produces MDV",
+      "in Roth's gamma units exactly, matching R's slope_for_power().",
       "See diff-diff/docs/methodology/papers/roth-2022-review.md for",
       "the full derivation."
     )
   ),
   uniform_3_pre_periods_no_anticipation = fixture_1,
   irregular_pre_periods = fixture_2,
-  anticipation_shifted = fixture_3
+  anticipation_shifted = fixture_3,
+  single_pre_period_closed_form = fixture_4
 )
 
 out_path <- "../data/r_pretrends_golden.json"
 write_json(out, out_path, pretty = TRUE, digits = NA, auto_unbox = TRUE)
 cat(sprintf("Wrote %s\n", out_path))
-cat("\n")
-cat("PR-C TODO checklist:\n")
-cat("  [ ] Replace <PR-C-PIN> commit-hash placeholder above with actual\n")
-cat("      git SHA from https://github.com/jonathandroth/pretrends.\n")
-cat("  [ ] Replace the NA_real_ stubs in extract_pretrends() with the\n")
-cat("      actual pretrends::pretrends() / slope_for_power() calls.\n")
-cat("  [ ] Verify the surface claims in REGISTRY.md PreTrendsPower\n")
-cat("      Reference implementations section against the pinned revision.\n")
-cat("  [ ] Activate tests/test_methodology_pretrends.py::TestPretrendsParityR\n")
-cat("      (currently skips via @pytest.mark.skipif when the JSON is missing).\n")
-cat("  [ ] Flip METHODOLOGY_REVIEW.md PreTrendsPower row from\n")
-cat("      **Complete** (R parity pending) → **Complete**.\n")
diff --git a/benchmarks/data/r_pretrends_golden.json b/benchmarks/data/r_pretrends_golden.json
new file mode 100644
index 00000000..935f705a
--- /dev/null
+++ b/benchmarks/data/r_pretrends_golden.json
@@ -0,0 +1,106 @@
+{
+  "meta": {
+    "generated_at": "2026-05-19",
+    "pretrends_version": "0.1.0",
+    "pretrends_commit": "122731d082",
+    "r_version": "R version 4.5.2 (2025-10-31)",
+    "description": "Roth (2022) PreTrendsPower parity goldens for diff-diff compute_pretrends_power / PreTrendsPower (PR-C). Three-tier parity contract: (1) NIS box probability at fixed gamma values on all 4 fixtures (atol=1e-5; R hardcodes thresholdTstat=1.96, Python uses qnorm(0.975) = 1.959963984540054); (2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular, irregular, anticipation, and K=1 grids (atol=1e-4; R uniroot tol vs Python brentq xtol gap); (3) gamma-unit MDV invariance: PR-B's skip-L2-norm path produces MDV in Roth's gamma units exactly, matching R's slope_for_power(). See diff-diff/docs/methodology/papers/roth-2022-review.md for the full derivation."
+  },
+  "uniform_3_pre_periods_no_anticipation": {
+    "panel": {
+      "pre_periods": [-3, -2, -1],
+      "post_periods": [1, 2, 3],
+      "all_periods": [-3, -2, -1, 1, 2, 3],
+      "beta_hat": [0.0748350781672762, 0.068292848337178, -0.0771191463064529, -0.0935800609975106, 0.0682614805759886, 0.211856579827921],
+      "Sigma": [
+        [0.04, 0.012, 0.012, 0.012, 0.012, 0.012],
+        [0.012, 0.04, 0.012, 0.012, 0.012, 0.012],
+        [0.012, 0.012, 0.04, 0.012, 0.012, 0.012],
+        [0.012, 0.012, 0.012, 0.04, 0.012, 0.012],
+        [0.012, 0.012, 0.012, 0.012, 0.04, 0.012],
+        [0.012, 0.012, 0.012, 0.012, 0.012, 0.04]
+      ]
+    },
+    "r_power_at_gamma": {
+      "gamma_test_values": [0, 0.2, 0.5, 1],
+      "power_values": [0.136261192117481, 0.905586670498965, 0.999999998817154, 1]
+    },
+    "r_gamma_p": {
+      "target_power": [0.5, 0.8],
+      "gamma_p_values": [0.11085490751343, 0.168538720656777]
+    },
+    "fixture_name": "uniform_3_pre_periods_no_anticipation"
+  },
+  "irregular_pre_periods": {
+    "panel": {
+      "pre_periods": [-5, -3, -1],
+      "post_periods": [1, 2, 3],
+      "all_periods": [-5, -3, -1, 1, 2, 3],
+      "beta_hat": [0.0167238420028552, 0.109902112987245, 0.328388861134961, 0.174851793110386, 0.252739941152015, -0.00593022820809866],
+      "Sigma": [
+        [0.04, 0.012, 0.012, 0.012, 0.012, 0.012],
+        [0.012, 0.04, 0.012, 0.012, 0.012, 0.012],
+        [0.012, 0.012, 0.04, 0.012, 0.012, 0.012],
+        [0.012, 0.012, 0.012, 0.04, 0.012, 0.012],
+        [0.012, 0.012, 0.012, 0.012, 0.04, 0.012],
+        [0.012, 0.012, 0.012, 0.012, 0.012, 0.04]
+      ]
+    },
+    "r_power_at_gamma": {
+      "gamma_test_values": [0, 0.2, 0.5, 1],
+      "power_values": [0.1362253542292, 0.999453378314271, 1, 1]
+    },
+    "r_gamma_p": {
+      "target_power": [0.5, 0.8],
+      "gamma_p_values": [0.0685248863600757, 0.103615302925996]
+    },
+    "fixture_name": "irregular_pre_periods"
+  },
+  "anticipation_shifted": {
+    "panel": {
+      "pre_periods": [-5, -4, -3, -2],
+      "post_periods": [1, 2, 3],
+      "all_periods": [-5, -4, -3, -2, 1, 2, 3],
+      "beta_hat": [-0.238811348142889, -0.127368309020327, -0.120497459632285, -0.162119040596379, 0.118496154007033, -0.000345166145009745, 0.0906365319231415],
+      "Sigma": [
+        [0.04, 0.012, 0.012, 0.012, 0.012, 0.012, 0.012],
+        [0.012, 0.04, 0.012, 0.012, 0.012, 0.012, 0.012],
+        [0.012, 0.012, 0.04, 0.012, 0.012, 0.012, 0.012],
+        [0.012, 0.012, 0.012, 0.04, 0.012, 0.012, 0.012],
+        [0.012, 0.012, 0.012, 0.012, 0.04, 0.012, 0.012],
+        [0.012, 0.012, 0.012, 0.012, 0.012, 0.04, 0.012],
+        [0.012, 0.012, 0.012, 0.012, 0.012, 0.012, 0.04]
+      ]
+    },
+    "r_power_at_gamma": {
+      "gamma_test_values": [0, 0.2, 0.5, 1],
+      "power_values": [0.173946844004659, 0.99990350608629, 1, 1]
+    },
+    "r_gamma_p": {
+      "target_power": [0.5, 0.8],
+      "gamma_p_values": [0.0580280273451627, 0.0917980203438368]
+    },
+    "fixture_name": "anticipation_shifted"
+  },
+  "single_pre_period_closed_form": {
+    "panel": {
+      "pre_periods": -1,
+      "post_periods": 1,
+      "all_periods": [-1, 1],
+      "beta_hat": [-0.029844176495569, 0.386819661339587],
+      "Sigma": [
+        [0.25, 0],
+        [0, 0.25]
+      ]
+    },
+    "r_power_at_gamma": {
+      "gamma_test_values": [0, 0.2, 0.5, 1],
+      "power_values": [0.0499957902964407, 0.0685174081253657, 0.170065802678576, 0.515990911734522]
+    },
+    "r_gamma_p": {
+      "target_power": [0.5, 0.8],
+      "gamma_p_values": [0.979923226532149, 1.40081048235691]
+    },
+    "fixture_name": "single_pre_period_closed_form"
+  }
+}
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 3efdcdbc..7bed9947 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -2840,7 +2840,7 @@ Violation types:
 - **Backwards-compat addendum (`power_at()` for `violation_type='custom'`):** `PreTrendsPowerResults` now persists `violation_weights` on fresh fits (PR-B Step 5), so `power_at(M)` works for all four violation types including custom. Old serialized results from before PR-B's field addition have `violation_weights=None`; for those legacy results, `power_at(M)` falls back to weight reconstruction from `violation_type + n_pre_periods`, but for `violation_type='custom'` the custom weights cannot be reconstructed and `power_at(M)` raises `NotImplementedError` with a "refit with current library version" message. Fresh fits do not hit this guard.
 
 **Reference implementation(s):**
-- R: [`pretrends`](https://github.com/jonathandroth/pretrends) (Roth's official package). NIS-based (`pretrends()`, `slope_for_power()`, `*_NIS` helpers). R-parity goldens deferred to PR-C; the generator script `benchmarks/R/generate_pretrends_golden.R` ships in PR-B with a placeholder commit reference pending an R-package revision pin.
+- R: [`pretrends`](https://github.com/jonathandroth/pretrends) (Roth's official package). NIS-based (`pretrends()`, `slope_for_power()`, `*_NIS` helpers). R-parity goldens at `benchmarks/data/r_pretrends_golden.json` generated by `benchmarks/R/generate_pretrends_golden.R` against commit `122731d082` (package version 0.1.0; PR-C, 2026-05-19). Parity at `atol=1e-4` on both NIS power and γ_p MDV — R hardcodes `thresholdTstat=1.96` while Python uses `scipy.stats.norm.ppf(0.975) = 1.959963984540054`, and R `slope_for_power` uses `uniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)` versus Python `brentq(xtol=2e-12)`; the inverse-solver tolerance gap dominates the γ_p residual.
 - R dependency: [`tmvtnorm`](https://cran.r-project.org/package=tmvtnorm) (Manjunath & Wilhelm 2012) — used by R `pretrends` for truncated multivariate normal moments. The Python library uses `scipy.stats.multivariate_normal.cdf` directly for the box probability (does not require a `tmvtnorm` port).
 
 **Requirements checklist:**
@@ -2851,7 +2851,7 @@ Violation types:
 - [x] Custom-violation weights persisted on `PreTrendsPowerResults`; `power_at(custom)` works on fresh fits (PR-B Step 5)
 - [x] Helper API (`compute_pretrends_power` / `compute_mdv`) supports `violation_weights` + `pretest_form` (PR-B Step 6)
 - [x] Methodology test file with paper-equation-numbered Verified Components walk-through (PR-B Step 7 — `tests/test_methodology_pretrends.py`)
-- [ ] R `pretrends` parity at pinned commit (deferred to PR-C; generator script committed in PR-B)
+- [x] R `pretrends` parity at commit `122731d082` (PR-C, 2026-05-19; 4 fixtures × NIS power + γ_p MDV at `atol=1e-4`)
 - [x] Power curve plotting over violation magnitudes (preserved from pre-PR-B)
 - [x] Integrates with HonestDiD for combined sensitivity analysis (preserved from pre-PR-B)
 
diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index caf3bac9..0b215db1 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -288,11 +288,4 @@ Quoting Roth's key empirical results (for cross-validation):
 - **Heteroskedastic Sigma**: Proposition 2 requires Assumption 1. Once Proposition 1 / Proposition 3 / Proposition 4 computations are added (current `diff_diff/pretrends.py` does Wald-test power/MDV only, not the full conditional-moment path), the library will be able to operate under arbitrary Sigma — but at that point the sign of the bias-amplification effect is NOT guaranteed without Assumption 1. The library should NOT print "pretest amplifies bias under monotone trends" unless Assumption 1 is approximately satisfied (or just always issue the conditional warning).
 - **Equation 4 publication-rules analysis**: not standardly implemented in PreTrendsPower-style tools. Roth notes it as part of the discussion (Section II.D) but does not provide a numerical workflow for users. Library should NOT attempt to implement Equation 4 unless requested.
 - **Connection to `compute_pretrends_power` library helper**: the paper review confirms that "minimum slope detectable at 80% power" is exactly Roth's gamma_{0.8}, and the library helper should compute and surface this. Need to verify the existing helper's calling convention against the paper's framework when auditing `diff_diff/pretrends.py`.
-- **R `pretrends` package version pin (provisional)**: paper cites the package at https://github.com/jonathandroth/pretrends; no specific version cited in the paper, and this review has NOT pinned a commit either. **All R-package surface claims in this file are therefore provisional, pending a pinned commit/release.** Specifically, the following statements must be re-verified against a recorded revision before the follow-up audit can rely on them:
-  - "the published R `pretrends` package surface ... is NIS-based" (the Joint Wald bullet above)
-  - "exposes `pretrends()` / `slope_for_power()` / `*_NIS` helpers only, not a joint-Wald interface"
-  - "Section III ... endorses power analyses against hypothesized nonlinear trends via the `pretrends` package"
-  - any reference to specific R-package function names or behavior
-
-  R-parity work should record the exact commit/release pinned and re-verify each of these claims at that revision; either pin a commit now or treat every package-API statement above as "to be verified against a pinned revision".
 - **Compatibility with multi-cohort estimators**: Remark 1 lists Callaway-Sant'Anna, Sun-Abraham, etc. as compatible. The paper does not detail how to construct (beta_hat, Sigma_hat) from those estimators when the event-study output is multi-cohort (e.g., cohort × event-time matrix). Library should document the aggregation convention (per Sun-Abraham overall ATT or per Callaway-Sant'Anna `aggregate=event`).
diff --git a/tests/test_methodology_pretrends.py b/tests/test_methodology_pretrends.py
index 53b594d3..5995abe8 100644
--- a/tests/test_methodology_pretrends.py
+++ b/tests/test_methodology_pretrends.py
@@ -30,9 +30,11 @@
   violation_weights + pretest_form end-to-end (PR-B Step 6 regression).
 - ``TestPretrendsNISvsWald`` — NIS and Wald forms produce form-consistent
   output; backwards-compat regression on the Wald path.
-- ``TestPretrendsParityR`` — R `pretrends` package parity (skips when
-  goldens at ``benchmarks/data/r_pretrends_golden.json`` are missing;
-  populated in PR-C).
+- ``TestPretrendsParityR`` — R `pretrends` package parity at
+  `jonathandroth/pretrends` commit ``122731d082`` (package version 0.1.0).
+  Goldens at ``benchmarks/data/r_pretrends_golden.json`` populated in PR-C
+  (2026-05-19). The ``@pytest.mark.skipif`` guards partial checkouts where
+  the JSON is absent.
 """
 
 import json
@@ -1101,17 +1103,29 @@ def test_nis_and_wald_differ_in_general(self):
             "r_pretrends_golden.json",
         )
     ),
-    reason="R `pretrends` parity goldens not yet committed — see PR-C",
+    reason="R `pretrends` parity goldens not present (partial checkout)",
 )
 class TestPretrendsParityR:
-    """R `pretrends` package parity at `atol=1e-6`.
-
-    All tests skip when `benchmarks/data/r_pretrends_golden.json` is absent
-    (the canonical PR-B-vs-PR-C handoff: the generator script ships in PR-B
-    with a placeholder commit reference; PR-C pins the audited revision,
-    runs the script, commits the JSON, and activates these tests). See
-    REGISTRY.md `## PreTrendsPower` requirements checklist for the R-parity
-    deferred-to-PR-C status.
+    """R `pretrends` package parity at commit ``122731d082`` (PR-C).
+
+    Goldens at ``benchmarks/data/r_pretrends_golden.json`` populated in
+    PR-C 2026-05-19 against `jonathandroth/pretrends` commit ``122731d082``
+    (package version 0.1.0). The ``@pytest.mark.skipif`` decorator guards
+    partial checkouts where the JSON is absent (the full repo always has
+    it post-PR-C).
+
+    Tolerance rationale: R hardcodes ``thresholdTstat.Pretest = 1.96``
+    while Python uses ``scipy.stats.norm.ppf(0.975) = 1.959963984540054``
+    (``dz ≈ 3.6e-5``); on top of that, ``mvtnorm::pmvnorm`` (R) and
+    ``scipy.stats.multivariate_normal.cdf`` (Python) use Genz-Bretz
+    randomized-lattice rules with different absolute-error defaults
+    (abseps ≈ 1e-3 vs 1e-5). Combined, the empirical NIS power gap is
+    bounded by ~5e-5 in the K=4 anticipation fixture (smaller for K∈{1,3}).
+    For the inverse path (γ_p), R's ``slope_for_power`` uses
+    ``uniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)`` versus Python
+    ``brentq(xtol=2e-12)``; the inverse-solver tolerance gap dominates the
+    γ_p residual. ``atol=1e-4`` is therefore the realistic ceiling on both
+    tiers without tightening either solver.
     """
 
     @staticmethod
@@ -1125,40 +1139,194 @@ def _load_r_golden():
         with open(path) as f:
             return json.load(f)
 
-    def test_nis_power_matches_r_pretrends_at_atol_1e_6(self):
-        """Python NIS power matches R `pretrends::pretrends()` at atol=1e-6.
-
-        Stub — PR-C populates with concrete fixture iteration.
+    @staticmethod
+    def _extract_python_params(fixture):
+        """Parse a fixture's panel into ``(pre_periods, Sigma_22, weights)``.
+
+        ``weights = |pre_periods|`` is the Roth γ-unit linear violation
+        weight. ``Sigma_22`` is the pre-period sub-block of the full
+        ``Sigma``. K=1 fixtures arrive with jsonlite ``auto_unbox`` scalar
+        ``pre_periods`` — ``np.atleast_1d`` normalizes them to a length-1
+        array.
+        """
+        panel = fixture["panel"]
+        pre_periods = np.atleast_1d(np.array(panel["pre_periods"], dtype=float))
+        K_pre = len(pre_periods)
+        Sigma_full = np.atleast_2d(np.array(panel["Sigma"], dtype=float))
+        Sigma_22 = Sigma_full[:K_pre, :K_pre]
+        weights = np.abs(pre_periods)
+        return pre_periods, Sigma_22, weights
+
+    def test_nis_power_matches_r_pretrends(self):
+        """Python ``_compute_power_nis`` matches R ``pretrends()`` at atol=1e-4.
+
+        Iterates all 4 fixtures × 4 γ values (16 comparisons). See class
+        docstring for the tolerance derivation.
         """
         goldens = self._load_r_golden()
+        pt = PreTrendsPower(alpha=0.05, pretest_form="nis")
         for fixture_name, fixture in goldens.items():
             if fixture_name == "meta":
                 continue
-            # PR-C will iterate fixture['panel'] + fixture['r_power_at_gamma'] etc.
-            assert isinstance(fixture, dict)
-
-    def test_mdv_gamma_p_matches_r_slope_for_power_at_atol_1e_6(self):
-        """Python MDV (γ_p) matches R `slope_for_power()` at atol=1e-6.
-
-        Stub — PR-C populates with concrete fixture iteration.
+            _, Sigma_22, weights = self._extract_python_params(fixture)
+            for g_val, r_power in zip(
+                fixture["r_power_at_gamma"]["gamma_test_values"],
+                fixture["r_power_at_gamma"]["power_values"],
+            ):
+                py_power, _, _, _ = pt._compute_power_nis(g_val, weights, Sigma_22)
+                assert np.isclose(py_power, r_power, atol=1e-4), (
+                    f"{fixture_name} γ={g_val}: Python={py_power:.7f}, "
+                    f"R={r_power:.7f}, diff={py_power - r_power:+.2e}"
+                )
+
+    def test_mdv_gamma_p_matches_r_slope_for_power(self):
+        """Python ``_compute_mdv_nis`` matches R ``slope_for_power()`` at atol=1e-4.
+
+        Iterates all 4 fixtures × 2 target_power values (8 comparisons).
+        See class docstring for the tolerance derivation — R uniroot vs
+        Python brentq solver gap dominates.
         """
         goldens = self._load_r_golden()
         for fixture_name, fixture in goldens.items():
             if fixture_name == "meta":
                 continue
-            assert isinstance(fixture, dict)
+            _, Sigma_22, weights = self._extract_python_params(fixture)
+            for target_p, r_gamma_p in zip(
+                fixture["r_gamma_p"]["target_power"],
+                fixture["r_gamma_p"]["gamma_p_values"],
+            ):
+                pt = PreTrendsPower(alpha=0.05, power=target_p, pretest_form="nis")
+                py_mdv = pt._compute_mdv_nis(weights, Sigma_22)
+                assert np.isclose(py_mdv, r_gamma_p, atol=1e-4), (
+                    f"{fixture_name} target_power={target_p}: "
+                    f"Python={py_mdv:.7f}, R={r_gamma_p:.7f}, "
+                    f"diff={py_mdv - r_gamma_p:+.2e}"
+                )
+
+    def test_irregular_grid_gamma_unit_end_to_end_matches_r(self):
+        """End-to-end ``fit()`` on irregular grid matches R γ_p at atol=1e-4.
+
+        Constructs a synthetic ``MultiPeriodDiDResults`` from the
+        ``irregular_pre_periods`` fixture (pre {-5, -3, -1}; post {1, 2, 3};
+        reference_period 0) with the fixture's β̂ + full Σ + populated
+        ``interaction_indices``, then calls ``pt.fit()`` and asserts the
+        produced ``mdv`` matches R's ``slope_for_power()``. This exercises
+        the full
+        ``fit() → _extract_pre_period_params → _get_violation_weights
+        (γ-unit linear path) → _compute_mdv_nis`` chain, which the direct
+        ``_compute_mdv_nis`` test in
+        ``test_mdv_gamma_p_matches_r_slope_for_power`` does not. Locks the
+        PR-B Step 4 γ-unit fix at the public API boundary.
+        """
+        from diff_diff.results import MultiPeriodDiDResults, PeriodEffect
 
-    def test_irregular_grid_gamma_unit_matches_r(self):
-        """γ-unit MDV on irregular pre-period grids matches R at atol=1e-6.
+        goldens = self._load_r_golden()
+        fixture = goldens["irregular_pre_periods"]
+        pre_periods, _, _ = self._extract_python_params(fixture)
+        panel = fixture["panel"]
+        Sigma_full = np.atleast_2d(np.array(panel["Sigma"], dtype=float))
+        beta_hat = np.array(panel["beta_hat"], dtype=float)
+        all_periods = list(np.atleast_1d(np.array(panel["all_periods"], dtype=int)))
+        pre_period_ids = [int(p) for p in pre_periods]
+        post_period_ids = [
+            int(p) for p in np.atleast_1d(np.array(panel["post_periods"], dtype=int))
+        ]
 
-        Specifically tests the PR-B linear-units fix: irregular grid
-        {-5, -3, -1} should produce a γ value that R's pretrends package
-        also reports as the slope, not a normalized direction.
+        # Build period_effects + interaction_indices over all_periods.
+        ses_full = np.sqrt(np.diag(Sigma_full))
+        period_effects = {
+            int(p): PeriodEffect(
+                period=int(p),
+                effect=float(beta_hat[i]),
+                se=float(ses_full[i]),
+                t_stat=0.0,
+                p_value=0.5,
+                conf_int=(0.0, 0.0),
+            )
+            for i, p in enumerate(all_periods)
+        }
+        interaction_indices = {int(p): i for i, p in enumerate(all_periods)}
 
-        Stub — PR-C populates with concrete fixture iteration.
+        mpd_results = MultiPeriodDiDResults(
+            period_effects=period_effects,
+            avg_att=0.0,
+            avg_se=0.2,
+            avg_t_stat=0.0,
+            avg_p_value=0.5,
+            avg_conf_int=(0.0, 0.0),
+            n_obs=1000,
+            n_treated=500,
+            n_control=500,
+            pre_periods=pre_period_ids,
+            post_periods=post_period_ids,
+            reference_period=0,
+            vcov=Sigma_full,
+            interaction_indices=interaction_indices,
+        )
+
+        # Target power 0.8 (second entry in r_gamma_p).
+        target_p = fixture["r_gamma_p"]["target_power"][1]
+        r_gamma_p = fixture["r_gamma_p"]["gamma_p_values"][1]
+
+        pt = PreTrendsPower(alpha=0.05, power=target_p, pretest_form="nis")
+        result = pt.fit(mpd_results)
+
+        # Provenance assertion: end-to-end path took the full-Σ_22 branch,
+        # not the diag fallback (PR-B Step 3 routing).
+        assert (
+            result.covariance_source == "full_pre_period_vcov"
+        ), f"Expected full_pre_period_vcov, got {result.covariance_source}"
+
+        assert np.isclose(result.mdv, r_gamma_p, atol=1e-4), (
+            f"End-to-end fit() mdv={result.mdv:.7f} ≠ R slope_for_power()={r_gamma_p:.7f}, "
+            f"diff={result.mdv - r_gamma_p:+.2e}"
+        )
+
+    def test_k1_matches_r_and_closed_form(self):
+        """K=1 fixture: Python ≡ closed form, both within atol=1e-4 of R.
+
+        Three-way cross-check on the ``single_pre_period_closed_form``
+        fixture (K=1, diagonal Σ = 0.25·I). Roth Proposition 2 univariate
+        truncated-normal scalar:
+
+            power(γ) = 1 - Φ(z - γ/σ) + Φ(-z - γ/σ)
+
+        where ``z = Φ⁻¹(1 - α/2) = 1.959963984540054`` and
+        ``σ = sqrt(Σ_22[0,0]) = 0.5``. Python ``_compute_power_nis`` for
+        K=1 reduces to the same scalar normal-tail computation (no MVN
+        randomization), so the Python-vs-closed-form match is essentially
+        exact (atol=1e-7). R differs by the ``thresholdTstat=1.96`` vs
+        ``qnorm(0.975)`` z-hardcoding (~1e-5 to ~2e-5 on power, ~3e-6 to
+        ~2e-5 on γ_p). All three within 1e-4 of each other. Strongest
+        parity claim in the suite.
         """
         goldens = self._load_r_golden()
-        for fixture_name, fixture in goldens.items():
-            if fixture_name == "meta":
-                continue
-            assert isinstance(fixture, dict)
+        fixture = goldens["single_pre_period_closed_form"]
+        _, Sigma_22, weights = self._extract_python_params(fixture)
+        sigma = float(np.sqrt(Sigma_22[0, 0]))
+        z = float(stats.norm.ppf(1 - 0.05 / 2))
+
+        pt = PreTrendsPower(alpha=0.05, pretest_form="nis")
+        for g_val, r_power in zip(
+            fixture["r_power_at_gamma"]["gamma_test_values"],
+            fixture["r_power_at_gamma"]["power_values"],
+        ):
+            py_power, _, _, _ = pt._compute_power_nis(g_val, weights, Sigma_22)
+            closed_form = (
+                1.0 - stats.norm.cdf(z - g_val / sigma) + stats.norm.cdf(-z - g_val / sigma)
+            )
+            # Python ≡ closed form (effectively exact — same scalar path).
+            assert np.isclose(py_power, closed_form, atol=1e-7), (
+                f"K=1 Python NIS power ≠ analytical closed form (γ={g_val}): "
+                f"Python={py_power:.10f}, closed_form={closed_form:.10f}"
+            )
+            # Both within 1e-4 of R (z-hardcoding gap).
+            assert np.isclose(py_power, r_power, atol=1e-4), (
+                f"K=1 Python ≠ R (γ={g_val}): "
+                f"Python={py_power:.7f}, R={r_power:.7f}, diff={py_power - r_power:+.2e}"
+            )
+            assert np.isclose(closed_form, r_power, atol=1e-4), (
+                f"K=1 closed form ≠ R (γ={g_val}): "
+                f"closed_form={closed_form:.7f}, R={r_power:.7f}, "
+                f"diff={closed_form - r_power:+.2e}"
+            )

From a75d4ed8a0b523dfcac82688d7242e41e628dda4 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Tue, 19 May 2026 12:02:23 -0400
Subject: [PATCH 2/5] Address codex R1 P2+P3 on PR-C: atol/schema/stale-prose
 cleanup
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

R1 codex local review verdict: no P0/P1. Three actionable items addressed:

- P2 (Methodology): `docs/methodology/papers/roth-2022-review.md` Gaps
  bullets at L282 (Joint Wald) and L288 (Heteroskedastic Sigma) contained
  stale provisional claims — "specific commit not pinned" and "current
  `diff_diff/pretrends.py` does Wald-test power/MDV only". Both are
  superseded by PR-B (NIS as default) and PR-C (pinned commit
  `122731d082`). Updated to cite the pinned commit and the current
  NIS+Wald state.

- P2 (Documentation/Tests): the R script header and JSON `meta.description`
  advertised NIS power parity at `atol=1e-5`, but the active tests use
  `atol=1e-4` (matching empirical Genz-Bretz randomization gap of ~5e-5
  on K=4 anticipation). Aligned the script comments + JSON description
  to `atol=1e-4` and added the Genz-Bretz rationale.

- P3 (Maintainability): the K=1 fixture serialized `pre_periods` and
  `post_periods` as scalars due to jsonlite `auto_unbox=TRUE` (e.g.,
  `pre_periods: -1`). Wrapped singleton vector fields in `I()` so they
  round-trip as length-1 arrays uniformly across all 4 fixtures. The
  `np.atleast_1d` defensive compensation in
  `_extract_python_params` is harmless and retained as defense-in-depth.

Re-ran `Rscript generate_pretrends_golden.R` to regenerate the JSON
(numerics unchanged — only structural changes from `I()` wrapping).
All 4 `TestPretrendsParityR` tests + 131 pretrends suite tests pass;
black + ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 benchmarks/R/generate_pretrends_golden.R    | 44 +++++++++++++--------
 benchmarks/data/r_pretrends_golden.json     |  6 +--
 docs/methodology/papers/roth-2022-review.md |  4 +-
 3 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/benchmarks/R/generate_pretrends_golden.R b/benchmarks/R/generate_pretrends_golden.R
index 6721ab94..249c4ad7 100644
--- a/benchmarks/R/generate_pretrends_golden.R
+++ b/benchmarks/R/generate_pretrends_golden.R
@@ -14,12 +14,18 @@
 # Output: ../data/r_pretrends_golden.json
 #
 # diff-diff PreTrendsPower with `pretest_form='nis'` (the new default per
-# PR-B Step 2) matches the values in this JSON along a three-tier contract:
+# PR-B Step 2) matches the values in this JSON along a three-tier contract,
+# both tiers at atol=1e-4:
 #   (1) NIS box probability `P(beta_hat_pre in B_NIS(Sigma))` at fixed gamma
-#       values on all 4 fixtures, at atol=1e-5. R hardcodes thresholdTstat
+#       values on all 4 fixtures, at atol=1e-4. R hardcodes thresholdTstat
 #       = 1.96 while Python uses scipy.stats.norm.ppf(0.975) =
-#       1.959963984540054; the ~4e-5 dz gap propagates to a ~1e-6 ceiling on
-#       the box probability — 1e-5 is the realistic atol.
+#       1.959963984540054 (~4e-5 dz gap); on top of that, mvtnorm::pmvnorm
+#       (R) and scipy.stats.multivariate_normal.cdf (Python) use Genz-Bretz
+#       randomized-lattice rules with different absolute-error defaults
+#       (abseps ~ 1e-3 vs 1e-5). The empirical NIS power gap is bounded by
+#       ~5e-5 on the K=4 anticipation fixture; ~3e-5 on K=3 fixtures; ~2e-5
+#       on K=1. atol=1e-4 is the realistic atol without tightening
+#       thresholdTstat.Pretest in R or relaxing the Genz tolerances.
 #   (2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular, irregular,
 #       anticipation, and K=1 grids, at atol=1e-4. R uniroot defaults to
 #       tol = .Machine$double.eps^0.25 ~= 1.22e-4 vs Python brentq xtol=2e-12;
@@ -132,20 +138,24 @@ extract_pretrends <- function(fixture_data, fixture_name) {
   })
 
   list(
+    # Wrap vector fields in I() to prevent jsonlite's auto_unbox=TRUE from
+    # collapsing length-1 vectors to scalars (matters for the K=1 fixture:
+    # `pre_periods` and `post_periods` are singletons but must serialize as
+    # length-1 arrays so downstream Python loaders can iterate uniformly).
     panel = list(
-      pre_periods = as.integer(pre_periods),
-      post_periods = as.integer(post_periods),
-      all_periods = as.integer(all_periods),
-      beta_hat = as.numeric(beta_hat),
+      pre_periods = I(as.integer(pre_periods)),
+      post_periods = I(as.integer(post_periods)),
+      all_periods = I(as.integer(all_periods)),
+      beta_hat = I(as.numeric(beta_hat)),
       Sigma = Sigma
     ),
     r_power_at_gamma = list(
-      gamma_test_values = as.numeric(gamma_test_values),
-      power_values = as.numeric(power_values)
+      gamma_test_values = I(as.numeric(gamma_test_values)),
+      power_values = I(as.numeric(power_values))
     ),
     r_gamma_p = list(
-      target_power = c(0.5, 0.8),
-      gamma_p_values = as.numeric(gamma_p_values)
+      target_power = I(c(0.5, 0.8)),
+      gamma_p_values = I(as.numeric(gamma_p_values))
     ),
     fixture_name = fixture_name
   )
@@ -213,13 +223,15 @@ out <- list(
     description = paste(
       "Roth (2022) PreTrendsPower parity goldens for diff-diff",
       "compute_pretrends_power / PreTrendsPower (PR-C).",
-      "Three-tier parity contract:",
+      "Three-tier parity contract, both numeric tiers at atol=1e-4:",
       "(1) NIS box probability at fixed gamma values on all 4 fixtures",
-      "(atol=1e-5; R hardcodes thresholdTstat=1.96, Python uses qnorm(0.975)",
-      "= 1.959963984540054);",
+      "(atol=1e-4; R hardcodes thresholdTstat=1.96 while Python uses",
+      "qnorm(0.975) = 1.959963984540054, and mvtnorm::pmvnorm vs",
+      "scipy MVN CDF Genz-Bretz randomized-lattice differences bound the",
+      "K=4 NIS power gap at ~5e-5);",
       "(2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular,",
       "irregular, anticipation, and K=1 grids (atol=1e-4; R uniroot tol",
-      "vs Python brentq xtol gap);",
+      "vs Python brentq xtol gap dominates);",
       "(3) gamma-unit MDV invariance: PR-B's skip-L2-norm path produces MDV",
       "in Roth's gamma units exactly, matching R's slope_for_power().",
       "See diff-diff/docs/methodology/papers/roth-2022-review.md for",
diff --git a/benchmarks/data/r_pretrends_golden.json b/benchmarks/data/r_pretrends_golden.json
index 935f705a..4ec10386 100644
--- a/benchmarks/data/r_pretrends_golden.json
+++ b/benchmarks/data/r_pretrends_golden.json
@@ -4,7 +4,7 @@
     "pretrends_version": "0.1.0",
     "pretrends_commit": "122731d082",
     "r_version": "R version 4.5.2 (2025-10-31)",
-    "description": "Roth (2022) PreTrendsPower parity goldens for diff-diff compute_pretrends_power / PreTrendsPower (PR-C). Three-tier parity contract: (1) NIS box probability at fixed gamma values on all 4 fixtures (atol=1e-5; R hardcodes thresholdTstat=1.96, Python uses qnorm(0.975) = 1.959963984540054); (2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular, irregular, anticipation, and K=1 grids (atol=1e-4; R uniroot tol vs Python brentq xtol gap); (3) gamma-unit MDV invariance: PR-B's skip-L2-norm path produces MDV in Roth's gamma units exactly, matching R's slope_for_power(). See diff-diff/docs/methodology/papers/roth-2022-review.md for the full derivation."
+    "description": "Roth (2022) PreTrendsPower parity goldens for diff-diff compute_pretrends_power / PreTrendsPower (PR-C). Three-tier parity contract, both numeric tiers at atol=1e-4: (1) NIS box probability at fixed gamma values on all 4 fixtures (atol=1e-4; R hardcodes thresholdTstat=1.96 while Python uses qnorm(0.975) = 1.959963984540054, and mvtnorm::pmvnorm vs scipy MVN CDF Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5); (2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular, irregular, anticipation, and K=1 grids (atol=1e-4; R uniroot tol vs Python brentq xtol gap dominates); (3) gamma-unit MDV invariance: PR-B's skip-L2-norm path produces MDV in Roth's gamma units exactly, matching R's slope_for_power(). See diff-diff/docs/methodology/papers/roth-2022-review.md for the full derivation."
   },
   "uniform_3_pre_periods_no_anticipation": {
     "panel": {
@@ -84,8 +84,8 @@
   },
   "single_pre_period_closed_form": {
     "panel": {
-      "pre_periods": -1,
-      "post_periods": 1,
+      "pre_periods": [-1],
+      "post_periods": [1],
       "all_periods": [-1, 1],
       "beta_hat": [-0.029844176495569, 0.386819661339587],
       "Sigma": [
diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index 0b215db1..c66f93df 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -279,13 +279,13 @@ Quoting Roth's key empirical results (for cross-validation):
 
 ## Gaps and Uncertainties
 
-- **Joint Wald acceptance region**: paper mentions joint tests only briefly (Section I.B notes 1 of 12 papers uses one). Power, bias, and coverage formulas all apply by replacing B_NIS with the joint Wald acceptance region B_W (convex, so Propositions 1+3+4 all hold), but Roth does not work out a separate table. Joint Wald is theoretically admissible under the paper's propositions, but the published R `pretrends` package surface, as observed at the time of this review (`github.com/jonathandroth/pretrends`; specific commit not pinned — the follow-up audit should record the exact revision against which parity is asserted), is NIS-based (`pretrends()`, `slope_for_power()`, `*_NIS` helpers) and does NOT expose a joint-Wald parity target. Any library implementation of joint-Wald PreTrendsPower will need an independent fixture or derivation rather than direct R-package parity.
+- **Joint Wald acceptance region**: paper mentions joint tests only briefly (Section I.B notes 1 of 12 papers uses one). Power, bias, and coverage formulas all apply by replacing B_NIS with the joint Wald acceptance region B_W (convex, so Propositions 1+3+4 all hold), but Roth does not work out a separate table. Joint Wald is theoretically admissible under the paper's propositions, but the published R `pretrends` package surface (`github.com/jonathandroth/pretrends` at commit `122731d082` / package version 0.1.0, verified in PR-C parity goldens) is NIS-based (`pretrends()`, `slope_for_power()`, `*_NIS` helpers) and does NOT expose a joint-Wald parity target. Any library implementation of joint-Wald PreTrendsPower will need an independent fixture or derivation rather than direct R-package parity.
 - **"Slope-of-best-fit-line t-test" acceptance region**: Table 1 column shows the t-stat for the slope of the linear pre-trend. Paper does not analyze pretests based on this t-stat as a separate acceptance region; library should NOT extrapolate without further reading the `pretrends` package source.
 - **Nonlinear violations**: Section I.C formally tabulates power only against linear violations; Section I.D extends the sign-of-bias result (Proposition 2) to monotone violations under homoskedasticity. Section III ("Practical Recommendations") explicitly endorses power analyses against hypothesized nonlinear trends via the `pretrends` package, so the general nonlinear capability is paper-supported even though the paper does not separately tabulate it. The specific named shapes the library exposes ("constant", "last_period") are R-package API conventions, not separately analyzed in the paper.
 - **Custom delta vector interface**: paper Section III endorses "power analyses for the types of violations of parallel trends deemed to be most relevant in their context," which is the paper-level framing for a user-supplied delta vector; the specific `violation_weights`-style INTERFACE used in the library and the R `pretrends` package is a package-API convention layered on top of that paper-level framework.
 - **Choice of contrast l**: paper highlights l = uniform 1/M (average post-treatment) and l = e_1 (first period after treatment). No guidance on other contrasts (e.g., long-run effect l = e_M, dynamic-weighted contrast) — library should document defaults and warn that bias and coverage depend on l.
 - **K = 0 (no pre-periods)**: trivially no pretest possible; library should error.
-- **Heteroskedastic Sigma**: Proposition 2 requires Assumption 1. Once Proposition 1 / Proposition 3 / Proposition 4 computations are added (current `diff_diff/pretrends.py` does Wald-test power/MDV only, not the full conditional-moment path), the library will be able to operate under arbitrary Sigma — but at that point the sign of the bias-amplification effect is NOT guaranteed without Assumption 1. The library should NOT print "pretest amplifies bias under monotone trends" unless Assumption 1 is approximately satisfied (or just always issue the conditional warning).
+- **Heteroskedastic Sigma**: Proposition 2 requires Assumption 1. The current `diff_diff/pretrends.py` implements NIS and Wald power/MDV (PR-B 2026-05-18; NIS is the default per `pretest_form='nis'`), but NOT the full conditional-moment path (Propositions 1 / 3 / 4 numerical evaluation). Once those are added, the library will be able to operate under arbitrary Sigma — but at that point the sign of the bias-amplification effect is NOT guaranteed without Assumption 1. The library should NOT print "pretest amplifies bias under monotone trends" unless Assumption 1 is approximately satisfied (or just always issue the conditional warning).
 - **Equation 4 publication-rules analysis**: not standardly implemented in PreTrendsPower-style tools. Roth notes it as part of the discussion (Section II.D) but does not provide a numerical workflow for users. Library should NOT attempt to implement Equation 4 unless requested.
 - **Connection to `compute_pretrends_power` library helper**: the paper review confirms that "minimum slope detectable at 80% power" is exactly Roth's gamma_{0.8}, and the library helper should compute and surface this. Need to verify the existing helper's calling convention against the paper's framework when auditing `diff_diff/pretrends.py`.
 - **Compatibility with multi-cohort estimators**: Remark 1 lists Callaway-Sant'Anna, Sun-Abraham, etc. as compatible. The paper does not detail how to construct (beta_hat, Sigma_hat) from those estimators when the event-study output is multi-cohort (e.g., cohort × event-time matrix). Library should document the aggregation convention (per Sun-Abraham overall ATT or per Callaway-Sant'Anna `aggregate=event`).

From 50a088246ba6b247b535de1b4605281d309542d4 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Tue, 19 May 2026 13:11:48 -0400
Subject: [PATCH 3/5] Fix CHANGELOG placement: move PR-C bullet from [3.4.0] to
 [Unreleased]

The rebase onto main after release 3.4.0 textually applied the PR-C
CHANGELOG hunk under the [3.4.0] section because the surrounding context
(`### Added` heading + agent-discoverability bullet) had been copied from
the pre-release [Unreleased] into [3.4.0] by the release prep. PR-C is a
post-3.4.0 change, so its CHANGELOG entry belongs in [Unreleased].

Moves the single bullet; release [3.4.0] content unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index dae1a561..4ff3a980 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,10 +7,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Added
+- **PreTrendsPower R `pretrends` parity goldens (PR-C closes PR-B's deferred R-parity row).** JSON goldens at `benchmarks/data/r_pretrends_golden.json` generated from the committed `benchmarks/R/generate_pretrends_golden.R` script against `jonathandroth/pretrends` commit `122731d082` (package version 0.1.0, R 4.5.2). 4 fixtures cover regular K=3 grid (`uniform_3_pre_periods_no_anticipation`), irregular K=3 grid `[-5,-3,-1]` (`irregular_pre_periods` — locks the PR-B Step 4 γ-unit linear-weight fix), anticipation-shifted K=4 grid (`anticipation_shifted`), and K=1 closed form (`single_pre_period_closed_form` — Roth Proposition 2 univariate truncated-normal). `TestPretrendsParityR` in `tests/test_methodology_pretrends.py` now active (4 tests): NIS power vs R `pretrends::pretrends()` at `atol=1e-4` across all 4 fixtures × 4 γ values; γ_p MDV vs R `slope_for_power()` at `atol=1e-4` across all 4 fixtures × 2 target_power values; end-to-end `fit()` on irregular grid vs R γ_p at `atol=1e-4` (locks the full `fit() → _extract_pre_period_params → _get_violation_weights → _compute_mdv_nis` chain through the public API); K=1 three-way cross-check (Python ≡ analytical truncated-normal closed form `1 - Φ(z - γ/σ) + Φ(-z - γ/σ)` at `atol=1e-7`; both within `atol=1e-4` of R). Tolerance rationale: R hardcodes `thresholdTstat.Pretest=1.96` while Python uses `scipy.stats.norm.ppf(0.975) = 1.959963984540054` (`dz ≈ 3.6e-5`); R `slope_for_power` uses `uniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)` versus Python `brentq(xtol=2e-12)`; the inverse-solver tolerance gap dominates γ_p, and `mvtnorm::pmvnorm` (R) vs `scipy.stats.multivariate_normal.cdf` (Python) Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5. `METHODOLOGY_REVIEW.md` PreTrendsPower row promoted `**Complete** (R parity pending)` → `**Complete**`. Roth (2022) paper review's `R \`pretrends\` package version pin (provisional)` Gaps bullet struck. Closes the PR-C TODO row.
+
 ## [3.4.0] - 2026-05-19
 
 ### Added
-- **PreTrendsPower R `pretrends` parity goldens (PR-C closes PR-B's deferred R-parity row).** JSON goldens at `benchmarks/data/r_pretrends_golden.json` generated from the committed `benchmarks/R/generate_pretrends_golden.R` script against `jonathandroth/pretrends` commit `122731d082` (package version 0.1.0, R 4.5.2). 4 fixtures cover regular K=3 grid (`uniform_3_pre_periods_no_anticipation`), irregular K=3 grid `[-5,-3,-1]` (`irregular_pre_periods` — locks the PR-B Step 4 γ-unit linear-weight fix), anticipation-shifted K=4 grid (`anticipation_shifted`), and K=1 closed form (`single_pre_period_closed_form` — Roth Proposition 2 univariate truncated-normal). `TestPretrendsParityR` in `tests/test_methodology_pretrends.py` now active (4 tests): NIS power vs R `pretrends::pretrends()` at `atol=1e-4` across all 4 fixtures × 4 γ values; γ_p MDV vs R `slope_for_power()` at `atol=1e-4` across all 4 fixtures × 2 target_power values; end-to-end `fit()` on irregular grid vs R γ_p at `atol=1e-4` (locks the full `fit() → _extract_pre_period_params → _get_violation_weights → _compute_mdv_nis` chain through the public API); K=1 three-way cross-check (Python ≡ analytical truncated-normal closed form `1 - Φ(z - γ/σ) + Φ(-z - γ/σ)` at `atol=1e-7`; both within `atol=1e-4` of R). Tolerance rationale: R hardcodes `thresholdTstat.Pretest=1.96` while Python uses `scipy.stats.norm.ppf(0.975) = 1.959963984540054` (`dz ≈ 3.6e-5`); R `slope_for_power` uses `uniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)` versus Python `brentq(xtol=2e-12)`; the inverse-solver tolerance gap dominates γ_p, and `mvtnorm::pmvnorm` (R) vs `scipy.stats.multivariate_normal.cdf` (Python) Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5. `METHODOLOGY_REVIEW.md` PreTrendsPower row promoted `**Complete** (R parity pending)` → `**Complete**`. Roth (2022) paper review's `R \`pretrends\` package version pin (provisional)` Gaps bullet struck. Closes the PR-C TODO row.
 - **Agent-discoverability contract test (`tests/test_agent_discoverability.py`).** New static-snapshot test pinning the agent-facing surface introduced by PR #464: `__all__` membership of `agent_workflow` / `profile_panel` / `get_llm_guide` / `practitioner_next_steps` / `BusinessReport`; `dir(diff_diff)` head-first ordering against `_AGENT_FACING_ORDER` (catches drift in the `_OrderedName` `__lt__` ordering trick); `_OrderedName` `isinstance(_, str)` + str-method compatibility; `dir()` full-namespace + `inspect.getmembers` parity; top-level `__doc__` first-paragraph mention of `agent_workflow` + named references to the 5-step workflow primitives; `agent_workflow()` script content references each downstream helper by name; canonical estimator class names (CallawaySantAnna, ContinuousDiD, HeterogeneousAdoptionDiD, etc.) remain importable. No live API calls; runs in the default pytest suite. Closes [issue #461](https://github.com/igerber/diff-diff/issues/461) (snapshot variant — live-agent regression test deferred to a separate follow-up that depends on causal-llm-eval packaging its harness). Also closes the `__dir__()` contract-test row from `TODO.md` that PR #464 deferred here.
 - **`diff_diff.agent_workflow(df, unit=..., time=..., treatment=..., outcome=...)` — stateless orchestrator for LLM-agent discoverability** (`diff_diff/agent_workflow.py`). Prints (and returns as dict) a copy-pasteable 5-step workflow with the caller's column names templated in: `profile_panel` → `get_llm_guide("autonomous")` → `<Estimator>(...).fit(df, ...)` → `practitioner_next_steps(result)` → `BusinessReport(result).full_report()`. The function calls nothing internally and does not inspect `df`; it is a guided tour, not a router. Surfaces the canonical workflow primitives (`profile_panel`, `get_llm_guide`, `practitioner_next_steps`, `BusinessReport`) that cold-start agent dry-passes at [igerber/causal-llm-eval](https://github.com/igerber/causal-llm-eval) showed agents practically never reach for on their own. Output structure: `{"profile_call", "guide_call", "fit_candidates", "validation_calls", "reporting_call", "script"}`; `fit_candidates` is a flat list of estimator/diagnostic class names referenced in the workflow patterns (each must remain importable on `diff_diff`, locked by `tests/test_agent_workflow.py::test_fit_candidates_all_importable`). Closes [issue #460](https://github.com/igerber/diff-diff/issues/460).
 - **Top-level `__doc__` rewritten to lead with the agent workflow** (`diff_diff/__init__.py`). `help(diff_diff)` now opens with the `agent_workflow(df, ...)` recommendation as the first non-blank paragraph; `get_llm_guide("full")` and `get_llm_guide("practitioner")` pointers preserved for the existing `tests/test_guides.py::test_module_docstring_mentions_helper` guard.

From 5b90559d19429e729d491186a6921371ff4aeafb Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Tue, 19 May 2026 13:33:57 -0400
Subject: [PATCH 4/5] Address CI R1 P3 on PR-C: provenance fail-closed +
 fixture-3 clarification
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

CI R1 codex review verdict: ✅ no P0/P1. Two P3 findings addressed:

- P3 (Maintainability): the R generator hardcoded
  `pretrends_commit = "122731d082"` into the JSON but only verified
  `packageVersion("pretrends") >= "0.1.0"`. A future rerun could
  silently regenerate goldens from a drifted revision while still
  stamping the artifact with the original commit. Fix: replace the
  loose version gate with an exact `packageVersion == "0.1.0"` check
  plus a `startsWith(packageDescription("pretrends")$RemoteSha,
  PRETRENDS_COMMIT)` provenance assertion that fails closed with a
  reinstall instruction if the installed revision drifts. Verified
  via positive (RemoteSha = `122731d082a5990e274f57fd9af0968e44977e7a`)
  and negative (synthetic `deadbeef` prefix) checks.

- P3 (Documentation/Tests): the `anticipation_shifted` fixture's
  comment described it as validating anticipation-window filtering,
  but the fixture omits the `t=-1` anticipation window and the parity
  assertions consume prefiltered `Sigma_22` / weights directly — the
  CS/SA-level `_extract_pre_period_params` anticipation filter
  (`if t < _pre_cutoff` in `pretrends.py`) is NOT R-parity-locked by
  this fixture. Fix: rename the comment / R `cat()` print / JSON
  meta.description to "K=4 shifted-grid case", and document the
  non-coverage explicitly in the file-header comment with a forward
  reference to the existing PR-B MC-based and full-VCV coverage in
  `TestPretrendsPropositions` / `TestPretrendsCovarianceSource`,
  plus a deferred follow-up for a CS/SA-level
  `anticipation=1 + R-parity` test (would need a synthetic
  `CallawaySantAnnaResults` with a t=-1 entry that gets filtered
  before reaching `_compute_power_nis`). Test class docstring
  tolerance-rationale prose flipped "K=4 anticipation fixture" →
  "K=4 shifted-grid fixture" to match.

The fixture's JSON key (`anticipation_shifted`) is unchanged to
preserve the test-side reference; only the prose contract is
clarified.

All 4 parity tests still pass; black + ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 benchmarks/R/generate_pretrends_golden.R | 54 ++++++++++++++++++------
 benchmarks/data/r_pretrends_golden.json  |  2 +-
 tests/test_methodology_pretrends.py      |  2 +-
 3 files changed, 44 insertions(+), 14 deletions(-)

diff --git a/benchmarks/R/generate_pretrends_golden.R b/benchmarks/R/generate_pretrends_golden.R
index 249c4ad7..4bb3c2fe 100644
--- a/benchmarks/R/generate_pretrends_golden.R
+++ b/benchmarks/R/generate_pretrends_golden.R
@@ -23,11 +23,11 @@
 #       (R) and scipy.stats.multivariate_normal.cdf (Python) use Genz-Bretz
 #       randomized-lattice rules with different absolute-error defaults
 #       (abseps ~ 1e-3 vs 1e-5). The empirical NIS power gap is bounded by
-#       ~5e-5 on the K=4 anticipation fixture; ~3e-5 on K=3 fixtures; ~2e-5
+#       ~5e-5 on the K=4 shifted-grid fixture; ~3e-5 on K=3 fixtures; ~2e-5
 #       on K=1. atol=1e-4 is the realistic atol without tightening
 #       thresholdTstat.Pretest in R or relaxing the Genz tolerances.
 #   (2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular, irregular,
-#       anticipation, and K=1 grids, at atol=1e-4. R uniroot defaults to
+#       shifted-grid, and K=1 grids, at atol=1e-4. R uniroot defaults to
 #       tol = .Machine$double.eps^0.25 ~= 1.22e-4 vs Python brentq xtol=2e-12;
 #       the inverse-solver tolerance gap dominates, so 1e-4 is the realistic
 #       atol without tightening either solver.
@@ -42,9 +42,20 @@
 #      never-treated control. Default-case parity baseline.
 #   2. irregular_pre_periods — K=3 with relative_times = [-5, -3, -1].
 #      Exercises the PR-B gamma-unit linear-pattern fix end-to-end.
-#   3. anticipation_shifted — K=4 with anticipation=1 (pre-cutoff at t<-1,
-#      so pre-periods are {-5, -4, -3, -2}). Verifies the pre-period filter
-#      logic in `_extract_pre_period_params`.
+#   3. anticipation_shifted — K=4 shifted-grid case (pre-periods at
+#      {-5, -4, -3, -2}, leaving a notional t=-1 anticipation gap to the
+#      reference period 0). Locks R parity at a larger K than the K=3
+#      fixtures and on a non-adjacent-to-reference grid. NOTE: the R-side
+#      `pretrends` package has no anticipation parameter, so this fixture
+#      does NOT exercise the Python-side `_extract_pre_period_params`
+#      CS/SA anticipation filter (`if t < _pre_cutoff` in pretrends.py)
+#      against R goldens — that filter is exercised by the existing
+#      `TestPretrendsCovarianceSource` suite (PR-B Step 3) and by the
+#      MC-based `TestPretrendsPropositions` tests, but a CS/SA-level
+#      anticipation+R-parity test would need a synthetic
+#      CallawaySantAnnaResults with `anticipation=1` and a t=-1 entry
+#      that gets filtered before reaching `_compute_power_nis`. Deferred
+#      to a follow-up.
 #   4. single_pre_period_closed_form — K=1 with diagonal Sigma = 0.25*I
 #      (Roth Proposition 2 univariate truncated-normal closed form). Locks
 #      the scalar fast-path against R AND against the analytical expression
@@ -58,10 +69,26 @@ suppressPackageStartupMessages({
   library(jsonlite)
 })
 
-stopifnot(packageVersion("pretrends") >= "0.1.0")
-
 PRETRENDS_COMMIT <- "122731d082"
 
+# Provenance fail-closed: refuse to regenerate goldens unless the installed
+# pretrends matches the pinned (version, commit) pair. The JSON stamps the
+# pinned commit string into meta.pretrends_commit, so without this check a
+# future rerun could silently regenerate goldens from a drifted revision
+# while still labeling the artifact with the original commit.
+stopifnot(packageVersion("pretrends") == "0.1.0")
+.installed_sha <- packageDescription("pretrends")$RemoteSha
+if (is.null(.installed_sha) || !startsWith(.installed_sha, PRETRENDS_COMMIT)) {
+  stop(sprintf(
+    "pretrends provenance mismatch: expected RemoteSha to start with '%s' but got '%s'. Reinstall with: remotes::install_github('jonathandroth/pretrends', ref = '%s')",
+    PRETRENDS_COMMIT,
+    if (is.null(.installed_sha)) "<missing — not installed via install_github>" else .installed_sha,
+    PRETRENDS_COMMIT
+  ))
+}
+cat("pretrends provenance verified: version 0.1.0, RemoteSha =",
+    .installed_sha, "\n")
+
 # ---------------------------------------------------------------------------
 # DGP helper: build a synthetic event-study coefficient vector + VCV under a
 # stylized null DGP (beta = 0, Sigma_22 ~ correlated). Mirrors the simulation
@@ -186,10 +213,13 @@ f2 <- build_event_study_fixture(
 )
 fixture_2 <- extract_pretrends(f2, "irregular_pre_periods")
 
-cat("Building fixture 3: anticipation_shifted...\n")
-# K=4 pre-periods with anticipation=1. Real pre-treatment cutoff is t < -1,
-# so the {-5, -4, -3, -2} cells are the genuine pre-periods; t=-1 is the
-# anticipation window. Tests the pre-period filtering logic.
+cat("Building fixture 3: anticipation_shifted (K=4 shifted-grid)...\n")
+# K=4 pre-periods at {-5, -4, -3, -2} — a shifted-grid case (gap at t=-1
+# between the last pre-period and the reference period 0). Locks R parity
+# at a larger K and on a non-adjacent-to-reference grid. Note: does NOT
+# exercise the Python-side `_extract_pre_period_params` CS/SA
+# anticipation-filter path against R goldens — see the file-header
+# comment for the rationale and deferred follow-up.
 f3 <- build_event_study_fixture(
   pre_periods = c(-5L, -4L, -3L, -2L),
   post_periods = c(1L, 2L, 3L),
@@ -230,7 +260,7 @@ out <- list(
       "scipy MVN CDF Genz-Bretz randomized-lattice differences bound the",
       "K=4 NIS power gap at ~5e-5);",
       "(2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular,",
-      "irregular, anticipation, and K=1 grids (atol=1e-4; R uniroot tol",
+      "irregular, shifted-grid (K=4), and K=1 grids (atol=1e-4; R uniroot tol",
       "vs Python brentq xtol gap dominates);",
       "(3) gamma-unit MDV invariance: PR-B's skip-L2-norm path produces MDV",
       "in Roth's gamma units exactly, matching R's slope_for_power().",
diff --git a/benchmarks/data/r_pretrends_golden.json b/benchmarks/data/r_pretrends_golden.json
index 4ec10386..1e1eebc4 100644
--- a/benchmarks/data/r_pretrends_golden.json
+++ b/benchmarks/data/r_pretrends_golden.json
@@ -4,7 +4,7 @@
     "pretrends_version": "0.1.0",
     "pretrends_commit": "122731d082",
     "r_version": "R version 4.5.2 (2025-10-31)",
-    "description": "Roth (2022) PreTrendsPower parity goldens for diff-diff compute_pretrends_power / PreTrendsPower (PR-C). Three-tier parity contract, both numeric tiers at atol=1e-4: (1) NIS box probability at fixed gamma values on all 4 fixtures (atol=1e-4; R hardcodes thresholdTstat=1.96 while Python uses qnorm(0.975) = 1.959963984540054, and mvtnorm::pmvnorm vs scipy MVN CDF Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5); (2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular, irregular, anticipation, and K=1 grids (atol=1e-4; R uniroot tol vs Python brentq xtol gap dominates); (3) gamma-unit MDV invariance: PR-B's skip-L2-norm path produces MDV in Roth's gamma units exactly, matching R's slope_for_power(). See diff-diff/docs/methodology/papers/roth-2022-review.md for the full derivation."
+    "description": "Roth (2022) PreTrendsPower parity goldens for diff-diff compute_pretrends_power / PreTrendsPower (PR-C). Three-tier parity contract, both numeric tiers at atol=1e-4: (1) NIS box probability at fixed gamma values on all 4 fixtures (atol=1e-4; R hardcodes thresholdTstat=1.96 while Python uses qnorm(0.975) = 1.959963984540054, and mvtnorm::pmvnorm vs scipy MVN CDF Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5); (2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular, irregular, shifted-grid (K=4), and K=1 grids (atol=1e-4; R uniroot tol vs Python brentq xtol gap dominates); (3) gamma-unit MDV invariance: PR-B's skip-L2-norm path produces MDV in Roth's gamma units exactly, matching R's slope_for_power(). See diff-diff/docs/methodology/papers/roth-2022-review.md for the full derivation."
   },
   "uniform_3_pre_periods_no_anticipation": {
     "panel": {
diff --git a/tests/test_methodology_pretrends.py b/tests/test_methodology_pretrends.py
index 5995abe8..ae4cb4dd 100644
--- a/tests/test_methodology_pretrends.py
+++ b/tests/test_methodology_pretrends.py
@@ -1120,7 +1120,7 @@ class TestPretrendsParityR:
     ``scipy.stats.multivariate_normal.cdf`` (Python) use Genz-Bretz
     randomized-lattice rules with different absolute-error defaults
     (abseps ≈ 1e-3 vs 1e-5). Combined, the empirical NIS power gap is
-    bounded by ~5e-5 in the K=4 anticipation fixture (smaller for K∈{1,3}).
+    bounded by ~5e-5 in the K=4 shifted-grid fixture (smaller for K∈{1,3}).
     For the inverse path (γ_p), R's ``slope_for_power`` uses
     ``uniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)`` versus Python
     ``brentq(xtol=2e-12)``; the inverse-solver tolerance gap dominates the

From 738ae8fcce029f851788eada26b76c97b5e88d37 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Tue, 19 May 2026 13:50:59 -0400
Subject: [PATCH 5/5] Address CI R2 P3 on PR-C: qnorm/norm.ppf typo + TODO
 follow-up row
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

CI R2 codex review verdict: ✅ no P0/P1 (both prior P3s addressed).
Two new P3 findings (both consequences of R1 fixes):

- P3 (Documentation/Tests): the JSON `meta.description` string says
  Python uses `qnorm(0.975)`, but `qnorm` is the R function name; the
  rest of the PR correctly refers to `scipy.stats.norm.ppf(0.975)`.
  Fix: change the R script's `description = paste(...)` block from
  "qnorm(0.975)" to "scipy.stats.norm.ppf(0.975)" so the committed
  parity artifact's audit trail matches the language used in REGISTRY,
  the file-header comment, and the test class docstring.

- P3 (Tech Debt): the R generator's file-header comment now explicitly
  defers a CS/SA `anticipation=1` R-parity test to a follow-up (R1
  P3 #2 fix), but the PR removed the only PreTrendsPower TODO row and
  did not add a replacement tracker. Fix: add a low-priority TODO.md
  row describing the deferred work: build a synthetic
  `CallawaySantAnnaResults` (or `SunAbrahamResults`) with
  `anticipation=1` and a t=-1 entry that gets filtered by
  `_extract_pre_period_params` before reaching `_compute_power_nis`,
  then assert the resulting γ_p matches R's `slope_for_power()` on
  the K=4 shifted-grid fixture.

All 4 parity tests still pass; the regenerated JSON only differs in
the description string.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 TODO.md                                  | 1 +
 benchmarks/R/generate_pretrends_golden.R | 2 +-
 benchmarks/data/r_pretrends_golden.json  | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/TODO.md b/TODO.md
index 1a32b34d..f9ac9824 100644
--- a/TODO.md
+++ b/TODO.md
@@ -95,6 +95,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | WooldridgeDiD: optional *efficiency hint* (NOT a canonical-link violation per W2023 Prop 3.1) when method/outcome pairing is sub-optimal — e.g., `method="ols"` on binary data is consistent under QMLE, but `method="logit"` is typically more efficient. The original framing in this row as a "canonical link requirement" tied to Prop 3.1 was incorrect: Wooldridge (2023) Table 1 lists Gaussian/OLS for "any response" and logistic-Bernoulli for "binary OR fractional". A useful hint exists (efficiency), but should not be framed as a methodology violation. See PR #453 R1 review for the corrected reading. | `wooldridge.py` | #216 | Low |
 | WooldridgeDiD: Stata `jwdid` golden value tests — add R/Stata reference script and `TestReferenceValues` class. | `tests/test_wooldridge.py` | #216 | Medium |
 <!-- The PreTrendsPower R parity row (PR-C, 2026-05-19) and the four PR-A-tagged PreTrendsPower rows (CS/SA Σ_22 fidelity, helper `violation_weights`, custom-weight persistence, linear γ-unit MDV; resolved in PR-B 2026-05-18) are all closed — see CHANGELOG.md [Unreleased] Added/Changed/Fixed entries for the new behavior. -->
+| PreTrendsPower: CS/SA `anticipation=1` R-parity fixture. The PR-C R-parity goldens cover NIS power + γ_p MDV at `atol=1e-4` on four shifted-grid / regular / irregular / K=1 fixtures, but R `pretrends` has no anticipation parameter so the Python-side `_extract_pre_period_params` anticipation filter (`if t < _pre_cutoff` in `pretrends.py` lines 1138-1150 for CS; mirror in SA branch) is not R-parity-locked. Build a synthetic `CallawaySantAnnaResults` (or `SunAbrahamResults`) with `anticipation=1` and a t=-1 event-study entry that should be filtered before reaching `_compute_power_nis`, then assert the resulting γ_p matches R's `slope_for_power()` on the K=4 shifted-grid fixture. Existing PR-B MC-based tests (`TestPretrendsPropositions`) and full-VCV tests (`TestPretrendsCovarianceSource`) already cover the filter mechanically; this would close the loop against R. | `tests/test_methodology_pretrends.py::TestPretrendsParityR`, `benchmarks/R/generate_pretrends_golden.R` | PR-C follow-up | Low |
 
 
 | Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 8 standalone estimators that expose `cluster=`: `CallawaySantAnna`, `SunAbraham`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added `vcov_type` to the `DifferenceInDifferences` inheritance chain only. | multiple | Phase 1a | Medium |
diff --git a/benchmarks/R/generate_pretrends_golden.R b/benchmarks/R/generate_pretrends_golden.R
index 4bb3c2fe..1881fa80 100644
--- a/benchmarks/R/generate_pretrends_golden.R
+++ b/benchmarks/R/generate_pretrends_golden.R
@@ -256,7 +256,7 @@ out <- list(
       "Three-tier parity contract, both numeric tiers at atol=1e-4:",
       "(1) NIS box probability at fixed gamma values on all 4 fixtures",
       "(atol=1e-4; R hardcodes thresholdTstat=1.96 while Python uses",
-      "qnorm(0.975) = 1.959963984540054, and mvtnorm::pmvnorm vs",
+      "scipy.stats.norm.ppf(0.975) = 1.959963984540054, and mvtnorm::pmvnorm vs",
       "scipy MVN CDF Genz-Bretz randomized-lattice differences bound the",
       "K=4 NIS power gap at ~5e-5);",
       "(2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular,",
diff --git a/benchmarks/data/r_pretrends_golden.json b/benchmarks/data/r_pretrends_golden.json
index 1e1eebc4..77e7d333 100644
--- a/benchmarks/data/r_pretrends_golden.json
+++ b/benchmarks/data/r_pretrends_golden.json
@@ -4,7 +4,7 @@
     "pretrends_version": "0.1.0",
     "pretrends_commit": "122731d082",
     "r_version": "R version 4.5.2 (2025-10-31)",
-    "description": "Roth (2022) PreTrendsPower parity goldens for diff-diff compute_pretrends_power / PreTrendsPower (PR-C). Three-tier parity contract, both numeric tiers at atol=1e-4: (1) NIS box probability at fixed gamma values on all 4 fixtures (atol=1e-4; R hardcodes thresholdTstat=1.96 while Python uses qnorm(0.975) = 1.959963984540054, and mvtnorm::pmvnorm vs scipy MVN CDF Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5); (2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular, irregular, shifted-grid (K=4), and K=1 grids (atol=1e-4; R uniroot tol vs Python brentq xtol gap dominates); (3) gamma-unit MDV invariance: PR-B's skip-L2-norm path produces MDV in Roth's gamma units exactly, matching R's slope_for_power(). See diff-diff/docs/methodology/papers/roth-2022-review.md for the full derivation."
+    "description": "Roth (2022) PreTrendsPower parity goldens for diff-diff compute_pretrends_power / PreTrendsPower (PR-C). Three-tier parity contract, both numeric tiers at atol=1e-4: (1) NIS box probability at fixed gamma values on all 4 fixtures (atol=1e-4; R hardcodes thresholdTstat=1.96 while Python uses scipy.stats.norm.ppf(0.975) = 1.959963984540054, and mvtnorm::pmvnorm vs scipy MVN CDF Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5); (2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular, irregular, shifted-grid (K=4), and K=1 grids (atol=1e-4; R uniroot tol vs Python brentq xtol gap dominates); (3) gamma-unit MDV invariance: PR-B's skip-L2-norm path produces MDV in Roth's gamma units exactly, matching R's slope_for_power(). See diff-diff/docs/methodology/papers/roth-2022-review.md for the full derivation."
   },
   "uniform_3_pre_periods_no_anticipation": {
     "panel": {