PR-C: PreTrendsPower R pretrends parity goldens at commit 122731d082#471
Merged
Conversation
Closes the deferred R-parity row from PR-B (PreTrendsPower implementation audit, Roth 2022). Generates JSON goldens at `benchmarks/data/r_pretrends_golden.json` from the committed R script against `jonathandroth/pretrends` commit `122731d082` (package version 0.1.0, R 4.5.2), and activates `TestPretrendsParityR` in `tests/test_methodology_pretrends.py`. Four fixtures (regular K=3, irregular K=3 `[-5,-3,-1]`, anticipation-shifted K=4, K=1 closed form) × NIS power + γ_p MDV at `atol=1e-4`. K=1 also asserts three-way cross-check against Roth Proposition 2's analytical truncated-normal expression `1 - Φ(z - γ/σ) + Φ(-z - γ/σ)` at `atol=1e-7` (Python/closed-form effectively exact; both within `atol=1e-4` of R). End-to-end irregular-grid `fit()` parity test exercises the full `fit() → _extract_pre_period_params → _get_violation_weights → _compute_mdv_nis` chain through the public API, locking PR-B Step 4's γ-unit linear-weight fix. Tolerance rationale: R hardcodes `thresholdTstat.Pretest=1.96` while Python uses `scipy.stats.norm.ppf(0.975) = 1.959963984540054`; R `slope_for_power` uses `uniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)` vs Python `brentq(xtol=2e-12)`; the inverse-solver tolerance gap dominates γ_p, and `mvtnorm::pmvnorm` vs `scipy.stats.multivariate_normal.cdf` Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5. Cross-surface tracker promotion: - `METHODOLOGY_REVIEW.md` PreTrendsPower row: `**Complete** (R parity pending)` → `**Complete**`; Last Review `2026-05-18` → `2026-05-19`. - REGISTRY.md Requirements checklist: `[x] R \`pretrends\` parity at commit \`122731d082\` (PR-C, 2026-05-19)`. - `roth-2022-review.md` "R `pretrends` package version pin (provisional)" Gaps bullet struck. - `TODO.md` PR-C row deleted. - CHANGELOG.md `[Unreleased]` Added entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R1 codex local review verdict: no P0/P1. Three actionable items addressed: - P2 (Methodology): `docs/methodology/papers/roth-2022-review.md` Gaps bullets at L282 (Joint Wald) and L288 (Heteroskedastic Sigma) contained stale provisional claims — "specific commit not pinned" and "current `diff_diff/pretrends.py` does Wald-test power/MDV only". Both are superseded by PR-B (NIS as default) and PR-C (pinned commit `122731d082`). Updated to cite the pinned commit and the current NIS+Wald state. - P2 (Documentation/Tests): the R script header and JSON `meta.description` advertised NIS power parity at `atol=1e-5`, but the active tests use `atol=1e-4` (matching empirical Genz-Bretz randomization gap of ~5e-5 on K=4 anticipation). Aligned the script comments + JSON description to `atol=1e-4` and added the Genz-Bretz rationale. - P3 (Maintainability): the K=1 fixture serialized `pre_periods` and `post_periods` as scalars due to jsonlite `auto_unbox=TRUE` (e.g., `pre_periods: -1`). Wrapped singleton vector fields in `I()` so they round-trip as length-1 arrays uniformly across all 4 fixtures. The `np.atleast_1d` defensive compensation in `_extract_python_params` is harmless and retained as defense-in-depth. Re-ran `Rscript generate_pretrends_golden.R` to regenerate the JSON (numerics unchanged — only structural changes from `I()` wrapping). All 4 `TestPretrendsParityR` tests + 131 pretrends suite tests pass; black + ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The rebase onto main after release 3.4.0 textually applied the PR-C CHANGELOG hunk under the [3.4.0] section because the surrounding context (`### Added` heading + agent-discoverability bullet) had been copied from the pre-release [Unreleased] into [3.4.0] by the release prep. PR-C is a post-3.4.0 change, so its CHANGELOG entry belongs in [Unreleased]. Moves the single bullet; release [3.4.0] content unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…ation
CI R1 codex review verdict: ✅ no P0/P1. Two P3 findings addressed:
- P3 (Maintainability): the R generator hardcoded
`pretrends_commit = "122731d082"` into the JSON but only verified
`packageVersion("pretrends") >= "0.1.0"`. A future rerun could
silently regenerate goldens from a drifted revision while still
stamping the artifact with the original commit. Fix: replace the
loose version gate with an exact `packageVersion == "0.1.0"` check
plus a `startsWith(packageDescription("pretrends")$RemoteSha,
PRETRENDS_COMMIT)` provenance assertion that fails closed with a
reinstall instruction if the installed revision drifts. Verified
via positive (RemoteSha = `122731d082a5990e274f57fd9af0968e44977e7a`)
and negative (synthetic `deadbeef` prefix) checks.
- P3 (Documentation/Tests): the `anticipation_shifted` fixture's
comment described it as validating anticipation-window filtering,
but the fixture omits the `t=-1` anticipation window and the parity
assertions consume prefiltered `Sigma_22` / weights directly — the
CS/SA-level `_extract_pre_period_params` anticipation filter
(`if t < _pre_cutoff` in `pretrends.py`) is NOT R-parity-locked by
this fixture. Fix: rename the comment / R `cat()` print / JSON
meta.description to "K=4 shifted-grid case", and document the
non-coverage explicitly in the file-header comment with a forward
reference to the existing PR-B MC-based and full-VCV coverage in
`TestPretrendsPropositions` / `TestPretrendsCovarianceSource`,
plus a deferred follow-up for a CS/SA-level
`anticipation=1 + R-parity` test (would need a synthetic
`CallawaySantAnnaResults` with a t=-1 entry that gets filtered
before reaching `_compute_power_nis`). Test class docstring
tolerance-rationale prose flipped "K=4 anticipation fixture" →
"K=4 shifted-grid fixture" to match.
The fixture's JSON key (`anticipation_shifted`) is unchanged to
preserve the test-side reference; only the prose contract is
clarified.
All 4 parity tests still pass; black + ruff clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
CI R2 codex review verdict: ✅ no P0/P1 (both prior P3s addressed). Two new P3 findings (both consequences of R1 fixes): - P3 (Documentation/Tests): the JSON `meta.description` string says Python uses `qnorm(0.975)`, but `qnorm` is the R function name; the rest of the PR correctly refers to `scipy.stats.norm.ppf(0.975)`. Fix: change the R script's `description = paste(...)` block from "qnorm(0.975)" to "scipy.stats.norm.ppf(0.975)" so the committed parity artifact's audit trail matches the language used in REGISTRY, the file-header comment, and the test class docstring. - P3 (Tech Debt): the R generator's file-header comment now explicitly defers a CS/SA `anticipation=1` R-parity test to a follow-up (R1 P3 #2 fix), but the PR removed the only PreTrendsPower TODO row and did not add a replacement tracker. Fix: add a low-priority TODO.md row describing the deferred work: build a synthetic `CallawaySantAnnaResults` (or `SunAbrahamResults`) with `anticipation=1` and a t=-1 entry that gets filtered by `_extract_pre_period_params` before reaching `_compute_power_nis`, then assert the resulting γ_p matches R's `slope_for_power()` on the K=4 shifted-grid fixture. All 4 parity tests still pass; the regenerated JSON only differs in the description string. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
HanomicsIMF
pushed a commit
to HanomicsIMF/diff-diff
that referenced
this pull request
May 22, 2026
Release notes consolidate 8 PRs since 3.4.0 (2026-05-19): Public-surface variance lifts: - SpilloverDiD survey_design on HC1/CR1 via Binder TSL (Wave E.1, igerber#468) - SpilloverDiD vcov_type=conley + survey_design via stratified-Conley on PSU totals (Wave E.2, igerber#474) + lag_cutoff>0 follow-up (igerber#477) - SunAbraham vcov_type ∈ {classical, hc1, hc2, hc2_bm} (Phase 1b 1/8, igerber#472) - WLS-CR2 Bell-McCaffrey gates lifted via clubSandwich port (igerber#475) Methodology-review-tracker promotions (mostly docs/tests): - PreTrendsPower R pretrends parity goldens (PR-C, igerber#471) - HAD methodology-review-tracker promotion (igerber#473) - ContinuousDiD methodology-review-tracker promotion (igerber#476) All changes additive; bit-equal defaults preserved across the affected estimators. No new estimators (patch-level per semver convention). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
benchmarks/data/r_pretrends_golden.jsonfrom the committedbenchmarks/R/generate_pretrends_golden.Ragainstjonathandroth/pretrendscommit122731d082(package version 0.1.0, R 4.5.2), and activatesTestPretrendsParityRintests/test_methodology_pretrends.pywith 4 concrete tests covering 4 fixtures × NIS power + γ_p MDV atatol=1e-4.[-5,-3,-1]fixture: a syntheticMultiPeriodDiDResultsconstructed from the fixture's β̂ + full Σ +interaction_indicesis passed throughpt.fit(), locking the fullfit() → _extract_pre_period_params → _get_violation_weights (γ-unit linear path) → _compute_mdv_nischain against R'sslope_for_power(). This is the regression test for PR-B Step 4's γ-unit linear-weight fix at the public-API boundary.single_pre_period_closed_formfixture (Roth Proposition 2): Python_compute_power_nis≡ analytical truncated-normal1 - Φ(z - γ/σ) + Φ(-z - γ/σ)atatol=1e-7(effectively exact — same scalar path); both withinatol=1e-4of R. Strongest parity claim in the suite.METHODOLOGY_REVIEW.mdPreTrendsPower row flipped from**Complete** (R parity pending)→**Complete**. REGISTRY.md Requirements checklist row checked. Roth (2022) paper review'sR pretrends package version pin (provisional)Gaps bullet struck.TODO.mdPR-C row deleted. CHANGELOG.md[Unreleased]entry added.Tolerance rationale (atol=1e-4 on both tiers). R hardcodes
thresholdTstat.Pretest = 1.96while Python usesscipy.stats.norm.ppf(0.975) = 1.959963984540054(dz ≈ 3.6e-5); on top of that,mvtnorm::pmvnorm(R) andscipy.stats.multivariate_normal.cdf(Python) use Genz-Bretz randomized-lattice rules with different absolute-error defaults (abseps ≈ 1e-3vs1e-5). The empirical NIS-power gap is bounded by ~5e-5 on the K=4 anticipation fixture (smaller for K∈{1,3}). For the inverse path (γ_p), Rslope_for_powerusesuniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)versus Pythonbrentq(xtol=2e-12); the inverse-solver tolerance gap dominates.atol=1e-4is the realistic ceiling on both tiers without tightening either solver.Methodology references (required if estimator / math changes)
PreTrendsPower(Roth 2022 NIS box probability; γ_p MDV).pretrendspackage: https://github.com/jonathandroth/pretrends at commit122731d082(package version 0.1.0).docs/methodology/papers/roth-2022-review.md.docs/methodology/REGISTRY.md## PreTrendsPower.pretest_form='nis', full-Σ_22 routing on CS/SA event-study adapters, γ-unit linear weights,power_at(custom)persistence) are unchanged. The R-vs-Python tolerance differences documented above are not deviations but solver/algorithm-floor consequences (R hardcodedthresholdTstat, Runirootdefault tol, Genz-Bretz randomization).Validation
tests/test_methodology_pretrends.py—TestPretrendsParityRactivated with 4 concrete tests +_extract_python_paramshelper (np.atleast_1ddefensive parsing).test_nis_power_matches_r_pretrends(16 comparisons: 4 fixtures × 4 γ values).test_mdv_gamma_p_matches_r_slope_for_power(8 comparisons: 4 fixtures × 2 target_power values).test_irregular_grid_gamma_unit_end_to_end_matches_r(end-to-endfit()throughMultiPeriodDiDResultswithinteraction_indices).test_k1_matches_r_and_closed_form(3-way: Python ≡ closed form at atol=1e-7, both within 1e-4 of R).122731d082.tests/test_pretrends.py+tests/test_pretrends_event_study.py+tests/test_methodology_pretrends.py)./ai-review-local --backend codex2 rounds. R1 surfaced 3 P2/P3 items (paper-review stale prose, atol contract mismatch in script header, K=1 schema unboxing); all addressed in commita75d4ed8. R2 verdict: ✅ no findings at any severity.Security / privacy
Generated with Claude Code