diff --git a/CHANGELOG.md b/CHANGELOG.md
index 278ce96f..8b3f2d5c 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,6 +9,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 - **`MultiPeriodDiD(cluster=..., vcov_type="hc2_bm")` now supported** (`diff_diff/estimators.py:1657`). Pre-PR the combination raised `NotImplementedError` because the cluster-aware CR2 Bell-McCaffrey Satterthwaite DOF for the post-period-average ATT (`avg_att = (1/n_post) Σ_{t ≥ t_treat} β_t`) was not implemented — only the per-coefficient case existed in `_compute_cr2_bm`. New `_compute_cr2_bm_contrast_dof` helper in `diff_diff/linalg.py` generalizes the per-coefficient loop to arbitrary `(k, m)` contrast matrices using the identical Pustejovsky-Tipton 2018 Section 4 algebra; `_compute_cr2_bm` is refactored to call it with `contrasts=eye(k)` so the existing per-coefficient parity to clubSandwich's `coef_test$df_Satt` is preserved (refactor regression at atol=1e-10). `MultiPeriodDiD.fit()` extends its existing avg_att DOF block to branch on `effective_cluster_ids`: one-way `_compute_bm_dof_from_contrasts` when None, cluster-aware `_compute_cr2_bm_contrast_dof` otherwise. Cluster IDs are per-observation length `n` and are NOT subscripted by the rank-deficient column-drop mask. R parity verified at atol=1e-10 against clubSandwich's `Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom` on the new `mpd_clustered_avg_att_dof` fixture in `benchmarks/data/clubsandwich_cr2_golden.json` (Wald_test's HTZ on a 1-row constraint matrix yields the Satterthwaite t-test DOF). Per-coefficient `period_effects[t].p_value` / `conf_int` and `avg_att` `avg_p_value` / `avg_conf_int` now reflect the correct Satterthwaite DOF rather than the n-k fallback under cluster+hc2_bm. Weighted CR2-BM (`survey_design=` paths) remains a separate gate. New tests: `tests/test_linalg_hc2_bm.py::TestCR2BMContrastDOF` (4 tests: refactor regression, R-parity, shape validation, cluster-count validation); existing `test_multi_period_cluster_plus_hc2_bm_rejected` flipped to behavioral `test_multi_period_cluster_plus_hc2_bm_produces_finite_inference`.
+- **PreTrendsPower: NIS box probability as the new primary test form (PR-B methodology audit, Roth 2022).** Implements Roth (2022) Section II.A-B no-individually-significant (NIS) box probability `P(β̂_pre ∈ B_NIS(Σ))` as the new default `pretest_form='nis'` on `PreTrendsPower`, `compute_pretrends_power`, and `compute_mdv`. The Wald noncentral-χ² form previously shipped as the implicit default is now opt-in via `pretest_form='wald'` and remains as a paper-supported alternative (Propositions 1+3+4 all apply — the Wald ellipsoid is convex). Computation uses `scipy.stats.multivariate_normal.cdf` with `lower_limit=` for the rectangular box probability on the centered change-of-variable `Y = β̂_pre - δ_pre ~ N(0, Σ_22)`; the MDV is solved via doubling expansion + `optimize.brentq` bisection with a 1000-cap non-convergence fallback returning `np.inf`. New private helpers `_compute_power_nis` and `_compute_mdv_nis`; the existing methods are renamed `_compute_power_wald` and `_compute_mdv_wald` with byte-identical math, and `_compute_power` / `_compute_mdv` become dispatchers on `self.pretest_form`. `power_curve()` and `PreTrendsPowerResults.power_at()` inherit the dispatch (power_at via the new persisted `pretest_form` field on the result). The `summary()` / `to_dict()` / `to_dataframe()` outputs dispatch on `pretest_form` — NIS fits print "NIS box probability: ..." instead of "Non-centrality parameter: ...".
+- **PreTrendsPower: full Σ_22 routing on CS and SA event-study adapters (PR-B methodology audit, Σ_22 fidelity).** The shipped `compute_pretrends_power` adapter previously hard-coded `np.diag(ses**2)` for both `CallawaySantAnnaResults` and `SunAbrahamResults` regardless of whether the analytical event-study VCV was available, dropping the off-diagonal correlations Roth's framework relies on. PR-B routes non-bootstrap CS fits through the full `event_study_vcov` sub-block (already persisted at `staggered_results.py:126-128`) and extends `SunAbrahamResults` to also persist `event_study_vcov` + `event_study_vcov_index` constructed via the W-matrix aggregation `event_study_vcov = W @ vcov_cohort @ W.T` where W is the cohort-aggregation matrix (`|event_times| × n_interactions` sparse matrix with `W[i, j] = cohort_weights[e_i][g]` at column `j = coef_index_map[(g, e_i)]`). The new shared helper `_extract_event_study_vcov_subblock` at module level in `pretrends.py` consumes the full VCV when available with a `.index()` lookup on `event_study_vcov_index`; defensive ValueError on label mismatch. Bootstrap fits and replicate-weight survey fits clear `event_study_vcov` (mirroring the CS bootstrap-clear pattern at `staggered.py:2032-2036`) so they fall through to `diag(ses^2)` and the analytical VCV is never mixed with bootstrap/replicate SE overrides downstream. Diagonal-entry sanity check verifies that `event_study_vcov[i, i] = se(e_i)^2` matches the existing per-event-time SE computation in `_compute_iw_effects` at `atol=1e-10`. **Backwards-compatible field additions**: new `event_study_vcov` + `event_study_vcov_index` fields on `SunAbrahamResults` default to `None`, so existing consumers that don't read them see no change.
+- **`PreTrendsPowerResults` now persists fitted `violation_weights` + `pretest_form` + `nis_box_probability` (PR-B Step 5).** New optional fields on the result dataclass enable `power_at(M)` to work for ALL four violation types (linear / constant / last_period / **custom**) on fresh fits, by reading the stored weights directly instead of reconstructing from `violation_type` alone. The PR-A R18 NotImplementedError silent-failure guard for `violation_type='custom'` is retained ONLY for legacy serialized results (`violation_weights=None`) — fresh fits no longer hit it.
+- **Helper API: `compute_pretrends_power` and `compute_mdv` now accept `violation_weights` and `pretest_form` (PR-B Step 6).** Closes the PR-A R18 helper/class API gap that previously made `violation_type='custom'` unusable from the helper functions. Helpers now forward both new parameters to the underlying `PreTrendsPower` class. Default `pretest_form='nis'` matches the class default. All existing helper call sites in `test_pretrends.py` and `test_pretrends_event_study.py` continue to pass without changes because the form-invariance of most assertions allowed the default flip with only 3 tests needing targeted updates.
+- **NEW `tests/test_methodology_pretrends.py` (PR-B Step 7).** Roth (2022) Section II.A-B paper-equation-numbered Verified Components walk-through. 8 classes, 30+ tests covering K=1 closed-form (Proposition 2 proof), NIS box probability via MC simulation cross-check, Propositions 1-4 simulation parity, linear-units γ-scale verification on regular / irregular / pandas.Period grids, custom-weight persistence regression, JSON-serializability of `to_dict`, CS/SA full-VCV adapter regression, helper API end-to-end, NIS-vs-Wald differentiation, and skip-gated `TestPretrendsParityR` stubs for PR-C R-package goldens.
+- **`benchmarks/R/generate_pretrends_golden.R` (PR-B Step 12).** R generator script for the PR-C deferred goldens. Script committed with a `<PR-C-PIN>` placeholder commit reference; PR-C pins the audited `pretrends` revision, runs the script, commits the JSON goldens at `benchmarks/data/r_pretrends_golden.json`, and activates the parity tests.
 - **`MultiPeriodDiD(absorb=..., vcov_type in {"hc2", "hc2_bm"})` now supported** (`diff_diff/estimators.py:1476`). Mirrors the DiD-absorb auto-route shipped earlier in this release: when `absorb=` is paired with `vcov_type in {"hc2","hc2_bm"}`, `MultiPeriodDiD.fit()` promotes the absorb columns to `fixed_effects=` internally so the existing full-dummy-design code path computes the algebraically correct vcov on the event-study design (`treated + period_X dummies + treated:period_X interactions + factor(unit)`). Verified at ~1e-10 vs `lm() + sandwich::vcovHC(type="HC2")` and `lm() + clubSandwich::vcovCR(cluster=1:n, type="CR2")` on a 5-cohort × 5-period event-study fixture (new `tests/test_estimators_vcov_type.py::TestMPDAbsorbedFERParity` against `benchmarks/data/clubsandwich_cr2_golden.json` scenario `mpd_absorbed_fe_did`). HC1/CR1 paths on `absorb=` are unchanged (no leverage term). `TwoWayFixedEffects(vcov_type in {"hc2","hc2_bm"})` rejection remains as a follow-up (different fit-path structure — no `fixed_effects=` equivalent inside TWFE). **Behavioral note (full `MultiPeriodDiDResults` surface change under auto-route):** under the auto-route, the entire returned `MultiPeriodDiDResults` reflects the full-dummy fit rather than the within-transformed fit — `result.coefficients`, `result.vcov`, `result.residuals`, `result.fitted_values`, `result.r_squared` all include the FE-dummy entries / un-demeaned values. `result.period_effects[t].effect` / `.se` / `.p_value` / `.conf_int` and `result.avg_att` / `.avg_se` are invariant to this routing (FWL guarantee). MPD requires a time-invariant ever-treated indicator that lies in the span of the intercept and the post-auto-route unit FE dummies (the exact alias depends on the omitted FE reference category under `pd.get_dummies(drop_first=True)`, not just on "the sum of treated-cohort unit dummies"), so `solve_ols` drops one column from that collinear set under R-style rank-deficiency handling. Which specific column is dropped is pivot-order and dummy-coding dependent (in the shipped parity fixture it is a never-treated unit dummy, not the `treated` main effect itself). The per-period interaction coefficients (`treated:period_X`) and `avg_att` are identified and invariant to that choice; parity tests target those rather than the `treated` main effect. **Survey-design scope (replicate weights):** when `survey_design=` uses replicate weights, the auto-route short-circuits the absorb-refit branch at `estimators.py:1693` and routes through the standard `compute_replicate_vcov` path on the fixed full-dummy design — correct because the design does not depend on replicate weights so no per-replicate refit is needed. **Redundant time-FE skip:** when the routed (or directly-supplied) `fixed_effects` list contains the `time` column, MPD silently skips emitting `<time>_<X>` dummies for that entry because the design already absorbs the time dimension via the non-reference period dummies; without the skip, the two blocks would collide on dummy names and the `coefficients` dict would silently collapse duplicates under `var_names`-keyed construction, breaking the coefficients-vs-vcov alignment that downstream consumers rely on. This applies to both the new `absorb=` auto-route and the pre-existing `fixed_effects=[<time_col>]` invocation.
 - **`DifferenceInDifferences(absorb=..., vcov_type in {"hc2", "hc2_bm"})` now supported** (`diff_diff/estimators.py:382`). Previously raised `NotImplementedError` because the HC2 leverage correction and CR2 Bell-McCaffrey DOF depend on the FULL FE hat matrix, while within-transformation (FWL) preserves coefficients and residuals but not the hat. Lift via internal auto-route: when `absorb=` is paired with `vcov_type in {"hc2","hc2_bm"}`, the fit promotes the absorb columns to `fixed_effects=` internally so the existing full-dummy-design code path computes the algebraically correct vcov. Empirically matches `lm() + sandwich::vcovHC(type="HC2")` and `lm() + clubSandwich::vcovCR(cluster=..., type="CR2")` at ~1e-10 (verified via new `tests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParity` against `benchmarks/data/clubsandwich_cr2_golden.json` scenario `absorbed_fe_did`, with the R generator using the singleton-cluster CR2 trick for one-way HC2-BM Satterthwaite DOF). HC1/CR1 paths unchanged. `MultiPeriodDiD(absorb=...)` and `TwoWayFixedEffects` rejections remain as follow-ups (different fit-path structure). **Behavioral note (full `DiDResults` surface change under auto-route):** under the auto-route, the entire returned `DiDResults` reflects the full-dummy fit rather than the within-transformed fit. Specifically, `result.coefficients` and `result.vcov` include the FE-dummy entries (matching the `fixed_effects=` path), `result.residuals` and `result.fitted_values` are on the un-demeaned outcome scale, and `result.r_squared` is computed on the un-demeaned outcome (so it absorbs the FE variance and will typically be higher than the within-R²). `result.att` is invariant to this routing (FWL guarantee). Downstream consumers reading `result.att` are unaffected; consumers reading the broader result surface should expect the full-dummy values. **Survey-design scope:** the auto-route changes the FE handling (and removes the prior absorbed-FE rejection), but `survey_design=` continues to drive its own variance path (Taylor-series linearization or replicate-weight variance, per the existing survey contract) rather than the analytical HC2/HC2-BM sandwich. The auto-route is therefore methodologically meaningful for non-survey fits and for the FE-handling side of survey fits; analytical small-sample inference under `vcov_type in {"hc2","hc2_bm"}` is bypassed when a survey design is supplied.
 - **`SpilloverDiD` Gardner GMM first-stage uncertainty correction across HC1 / Conley / cluster (Wave D).** Closes the documented Wave B/C "SEs biased downward by a few percent" caveat. **Documented synthesis** of Butts (2021) Section 3.1 (the IF construction for spillover-aware DiD) + Gardner (2022) Section 4 (the two-stage GMM sandwich) + Conley (1999) (the spatial kernel). No reference software combines all three — `did2s` (Butts & Gardner) implements the Gardner correction without rings or Conley; `conleyreg` and `acreg` implement Conley without the two-stage correction. Wave D is the synthesis. Applies unconditionally under `vcov_type ∈ {"hc1", "conley", "cluster"}` for both `event_study=False` AND `event_study=True`. **Formula** (Butts 2021 §3.1 + Gardner 2022 §4): `psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}` where `gamma_hat = (X_10' X_10)^{-1} (X_1' X_2)` is the stage-1-projection-of-stage-2 cross-moment; meat = `Psi' K Psi` with `K` dispatched by `vcov_type` (identity for HC1, block-indicator for cluster, spatial kernel for Conley); vcov = `(X_2' X_2)^{-1} @ meat @ (X_2' X_2)^{-1}`. **Finite-sample multipliers:** `n/(n-p)` for HC1; `G/(G-1) * (n-1)/(n-p)` for cluster CR1; no multiplier for Conley (preserves `conleyreg` / Wave B convention). **Public surface:** `vcov_type="classical"` now raises `NotImplementedError` upfront (the Wave D synthesis has not been derived for the homoskedastic meat structure `sigma_hat^2 * (X_10' X_10)`); REGISTRY's "vcov_type restrictions" block updated accordingly. **Point estimates unchanged** (`tau_total`, `delta_j`, event-study `tau_k` / `delta_jk` are byte-identical to Wave B/C); SE values shift upward by 1-few percent depending on first-stage residual variance. **Implementation:** new module-level helper `_compute_gmm_corrected_meat` in `diff_diff/two_stage.py` (NOT a modification of the existing `_compute_gmm_variance` method — TwoStageDiD's path is unchanged); new module-level helper `_build_butts_fe_design_csr` in `diff_diff/spillover.py`; new module-level helper `_compute_conley_meat` in `diff_diff/conley.py` factored out of `_compute_conley_vcov` so the same kernel-application code path handles both standard sandwich (`X * residuals`) and Wave D IF outer product (`Psi`) cases. **No new public API kwarg** — the correction is unconditional. Wave D variance mode dispatch derives from the public contract: `vcov_type="conley"` → `"conley"`; `cluster=<col>` → `"cluster"` (CR1); otherwise `"hc1"`. **Wave B/C SE goldens re-pinned** at `tests/test_spillover.py::TestSpilloverDiDEventStudyBackwardCompat` (constants renamed `_WAVE_B_GOLDEN_*` → `_WAVE_D_GOLDEN_*`; pre-Wave-D references retained as commented baselines for the directional inflation invariant `_WAVE_B_UNCORRECTED_*`). **Tests:** new test classes `TestSpilloverDiDWaveDGmmCorrectedHc1Hand` (hand-derived `Psi` on a 4-unit × 3-period over-identified panel — matches at `atol=1e-12`), `TestSpilloverDiDWaveDGmmCorrectedEventStudy` (vcov shape on event-study path), `TestSpilloverDiDWaveDGmmCorrectedNanInferenceContract` (rank-deficient column propagation), `TestSpilloverDiDWaveDGmmCorrectedValidatorWiring` (Conley validator fires from the new helper), `TestSpilloverDiDWaveDGmmCorrectedFitIdempotence` (clone + repeat-fit bit-identity per `feedback_fit_does_not_mutate_config`), `TestSpilloverDiDWaveDPublicVarianceContract` (end-to-end public `cluster=<col>` CR1 routing, single-cluster rejection, classical NotImplementedError). Closes the Gardner-GMM follow-up row in `TODO.md`.
@@ -20,12 +26,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422.
 
 ### Changed
+- **PreTrendsPower: default `pretest_form` flipped from implicit Wald to explicit `'nis'` (PR-B methodology audit, Roth 2022).** The new default uses the paper-analyzed NIS box probability — the form Roth (2022) actually tabulates in his Section I.C empirical exercise and the form the R `pretrends` package implements. `pretest_form='wald'` preserves the **acceptance-region form** (noncentral-χ² on the quadratic form `δ' Σ_22^{-1} δ`) byte-identically — the methods are renamed `_compute_power_wald` + `_compute_mdv_wald` with unchanged bodies, dispatched on `self.pretest_form`. **Caveat on bit-identity for fitted results**: the linear-weight contract changed independently in PR-B Step 4 (see the next bullet), so a Wald fit on an irregular pre-period grid produces γ-unit MDV via the new `relative_times`-threaded path, NOT the pre-PR-B count-based L2-normalized MDV. Pre-PR-B Wald numerics are bit-identical to post-PR-B Wald output only on the legacy `relative_times=None` callable path (callers that bypass `fit()` and call `_get_violation_weights(n_pre)` directly) and on the regular-grid case where `|t| ∝ [n_pre-1, ..., 0]`. All existing `tests/test_pretrends.py` numerical assertions (101 helper/class references; only 3 tests depended on the exact Wald size-at-null property and were pinned to `pretest_form='wald'`) continue to produce identical numerical output. The `docs/tutorials/07_pretrends_power.ipynb` walkthrough re-render to reflect the default flip is tracked as a follow-up (the existing tutorial does not exercise the irregular-grid regime).
+- **PreTrendsPower: `_get_violation_weights('linear')` now honors actual pre-period relative-time labels and skips L2 normalization → reported MDV is in Roth's γ units (PR-B Step 4).** Pre-PR-B, the linear-violation direction was constructed as `[n_pre-1, ..., 1, 0] / ||·||_2` from `n_pre` count alone — irregular pre-period grids like `{-5, -3, -1}` were treated as if the periods were `{-3, -2, -1}`, and the L2-normalization meant the reported MDV equaled `γ · ||t||_2`, not γ. PR-B threads the actual `relative_times` array from `_extract_pre_period_params` into `_get_violation_weights` and, for `violation_type='linear'` with `relative_times not None`, uses `weights = |t|` directly with NO L2 normalization. Then `δ_pre = M · |t|` reflects Roth's `δ_t = γ · t` convention and the reported MDV equals γ exactly. Verified: regular grid `[-3, -2, -1]` → weights `[3, 2, 1]`; irregular grid `[-5, -3, -1]` → weights `[5, 3, 1]`; backwards-compat callers that bypass `fit()` and pass only `n_pre` retain the legacy normalized `[n_pre-1, ..., 0] / ||·||_2` behavior. The `_extract_pre_period_params` return type widened from a 4-tuple to a 6-tuple `(effects, ses, vcov, n_pre, relative_times, covariance_source)`; the `relative_times` element is populated by all three adapter branches from their respective sorted pre-period lists (MPD via `pandas.Period` / `Timestamp` / `np.datetime64` arithmetic when applicable, falling back to a warn + count-based normalized direction for genuinely non-numeric labels), and the new `covariance_source` element records the actual extraction path for downstream report-layer tier classification.
 - **BaconDecomposition: default `weights` flipped from `"approximate"` to `"exact"` (PR-B methodology audit).** The new default uses Goodman-Bacon (2021) Theorem 1's exact Eqs. 7-9 + 10e-g weights, matching R `bacondecomp::bacon()` at `atol=1e-6` (validated via `tests/test_methodology_bacon.py::TestBaconParityR`; see the new Added entry above for the convention divergence on always-treated cohorts). Hand-calculation + TWFE-vs-weighted-sum identity also hold at `atol=1e-10`. The `weights="approximate"` path remains available as an opt-in fast diagnostic for speed-sensitive loops; its numerical output may differ from R. Three entry points were flipped: `BaconDecomposition(weights="exact")` (`bacon.py:397`), `bacon_decompose(weights="exact")` (`bacon.py:1064`), `TwoWayFixedEffects.decompose(weights="exact")` (`twfe.py:684`). **Behavior change for users not passing explicit `weights=`**: the decomposition weights are now paper-faithful by default. Users who depended on the previous `"approximate"` numerics for diagnostic plots or comparison-type weight shares can preserve the old behavior by passing `weights="approximate"` explicitly. **Survey-design behavior change**: `weights="exact"` (now the default) routes through `_validate_unit_constant_survey`, which rejects survey designs whose weights / strata / PSU / FPC columns vary within a unit across periods (the exact-mode path collapses to per-unit aggregation via `groupby().first()`). The previous `weights="approximate"` default tolerated time-varying within-unit survey weights via observation-level weighted means. Users whose survey-weighted Bacon calls used time-varying within-unit weights must now either (a) collapse their weights to be unit-constant or (b) pass explicit `weights="approximate"` to retain the legacy obs-level path. The production diagnostic surface (`diff_diff/diagnostic_report.py:1740`) was updated to pass explicit `weights="exact"`. Existing test assertions in `tests/test_bacon.py` continue to pass with the new default; the `test_weighted_sum_equals_twfe` tolerance was tightened from `< 0.1` to `< 1e-10` to lock the Theorem 1 algebraic-identity contract.
 
 - **`ChaisemartinDHaultfoeuille.predict_het` inference: t-distribution df threading (closes TODO pilot-412).** `_compute_heterogeneity_test` now passes `df = n_obs - rank(design)` to `safe_inference` on the non-survey OLS path, matching R `did_multiplegt_dyn(predict_het=...)`'s t-distribution inference (`DIDmultiplegtDYN:::did_multiplegt_main` `t_stat <- qt(0.975, df.residual(model))` site). Pre-PR Python used `df=None` (normal Z critical), producing 0.1-2% rtol gaps on `p_value` and `conf_int` vs R. Parity tolerance tightened on the existing forward-horizon scenarios (`multi_path_reversible_predict_het`, `multi_path_reversible_by_path_predict_het`) from "unpinned" to `INFERENCE_RTOL=1e-4` on `p_value` and `conf_int`; `beta` / `se` / `t_stat` continue at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5`. **Post-drop rank (post-2026-05-16 wrap-up):** the df denominator uses the post-drop numerical rank via `_detect_rank_deficiency`, which `solve_ols` already calls internally. For full-rank designs `rank == n_params` and behavior is bit-identical to the pre-PR `n_obs - n_params` path; for near-rank-deficient designs that `solve_ols` retains rather than NaN-out (e.g., cohort-collinearity at high horizons), the post-drop rank is strictly lower and the post-PR `df` is larger, matching R's `lm()` convention. The Z-vs-t REGISTRY deviation note is replaced with an "R parity (post-2026-05-15 df threading)" positive-claim note.
 
 - **`ChaisemartinDHaultfoeuille.by_path` negative-baseline path regression coverage.** New `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathNonBinary::test_negative_baseline_path_supported` exercises switchers with `D_{g,1} = -1` and asserts that `path_effects` correctly contains negative-baseline tuple keys (e.g., `(-1, 0, 0, 0)`, `(-1, 1, 1, 1)`). This closes the test-coverage gap from PR #419: the existing `test_negative_integer_D_supported` only covered paths with negative values in non-baseline positions (e.g., `(0, -1, -1, -1)`), which does not trigger R's documented `substr(path, 1, 1)` baseline-extraction bug. Python's tuple-key matching is correct under any baseline value; this test pins the contract. No R-parity fixture is added because R is the buggy side on this regime — the deviation is documented in the REGISTRY non-binary treatment Note.
 
+### Fixed
+- **PreTrendsPower: unit-consistent level-scale ratio for tier classification (PR-B R12 follow-up).** PR-B Step 4 made the linear MDV report Roth's γ units (a slope on relative time), but downstream tier-classification heuristics still divided the raw γ by level-scale quantities — `DiagnosticReport.pretrends_power` computed `mdv_share_of_att = mdv / abs(att)`, `is_informative` checked `mdv < 2 * max(pre_period_ses)`, and `sensitivity_to_honest_did` reported `mdv_in_ses = mdv / max_pre_se`. On irregular pre-period grids this silently mixed slope and level scales and could mis-tier the same fit as `well_powered` / `moderately_powered` / `underpowered`. Fix: new `PreTrendsPowerResults.max_abs_pre_violation` property exposes the level-scale scalar `mdv * max(|violation_weights|)` — the largest level-scale pre-period deviation under the MDV. `is_informative`, `sensitivity_to_honest_did`, `DiagnosticReport._check_pretrends_power`, and `_format_precomputed_pretrends_power` all switched to consume `max_abs_pre_violation` instead of raw `mdv` for level-scale comparisons. `mdv_share_of_att` is now defined as `max_abs_pre_violation / abs(att)`; the schema also surfaces the new `max_abs_pre_violation` field for inspection. Legacy serialized results without `violation_weights` fall back to raw `mdv` (preserves pre-PR-B count-based L2-normalized behavior where `mdv` was already roughly level-scale). On the live `cs_fit` fixture the ratio moves from `0.053` (slope/level mismatch) to `0.211` (level/level) — still `well_powered`, but now interpretable. New regressions: `test_max_abs_pre_violation_uses_weight_scale_on_irregular_grid` (γ * 5 on `[-5, -3, -1]`), `test_is_informative_uses_level_scale_not_raw_gamma` (level-scale check beats raw-γ check on a constructed mismatch), plus the updated BR `test_full_vcov_path_no_downgrade_on_real_cs_fit` which now pins `0.35 < max_abs_pre_violation < 0.40`.
+- **PreTrendsPower: `PreTrendsPowerResults.power_at(M)` for `violation_type='custom'` (PR-B Step 5).** PR-A R18 added a `NotImplementedError` guard to prevent silent equal-weights output when `power_at()` couldn't reconstruct the fitted custom weights. PR-B Step 5 persists the normalized `violation_weights` on `PreTrendsPowerResults` at fit time, so `power_at(M)` now works correctly for all four violation types (linear / constant / last_period / custom) on fresh fits. The PR-A guard is retained only for legacy serialized results lacking the new `violation_weights` field (refit with current library version to lift). Verified by the new `test_power_at_works_for_custom_violation_type` regression test and the companion `test_power_at_raises_on_legacy_custom_result_without_weights` (simulates a legacy serialized result by clearing `violation_weights` to None).
+- **`DiagnosticReport` / `BusinessReport` covariance-source provenance propagation (PR-B Step 3, R3 follow-up).** Before PR-B, `DiagnosticReport._infer_cov_source` flagged CS / SA fits with populated `event_study_vcov` as `"diag_fallback_available_full_vcov_unused"`, and `_apply_diag_fallback_downgrade` then conservatively downgraded the `well_powered` tier to `moderately_powered`. PR-B Step 3 routes those fits through the full `Σ_22` sub-block at the estimator layer — but the report layer kept the old type-based inference, so correctly-computed full-VCV power results were silently being downgraded. Fix: `PreTrendsPowerResults` gains a new `covariance_source` field that `pretrends.py:_extract_pre_period_params` populates with `"full_pre_period_vcov"` or `"diag_fallback"` based on the actual extraction path taken; `DiagnosticReport._check_pretrends_power` and `_format_precomputed_pretrends_power` prefer that persisted label and fall back to type-based inference only for legacy serialized results that lack the field. Two paths now coexist through the report layer: **new fits** (post-PR-B, `covariance_source` is persisted) consume the persisted label directly — non-bootstrap CS / SA report `"full_pre_period_vcov"` and are NOT downgraded; **legacy serialized results** (pre-PR-B, no `covariance_source` field on the object) fall through to `_infer_cov_source`, which STILL emits the conservative `"diag_fallback_available_full_vcov_unused"` sentinel for CS / SA + populated `event_study_vcov` because without the persisted label we cannot distinguish a pre-PR-B fit (which used `diag(ses^2)`) from a post-PR-B fit, and the PR-A conservative downgrade still applies to preserve backwards-compat. For `MultiPeriodDiDResults` without `interaction_indices`, the legacy fallback reports `"diag_fallback"` (a genuine fallback, no downgrade applies). Effect: non-bootstrap CS / SA pre-trends power blocks on fresh fits now keep their well_powered tier through the report layer (instead of being downgraded by the conservative sentinel); legacy serialized results are unchanged. Verified by `test_precomputed_pretrends_power_persisted_full_vcov_no_downgrade` (new fits), `test_precomputed_pretrends_power_legacy_missing_field_still_downgraded` (legacy fallback contract), `test_precomputed_pretrends_power_consumes_persisted_cov_source` (persisted label takes precedence over legacy inference), and `test_precomputed_pretrends_power_legacy_mpd_without_interaction_indices_reports_diag`.
+
 ## [3.3.3] - 2026-05-15
 
 ### Added
diff --git a/METHODOLOGY_REVIEW.md b/METHODOLOGY_REVIEW.md
index ffe4f720..e61ca04e 100644
--- a/METHODOLOGY_REVIEW.md
+++ b/METHODOLOGY_REVIEW.md
@@ -80,7 +80,7 @@ The catalog grew incrementally over several quarters, so formats vary across the
 |------|--------|-------------|--------|-------------|
 | BaconDecomposition | `bacon.py` | `bacondecomp::bacon()` | **Complete** | 2026-05-16 |
 | HonestDiD | `honest_did.py` | `HonestDiD` package | **Complete** | 2026-04-01 |
-| PreTrendsPower | `pretrends.py` | `pretrends` package | **In Progress** | — |
+| PreTrendsPower | `pretrends.py` | `pretrends` package | **Complete** (R parity pending) | 2026-05-18 |
 | PowerAnalysis | `power.py` | `pwr` / `DeclareDesign` | **In Progress** | — |
 | PlaceboTests | `diagnostics.py` | (no canonical reference) | **In Progress** | — |
 
@@ -1047,18 +1047,29 @@ and covariate-adjusted specifications.)
 | Module | `pretrends.py` |
 | Primary Reference | Roth (2022), *Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends*, AER:I 4(3), 305-322 |
 | R Reference | `pretrends` package |
-| Status | **In Progress** |
-| Last Review | — |
+| Status | **Complete** (R parity pending) |
+| Last Review | 2026-05-18 |
 
 **Documentation in place:**
-- REGISTRY.md section: `## PreTrendsPower` (MDV at target power, four violation types — linear/constant/last_period/custom, power curve plotting, HonestDiD integration)
-- Implementation: `tests/test_pretrends.py` (point-estimator, MDV, power curve, sensitivity) plus event-study coverage in `tests/test_pretrends_event_study.py`
-- Paper review on file: `docs/methodology/papers/roth-2022-review.md` (added 2026-05-17; non-authoritative source audit — registry entry remains authoritative until the follow-up audit PR)
+- REGISTRY.md section: `## PreTrendsPower` — NIS-framed audit per Roth (2022) Section II.A-B with full equation blocks for both NIS and Wald forms; paper-supported alternative + γ-unit MDV + full-Σ_22 routing all locked.
+- Paper review on file: `docs/methodology/papers/roth-2022-review.md` (added 2026-05-17 via PR #463).
+- Implementation: `tests/test_pretrends.py` (67 tests — point-estimator, MDV, power curve, sensitivity, plus the PR-A R18 silent-failure regression and the PR-B custom-weight persistence regression) + event-study coverage in `tests/test_pretrends_event_study.py` (27 tests).
+- Dedicated `tests/test_methodology_pretrends.py` (added 2026-05-18 in PR-B Step 7) — Roth (2022) Section II.A-B paper-equation-numbered Verified Components walk-through (8 classes, 30-40 tests covering NIS box probability, Wald-vs-NIS, Propositions 1-4 simulation parity, linear-units γ-scale, custom-weight persistence, CS/SA full-VCV, helper API).
 
-**Outstanding for promotion:**
-- Dedicated `tests/test_methodology_pretrends.py` with paper-equation-numbered Verified Components walk-through
-- R parity fixture against the `pretrends` R package at a **pinned revision** (TODO.md tracks the revision-pin follow-up; until that lands, the R-package surface claims in `docs/methodology/papers/roth-2022-review.md` are provisional). Covers the four power calculations: linear, constant, last-period, custom. Note that `compute_pretrends_power` does not accept `violation_weights` today, so `"custom"` parity has to run through `PreTrendsPower(..., violation_weights=...)` directly until the helper is extended (TODO.md tracks the helper-extension follow-up); helper-only parity is limited to `linear` / `constant` / `last_period`.
-- Verify the REGISTRY Implementation Checklist (all four items currently unchecked)
+**Verified Components:**
+- [x] NIS box probability implemented via `scipy.stats.multivariate_normal.cdf` (Roth Section II.A-B primary form)
+- [x] Wald noncentral-χ² form retained as paper-supported alternative (Propositions 1+3+4 all apply — convex ellipsoid acceptance region)
+- [x] Both forms produce form-consistent MDV via doubling + brentq bisection with 1000-cap non-convergence fallback
+- [x] Non-bootstrap CS adapter consumes full `event_study_vcov` sub-block (not diag)
+- [x] Non-bootstrap SA adapter consumes full `event_study_vcov` sub-block (W-matrix construction `event_study_vcov = W @ vcov_cohort @ W.T` added to `SunAbrahamResults`)
+- [x] Bootstrap CS/SA and replicate-weight survey paths fall through to `diag(ses^2)` (analytical VCV cleared to prevent mixing with bootstrap/replicate SE overrides)
+- [x] `_get_violation_weights('linear')` honors actual pre-period relative-time labels via `fit()` threading → reported MDV is in Roth's γ units on irregular and anticipation-shifted grids. For `MultiPeriodDiDResults`, supported label types are numeric (`int` / `float` / `np.int64`) and `pandas.Period` / `pandas.Timestamp` / `np.datetime64`; **genuinely non-numeric labels** (string period IDs, unranked categoricals) emit an explicit `UserWarning` and fall through to the legacy count-based normalized direction (MDV is NOT in γ units in that case — re-fit with numeric labels)
+- [x] `PreTrendsPowerResults` persists fitted `violation_weights` + `pretest_form` + `nis_box_probability`; `power_at(M)` works for all four violation types on fresh fits
+- [x] Helper API (`compute_pretrends_power`, `compute_mdv`) accepts `violation_weights` and `pretest_form`; closes the PR-A R18 helper/class API gap
+- [x] Summary, `to_dict`, `to_dataframe` dispatch on `pretest_form` (NIS prints box probability; Wald prints noncentrality)
+
+**Outstanding for promotion to fully Complete:**
+- R parity fixture against the `pretrends` R package at a **pinned revision** (deferred to PR-C). The generator script `benchmarks/R/generate_pretrends_golden.R` is committed in PR-B with a placeholder commit reference; PR-C will install the package, generate the JSON goldens at `benchmarks/data/r_pretrends_golden.json`, activate `TestPretrendsParityR` (currently skips when goldens missing), and record the audited R-package revision. Until that lands, the R-package surface claims in `docs/methodology/papers/roth-2022-review.md` Gaps section remain provisional.
 
 ---
 
diff --git a/TODO.md b/TODO.md
index 9aa28973..83f35525 100644
--- a/TODO.md
+++ b/TODO.md
@@ -94,11 +94,9 @@ Deferred items from PR reviews that were not addressed before merge.
 | WooldridgeDiD: aggregation weights use cell-level n_{g,t} counts. Paper (W2025 Eqs. 7.2-7.4) defines cohort-share weights. Add optional `weights="cohort_share"` parameter to `aggregate()`. | `wooldridge_results.py` | #216 | Medium |
 | WooldridgeDiD: optional *efficiency hint* (NOT a canonical-link violation per W2023 Prop 3.1) when method/outcome pairing is sub-optimal — e.g., `method="ols"` on binary data is consistent under QMLE, but `method="logit"` is typically more efficient. The original framing in this row as a "canonical link requirement" tied to Prop 3.1 was incorrect: Wooldridge (2023) Table 1 lists Gaussian/OLS for "any response" and logistic-Bernoulli for "binary OR fractional". A useful hint exists (efficiency), but should not be framed as a methodology violation. See PR #453 R1 review for the corrected reading. | `wooldridge.py` | #216 | Low |
 | WooldridgeDiD: Stata `jwdid` golden value tests — add R/Stata reference script and `TestReferenceValues` class. | `tests/test_wooldridge.py` | #216 | Medium |
-| PreTrendsPower: `compute_pretrends_power` adapter uses `diag(ses^2)` instead of the full pre-period covariance block Σ_22 for `CallawaySantAnnaResults` (deliberate — non-bootstrap CS persists `event_study_vcov`; bootstrap CS fits clear it at `staggered.py:2032-2036`) and `SunAbrahamResults` (forced — SA does not expose an event-study/cohort VCV at all). Roth (2022)'s NIS box probability and the library's Wald object both depend on Σ_22 off-diagonals; diag fallback is not provably conservative. For non-bootstrap CS fits, route through `event_study_vcov`; for bootstrap CS fits the diag fallback is the only path. For SA, extend `SunAbrahamResults` to persist a cohort/event-study VCV (then route the adapter likewise). Or formally retain the diag fallback with explicit miscalibration framing. See REGISTRY.md `## PreTrendsPower` Note (deviation from paper) + `docs/methodology/papers/roth-2022-review.md`. | `diff_diff/pretrends.py:609-687`, `diff_diff/sun_abraham.py:30-88`, `docs/methodology/REGISTRY.md`, `docs/methodology/papers/roth-2022-review.md` | PR-A (Roth paper review, 2026-05-17) | Medium |
-| PreTrendsPower: pin the R `pretrends` package commit/release before building the R-parity fixture. The paper review's R-package surface claims (`pretrends()`, `slope_for_power()`, NIS-only API, no joint-Wald target) are provisional pending a pinned revision; the audited revision should be recorded either in the review file's Gaps section or in this TODO row before any parity assertions are committed. | `docs/methodology/papers/roth-2022-review.md`, `METHODOLOGY_REVIEW.md` (PreTrendsPower row) | PR-A (Roth paper review, 2026-05-17) | Low |
-| PreTrendsPower: helper `compute_pretrends_power(results, M, alpha, target_power, violation_type, pre_periods)` does NOT accept `violation_weights`, so `violation_type="custom"` is unusable from the helper (class-only today via `PreTrendsPower(..., violation_weights=...)`). Either add `violation_weights` to the helper signature and forward to the class, or document the helper as supporting only `linear` / `constant` / `last_period`. | `diff_diff/pretrends.py:1048-1095, 442-466` | PR-A (Roth paper review, 2026-05-17) | Low |
-| PreTrendsPower: `PreTrendsPowerResults.power_at()` does not yet support `violation_type="custom"`. **Silent-failure path was mitigated** in PR-A (2026-05-17, R18 of the codex review): `power_at()` now raises `NotImplementedError` for custom fits rather than returning equal-weights output, locked in by `test_power_at_raises_on_custom_violation_type`. Remaining follow-up: persist the normalized fitted `violation_weights` on `PreTrendsPowerResults` (currently absent at `pretrends.py:77-90`) and re-enable `power_at()` for custom fits, with a parity test comparing `results.power_at(M)` to a fresh `PreTrendsPower(...).fit(..., M=M).power` on a custom-weights fixture. | `diff_diff/pretrends.py:77-90, ~196-235, ~878-892` | PR-A (Roth paper review, 2026-05-17) | Medium |
-| PreTrendsPower: `linear` violation pattern does NOT implement Roth's δ_t = γ·t. `_get_violation_weights(violation_type="linear")` constructs a shifted, normalized `[n-1, ..., 1, 0]` direction from `n_pre` only (`pretrends.py:510-515`), and `fit()` never threads actual relative-time labels into that construction (`pretrends.py:862-866`). For irregular pre-period grids (e.g., anticipation-shifted `t ∈ {-5, -3, -1}`) this means the slope reported as MDV is not in Roth's γ units. Fix: build linear weights from the sorted actual relative-time values used in the fit, define the exposed parameter in γ units, persist any normalization separately, and add a regression test using anticipation-shifted / irregular pre-periods. If the shifted convention is intentional, add a `**Note (deviation from paper):**` to REGISTRY.md and convert reported MDV back to Roth's slope scale before exposing it. | `diff_diff/pretrends.py:488-531, 862-866`, `docs/methodology/REGISTRY.md:2786-2789` | PR-A (Roth paper review, 2026-05-17; surfaced by R17 of the iterative codex review on the paper review file) | **High** |
+| PreTrendsPower R parity goldens (PR-C): pin the R `pretrends` package commit/release, run `benchmarks/R/generate_pretrends_golden.R` (committed in PR-B), commit the JSON goldens at `benchmarks/data/r_pretrends_golden.json`, activate the `TestPretrendsParityR` class in `tests/test_methodology_pretrends.py` (currently skips when goldens missing), and flip the METHODOLOGY_REVIEW.md `PreTrendsPower` row from `**Complete** (R parity pending)` → `**Complete**`. Until that lands, the R-package surface claims in `docs/methodology/papers/roth-2022-review.md` remain provisional. | `benchmarks/R/generate_pretrends_golden.R`, `benchmarks/data/r_pretrends_golden.json` (new), `tests/test_methodology_pretrends.py::TestPretrendsParityR`, `METHODOLOGY_REVIEW.md` (PreTrendsPower row) | PR-C (PreTrendsPower R parity) | Low |
+<!-- The remaining four PR-A-tagged PreTrendsPower rows (CS/SA Σ_22 fidelity, helper `violation_weights`, custom-weight persistence, linear γ-unit MDV) were all resolved in PR-B 2026-05-18 — see CHANGELOG.md [Unreleased] Added/Changed/Fixed entries for the new behavior. -->
+
 | Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 8 standalone estimators that expose `cluster=`: `CallawaySantAnna`, `SunAbraham`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added `vcov_type` to the `DifferenceInDifferences` inheritance chain only. | multiple | Phase 1a | Medium |
 | Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster) currently raises `NotImplementedError`. `_compute_bm_dof_from_contrasts` builds its hat matrix from the unscaled design via `X (X'WX)^{-1} X' W`, but `solve_ols` solves the WLS problem by transforming to `X* = sqrt(w) X`, so the correct symmetric idempotent residual-maker is `M* = I - sqrt(W) X (X'WX)^{-1} X' sqrt(W)`. Rederive the Satterthwaite `(tr G)^2 / tr(G^2)` ratio on the transformed design and add weighted parity tests before lifting the guard. | `linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_validate_vcov_args` | Phase 1a | Medium |
 | HC2 / HC2 + Bell-McCaffrey on absorbed-FE fits — REMAINING sub-gate: `TwoWayFixedEffects` (`twfe.py:154` rejects unconditionally). The DiD sub-gate and the MultiPeriodDiD sub-gate were both lifted via auto-route to `fixed_effects=` internally (DiD: PR #458, ~1e-10 vs clubSandwich; MPD: this release, ~1e-10 vs sandwich::vcovHC and clubSandwich::vcovCR). TWFE has no equivalent `fixed_effects=` code path (always within-transforms), so the same auto-route surgery is not directly applicable — lifting requires either building the full-dummy design inline or refactoring TWFE to delegate to DiD. Within-transformation preserves coefficients and residuals under FWL but not the hat matrix; HC1/CR1 are unaffected (no leverage term). | `twfe.py::fit` | follow-up | Medium |
diff --git a/benchmarks/R/generate_pretrends_golden.R b/benchmarks/R/generate_pretrends_golden.R
new file mode 100644
index 00000000..78af170e
--- /dev/null
+++ b/benchmarks/R/generate_pretrends_golden.R
@@ -0,0 +1,223 @@
+#!/usr/bin/env Rscript
+# Generate R `pretrends` parity goldens for diff-diff PreTrendsPower (PR-C).
+#
+# This script is committed in PR-B (PreTrendsPower implementation audit,
+# Roth 2022); the JSON goldens at ../data/r_pretrends_golden.json are
+# DEFERRED to PR-C. Running this script writes the JSON to that path; PR-C
+# pins the R `pretrends` package commit / release, runs this script, and
+# commits the resulting JSON to land the parity tests.
+#
+# Requires:
+#   - R 4.4+ (tested on 4.5.2)
+#   - install.packages("remotes")
+#   - remotes::install_github("jonathandroth/pretrends", ref = "<PR-C-PIN>")
+#   - install.packages("jsonlite")
+#
+# **R `pretrends` commit pin (TODO — PR-C):** the audited revision MUST be
+# recorded here before parity assertions are committed. As of 2026-05-18
+# (PR-B implementation date) the script targets the default `main` branch
+# at https://github.com/jonathandroth/pretrends with no pin. PR-C will
+# replace `<PR-C-PIN>` with the exact commit hash AND verify the surface
+# claims documented in REGISTRY.md `## PreTrendsPower` and the paper
+# review's "R `pretrends` package version pin (provisional)" Gaps bullet.
+#
+# Output: ../data/r_pretrends_golden.json
+#
+# diff-diff PreTrendsPower with `pretest_form='nis'` (the new default per
+# PR-B Step 2) is expected to match the values in this JSON at atol=1e-6
+# along a three-tier contract:
+#   (1) NIS box probability `P(β̂_pre ∈ B_NIS(Σ))` at fixed M values on
+#       all 3 fixtures;
+#   (2) MDV / gamma_p (slope at target power 0.5 and 0.8) on regular and
+#       irregular pre-period grids;
+#   (3) γ-unit MDV invariance: PR-B's "skip L2 norm for linear with
+#       relative_times" path produces MDV in Roth's γ units exactly,
+#       matching R's `slope_for_power()` which also reports γ.
+#
+# Three fixtures (matched to test_methodology_pretrends.py expectations):
+#   1. uniform_3_pre_periods_no_anticipation — K=3 regular grid (t ∈ {-3, -2, -1}),
+#      never-treated control. Default-case parity baseline.
+#   2. irregular_pre_periods — K=3 with relative_times = [-5, -3, -1].
+#      Exercises the PR-B γ-unit linear-pattern fix.
+#   3. anticipation_shifted — K=4 with anticipation=1 (pre-cutoff at t<-1,
+#      so pre-periods are {-5, -4, -3, -2}). Verifies the pre-period filter
+#      logic in `_extract_pre_period_params`.
+#
+# Run:
+#   cd benchmarks/R && Rscript generate_pretrends_golden.R
+
+suppressPackageStartupMessages({
+  library(pretrends)
+  library(jsonlite)
+})
+
+stopifnot(packageVersion("pretrends") >= "0.1.0")
+
+# ---------------------------------------------------------------------------
+# DGP helper: build a synthetic event-study coefficient vector + VCV under a
+# stylized null DGP (β = 0, Σ_22 ~ correlated). Mirrors the simulation
+# fixtures in test_methodology_pretrends.py.
+# ---------------------------------------------------------------------------
+
+build_event_study_fixture <- function(
+  pre_periods,
+  post_periods,
+  sigma2 = 0.04,
+  rho = 0.3,
+  seed = 42L
+) {
+  # Generate a correlated equicorrelation Σ across all (pre + post) periods.
+  # Realized β̂ drawn from N(0, Σ) — null DGP, no real treatment effect.
+  set.seed(seed)
+  all_periods <- c(pre_periods, post_periods)
+  K_total <- length(all_periods)
+  Sigma <- sigma2 * (rho * matrix(1, K_total, K_total) + (1 - rho) * diag(K_total))
+  beta_hat <- MASS::mvrnorm(1, mu = rep(0, K_total), Sigma = Sigma)
+
+  list(
+    beta_hat = beta_hat,
+    Sigma = Sigma,
+    all_periods = all_periods,
+    pre_periods = pre_periods,
+    post_periods = post_periods
+  )
+}
+
+# ---------------------------------------------------------------------------
+# Extract R pretrends() output into a fixture-shaped list.
+# ---------------------------------------------------------------------------
+
+extract_pretrends <- function(fixture_data, fixture_name) {
+  beta_hat <- fixture_data$beta_hat
+  Sigma <- fixture_data$Sigma
+  pre_periods <- fixture_data$pre_periods
+  post_periods <- fixture_data$post_periods
+  all_periods <- fixture_data$all_periods
+
+  # R `pretrends` expects: betahat (coefficient vector), sigma (VCV matrix),
+  # tVec (relative-time labels including the reference period 0, omitted
+  # from betahat / sigma per convention), referencePeriod = 0, alpha = 0.05.
+
+  # The `slopes_for_power` helper returns gamma values at target power.
+  # For the three-tier parity contract, we capture both NIS power at a fixed
+  # slope and the inverse (γ_p MDV) at target power 0.5 and 0.8.
+
+  # NIS power at fixed gamma values (for tier-1 parity):
+  gamma_test_values <- c(0.0, 0.2, 0.5, 1.0)
+  power_values <- sapply(gamma_test_values, function(g) {
+    # Build δ = γ * |t| for pre-periods (Roth's δ_t = γ·t convention,
+    # using |t| since pre-period t < 0).
+    delta_pre <- g * abs(pre_periods)
+    # `pretrends` package: pretrends() with explicit delta vector.
+    # The exact R API: pretrends(betahat, sigma, tVec, referencePeriod,
+    #                            deltahypothesis, ...).
+    # PR-C: replace this stub with the actual R pretrends() call and
+    # extract the rejection probability.
+    NA_real_  # PR-C will populate
+  })
+
+  # γ_p MDV: solve for γ such that NIS rejection probability = target power.
+  # R `slope_for_power(betahat, sigma, tVec, referencePeriod, power)`.
+  gamma_p_values <- sapply(c(0.5, 0.8), function(p) {
+    # PR-C: replace with actual R slope_for_power() call.
+    NA_real_
+  })
+
+  list(
+    panel = list(
+      pre_periods = as.integer(pre_periods),
+      post_periods = as.integer(post_periods),
+      all_periods = as.integer(all_periods),
+      beta_hat = as.numeric(beta_hat),
+      Sigma = Sigma
+    ),
+    r_power_at_gamma = list(
+      gamma_test_values = as.numeric(gamma_test_values),
+      power_values = as.numeric(power_values)
+    ),
+    r_gamma_p = list(
+      target_power = c(0.5, 0.8),
+      gamma_p_values = as.numeric(gamma_p_values)
+    ),
+    fixture_name = fixture_name
+  )
+}
+
+# ---------------------------------------------------------------------------
+# Fixtures
+# ---------------------------------------------------------------------------
+
+cat("Building fixture 1: uniform_3_pre_periods_no_anticipation...\n")
+f1 <- build_event_study_fixture(
+  pre_periods = c(-3L, -2L, -1L),
+  post_periods = c(1L, 2L, 3L),
+  seed = 101L
+)
+fixture_1 <- extract_pretrends(f1, "uniform_3_pre_periods_no_anticipation")
+
+cat("Building fixture 2: irregular_pre_periods...\n")
+# K=3 with t ∈ {-5, -3, -1}. Tests PR-B's γ-unit linear-pattern fix:
+# pre-PR-B Python with normalized count-based weights would silently report
+# MDV in [0.45, 0.30, 0.15] / sqrt(0.3) units, not γ. R `slope_for_power()`
+# always reports γ; Python's PR-B Step 4 makes the two match at atol=1e-6.
+f2 <- build_event_study_fixture(
+  pre_periods = c(-5L, -3L, -1L),
+  post_periods = c(1L, 2L, 3L),
+  seed = 202L
+)
+fixture_2 <- extract_pretrends(f2, "irregular_pre_periods")
+
+cat("Building fixture 3: anticipation_shifted...\n")
+# K=4 pre-periods with anticipation=1. Real pre-treatment cutoff is t < -1,
+# so the {-5, -4, -3, -2} cells are the genuine pre-periods; t=-1 is the
+# anticipation window. Tests the pre-period filtering logic.
+f3 <- build_event_study_fixture(
+  pre_periods = c(-5L, -4L, -3L, -2L),  # genuine pre-periods (cutoff = -1)
+  post_periods = c(1L, 2L, 3L),
+  seed = 303L
+)
+fixture_3 <- extract_pretrends(f3, "anticipation_shifted")
+
+# ---------------------------------------------------------------------------
+# Write JSON
+# ---------------------------------------------------------------------------
+
+out <- list(
+  meta = list(
+    generated_at = format(Sys.Date()),
+    pretrends_version = as.character(packageVersion("pretrends")),
+    pretrends_commit = "<PR-C-PIN>",  # TODO PR-C: replace with actual git SHA
+    r_version = R.version.string,
+    description = paste(
+      "Roth (2022) PreTrendsPower parity goldens for diff-diff",
+      "compute_pretrends_power / PreTrendsPower (PR-C parity target).",
+      "Parity at atol=1e-6 along a three-tier contract:",
+      "(1) NIS box probability at fixed γ values on all 3 fixtures;",
+      "(2) γ_p MDV (slope at target power 0.5 and 0.8) on regular and",
+      "irregular grids;",
+      "(3) γ-unit MDV invariance: PR-B's skip-L2-norm path produces MDV",
+      "in Roth's γ units exactly, matching R's slope_for_power().",
+      "See diff-diff/docs/methodology/papers/roth-2022-review.md for",
+      "the full derivation."
+    )
+  ),
+  uniform_3_pre_periods_no_anticipation = fixture_1,
+  irregular_pre_periods = fixture_2,
+  anticipation_shifted = fixture_3
+)
+
+out_path <- "../data/r_pretrends_golden.json"
+write_json(out, out_path, pretty = TRUE, digits = NA, auto_unbox = TRUE)
+cat(sprintf("Wrote %s\n", out_path))
+cat("\n")
+cat("PR-C TODO checklist:\n")
+cat("  [ ] Replace <PR-C-PIN> commit-hash placeholder above with actual\n")
+cat("      git SHA from https://github.com/jonathandroth/pretrends.\n")
+cat("  [ ] Replace the NA_real_ stubs in extract_pretrends() with the\n")
+cat("      actual pretrends::pretrends() / slope_for_power() calls.\n")
+cat("  [ ] Verify the surface claims in REGISTRY.md PreTrendsPower\n")
+cat("      Reference implementations section against the pinned revision.\n")
+cat("  [ ] Activate tests/test_methodology_pretrends.py::TestPretrendsParityR\n")
+cat("      (currently skips via @pytest.mark.skipif when the JSON is missing).\n")
+cat("  [ ] Flip METHODOLOGY_REVIEW.md PreTrendsPower row from\n")
+cat("      **Complete** (R parity pending) → **Complete**.\n")
diff --git a/diff_diff/business_report.py b/diff_diff/business_report.py
index fb169820..6ceeb4a5 100644
--- a/diff_diff/business_report.py
+++ b/diff_diff/business_report.py
@@ -924,6 +924,13 @@ def _lift_pre_trends(dr: Optional[Dict[str, Any]]) -> Dict[str, Any]:
         "power_reason": pp.get("reason"),
         "power_tier": pp.get("tier"),
         "mdv": pp.get("mdv"),
+        # Level-scale max pre-period violation under the MDV
+        # (PR-B R12: `mdv * max(|violation_weights|)`). Carried alongside
+        # the raw `mdv` so BR schema consumers and the full-report
+        # renderer can show both quantities. Pre-R14 this was silently
+        # dropped at the BR lift boundary so the new renderer line never
+        # fired even though DR emitted the value.
+        "max_abs_pre_violation": pp.get("max_abs_pre_violation"),
         "mdv_share_of_att": pp.get("mdv_share_of_att"),
         # Carry the covariance-source annotation through so BR can hedge the
         # power-tier phrasing when compute_pretrends_power silently used a
@@ -2158,8 +2165,9 @@ def _render_summary(schema: Dict[str, Any]) -> str:
             if tier == "well_powered":
                 sentences.append(
                     f"{subject} are consistent with parallel trends, and "
-                    "the test is well-powered (the minimum-detectable "
-                    "violation is small relative to the estimated effect)."
+                    "the test is well-powered (the max pre-period level "
+                    "deviation at the MDV is small relative to the "
+                    "estimated effect)."
                 )
             elif tier == "moderately_powered":
                 sentences.append(
@@ -2467,11 +2475,18 @@ def _render_full_report(schema: Dict[str, Any]) -> str:
         if tier:
             lines.append(f"- Power tier: `{tier}`")
         mdv = pt.get("mdv")
+        max_abs_pre = pt.get("max_abs_pre_violation")
         ratio = pt.get("mdv_share_of_att")
         if isinstance(mdv, (int, float)):
             lines.append(f"- Minimum detectable violation (MDV): {mdv:.3g}")
+        if isinstance(max_abs_pre, (int, float)):
+            lines.append(f"- Max pre-period level deviation at MDV: {max_abs_pre:.3g}")
         if isinstance(ratio, (int, float)):
-            lines.append(f"- MDV / |ATT|: {ratio:.2g}")
+            # PR-B R12: ratio is now max_abs_pre_violation / |ATT|, the
+            # level-scale comparable to ATT (not raw γ-unit mdv on linear
+            # fits). Label updated to match the numerator definition in
+            # REPORTING.md "Power-aware phrasing" Note.
+            lines.append(f"- Max pre-period level deviation / |ATT|: {ratio:.2g}")
     else:
         lines.append(f"- Pre-trends not computed: {pt.get('reason', 'unavailable')}")
     lines.append("")
diff --git a/diff_diff/diagnostic_report.py b/diff_diff/diagnostic_report.py
index e68f1c0f..6645bb0d 100644
--- a/diff_diff/diagnostic_report.py
+++ b/diff_diff/diagnostic_report.py
@@ -1425,21 +1425,37 @@ def _check_pretrends_power(self) -> Dict[str, Any]:
                 "reason": f"compute_pretrends_power raised " f"{type(exc).__name__}: {exc}",
             }
 
-        # Build the schema section and compute the MDV/|ATT| ratio for BR.
+        # Build the schema section and compute the level-scale max-pre-
+        # violation / |ATT| ratio for BR tier classification. Post-PR-B
+        # Step 4 the linear `mdv` is in Roth's γ units (a slope on
+        # relative time), so the level-scale comparable quantity is
+        # `max_abs_pre_violation = mdv * max(|violation_weights|)` —
+        # the largest pre-period level deviation under the MDV. Using
+        # raw `mdv` here would mix slope and level scales on irregular
+        # grids and mis-tier well_powered / moderately_powered /
+        # underpowered.
         headline_metric = self._extract_headline_metric()
         att = headline_metric.get("value") if headline_metric else None
         mdv = _to_python_float(getattr(pp, "mdv", None))
+        max_abs_pre_violation = _to_python_float(getattr(pp, "max_abs_pre_violation", mdv))
         ratio: Optional[float] = None
         if (
-            mdv is not None
+            max_abs_pre_violation is not None
             and att is not None
             and np.isfinite(att)
             and abs(att) > 0
-            and np.isfinite(mdv)
+            and np.isfinite(max_abs_pre_violation)
         ):
-            ratio = mdv / abs(att)
-
-        cov_source = self._infer_cov_source(self._results)
+            ratio = max_abs_pre_violation / abs(att)
+
+        # Prefer the provenance label `pretrends.py` records on the result
+        # itself (PR-B: `PreTrendsPowerResults.covariance_source` captures
+        # which extraction path was actually taken — full Σ_22 sub-block
+        # vs diag fallback). Fall back to type-based inference for legacy
+        # serialized results pre-PR-B that lack the field.
+        cov_source = getattr(pp, "covariance_source", "unknown")
+        if cov_source == "unknown":
+            cov_source = self._infer_cov_source(self._results)
         tier = _apply_diag_fallback_downgrade(_power_tier(ratio), cov_source)
         return {
             "status": "ran",
@@ -1448,6 +1464,7 @@ def _check_pretrends_power(self) -> Dict[str, Any]:
             "alpha": _to_python_float(getattr(pp, "alpha", self._alpha)),
             "target_power": _to_python_float(getattr(pp, "target_power", 0.80)),
             "mdv": mdv,
+            "max_abs_pre_violation": max_abs_pre_violation,
             "mdv_share_of_att": ratio,
             # Power is reported at ``violation_magnitude`` — the M that
             # the helper actually evaluated (defaults to the MDV when
@@ -1475,13 +1492,29 @@ def _format_precomputed_pretrends_power(self, obj: Any) -> Dict[str, Any]:
         populates at construction time), falling back to ``self._results``.
         """
         mdv = _to_python_float(getattr(obj, "mdv", None))
+        # PR-B Step 4: use level-scale max_abs_pre_violation rather than
+        # raw γ-unit mdv to tier (see ``_check_pretrends_power`` for the
+        # rationale). Legacy precomputed PreTrendsPowerResults objects
+        # without the property fall back to raw ``mdv``.
+        max_abs_pre_violation = _to_python_float(getattr(obj, "max_abs_pre_violation", mdv))
         hm = self._extract_headline_metric()
         att = hm.get("value") if hm else None
         ratio: Optional[float] = None
-        if mdv is not None and att is not None and np.isfinite(att) and abs(att) > 0:
-            ratio = mdv / abs(att)
+        if (
+            max_abs_pre_violation is not None
+            and att is not None
+            and np.isfinite(att)
+            and abs(att) > 0
+            and np.isfinite(max_abs_pre_violation)
+        ):
+            ratio = max_abs_pre_violation / abs(att)
         source_fit = getattr(obj, "original_results", None) or self._results
-        cov_source = self._infer_cov_source(source_fit)
+        # PR-B: prefer the provenance label `pretrends.py` records on the
+        # precomputed result; fall back to type-based inference only for
+        # legacy serialized results that lack the field.
+        cov_source = getattr(obj, "covariance_source", "unknown")
+        if cov_source == "unknown":
+            cov_source = self._infer_cov_source(source_fit)
         tier = _apply_diag_fallback_downgrade(_power_tier(ratio), cov_source)
         return {
             "status": "ran",
@@ -1490,6 +1523,7 @@ def _format_precomputed_pretrends_power(self, obj: Any) -> Dict[str, Any]:
             "alpha": _to_python_float(getattr(obj, "alpha", self._alpha)),
             "target_power": _to_python_float(getattr(obj, "target_power", 0.80)),
             "mdv": mdv,
+            "max_abs_pre_violation": max_abs_pre_violation,
             "mdv_share_of_att": ratio,
             "violation_magnitude": _to_python_float(getattr(obj, "violation_magnitude", None)),
             "power_at_violation_magnitude": _to_python_float(getattr(obj, "power", None)),
@@ -1504,12 +1538,39 @@ def _infer_cov_source(source_fit: Any) -> str:
         """Classify whether ``compute_pretrends_power`` had access to the
         full pre-period covariance on ``source_fit``.
 
-        CS / SA / ImputationDiD / EfficientDiD / Stacked / etc. currently
-        fall back to ``np.diag(ses**2)`` inside ``pretrends.py``, even when
-        ``event_study_vcov`` is populated on the result; the returned
-        ``PreTrendsPowerResults.vcov`` therefore ignores off-diagonal pre-
-        period correlations. Annotating the source explicitly lets BR
-        downgrade the tier conservatively.
+        Backwards-compatibility helper for legacy ``PreTrendsPowerResults``
+        objects produced before PR-B (which records the actual extraction
+        path on ``PreTrendsPowerResults.covariance_source`` at fit time).
+        New fits read provenance directly off the result object; this
+        fallback is only invoked when that field is missing or set to
+        ``"unknown"`` (legacy-ambiguous).
+
+        Classification rules:
+
+        - ``"full_pre_period_vcov"`` — basic ``DiDResults`` and other
+          non-event-study, non-MPD result types that historically expose
+          the full pre-period covariance. ``MultiPeriodDiDResults`` is
+          handled by an explicit branch below because its
+          ``pretrends.py`` MPD branch only takes the full sub-block path
+          when ``interaction_indices`` is populated, otherwise falling
+          through to ``diag(ses**2)``.
+        - ``"diag_fallback_available_full_vcov_unused"`` — event-study
+          result types with populated ``event_study_vcov``. Under PR-B,
+          new fits route through the full sub-block, but a legacy
+          ``PreTrendsPowerResults`` lacking ``covariance_source`` may
+          have been computed from ``diag(ses**2)`` even though the full
+          matrix was attached on the source fit (PR-A behavior). Without
+          the persisted provenance label we cannot distinguish the two,
+          and the conservative default is to apply the PR-A downgrade.
+          New PR-B fits set ``covariance_source`` directly and bypass
+          this fallback entirely.
+        - ``"diag_fallback"`` — event-study result types with
+          ``event_study_vcov is None`` (bootstrap or replicate-weight
+          CS / SA fits, plus ImputationDiD / Stacked / EfficientDiD /
+          TwoStageDiD / etc. which don't yet expose ``event_study_vcov``);
+          OR ``MultiPeriodDiDResults`` without ``interaction_indices``
+          (genuine diag-only path inside ``pretrends.py:_extract_pre_period_params``,
+          no "available but unused" concern, so no downgrade applies).
         """
         is_event_study_type = type(source_fit).__name__ in {
             "CallawaySantAnnaResults",
@@ -1527,9 +1588,29 @@ def _infer_cov_source(source_fit: Any) -> str:
             and getattr(source_fit, "event_study_vcov_index", None) is not None
         )
         if is_event_study_type and has_full_es_vcov:
+            # Legacy-ambiguous: we don't know whether this serialized
+            # result was computed pre- or post-PR-B; conservatively
+            # downgrade. New PR-B fits will set covariance_source
+            # explicitly on the result and never reach this branch.
             return "diag_fallback_available_full_vcov_unused"
         if is_event_study_type:
             return "diag_fallback"
+        # Non-event-study path. MultiPeriodDiDResults takes the full
+        # ``vcov[ix_]`` sub-block only when ``interaction_indices`` is
+        # populated (pretrends.py MPD branch); otherwise it falls
+        # through to ``diag(ses**2)`` and ships the diag-fallback path
+        # — which is a normal (not "available but unused") fallback,
+        # so no conservative downgrade applies. Legacy MPD result
+        # objects without ``interaction_indices`` should be reported as
+        # ``diag_fallback`` rather than overclaiming full-Σ_22.
+        if type(source_fit).__name__ == "MultiPeriodDiDResults":
+            mpd_has_full_vcov = (
+                getattr(source_fit, "vcov", None) is not None
+                and getattr(source_fit, "interaction_indices", None) is not None
+            )
+            return "full_pre_period_vcov" if mpd_has_full_vcov else "diag_fallback"
+        # Other non-event-study types (basic DiDResults, TWFE, etc.)
+        # historically expose the full covariance.
         return "full_pre_period_vcov"
 
     def _check_sensitivity(self) -> Dict[str, Any]:
@@ -2711,6 +2792,16 @@ def _apply_diag_fallback_downgrade(tier: str, cov_source: str) -> str:
     ``summary()`` all read the same adjusted tier. Round-14 CI review
     flagged per-surface divergence; round-20 flagged that the precomputed
     adapter bypassed the downgrade entirely.
+
+    PR-B (Roth 2022 audit) note: new fits set
+    ``PreTrendsPowerResults.covariance_source`` directly at fit time
+    based on the actual extraction path, so the report-layer adapters
+    bypass ``_infer_cov_source`` whenever the persisted field is set.
+    The "available but unused" sentinel is still produced for legacy
+    ``PreTrendsPowerResults`` objects that lack the field — there we
+    cannot distinguish a pre-PR-B fit (which DID drop to diag despite
+    the populated source-fit matrix) from a post-PR-B fit, so the
+    conservative downgrade still applies to legacy-ambiguous results.
     """
     if tier == "well_powered" and cov_source == "diag_fallback_available_full_vcov_unused":
         return "moderately_powered"
@@ -3190,9 +3281,10 @@ def _render_overall_interpretation(schema: Dict[str, Any], labels: Dict[str, str
             if tier == "well_powered":
                 sentences.append(
                     f"{subject} are consistent with parallel trends"
-                    f"{jp_str} and the test is well-powered (MDV is a small "
-                    "share of the estimated effect), so a material pre-trend "
-                    "would likely have been detected."
+                    f"{jp_str} and the test is well-powered (the max pre-period "
+                    "level deviation at the MDV is a small share of the estimated "
+                    "effect), so a material pre-trend would likely have been "
+                    "detected."
                 )
             elif tier == "moderately_powered":
                 sentences.append(
diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt
index a310d621..98c2755f 100644
--- a/diff_diff/guides/llms.txt
+++ b/diff_diff/guides/llms.txt
@@ -75,7 +75,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")`
 - [Parallel Trends Testing](https://diff-diff.readthedocs.io/en/stable/api/diagnostics.html): Simple and Wasserstein-robust parallel trends tests, equivalence testing (TOST)
 - [Placebo Tests](https://diff-diff.readthedocs.io/en/stable/api/diagnostics.html): Placebo timing, group, permutation, and leave-one-out diagnostics
 - [Honest DiD](https://diff-diff.readthedocs.io/en/stable/api/honest_did.html): Rambachan & Roth (2023) sensitivity analysis — robust CI under parallel trends violations, breakdown values
-- [Pre-Trends Power Analysis](https://diff-diff.readthedocs.io/en/stable/api/pretrends.html): Roth (2022) minimum detectable violation and pre-trends test power curves
+- [Pre-Trends Power Analysis](https://diff-diff.readthedocs.io/en/stable/api/pretrends.html): Roth (2022) Section II.A-B no-individually-significant (NIS) box-probability pretest power + minimum detectable violation; `pretest_form='nis'` (default) implements the paper's primary form, `pretest_form='wald'` retained as paper-supported alternative (Propositions 1+3+4 all apply); linear-violation MDV in Roth's γ units when relative-time labels are threaded through `fit()`; full Σ_22 routing on non-bootstrap CallawaySantAnna and SunAbraham adapters
 - [Power Analysis](https://diff-diff.readthedocs.io/en/stable/api/power.html): Analytical and simulation-based power analysis — MDE, sample size, power curves for study design
 - Conley spatial HAC SE (`vcov_type="conley"`) on cross-sectional `LinearRegression` / `compute_robust_vcov` PLUS panel `DifferenceInDifferences` / `MultiPeriodDiD` / `TwoWayFixedEffects` (with `conley_lag_cutoff=<int>` for within-unit Bartlett temporal HAC) — Conley (1999) spatial-correlation-aware SEs with haversine/euclidean/callable distance metric and Bartlett/uniform spatial kernel; panel path uses the R `conleyreg`-form block-decomposed sandwich (within-period spatial + within-unit Bartlett serial, same-time excluded); parity vs R `conleyreg` (Düsterhöft 2021) on cross-sectional AND panel `lag_cutoff > 0` fixtures. Combining with explicit `cluster=<col>` applies the combined spatial + cluster product kernel `K_total[i,j] = K_space · 1{c_i = c_j}` (cluster must be constant within each unit across periods on the panel path; validator-enforced). DiD takes `unit=<col>` as a fit-time kwarg when `vcov_type="conley"` (not on `__init__`). Sparse k-d-tree fast path auto-activates for `n > 5_000` with bartlett kernel + haversine/euclidean metric
 
diff --git a/diff_diff/pretrends.py b/diff_diff/pretrends.py
index 8b32c471..fbc68b09 100644
--- a/diff_diff/pretrends.py
+++ b/diff_diff/pretrends.py
@@ -25,6 +25,7 @@
 diff_diff.honest_did - Sensitivity analysis for parallel trends violations
 """
 
+import warnings
 from dataclasses import dataclass, field
 from typing import Any, Dict, List, Literal, Optional, Tuple, Union
 
@@ -34,6 +35,208 @@
 
 from diff_diff.results import MultiPeriodDiDResults
 
+
+def _compute_nis_acceptance_prob(
+    M: float,
+    weights: np.ndarray,
+    vcov: np.ndarray,
+    z_alpha: float,
+) -> float:
+    """
+    Compute the NIS box acceptance probability ``P(β̂_pre ∈ B_NIS(Σ))``.
+
+    Used by both ``PreTrendsPower._compute_power_nis`` and
+    ``PreTrendsPowerResults.power_at()`` to avoid code duplication and
+    centralize the analytical-or-MC fallback path.
+
+    Returns
+    -------
+    accept_prob : float
+        Acceptance probability in [0, 1]. Always finite — falls back to
+        Monte Carlo (N=20000) if the analytical scipy MVN CDF raises OR
+        returns a non-finite value (e.g., on numerically degenerate Σ).
+    """
+    sigma = np.sqrt(np.maximum(np.diag(vcov), 0))
+    delta = M * weights
+    upper = z_alpha * sigma - delta
+    lower = -z_alpha * sigma - delta
+
+    accept_prob: float
+    try:
+        accept_prob = float(
+            stats.multivariate_normal.cdf(  # type: ignore[arg-type]
+                upper,
+                lower_limit=lower,
+                mean=np.zeros(len(weights)),
+                cov=vcov,
+                allow_singular=True,
+            )
+        )
+    except (ValueError, np.linalg.LinAlgError):
+        accept_prob = float("nan")
+
+    # MC fallback on non-finite analytical output. The scipy CDF can return
+    # nan on numerically degenerate Σ even when no exception is raised
+    # (Genz algorithm internal cancellation); detecting nan and falling
+    # back to simulation keeps the downstream MDV solver from silently
+    # propagating nan and returning a wrong-but-finite MDV.
+    if not np.isfinite(accept_prob):
+        rng = np.random.default_rng(0)
+        samples = rng.multivariate_normal(mean=np.zeros(len(weights)), cov=vcov, size=20000)
+        in_box = np.all((samples >= lower[None, :]) & (samples <= upper[None, :]), axis=1)
+        accept_prob = float(in_box.mean())
+
+    return float(np.clip(accept_prob, 0.0, 1.0))
+
+
+def _coerce_relative_times_from_reference(
+    estimated_pre_periods: List[Any],
+    reference_period: Any,
+) -> Optional[np.ndarray]:
+    """
+    Convert ``estimated_pre_periods`` to Roth-style relative-time offsets
+    from a numeric / Period / datetime ``reference_period``.
+
+    Returns ``np.ndarray`` of float relative times when conversion succeeds,
+    or ``None`` when the labels are genuinely non-numeric / unordered
+    (string period IDs, categoricals, etc.). In the ``None`` case, the
+    caller's downstream linear-violation weight construction falls back to
+    the legacy count-based normalized direction — the reported MDV is then
+    NOT in Roth's γ units. We emit a ``UserWarning`` so the user knows
+    the γ-unit contract did not hold and can re-fit with numeric labels.
+
+    Supported regimes:
+
+    - Numeric (``int`` / ``float`` / ``np.int64``): direct ``float()``
+      coercion gives the correct relative offset.
+    - ``pandas.Period`` / ``pandas.Timestamp`` / ``np.datetime64``: period
+      arithmetic returns an offset / ``Timedelta`` that we coerce to a
+      float via ``.n`` (for Period frequencies) or ``.days`` (for
+      Timedelta-like). The result is in units of the reference's
+      frequency for Period, days for Timestamp / datetime64 — the linear
+      γ-units scale is per-unit-of-frequency.
+    - Anything else (string period IDs, categoricals with no ordering,
+      mixed types): returns ``None`` with a warning.
+    """
+    # Path 1: direct float coercion (numeric scalars).
+    try:
+        ref_float = float(reference_period)
+        return np.asarray(
+            [float(p) - ref_float for p in estimated_pre_periods],
+            dtype=float,
+        )
+    except (TypeError, ValueError):
+        pass
+
+    # Path 2: pandas.Period / pandas.Timestamp / datetime64 — try
+    # subtraction-based offset arithmetic.
+    try:
+        diffs = [p - reference_period for p in estimated_pre_periods]
+        floats: List[float] = []
+        for d in diffs:
+            # pandas.tseries.offsets.* or pandas.Period offset — has `.n`.
+            n_attr = getattr(d, "n", None)
+            if n_attr is not None:
+                floats.append(float(n_attr))
+                continue
+            # pandas.Timedelta / numpy.timedelta64 — convert to days.
+            days_attr = getattr(d, "days", None)
+            if days_attr is not None:
+                floats.append(float(days_attr))
+                continue
+            # Bare numpy.timedelta64 fallback.
+            try:
+                floats.append(float(d / np.timedelta64(1, "D")))
+                continue
+            except (TypeError, ValueError):
+                raise TypeError(
+                    f"cannot coerce difference {d!r} of type {type(d).__name__} "
+                    "to float days/periods"
+                )
+        return np.asarray(floats, dtype=float)
+    except (TypeError, ValueError):
+        pass
+
+    # Path 3: genuinely non-numeric labels — warn and fall back to legacy.
+    warnings.warn(
+        f"PreTrendsPower: reference_period {reference_period!r} (type "
+        f"{type(reference_period).__name__}) is not numeric or datetime-like, "
+        "so per-period relative times cannot be derived. Linear-violation "
+        "weights will use the legacy count-based [n_pre-1, ..., 0]/||·||_2 "
+        "direction; the reported MDV is NOT in Roth (2022) γ units. Re-fit "
+        "with numeric period labels (int year, pandas.Period, datetime) to "
+        "obtain γ-unit MDV.",
+        UserWarning,
+        stacklevel=3,
+    )
+    return None
+
+
+def _extract_event_study_vcov_subblock(
+    results: Any,
+    pre_periods: List[int],
+    ses: np.ndarray,
+) -> Tuple[np.ndarray, str]:
+    """
+    Extract the pre-period sub-block of ``results.event_study_vcov`` when
+    available; otherwise fall back to ``diag(ses**2)``.
+
+    This is the canonical Σ_22 routing path for ``compute_pretrends_power``
+    when the event-study result type exposes a full event-study covariance
+    matrix (CallawaySantAnnaResults non-bootstrap fits at
+    ``staggered_results.py:126-128`` and SunAbrahamResults non-bootstrap
+    fits via the W-matrix construction added in PR-B Step 3). Bootstrap
+    fits and replicate-weight survey fits clear ``event_study_vcov`` so
+    the analytical VCV is not mixed with bootstrap / replicate SE
+    overrides — those cases naturally fall through to the diag fallback.
+
+    Parameters
+    ----------
+    results : event-study results object
+        Must have ``event_study_vcov`` and ``event_study_vcov_index``
+        attributes (CallawaySantAnnaResults and SunAbrahamResults both
+        expose them; either may be None for the bootstrap / replicate
+        paths).
+    pre_periods : list of int
+        Sorted relative-time labels of the pre-period coefficients to
+        extract.
+    ses : np.ndarray
+        Per-period standard errors (used for the ``diag(ses**2)`` fallback
+        path; must be in the same order as ``pre_periods``).
+
+    Returns
+    -------
+    vcov : np.ndarray
+        The (n_pre, n_pre) covariance sub-block. Full event_study_vcov
+        sub-block when available; diag(ses**2) otherwise.
+    source : str
+        Provenance label for downstream report-layer tier classification:
+        ``"full_pre_period_vcov"`` when the full event-study sub-block
+        was used (no off-diagonal information was discarded), or
+        ``"diag_fallback"`` when ``event_study_vcov`` was missing /
+        cleared (bootstrap / replicate-weight CS or SA paths).
+    """
+    es_vcov = getattr(results, "event_study_vcov", None)
+    es_vcov_index = getattr(results, "event_study_vcov_index", None)
+    if es_vcov is None or es_vcov_index is None:
+        return np.diag(ses**2), "diag_fallback"
+
+    try:
+        indices = [list(es_vcov_index).index(t) for t in pre_periods]
+    except ValueError as e:
+        # event_study_vcov_index out of sync with the filtered pre_periods.
+        # This is a defensive guard — should not happen on the canonical
+        # construction paths, but if it does we fail loud rather than
+        # silently substituting diag.
+        raise ValueError(
+            f"event_study_vcov_index is missing one of the pre-period labels "
+            f"{pre_periods}; cannot extract sub-block. Available index: "
+            f"{list(es_vcov_index)}. Original error: {e}"
+        ) from e
+
+    return np.asarray(es_vcov)[np.ix_(indices, indices)], "full_pre_period_vcov"
+
+
 # =============================================================================
 # Results Classes
 # =============================================================================
@@ -61,17 +264,47 @@ class PreTrendsPowerResults:
     n_pre_periods : int
         Number of pre-treatment periods in the event study.
     test_statistic : float
-        Expected test statistic under the specified violation.
+        Expected test statistic under the specified violation (Wald only;
+        NaN for NIS fits).
     critical_value : float
         Critical value for the pre-trends test.
     noncentrality : float
-        Non-centrality parameter under the alternative hypothesis.
+        Non-centrality parameter under the alternative hypothesis (Wald only;
+        NaN for NIS fits).
     pre_period_effects : np.ndarray
         Estimated pre-period effects from the event study.
     pre_period_ses : np.ndarray
         Standard errors of pre-period effects.
     vcov : np.ndarray
         Variance-covariance matrix of pre-period effects.
+    pretest_form : str
+        Pretest acceptance-region form used: ``'nis'`` (no-individually-
+        significant box probability — Roth 2022 Section II.A-B, default for new
+        fits) or ``'wald'`` (noncentral-chi-squared on the quadratic form
+        ``delta' Sigma_22^{-1} delta`` — paper-supported alternative, retained
+        for backwards compatibility with shipped numerical baselines).
+    nis_box_probability : float
+        Acceptance probability ``P(beta_hat_pre in B_NIS(Sigma))`` under the
+        alternative ``M * weights``. NIS-only; NaN for Wald fits.
+    violation_weights : np.ndarray, optional
+        The violation-direction vector used at fit time. Populated for all
+        violation types on fresh fits. Normalization depends on the type
+        so that ``M`` always matches the documented per-pattern contract:
+
+        - ``linear`` threaded with ``relative_times`` (post PR-B Step 4):
+          ``|t|`` directly, NOT L2-normalized, so ``δ_t = M·|t|`` and the
+          reported MDV equals Roth's γ exactly.
+        - ``linear`` without ``relative_times`` (legacy):
+          ``[n_pre-1, ..., 0]`` L2-normalized.
+        - ``constant`` (post PR-B R13): ``[1, ..., 1]`` directly, NOT
+          L2-normalized, so ``δ_t = M`` is a true per-period level shift.
+        - ``last_period``: ``[0, ..., 0, 1]`` (already unit-norm).
+        - ``custom``: user vector L2-normalized to unit norm.
+
+        Old serialized results may have ``None`` here; ``power_at()``
+        falls back to reconstruction in that case (with the PR-A
+        ``NotImplementedError`` guard retained only for
+        ``violation_type='custom'`` with ``violation_weights=None``).
     """
 
     power: float
@@ -88,6 +321,16 @@ class PreTrendsPowerResults:
     pre_period_ses: np.ndarray = field(repr=False)
     vcov: np.ndarray = field(repr=False)
     original_results: Optional[Any] = field(default=None, repr=False)
+    pretest_form: Literal["nis", "wald"] = "wald"
+    nis_box_probability: float = np.nan
+    violation_weights: Optional[np.ndarray] = field(default=None, repr=False)
+    # Provenance for downstream tier classification. Populated at fit time
+    # from `_extract_pre_period_params`. ``"full_pre_period_vcov"`` when
+    # off-diagonal pre-period covariances were used; ``"diag_fallback"``
+    # when only per-period SEs were available; ``"unknown"`` for legacy
+    # serialized results pre-PR-B (backwards-compat default). See
+    # ``diagnostic_report._infer_cov_source`` for consumer-side use.
+    covariance_source: str = "unknown"
 
     def __repr__(self) -> str:
         return (
@@ -100,13 +343,57 @@ def is_informative(self) -> bool:
         """
         Check if the pre-trends test is informative.
 
-        A pre-trends test is considered informative if the MDV is reasonably
-        small relative to typical effect sizes. This is a heuristic check;
-        see the summary for interpretation guidance.
+        A pre-trends test is considered informative if the MAX level-scale
+        pre-period violation under the MDV is reasonably small relative to
+        the per-period standard errors. Post PR-B Step 4 the `linear`
+        MDV is in Roth's γ units (a slope), so comparing the raw ``mdv``
+        scalar to the level-scale ``max(pre_period_ses)`` would mix units
+        on irregular pre-period grids. The comparable level-scale scalar
+        is ``mdv * max(|violation_weights|)`` (the largest pre-period
+        deviation under the MDV — see ``max_abs_pre_violation``).
         """
-        # Heuristic: MDV < 2x the max observed pre-period SE
         max_se = np.max(self.pre_period_ses) if len(self.pre_period_ses) > 0 else 1.0
-        return bool(self.mdv < 2 * max_se)
+        return bool(self.max_abs_pre_violation < 2 * max_se)
+
+    @property
+    def max_abs_pre_violation(self) -> float:
+        """
+        Largest level-scale pre-period deviation under the MDV.
+
+        Returns ``mdv * max(|violation_weights|)`` — the maximum
+        absolute pre-period violation ``δ_t`` when the violation
+        magnitude equals the MDV. This is the right level-scale
+        scalar for comparing pre-trends sensitivity against
+        coefficient-scale quantities (post-treatment ATT, per-period
+        SEs, HonestDiD's M bound).
+
+        Why this matters: PR-B Step 4 made the linear ``mdv`` report
+        Roth's γ units (a slope on relative time). On a regular grid
+        ``[-3, -2, -1]`` the max deviation is ``γ * 3``; on an
+        irregular grid ``[-5, -3, -1]`` it is ``γ * 5``. Raw ``mdv``
+        alone cannot be compared to level effects without applying
+        the weight scale.
+
+        For non-linear violation types under the PR-B R13 level-shift
+        convention: constant weights ``[1, ..., 1]`` (unnormalized)
+        yield ``max_abs_pre_violation = mdv * 1 = mdv`` — raw ``mdv``
+        IS the per-period level shift, so level- and γ-scales coincide.
+        Last_period ``[0, ..., 0, 1]`` yields ``max_abs_pre_violation
+        = mdv`` for the same reason. Custom uses the L2-normalized
+        user-supplied weight vector, so ``max_abs_pre_violation``
+        depends on the user's direction.
+
+        Backwards-compat: legacy serialized results without
+        ``violation_weights`` (pre-PR-B) fall back to the raw ``mdv``
+        (which under the pre-PR-B count-based L2-normalized linear
+        convention already had a roughly level-scale magnitude).
+        """
+        if self.violation_weights is None or len(self.violation_weights) == 0:
+            return float(self.mdv)
+        if not np.isfinite(self.mdv):
+            return float(self.mdv)
+        max_w = float(np.max(np.abs(self.violation_weights)))
+        return float(self.mdv * max_w)
 
     @property
     def power_adequate(self) -> bool:
@@ -132,6 +419,7 @@ def summary(self) -> str:
             f"{'Significance level (alpha):':<35} {self.alpha:.3f}",
             f"{'Target power:':<35} {self.target_power:.1%}",
             f"{'Violation type:':<35} {self.violation_type}",
+            f"{'Pretest form:':<35} {self.pretest_form}",
             "",
             "-" * 70,
             "Power Analysis".center(70),
@@ -140,14 +428,23 @@ def summary(self) -> str:
             f"{'Power to detect this violation:':<35} {self.power:.1%}",
             f"{'Minimum detectable violation:':<35} {self.mdv:.4f}",
             "",
-            f"{'Test statistic (expected):':<35} {self.test_statistic:.4f}",
             f"{'Critical value:':<35} {self.critical_value:.4f}",
-            f"{'Non-centrality parameter:':<35} {self.noncentrality:.4f}",
-            "",
-            "-" * 70,
-            "Interpretation".center(70),
-            "-" * 70,
         ]
+        # Dispatch on pretest_form: NIS reports the MVN box acceptance
+        # probability, Wald reports the noncentral-chi-squared noncentrality.
+        if self.pretest_form == "nis":
+            lines.append(f"{'NIS box probability (accept):':<35} {self.nis_box_probability:.4f}")
+        else:
+            lines.append(f"{'Test statistic (expected):':<35} {self.test_statistic:.4f}")
+            lines.append(f"{'Non-centrality parameter:':<35} {self.noncentrality:.4f}")
+        lines.extend(
+            [
+                "",
+                "-" * 70,
+                "Interpretation".center(70),
+                "-" * 70,
+            ]
+        )
 
         if self.power_adequate:
             lines.append(f"✓ Power ({self.power:.0%}) meets target ({self.target_power:.0%}).")
@@ -173,7 +470,25 @@ def print_summary(self) -> None:
         print(self.summary())
 
     def to_dict(self) -> Dict[str, Any]:
-        """Convert results to dictionary."""
+        """Convert results to JSON-serializable dictionary.
+
+        Includes the post-PR-B provenance fields (``violation_weights``,
+        ``covariance_source``) so callers that round-trip the result
+        through ``to_dict``/``to_dataframe`` (e.g., for serialization
+        or downstream transport) preserve the same information the
+        reporting layer reads off the dataclass directly.
+
+        ``violation_weights`` is emitted as ``list[float]`` (or ``None``)
+        so ``json.dumps(result.to_dict())`` works out of the box. Use
+        ``self.violation_weights`` directly on the dataclass when an
+        ndarray is needed.
+        """
+        weights = self.violation_weights
+        weights_list: Optional[List[float]]
+        if weights is None:
+            weights_list = None
+        else:
+            weights_list = [float(w) for w in np.asarray(weights).ravel()]
         return {
             "power": self.power,
             "mdv": self.mdv,
@@ -185,20 +500,30 @@ def to_dict(self) -> Dict[str, Any]:
             "test_statistic": self.test_statistic,
             "critical_value": self.critical_value,
             "noncentrality": self.noncentrality,
+            "pretest_form": self.pretest_form,
+            "nis_box_probability": self.nis_box_probability,
+            "violation_weights": weights_list,
+            "covariance_source": self.covariance_source,
             "is_informative": self.is_informative,
             "power_adequate": self.power_adequate,
         }
 
     def to_dataframe(self) -> pd.DataFrame:
-        """Convert results to DataFrame."""
+        """Convert results to DataFrame.
+
+        ``violation_weights`` is stored as a Python list in the single
+        row (pandas-friendly); ``covariance_source`` is a plain string.
+        Mirrors ``to_dict``.
+        """
         return pd.DataFrame([self.to_dict()])
 
     def power_at(self, M: float) -> float:
         """
         Compute power to detect a specific violation magnitude.
 
-        This method allows computing power at different M values without
-        re-fitting the model, using the stored variance-covariance matrix.
+        Uses the stored fitted ``violation_weights`` and the stored
+        ``pretest_form`` to dispatch to the NIS or Wald power computation
+        without re-fitting.
 
         Parameters
         ----------
@@ -213,69 +538,78 @@ def power_at(self, M: float) -> float:
         Raises
         ------
         NotImplementedError
-            If the fit was made with ``violation_type="custom"``. The
-            ``PreTrendsPowerResults`` dataclass does not currently persist
-            the fitted ``violation_weights``, so this method cannot
-            reconstruct the custom weights. Refit
-            ``PreTrendsPower(violation_type="custom", violation_weights=...)``
-            with the new ``M`` instead. Tracked in TODO.md as a planned
-            follow-up to persist the fitted weights.
+            If the result was produced by an older library version (before
+            the ``violation_weights`` field was added to ``PreTrendsPowerResults``)
+            AND ``violation_type='custom'``. The reconstruction fallback can
+            handle ``linear``/``constant``/``last_period`` from stored
+            metadata, but custom weights cannot be reconstructed; refit
+            ``PreTrendsPower(violation_type='custom', violation_weights=...)``
+            with the new ``M`` instead.
         """
         from scipy import stats
 
-        if self.violation_type == "custom":
-            raise NotImplementedError(
-                "PreTrendsPowerResults.power_at() does not support "
-                "violation_type='custom': fitted violation_weights are "
-                "not persisted on the result object, so the custom weights "
-                "cannot be reconstructed. Refit "
-                "PreTrendsPower(violation_type='custom', "
-                "violation_weights=...) with the new M instead. "
-                "See TODO.md (PreTrendsPower power_at custom path)."
-            )
-
         n_pre = self.n_pre_periods
 
-        # Reconstruct violation weights based on violation type
-        # Must match PreTrendsPower._get_violation_weights() exactly
-        if self.violation_type == "linear":
-            # Linear trend: weights decrease toward treatment
-            # [n-1, n-2, ..., 1, 0] for n pre-periods
-            weights = np.arange(-n_pre + 1, 1, dtype=float)
-            weights = -weights  # Now [n-1, n-2, ..., 1, 0]
-        elif self.violation_type == "constant":
-            weights = np.ones(n_pre)
-        elif self.violation_type == "last_period":
-            weights = np.zeros(n_pre)
-            weights[-1] = 1.0
+        # Prefer the persisted fitted weights (populated for all violation
+        # types on fresh fits after PR-B). Fall back to reconstruction only
+        # for old serialized results lacking the field.
+        if self.violation_weights is not None:
+            weights = np.asarray(self.violation_weights, dtype=float)
         else:
-            # Fail loud on unknown violation_type values. Mirrors the raise
-            # at the end of _get_violation_weights(); prevents silent
-            # equal-weights output if a future violation_type is added to
-            # fit() but not threaded through power_at().
-            raise ValueError(
-                f"Unknown violation_type: {self.violation_type!r}. "
-                f"Expected one of: 'linear', 'constant', 'last_period', 'custom'."
+            if self.violation_type == "custom":
+                raise NotImplementedError(
+                    "PreTrendsPowerResults.power_at() cannot reconstruct "
+                    "custom violation weights from an older serialized result "
+                    "(violation_weights field is None). Refit "
+                    "PreTrendsPower(violation_type='custom', "
+                    "violation_weights=...) with the new M instead. "
+                    "Fresh fits from the current library version persist "
+                    "violation_weights and do not hit this guard."
+                )
+            # Reconstruction fallback for legacy serialized results.
+            # Matches the pre-PR-B count-based linear behavior (no
+            # relative_times available on an old result). Only used when
+            # violation_weights is None.
+            if self.violation_type == "linear":
+                weights = np.arange(-n_pre + 1, 1, dtype=float)
+                weights = -weights  # [n-1, n-2, ..., 1, 0]
+            elif self.violation_type == "constant":
+                weights = np.ones(n_pre)
+            elif self.violation_type == "last_period":
+                weights = np.zeros(n_pre)
+                weights[-1] = 1.0
+            else:
+                raise ValueError(
+                    f"Unknown violation_type: {self.violation_type!r}. "
+                    f"Expected one of: 'linear', 'constant', 'last_period', 'custom'."
+                )
+            # Normalize to unit L2 norm — matches the legacy normalize-at-end
+            # path in _get_violation_weights for non-relative_times callers.
+            norm = np.linalg.norm(weights)
+            if norm > 0:
+                weights = weights / norm
+
+        # Dispatch on the stored pretest_form. Old serialized results default
+        # to pretest_form='wald' (the dataclass default) which preserves the
+        # previous power_at numerical output for backwards compat.
+        if self.pretest_form == "nis":
+            z_alpha = float(
+                self.critical_value
+                if np.isfinite(self.critical_value)
+                else stats.norm.ppf(1 - self.alpha / 2)
             )
+            # Centralized analytical-or-MC fallback (module-level helper).
+            accept_prob = _compute_nis_acceptance_prob(M, weights, self.vcov, z_alpha)
+            return float(1.0 - accept_prob)
 
-        # Normalize weights to unit L2 norm
-        norm = np.linalg.norm(weights)
-        if norm > 0:
-            weights = weights / norm
-
-        # Compute non-centrality parameter
+        # Wald path (legacy default, also opt-in for new fits with
+        # pretest_form='wald'). Matches the pre-PR-B numerical output.
         try:
             vcov_inv = np.linalg.inv(self.vcov)
         except np.linalg.LinAlgError:
             vcov_inv = np.linalg.pinv(self.vcov)
-
-        # delta = M * weights
-        # nc = delta' * V^{-1} * delta
         noncentrality = M**2 * (weights @ vcov_inv @ weights)
-
-        # Compute power using non-central chi-squared
         power = 1 - stats.ncx2.cdf(self.critical_value, df=n_pre, nc=noncentrality)
-
         return float(power)
 
 
@@ -298,6 +632,11 @@ class PreTrendsPowerCurve:
         Target power level.
     violation_type : str
         Type of violation pattern.
+    pretest_form : str
+        Pretest acceptance-region form (``'nis'`` or ``'wald'``) used to
+        compute the curve. NIS and Wald curves can differ materially under
+        correlated Σ_22; persisting the form prevents callers from
+        misinterpreting a serialized/plotted curve.
     """
 
     M_values: np.ndarray
@@ -306,16 +645,18 @@ class PreTrendsPowerCurve:
     alpha: float
     target_power: float
     violation_type: str
+    pretest_form: Literal["nis", "wald"] = "wald"
 
     def __repr__(self) -> str:
         return f"PreTrendsPowerCurve(n_points={len(self.M_values)}, " f"mdv={self.mdv:.4f})"
 
     def to_dataframe(self) -> pd.DataFrame:
-        """Convert to DataFrame with M and power columns."""
+        """Convert to DataFrame with M, power, and pretest_form columns."""
         return pd.DataFrame(
             {
                 "M": self.M_values,
                 "power": self.powers,
+                "pretest_form": self.pretest_form,
             }
         )
 
@@ -425,6 +766,20 @@ class PreTrendsPower:
     violation_weights : array-like, optional
         Custom weights for violation pattern. Length must equal number of
         pre-periods. Only used when violation_type='custom'.
+    pretest_form : {'nis', 'wald'}, default='nis'
+        Pre-trends test acceptance-region form:
+
+        - ``'nis'``: Roth (2022) no-individually-significant pretest (Section
+          II.A-B). Acceptance region is ``B_NIS(Σ) = { b : |b_t| <= z_{1-α/2}
+          σ_t for all t }``. Power computed via multivariate normal box
+          probability. This is the new default (PR-B 2026-05-17), matching
+          both the paper's primary analysis and the R ``pretrends`` package.
+        - ``'wald'``: Noncentral chi-squared on the quadratic form
+          ``δ' Σ_22^{-1} δ`` (the shipped behavior prior to PR-B 2026-05-17).
+          Retained as a paper-supported alternative under Propositions 1+3+4
+          (Wald acceptance region is a convex ellipsoid, so all four
+          propositions apply). Use this for backwards-compat with shipped
+          numerical baselines.
 
     Examples
     --------
@@ -473,6 +828,7 @@ def __init__(
         power: float = 0.80,
         violation_type: Literal["linear", "constant", "last_period", "custom"] = "linear",
         violation_weights: Optional[np.ndarray] = None,
+        pretest_form: Literal["nis", "wald"] = "nis",
     ):
         if not 0 < alpha < 1:
             raise ValueError(f"alpha must be between 0 and 1, got {alpha}")
@@ -485,6 +841,8 @@ def __init__(
             )
         if violation_type == "custom" and violation_weights is None:
             raise ValueError("violation_weights must be provided when violation_type='custom'")
+        if pretest_form not in ("nis", "wald"):
+            raise ValueError(f"pretest_form must be 'nis' or 'wald', got '{pretest_form}'")
 
         self.alpha = alpha
         self.target_power = power
@@ -492,6 +850,7 @@ def __init__(
         self.violation_weights = (
             np.asarray(violation_weights) if violation_weights is not None else None
         )
+        self.pretest_form = pretest_form
 
     def get_params(self) -> Dict[str, Any]:
         """Get parameters for this estimator."""
@@ -500,6 +859,7 @@ def get_params(self) -> Dict[str, Any]:
             "power": self.target_power,
             "violation_type": self.violation_type,
             "violation_weights": self.violation_weights,
+            "pretest_form": self.pretest_form,
         }
 
     def set_params(self, **params) -> "PreTrendsPower":
@@ -513,7 +873,11 @@ def set_params(self, **params) -> "PreTrendsPower":
                 raise ValueError(f"Invalid parameter: {key}")
         return self
 
-    def _get_violation_weights(self, n_pre: int) -> np.ndarray:
+    def _get_violation_weights(
+        self,
+        n_pre: int,
+        relative_times: Optional[np.ndarray] = None,
+    ) -> np.ndarray:
         """
         Get violation weights based on violation type.
 
@@ -521,11 +885,46 @@ def _get_violation_weights(self, n_pre: int) -> np.ndarray:
         ----------
         n_pre : int
             Number of pre-treatment periods.
+        relative_times : np.ndarray, optional
+            Sorted relative-time labels for the pre-period coefficients
+            (e.g., ``[-3, -2, -1]`` for a regular grid, ``[-5, -3, -1]``
+            for an irregular grid, ``[-3, -2]`` for an anticipation-shifted
+            grid with ``anticipation=1``). When provided AND
+            ``violation_type='linear'``, weights are set to ``|t|`` directly
+            with NO L2 normalization, so ``δ_t = M * |t|`` and the reported
+            MDV is in Roth's γ units (δ_t = γ·t convention). When None,
+            falls back to the legacy count-based ``[n_pre-1, ..., 1, 0] /
+            ||·||_2`` direction (preserves the pre-PR-B shipped behavior
+            for callers that bypass ``fit()`` and call this helper
+            directly without relative-time labels).
 
         Returns
         -------
         np.ndarray
-            Violation weights, normalized to have L2 norm of 1.
+            Violation weights, with per-violation-type normalization
+            conventions chosen so the magnitude `M` matches what
+            ``REGISTRY.md`` documents for the pattern:
+
+            - ``'linear'`` with ``relative_times``: ``|t|`` directly,
+              NOT L2-normalized (so ``δ_t = M * |t|`` and the reported
+              MDV is in Roth's γ units). PR-B Step 4.
+            - ``'linear'`` without ``relative_times`` (legacy): the
+              count-based ``[n_pre-1, ..., 0]`` direction, L2-normalized
+              to unit norm (preserves pre-PR-B shipped behavior).
+            - ``'constant'``: ``[1, 1, ..., 1]`` directly, NOT
+              normalized — ``δ_t = M`` per period (a true level shift,
+              matching the documented ``δ_t = c`` convention). PR-B R13
+              fix: pre-R13 normalization gave ``δ_t = M/√K``, a silent
+              rescaling that the REGISTRY/API did not document.
+            - ``'last_period'``: ``[0, ..., 0, 1]`` directly. Already
+              unit-norm so the post-normalization output was identical;
+              the unconditional early return locks the level-shift
+              contract.
+            - ``'custom'``: user-supplied ``violation_weights``,
+              L2-normalized to unit norm (M is the magnitude along the
+              user's direction; downstream
+              ``max_abs_pre_violation = M * max(|weights|)`` exposes
+              the level-scale max under the MDV).
         """
         if self.violation_type == "custom":
             assert self.violation_weights is not None
@@ -536,22 +935,58 @@ def _get_violation_weights(self, n_pre: int) -> np.ndarray:
                 )
             weights = self.violation_weights.copy()
         elif self.violation_type == "linear":
-            # Linear trend: weights = [-n+1, -n+2, ..., -1, 0] for periods ending at -1
-            # Normalized so that violation at period -1 = 0 and grows linearly backward
+            if relative_times is not None:
+                # Roth (2022) δ_t = γ · t convention. Use |t| because
+                # pre-period labels are negative; the resulting violation
+                # vector δ_pre = M * |t| satisfies M = γ exactly.
+                # NO L2 normalization — keep the γ-unit scale so the
+                # reported MDV is in Roth's γ units on irregular and
+                # anticipation-shifted grids. Early return; skip the
+                # normalize-at-end block below. See PR-A REGISTRY ##
+                # PreTrendsPower "Note (deviation — linear violation
+                # pattern)" — PR-B Step 4 resolves the deviation when
+                # relative_times is threaded through.
+                if len(relative_times) != n_pre:
+                    raise ValueError(
+                        f"relative_times has length {len(relative_times)}, "
+                        f"but there are {n_pre} pre-periods"
+                    )
+                return np.abs(np.asarray(relative_times)).astype(float)
+            # Backwards-compatible fallback (no relative_times threaded):
+            # legacy count-based [n_pre-1, ..., 1, 0] / ||·||_2 direction.
+            # Used by callers that bypass fit() (e.g., direct
+            # _get_violation_weights() unit tests) or by code paths that
+            # don't have access to the actual pre-period labels.
             weights = np.arange(-n_pre + 1, 1, dtype=float)
-            # Shift so that weights are positive and represent deviation from PT
             weights = -weights  # Now [n-1, n-2, ..., 1, 0]
         elif self.violation_type == "constant":
-            # Same violation in all periods
-            weights = np.ones(n_pre)
+            # δ_t = M for all pre-periods (level shift). Skip L2
+            # normalization so M is exactly the per-period level shift
+            # the REGISTRY documents (`δ_t = c`). Pre-PR-B (and the
+            # pre-R13 PR-B state) divided by sqrt(K), making `δ_t =
+            # M/sqrt(K)` and silently re-scaling reported MDV/power on
+            # constant fits by sqrt(K). PR-B R13 fix: skip the norm
+            # so the public contract matches the docs.
+            return np.ones(n_pre, dtype=float)
         elif self.violation_type == "last_period":
-            # Violation only in last pre-period (period -1)
-            weights = np.zeros(n_pre)
+            # Violation only in last pre-period (period -1). Unnormalized
+            # `[0, ..., 0, 1]` already has L2 norm 1, so this path was
+            # always equivalent to the post-normalization output; keep
+            # the early return for symmetry with constant + linear-with-
+            # relative_times so the level-shift contract is uniform
+            # across all level-pattern violation types.
+            weights = np.zeros(n_pre, dtype=float)
             weights[-1] = 1.0
+            return weights
         else:
             raise ValueError(f"Unknown violation_type: {self.violation_type}")
 
-        # Normalize to unit norm (if not all zeros)
+        # Normalize to unit norm (if not all zeros). The early-return
+        # branches above for linear-with-relative_times, constant, and
+        # last_period intentionally skip this normalization to preserve
+        # the level-shift contract documented in REGISTRY.md
+        # `## PreTrendsPower`. This block only fires for the linear-
+        # legacy-fallback path and `violation_type='custom'`.
         norm = np.linalg.norm(weights)
         if norm > 0:
             weights = weights / norm
@@ -562,7 +997,7 @@ def _extract_pre_period_params(
         self,
         results: Union[MultiPeriodDiDResults, Any],
         pre_periods: Optional[List[int]] = None,
-    ) -> Tuple[np.ndarray, np.ndarray, np.ndarray, int]:
+    ) -> Tuple[np.ndarray, np.ndarray, np.ndarray, int, Optional[np.ndarray], str]:
         """
         Extract pre-period parameters from results.
 
@@ -583,6 +1018,27 @@ def _extract_pre_period_params(
             Variance-covariance matrix for pre-period effects.
         n_pre : int
             Number of pre-periods.
+        relative_times : np.ndarray or None
+            Pre-period relative-time labels (Roth's δ_t = γ·t convention),
+            or None for callers that bypass the labeled-grid path.
+        covariance_source : str
+            Provenance label describing which covariance path the
+            extraction actually took:
+
+            - ``"full_pre_period_vcov"`` when a full pre-period
+              covariance sub-block was used (MPD with
+              ``interaction_indices``, or CS/SA with populated
+              ``event_study_vcov``).
+            - ``"diag_fallback"`` when only the per-period standard
+              errors were available (bootstrap / replicate-weight CS or
+              SA fits, MPD without ``interaction_indices``).
+
+            ``DiagnosticReport`` consumes this label downstream to
+            decide whether the power-tier should be conservatively
+            downgraded (REPORTING.md "conservative deviation" rule),
+            rather than re-inferring covariance provenance from the
+            result type (which would diverge from the actual extraction
+            path the moment the routing changes — see PR-B Step 3).
         """
         if isinstance(results, MultiPeriodDiDResults):
             # Get pre-period information - use explicit pre_periods if provided
@@ -625,10 +1081,36 @@ def _extract_pre_period_params(
             ):
                 indices = [results.interaction_indices[p] for p in estimated_pre_periods]
                 vcov = results.vcov[np.ix_(indices, indices)]
+                covariance_source = "full_pre_period_vcov"
             else:
                 vcov = np.diag(ses**2)
-
-            return effects, ses, vcov, n_pre
+                covariance_source = "diag_fallback"
+
+            # For MultiPeriodDiDResults, period identifiers are generic
+            # (often calendar years, sometimes pre-shifted relative times).
+            # Roth's δ_t = γ·t convention needs RELATIVE offsets from the
+            # treatment / reference period. Three label-type regimes:
+            #
+            #   1. Numeric (int / float / np.int64) — direct float() coercion
+            #      gives the correct relative offset.
+            #   2. pandas.Period — period arithmetic works on the Period
+            #      object directly (``p - ref`` returns ordinal-difference);
+            #      we cast via the `n` attribute on the resulting offset for
+            #      sub-period frequencies. Datetime-like labels (Timestamp,
+            #      np.datetime64) are caught the same way and converted to
+            #      days via numpy timedelta semantics.
+            #   3. Genuinely non-numeric / unordered labels (string period
+            #      IDs, categoricals without a ranking) — emit an explicit
+            #      UserWarning and fall back to the legacy count-based
+            #      [n_pre-1, ..., 0] / ||·||_2 normalized direction. The
+            #      reported MDV under this fallback is NOT in Roth's γ
+            #      units; users on non-numeric labels who need γ-unit MDV
+            #      should re-fit with numeric period labels.
+            ref = getattr(results, "reference_period", None)
+            relative_times: Optional[np.ndarray] = None
+            if ref is not None:
+                relative_times = _coerce_relative_times_from_reference(estimated_pre_periods, ref)
+            return effects, ses, vcov, n_pre, relative_times, covariance_source
 
         # Try CallawaySantAnnaResults
         try:
@@ -675,9 +1157,17 @@ def _extract_pre_period_params(
 
                 effects = np.array([pre_effects[t]["effect"] for t in pre_periods])
                 ses = np.array([pre_effects[t]["se"] for t in pre_periods])
-                vcov = np.diag(ses**2)
 
-                return effects, ses, vcov, n_pre
+                # Route through full event_study_vcov when available
+                # (non-bootstrap CS fits at staggered_results.py:126-128).
+                # Bootstrap CS fits clear event_study_vcov at
+                # staggered.py:2032-2036, falling through to diag.
+                vcov, covariance_source = _extract_event_study_vcov_subblock(
+                    results, pre_periods, ses
+                )
+
+                relative_times = np.asarray(pre_periods, dtype=float)
+                return effects, ses, vcov, n_pre, relative_times, covariance_source
         except ImportError:
             pass
 
@@ -712,9 +1202,18 @@ def _extract_pre_period_params(
 
                 effects = np.array([pre_effects[t]["effect"] for t in pre_periods])
                 ses = np.array([pre_effects[t]["se"] for t in pre_periods])
-                vcov = np.diag(ses**2)
 
-                return effects, ses, vcov, n_pre
+                # Route through full event_study_vcov when available
+                # (non-bootstrap SA fits — sun_abraham.py builds the matrix
+                # via W @ vcov_cohort @ W.T after _compute_iw_effects).
+                # Bootstrap SA fits and replicate-weight survey fits clear
+                # event_study_vcov, falling through to diag.
+                vcov, covariance_source = _extract_event_study_vcov_subblock(
+                    results, pre_periods, ses
+                )
+
+                relative_times = np.asarray(pre_periods, dtype=float)
+                return effects, ses, vcov, n_pre, relative_times, covariance_source
         except ImportError:
             pass
 
@@ -728,13 +1227,26 @@ def _compute_power(
         M: float,
         weights: np.ndarray,
         vcov: np.ndarray,
+    ) -> Tuple[float, float, float, float]:
+        """Dispatch to the configured pretest form (NIS by default)."""
+        if self.pretest_form == "nis":
+            return self._compute_power_nis(M, weights, vcov)
+        return self._compute_power_wald(M, weights, vcov)
+
+    def _compute_power_wald(
+        self,
+        M: float,
+        weights: np.ndarray,
+        vcov: np.ndarray,
     ) -> Tuple[float, float, float, float]:
         """
-        Compute power to detect violation of magnitude M.
+        Compute power to detect violation of magnitude M under the Wald form.
 
-        The pre-trends test is a Wald test: H0: delta = 0 vs H1: delta != 0
-        Under H1 with violation delta = M * weights, the test statistic follows
-        a non-central chi-squared distribution.
+        Wald pre-trends test: H0: delta = 0 vs H1: delta != 0. Under H1 with
+        violation delta = M * weights, the test statistic ``delta' V^{-1} delta``
+        follows a non-central chi-squared distribution with df=K and
+        noncentrality lambda = M^2 * (w' V^{-1} w). Convex (ellipsoid)
+        acceptance region, so Propositions 1+3+4 of Roth (2022) all apply.
 
         Parameters
         ----------
@@ -785,15 +1297,86 @@ def _compute_power(
 
         return power, noncentrality, test_stat, critical_value
 
+    def _compute_power_nis(
+        self,
+        M: float,
+        weights: np.ndarray,
+        vcov: np.ndarray,
+    ) -> Tuple[float, float, float, float]:
+        """
+        Compute power to detect violation of magnitude M under the NIS form.
+
+        NIS (no-individually-significant) pre-trends test: passes iff every
+        pre-period coefficient lies within its own ``+/- z_{1-alpha/2} * sigma_t``
+        confidence interval. Roth (2022) Section II.A-B; matches the empirical
+        convention used in 12 of 12 surveyed papers (Section I.B).
+
+        Under H1 with violation ``delta_pre = M * weights``, the rejection
+        probability is computed via the centered change-of-variable
+        ``Y = beta_hat_pre - delta_pre ~ N(0, Sigma_22)``:
+
+        .. math::
+            \\text{Power} = 1 - P\\bigl(Y_t \\in [-z\\sigma_t - \\delta_t,
+                                                 z\\sigma_t - \\delta_t]
+                                       \\text{ for all } t\\bigr)
+
+        Implemented via ``scipy.stats.multivariate_normal.cdf`` with
+        rectangular bounds (Genz method; supports K up to ~20 cleanly).
+
+        Parameters
+        ----------
+        M : float
+            Violation magnitude.
+        weights : np.ndarray
+            Violation pattern (Linear: ``|t|`` directly when fit() threads
+            ``relative_times``; constant / last_period / custom: unit-normalized).
+        vcov : np.ndarray
+            Variance-covariance matrix Sigma_22 of the pre-period coefficients.
+
+        Returns
+        -------
+        power : float
+            Probability the NIS test rejects under the alternative.
+        noncentrality : float
+            ``np.nan``. NIS does not have a noncentrality scalar; the
+            equivalent NIS-specific output is ``nis_box_probability`` (the
+            acceptance probability ``1 - power``) stored on
+            ``PreTrendsPowerResults``.
+        test_stat : float
+            ``np.nan``. NIS rejects via a rectangular acceptance event,
+            not a scalar test statistic.
+        critical_value : float
+            ``z_{1-alpha/2}``, the per-period normal critical value used
+            to define ``B_NIS(Sigma)``.
+        """
+        z_alpha = float(stats.norm.ppf(1 - self.alpha / 2))
+        # Centralized analytical-or-MC fallback (module-level helper);
+        # handles both exception and non-finite-CDF cases.
+        accept_prob = _compute_nis_acceptance_prob(M, weights, vcov, z_alpha)
+        power = float(1.0 - accept_prob)
+        return power, float("nan"), float("nan"), z_alpha
+
     def _compute_mdv(
         self,
         weights: np.ndarray,
         vcov: np.ndarray,
+    ) -> float:
+        """Dispatch to the configured pretest form (NIS by default)."""
+        if self.pretest_form == "nis":
+            return self._compute_mdv_nis(weights, vcov)
+        return self._compute_mdv_wald(weights, vcov)
+
+    def _compute_mdv_wald(
+        self,
+        weights: np.ndarray,
+        vcov: np.ndarray,
     ) -> float:
         """
-        Compute minimum detectable violation.
+        Compute minimum detectable violation under the Wald form.
 
-        Find the smallest M such that power >= target_power.
+        Find the smallest M such that ``_compute_power_wald(M, weights, vcov)
+        >= target_power``. Uses binary search on the noncentrality parameter,
+        then converts back to M via ``nc = M^2 * (w' V^{-1} w)``.
 
         Parameters
         ----------
@@ -805,7 +1388,10 @@ def _compute_mdv(
         Returns
         -------
         mdv : float
-            Minimum detectable violation.
+            Minimum detectable violation in units of M (interpreted relative
+            to the ``weights`` direction; for linear weights threaded with
+            ``relative_times``, this is Roth's gamma in MDV units — see
+            ``_get_violation_weights``).
         """
         n_pre = len(weights)
 
@@ -860,6 +1446,74 @@ def power_minus_target(nc):
 
         return mdv
 
+    def _compute_mdv_nis(
+        self,
+        weights: np.ndarray,
+        vcov: np.ndarray,
+    ) -> float:
+        """
+        Compute minimum detectable violation under the NIS form.
+
+        Solves ``_compute_power_nis(M, weights, vcov) = target_power`` for M
+        via a doubling expansion to bracket the root, then ``brentq`` bisect.
+        Non-convergence cap at ``M_high = 1000`` returns ``np.inf`` (matches
+        the Wald path's existing 1000-cap fallback).
+
+        Parameters
+        ----------
+        weights : np.ndarray
+            Violation pattern.
+        vcov : np.ndarray
+            Variance-covariance matrix Sigma_22.
+
+        Returns
+        -------
+        mdv : float
+            Minimum detectable violation. For linear weights threaded with
+            ``relative_times``, this is Roth's gamma at the target power.
+        """
+
+        def power_minus_target(M: float) -> float:
+            return self._compute_power_nis(M, weights, vcov)[0] - self.target_power
+
+        # Boundary short-circuit: if the NIS size under the null
+        # (≈ 1 - (1-α)^K under independence) already meets target_power,
+        # the MDV is zero — no violation needed to reject at target rate.
+        # NIS size is generally LARGER than α (chi² size), so this case
+        # is reachable for small target_power (e.g., target=0.10, α=0.05,
+        # K=3 → null size ≈ 0.143 > 0.10).
+        if power_minus_target(0.0) >= 0:
+            return 0.0
+
+        # Doubling expansion to find an upper bound where power >= target.
+        # Cap M_high at 1000 to avoid pathological infinite doubling on
+        # numerically extreme Σ_22, but the cap itself does NOT mean
+        # "unreachable" — explicitly check power at the capped endpoint
+        # before returning inf (codex R2 P0 fix: previously the cap
+        # short-circuited to inf even when power(M_high) >= target,
+        # producing silently wrong MDV=inf for finite-root cases like
+        # vcov=[[50000]] where MDV lies between 512 and 1024).
+        M_high = 1.0
+        while power_minus_target(M_high) < 0 and M_high < 1000:
+            M_high *= 2
+
+        # Defensive: if the doubling exited because M_high*2 would exceed 1000,
+        # the LAST value M_high actually reached might be either above or below
+        # target. Evaluate explicitly at the final M_high to decide.
+        if power_minus_target(M_high) < 0:
+            # Power at the cap still fails to reach target_power.
+            # Genuinely unreachable in the practical range.
+            return np.inf
+
+        # Bisect on [0, M_high]. Both sign-change endpoints verified above.
+        try:
+            mdv = float(optimize.brentq(power_minus_target, 0.0, M_high))
+        except ValueError:
+            # Defensive fallback. Should be unreachable.
+            mdv = float(M_high)
+
+        return mdv
+
     def fit(
         self,
         results: Union[MultiPeriodDiDResults, Any],
@@ -887,22 +1541,38 @@ def fit(
         PreTrendsPowerResults
             Power analysis results including power and MDV.
         """
-        # Extract pre-period parameters
-        effects, ses, vcov, n_pre = self._extract_pre_period_params(results, pre_periods)
-
-        # Get violation weights
-        weights = self._get_violation_weights(n_pre)
-
-        # Compute MDV
+        # Extract pre-period parameters (now includes relative_times for
+        # γ-unit MDV under linear violation_type, plus the covariance-source
+        # provenance label for downstream DiagnosticReport / BusinessReport
+        # tier classification).
+        (
+            effects,
+            ses,
+            vcov,
+            n_pre,
+            relative_times,
+            covariance_source,
+        ) = self._extract_pre_period_params(results, pre_periods)
+
+        # Get violation weights. relative_times threaded through so the
+        # linear-violation path produces γ-unit MDV per Roth's δ_t = γ·t
+        # convention (skip L2 normalization for linear-with-relative_times).
+        weights = self._get_violation_weights(n_pre, relative_times=relative_times)
+
+        # Compute MDV (dispatches on self.pretest_form)
         mdv = self._compute_mdv(weights, vcov)
 
         # Default M: use MDV if not specified
         if M is None:
             M = mdv if np.isfinite(mdv) else np.max(ses)
 
-        # Compute power at specified M
+        # Compute power at specified M (dispatches on self.pretest_form)
         power, noncentrality, test_stat, critical_value = self._compute_power(M, weights, vcov)
 
+        # NIS-specific output: the box acceptance probability. Wald fits leave
+        # this as NaN; the meaningful Wald-specific scalar is `noncentrality`.
+        nis_box_probability = 1.0 - power if self.pretest_form == "nis" else float("nan")
+
         return PreTrendsPowerResults(
             power=power,
             mdv=mdv,
@@ -918,6 +1588,10 @@ def fit(
             pre_period_ses=ses,
             vcov=vcov,
             original_results=results,
+            pretest_form=self.pretest_form,
+            nis_box_probability=nis_box_probability,
+            violation_weights=weights,
+            covariance_source=covariance_source,
         )
 
     def power_at(
@@ -973,9 +1647,13 @@ def power_curve(
         PreTrendsPowerCurve
             Power curve data with plot method.
         """
-        # Extract parameters
-        _, ses, vcov, n_pre = self._extract_pre_period_params(results, pre_periods)
-        weights = self._get_violation_weights(n_pre)
+        # Extract parameters (6-tuple includes relative_times + covariance
+        # source; the source label is currently unused on the curve path but
+        # the unpack must match the helper's signature).
+        _, ses, vcov, n_pre, relative_times, _ = self._extract_pre_period_params(
+            results, pre_periods
+        )
+        weights = self._get_violation_weights(n_pre, relative_times=relative_times)
 
         # Compute MDV
         mdv = self._compute_mdv(weights, vcov)
@@ -998,6 +1676,7 @@ def power_curve(
             alpha=self.alpha,
             target_power=self.target_power,
             violation_type=self.violation_type,
+            pretest_form=self.pretest_form,
         )
 
     def sensitivity_to_honest_did(
@@ -1028,22 +1707,30 @@ def sensitivity_to_honest_did(
         """
         pt_results = self.fit(results, pre_periods=pre_periods)
         mdv = pt_results.mdv
+        # Level-scale scalar for comparison against the level-scale
+        # per-period SEs. PR-B Step 4: raw `mdv` for `linear` violations
+        # is now Roth's γ units (a slope); the level-scale quantity is
+        # `mdv * max(|violation_weights|)`. See PreTrendsPowerResults.
+        max_abs_pre_violation = pt_results.max_abs_pre_violation
 
-        # The MDV represents the size of violation the test could detect
+        # The MDV represents the size of violation the test could detect.
         # In HonestDiD's relative magnitudes framework, M=1 means
-        # post-treatment violations can be as large as the max pre-period violation
-        # The MDV gives us a sense of how large that max violation could be
+        # post-treatment violations can be as large as the max pre-period
+        # violation. ``max_abs_pre_violation`` gives us that level-scale
+        # number directly.
 
         max_pre_se = np.max(pt_results.pre_period_ses)
 
         interpretation = []
         interpretation.append(f"Minimum Detectable Violation (MDV): {mdv:.4f}")
+        interpretation.append(f"Max pre-period level deviation at MDV: {max_abs_pre_violation:.4f}")
         interpretation.append(f"Max pre-period SE: {max_pre_se:.4f}")
 
-        if np.isfinite(mdv):
-            # Ratio of MDV to max SE - gives sense of how many SEs the MDV is
-            mdv_in_ses = mdv / max_pre_se if max_pre_se > 0 else np.inf
-            interpretation.append(f"MDV / max(SE): {mdv_in_ses:.2f}")
+        if np.isfinite(max_abs_pre_violation):
+            # Ratio of max-level-deviation to max SE — how many SEs the
+            # largest pre-period violation under the MDV would be.
+            mdv_in_ses = max_abs_pre_violation / max_pre_se if max_pre_se > 0 else np.inf
+            interpretation.append(f"Max level deviation / max(SE): {mdv_in_ses:.2f}")
 
             if mdv_in_ses < 1:
                 interpretation.append("→ Pre-trends test is fairly sensitive to violations.")
@@ -1062,8 +1749,13 @@ def sensitivity_to_honest_did(
 
         return {
             "mdv": mdv,
+            "max_abs_pre_violation": float(max_abs_pre_violation),
             "max_pre_se": max_pre_se,
-            "mdv_in_ses": mdv / max_pre_se if max_pre_se > 0 and np.isfinite(mdv) else np.inf,
+            "mdv_in_ses": (
+                max_abs_pre_violation / max_pre_se
+                if max_pre_se > 0 and np.isfinite(max_abs_pre_violation)
+                else np.inf
+            ),
             "interpretation": "\n".join(interpretation),
         }
 
@@ -1080,6 +1772,8 @@ def compute_pretrends_power(
     target_power: float = 0.80,
     violation_type: str = "linear",
     pre_periods: Optional[List[int]] = None,
+    violation_weights: Optional[np.ndarray] = None,
+    pretest_form: Literal["nis", "wald"] = "nis",
 ) -> PreTrendsPowerResults:
     """
     Convenience function for pre-trends power analysis.
@@ -1095,21 +1789,21 @@ def compute_pretrends_power(
     target_power : float, default=0.80
         Target power for MDV calculation.
     violation_type : str, default='linear'
-        Type of violation pattern. This convenience helper supports
-        ``linear`` / ``constant`` / ``last_period`` only and does NOT
-        accept ``violation_weights``, so passing
-        ``violation_type='custom'`` will raise ``ValueError`` from the
-        underlying ``PreTrendsPower`` constructor (which requires
-        ``violation_weights`` when ``violation_type='custom'``). To use a
-        custom violation pattern, instantiate ``PreTrendsPower(...,
-        violation_weights=...)`` directly. Note that
-        ``PreTrendsPowerResults.power_at()`` on such a fit raises
-        ``NotImplementedError`` because fitted weights are not yet
-        persisted on the result object; refit with the new ``M`` instead.
-        Both gaps are tracked in TODO.md until the follow-up audit lands.
+        Type of violation pattern: ``linear`` / ``constant`` / ``last_period``
+        / ``custom``. For ``custom``, also pass ``violation_weights``.
     pre_periods : list of int, optional
         Explicit list of pre-treatment periods. If None, attempts to infer
         from results. Use when you've estimated all periods as post_periods.
+    violation_weights : np.ndarray, optional
+        Custom violation pattern weights. Required when
+        ``violation_type='custom'``; ignored for other violation types.
+    pretest_form : {'nis', 'wald'}, default='nis'
+        Pretest acceptance-region form. ``'nis'`` (default) implements Roth
+        (2022) Section II.A-B no-individually-significant box probability via
+        ``scipy.stats.multivariate_normal.cdf``; ``'wald'`` is the
+        noncentral-chi-squared form retained for backwards compatibility with
+        the pre-PR-B shipped numerical output (also a paper-supported
+        alternative under Propositions 1+3+4).
 
     Returns
     -------
@@ -1130,6 +1824,8 @@ def compute_pretrends_power(
         alpha=alpha,
         power=target_power,
         violation_type=violation_type,
+        violation_weights=violation_weights,
+        pretest_form=pretest_form,
     )
     return pt.fit(results, M=M, pre_periods=pre_periods)
 
@@ -1140,6 +1836,8 @@ def compute_mdv(
     target_power: float = 0.80,
     violation_type: str = "linear",
     pre_periods: Optional[List[int]] = None,
+    violation_weights: Optional[np.ndarray] = None,
+    pretest_form: Literal["nis", "wald"] = "nis",
 ) -> float:
     """
     Compute minimum detectable violation.
@@ -1153,21 +1851,17 @@ def compute_mdv(
     target_power : float, default=0.80
         Target power for MDV calculation.
     violation_type : str, default='linear'
-        Type of violation pattern. This convenience helper supports
-        ``linear`` / ``constant`` / ``last_period`` only and does NOT
-        accept ``violation_weights``, so passing
-        ``violation_type='custom'`` will raise ``ValueError`` from the
-        underlying ``PreTrendsPower`` constructor (which requires
-        ``violation_weights`` when ``violation_type='custom'``). To use a
-        custom violation pattern, instantiate ``PreTrendsPower(...,
-        violation_weights=...)`` directly. Note that
-        ``PreTrendsPowerResults.power_at()`` on such a fit raises
-        ``NotImplementedError`` because fitted weights are not yet
-        persisted on the result object; refit with the new ``M`` instead.
-        Both gaps are tracked in TODO.md until the follow-up audit lands.
+        Type of violation pattern: ``linear`` / ``constant`` / ``last_period``
+        / ``custom``. For ``custom``, also pass ``violation_weights``.
     pre_periods : list of int, optional
         Explicit list of pre-treatment periods. If None, attempts to infer
         from results. Use when you've estimated all periods as post_periods.
+    violation_weights : np.ndarray, optional
+        Custom violation pattern weights. Required when
+        ``violation_type='custom'``; ignored for other violation types.
+    pretest_form : {'nis', 'wald'}, default='nis'
+        Pretest acceptance-region form. See ``compute_pretrends_power`` and
+        ``PreTrendsPower`` for the NIS-vs-Wald discussion.
 
     Returns
     -------
@@ -1178,6 +1872,8 @@ def compute_mdv(
         alpha=alpha,
         power=target_power,
         violation_type=violation_type,
+        violation_weights=violation_weights,
+        pretest_form=pretest_form,
     )
     result = pt.fit(results, pre_periods=pre_periods)
     return result.mdv
diff --git a/diff_diff/sun_abraham.py b/diff_diff/sun_abraham.py
index c33569e6..56040429 100644
--- a/diff_diff/sun_abraham.py
+++ b/diff_diff/sun_abraham.py
@@ -91,6 +91,17 @@ class SunAbrahamResults:
     )
     # Survey design metadata (SurveyMetadata instance from diff_diff.survey)
     survey_metadata: Optional[Any] = field(default=None)
+    # Full event-study VCV matrix (PR-B 2026-05-17 for PreTrendsPower
+    # canonical Σ_22 fidelity). Built via W @ vcov_cohort @ W.T where W
+    # is the |event_times| × n_interactions cohort-aggregation matrix.
+    # Set to None for bootstrap fits (analytical VCV is invalidated by
+    # bootstrap SE overrides) and for replicate-weight survey fits
+    # (analytical vcov_cohort is overridden by replicate refit variance).
+    # Consumed by ``compute_pretrends_power`` to route SA through the full
+    # pre-period sub-Σ_22 block. Index keys mirror the relative-time labels
+    # in ``event_study_vcov_index``.
+    event_study_vcov: Optional["np.ndarray"] = field(default=None, repr=False)
+    event_study_vcov_index: Optional[list] = field(default=None, repr=False)
 
     # --- Inference-field aliases (balance/external-adapter compatibility) ---
     @property
@@ -768,6 +779,36 @@ def _refit_sa(w_r):
             survey_df=_sa_survey_df,
         )
 
+        # Build full event-study VCV via W-matrix aggregation (PR-B 2026-05-17).
+        # event_study_effects[e] = Σ_g w_{g,e} * cohort_effects[(g, e)] with
+        # w_{g,e} = cohort_weights[e][g]. The full event-study VCV is
+        #   event_study_vcov = W @ vcov_cohort @ W.T
+        # where W is the |event_times| × n_interactions sparse aggregation matrix
+        # whose row i has nonzero entries only at columns j = coef_index_map[(g, e_i)]
+        # for cohorts g appearing in cohort_weights[e_i]. The diagonal entry
+        # [i, i] of this product reproduces the existing per-event-time SE
+        # computation in _compute_iw_effects (weight_vec @ vcov_subset @ weight_vec);
+        # the off-diagonals give Cov(β̂_{e_i}, β̂_{e_k}) which is what
+        # ``compute_pretrends_power`` needs to consume full Σ_22 instead of
+        # falling back to diag(ses^2).
+        es_vcov_index: Optional[List[int]] = None
+        es_vcov: Optional[np.ndarray] = None
+        if cohort_weights:
+            es_vcov_index = sorted(cohort_weights.keys())
+            n_event_times = len(es_vcov_index)
+            n_interactions = vcov_cohort.shape[0]
+            W_mat = np.zeros((n_event_times, n_interactions))
+            for i, e in enumerate(es_vcov_index):
+                for g, w in cohort_weights[e].items():
+                    # Defensive: only populate when the (g, e) coefficient
+                    # actually exists (cohorts with zero observations at e
+                    # are filtered upstream by _compute_iw_effects but we
+                    # guard explicitly here for clarity).
+                    if (g, e) in coef_index_map:
+                        j = coef_index_map[(g, e)]
+                        W_mat[i, j] = w
+            es_vcov = W_mat @ vcov_cohort @ W_mat.T
+
         # Compute overall ATT (average of post-treatment effects)
         overall_att, overall_se = self._compute_overall_att(
             df,
@@ -904,6 +945,15 @@ def _refit_sa_cohort(w_r):
                 "weight": weight,
             }
 
+        # Clear analytical event_study_vcov when bootstrap or replicate-weight
+        # survey overrides the analytical SEs. Mirrors the CS pattern at
+        # staggered.py:2032-2036 — prevents mixing analytical VCV with
+        # bootstrap/replicate SEs downstream in PreTrendsPower (which would
+        # silently produce mis-scaled MDV/power output).
+        if bootstrap_results is not None or _uses_replicate_sa:
+            es_vcov = None
+            es_vcov_index = None
+
         # Store results
         self.results_ = SunAbrahamResults(
             event_study_effects=event_study_effects,
@@ -924,6 +974,8 @@ def _refit_sa_cohort(w_r):
             bootstrap_results=bootstrap_results,
             cohort_effects=cohort_effects_storage,
             survey_metadata=survey_metadata,
+            event_study_vcov=es_vcov,
+            event_study_vcov_index=es_vcov_index,
         )
 
         self.is_fitted_ = True
diff --git a/docs/api/_autosummary/diff_diff.PreTrendsPowerCurve.rst b/docs/api/_autosummary/diff_diff.PreTrendsPowerCurve.rst
index 64584465..aa679532 100644
--- a/docs/api/_autosummary/diff_diff.PreTrendsPowerCurve.rst
+++ b/docs/api/_autosummary/diff_diff.PreTrendsPowerCurve.rst
@@ -28,4 +28,5 @@
       ~PreTrendsPowerCurve.alpha
       ~PreTrendsPowerCurve.target_power
       ~PreTrendsPowerCurve.violation_type
+      ~PreTrendsPowerCurve.pretest_form
 
diff --git a/docs/api/_autosummary/diff_diff.PreTrendsPowerResults.rst b/docs/api/_autosummary/diff_diff.PreTrendsPowerResults.rst
index cfbbe639..247da6e6 100644
--- a/docs/api/_autosummary/diff_diff.PreTrendsPowerResults.rst
+++ b/docs/api/_autosummary/diff_diff.PreTrendsPowerResults.rst
@@ -26,6 +26,7 @@
    .. autosummary::
 
       ~PreTrendsPowerResults.is_informative
+      ~PreTrendsPowerResults.max_abs_pre_violation
       ~PreTrendsPowerResults.original_results
       ~PreTrendsPowerResults.power_adequate
       ~PreTrendsPowerResults.power
@@ -41,4 +42,8 @@
       ~PreTrendsPowerResults.pre_period_effects
       ~PreTrendsPowerResults.pre_period_ses
       ~PreTrendsPowerResults.vcov
+      ~PreTrendsPowerResults.pretest_form
+      ~PreTrendsPowerResults.nis_box_probability
+      ~PreTrendsPowerResults.violation_weights
+      ~PreTrendsPowerResults.covariance_source
 
diff --git a/docs/api/pretrends.rst b/docs/api/pretrends.rst
index 8924cb13..61a2716e 100644
--- a/docs/api/pretrends.rst
+++ b/docs/api/pretrends.rst
@@ -54,12 +54,25 @@ Example
                        time='period', unit='unit_id',
                        post_periods=[5, 6, 7], reference_period=4)
 
-   # Compute pre-trends power for linear violations
+   # Compute pre-trends power for linear violations.
+   # Default acceptance region is the Roth (2022) NIS box probability.
    pt = PreTrendsPower(alpha=0.05, power=0.80, violation_type='linear')
    pt_results = pt.fit(results)
 
    print(f"MDV: {pt_results.mdv:.3f}")
    print(f"Power: {pt_results.power:.2%}")
+   print(f"NIS box probability (accept H0): {pt_results.nis_box_probability:.4f}")
+
+   # Select the Wald (noncentral-χ²) acceptance-region form instead of the
+   # default NIS box probability. Wald preserves the pre-PR-B acceptance-
+   # region math byte-identically; numerical-output bit-identity to pre-PR-B
+   # fitted results only holds on regular pre-period grids and on the
+   # legacy `relative_times=None` path. PR-B Step 4's `relative_times`
+   # threading applies to BOTH NIS and Wald, so on irregular grids the
+   # Wald MDV is also in Roth's γ units (see REGISTRY linear-pattern Note).
+   pt_wald = PreTrendsPower(
+       alpha=0.05, power=0.80, violation_type='linear', pretest_form='wald'
+   )
 
 PreTrendsPowerResults
 ---------------------
@@ -125,7 +138,9 @@ The module supports several types of pre-trends violations:
    ``delta[-1] = M``, all other pre-periods are zero.
 
 **custom**
-   User-specified violation pattern via the ``custom_delta`` parameter.
+   User-specified violation pattern via the ``violation_weights`` parameter.
+   Accepted by both ``PreTrendsPower`` (constructor kwarg) and the convenience
+   helpers ``compute_pretrends_power`` / ``compute_mdv`` (forwarded kwarg).
 
 Complete Example
 ----------------
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 195601ee..3efdcdbc 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -2770,66 +2770,90 @@ CRITICAL: δ_pre = β_pre pins pre-treatment violations to observed coefficients
 
 ## PreTrendsPower
 
-**Primary source:** [Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. *American Economic Review: Insights*, 4(3), 305-322.](https://doi.org/10.1257/aeri.20210236). Paper review on file: `docs/methodology/papers/roth-2022-review.md` (non-authoritative source audit; this REGISTRY entry remains the authoritative methodology contract).
+**Primary source:** [Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. *American Economic Review: Insights*, 4(3), 305-322.](https://doi.org/10.1257/aeri.20210236). Paper review on file: `docs/methodology/papers/roth-2022-review.md`.
 
 **Key implementation requirements:**
 
 *Assumption checks / warnings:*
-- Requires specification of variance-covariance matrix of pre-treatment estimates
-- Warns if pre-trends test has low power (uninformative)
-- Different violation types have different power properties
+- Requires specification of variance-covariance matrix Σ_22 of pre-period coefficients
+- Pre-trend zero-anticipation: τ_pre = 0 (so β̂_pre estimates δ_pre directly) — same convention as Rambachan-Roth (2023) HonestDiD
+- Warns if pre-trends test has low power (uninformative) relative to typical effect sizes
+- Different violation types and pretest forms have different power properties
 
-*Estimator equation (as implemented):*
+*Estimator equation (primary form — NIS box probability; Roth 2022 Section II.A-B):*
+
+The paper-analyzed pretest is the **no-individually-significant (NIS)** test: reject parallel trends if any pre-period coefficient lies outside its own (1 - α) CI. The acceptance region is
 
-Pre-trends test statistic (Wald):
 ```
-W = δ̂_pre' V̂_pre^{-1} δ̂_pre ~ χ²(k)
+B_NIS(Σ) = { b ∈ R^K : |b_t| ≤ z_{1-α/2} · σ_t,  for all t ∈ pre-periods }
 ```
 
-Power function:
+Under H1 with violation `δ_pre = M · weights` and `β̂_pre ~ N(δ_pre, Σ_22)`, the rejection probability is computed via the centered change-of-variable `Y = β̂_pre - δ_pre ~ N(0, Σ_22)`:
+
 ```
-Power(δ_true) = P(W > χ²_{α,k} | δ = δ_true)
+Power(δ_pre) = 1 - P( Y_t ∈ [-z·σ_t - δ_t, z·σ_t - δ_t]  for all t )
+             = 1 - F_MVN(upper, lower; mean=0, cov=Σ_22)
 ```
 
-Minimum detectable violation (MDV):
+where `F_MVN` is the multivariate normal CDF over the rectangular box. Computed via `scipy.stats.multivariate_normal.cdf(upper, lower_limit=lower, mean=zeros, cov=Σ_22, allow_singular=True)` (Genz method; supports K up to ~20). Falls back to MC simulation (N=20000 draws) when the analytical CDF returns NaN on degenerate Σ.
+
+MDV: solve `Power(γ · weights) = target_power` for γ via doubling expansion + `optimize.brentq` bisection. Non-convergence cap at γ_high = 1000 returns `np.inf`.
+
+*Estimator equation (paper-supported alternative — Wald pretest form):*
+
 ```
-MDV(power=0.8) = min{|δ| : Power(δ) ≥ 0.8}
+W = δ̂_pre' Σ_22^{-1} δ̂_pre ~ χ²(K)
+Power(δ_pre) = 1 - F_ncχ²(c_α; K, λ),  where λ = δ_pre' Σ_22^{-1} δ_pre
+                                        (noncentrality parameter)
 ```
 
+The Wald acceptance region is a convex ellipsoid, so Propositions 1+3+4 of Roth (2022) all apply. Retained for backwards compatibility with the pre-PR-B shipped numerical output (Wald was the implicit default before PR-B 2026-05-17). Activated via `pretest_form='wald'`.
+
 Violation types:
-- **Linear**: δ_t = c × t (linear pre-trend)
-- **Constant**: δ_t = c (level shift)
-- **Last period**: δ_{-1} = c, others zero
-- **Custom**: user-specified pattern
+- **Linear**: `δ_t = γ · t` (Roth's slope convention). When `relative_times` is threaded through `fit()`, weights = `|t|` directly with no L2 normalization, so the reported MDV is in Roth's γ units.
+- **Constant**: `δ_t = c` (level shift)
+- **Last period**: `δ_{-1} = c`, others zero
+- **Custom**: user-specified `violation_weights` pattern
 
-- **Note (deviation from paper — `linear` violation pattern):** the shipped `PreTrendsPower._get_violation_weights("linear")` constructs `[n_pre-1, ..., 1, 0]` from `n_pre` alone and `PreTrendsPower.fit()` never threads the actual relative-time labels into that construction (`pretrends.py:488-531`, `pretrends.py:862-866`). For irregular or anticipation-shifted pre-period grids (e.g., `t ∈ {-5, -3, -1}`), this means the slope reported as MDV is NOT in Roth's `γ` units — the shifted/normalized direction effectively assumes contiguous relative times `{-(n_pre-1), ..., -1}`. The follow-up audit (tracked in TODO.md) will either rebuild `linear` weights from the sorted actual relative-time values and expose the parameter in Roth's `γ` units, or formally retain the current shifted/normalized contract with this Note as the deviation record.
+- **Note (paper-supported alternative — Wald pretest form):** the library retains the Wald noncentral-χ² form as `pretest_form='wald'`. NIS is the paper's primary analysis convention (used for all 12 surveyed papers' empirical exercises in Section I), but the Wald form is also a paper-supported alternative: Roth's Propositions 1, 3, and 4 apply to any (measurable) acceptance region for the conditional moments (Props 1+3) and to any convex acceptance region for the variance-reduction guarantee (Prop 4). The Wald ellipsoid is convex, so all four propositions apply. Wald is faster (no MVN CDF call) and matches the pre-PR-B shipped numerical baseline. Use Wald for backwards-compat / speed; use NIS for canonical paper alignment and R `pretrends` parity.
 
-- **Note (silent-failure guard — `power_at()` with `violation_type="custom"`):** `PreTrendsPowerResults` does not currently persist the fitted `violation_weights`, so `power_at(M)` cannot reconstruct the custom direction. As of this commit, `PreTrendsPowerResults.power_at()` raises `NotImplementedError` for `violation_type="custom"` rather than silently returning equal-weights output. To compute power at a new `M` for a custom fit, refit `PreTrendsPower(violation_type="custom", violation_weights=...)` with the new `M`. Tracked in TODO.md as a planned follow-up to persist the fitted weights and lift the guard.
+- **Note (convention — `linear` violation pattern, γ-unit MDV):** `_get_violation_weights('linear')` consumes actual pre-period relative-time labels threaded through `fit()` (PR-B 2026-05-17 resolution of the PR-A linear-pattern deviation). When `relative_times` is provided (e.g., `[-3, -2, -1]` for a regular grid or `[-5, -3, -1]` for an irregular grid), weights = `|t|` directly with NO L2 normalization, so `δ_pre = M · |t|` reflects Roth's `δ_t = γ · t` convention and the reported MDV equals γ. Callers that bypass `fit()` and supply only `n_pre` retain the previous count-based, L2-normalized `[n_pre-1, ..., 0]` direction (preserves shipped Wald numerical baselines for unit tests). **MPD period-label coverage:** for `MultiPeriodDiDResults`, the relative-time derivation in `_extract_pre_period_params` supports numeric labels (`int` / `float` / `np.int64`) and `pandas.Period` / `pandas.Timestamp` / `np.datetime64` (via Period or Timedelta arithmetic with units of frequency / days respectively). For genuinely non-numeric or unordered labels (string period IDs, unranked categoricals), the helper emits an explicit `UserWarning` and falls back to the legacy count-based normalized direction — the reported MDV is then NOT in Roth's γ units. Users on string period IDs who need γ-unit MDV should re-fit with numeric labels.
 
 *Standard errors:*
-- Power calculations are exact (no sampling variability)
-- Uncertainty comes from estimated Σ
+- Power calculations are exact (no sampling variability — power is computed against a hypothesized population trend, not estimated)
+- Uncertainty comes from the user-supplied Σ_22
+- Footnote 7 equivariance: the distribution of `β̂_post` conditional on `β̂_pre` passing the pretest is equivariant w.r.t. `τ_post` (Roth 2022 Section I.C); MDV/power do not depend on the value of `τ_post`
 
 *Edge cases:*
-- Perfect collinearity in pre-periods: test not well-defined
-- Single pre-period: power calculation trivial
-- Very high power: MDV approaches zero
+- Perfect collinearity in pre-periods: test not well-defined; `multivariate_normal.cdf(allow_singular=True)` may return NaN — MC simulation fallback kicks in.
+- Single pre-period (K=1): NIS power reduces to a univariate normal-tail probability; closed-form match with Roth Section II.B Proposition 2 proof: `E[β̂_pre | β̂_pre ∈ B_NIS] - β_pre ∝ φ(-z - β_pre/σ) - φ(z - β_pre/σ)`.
+- Very high power: MDV approaches zero.
+- Symmetric two-sided pretests under parallel trends: `β̂_post` remains unbiased for `τ_post` (Roth Section II.B paragraph after Prop 1 — `E[β̂_pre | β̂_pre ∈ B] = 0` if B is symmetric and `β_pre = 0`).
+
+- **Note (deviation from paper — diagonal pre-period VCV fallback, bootstrap-only after PR-B):** Roth (2022)'s power and bias objects operate on the full pre-period covariance block Σ_22. After PR-B 2026-05-17, the shipped `compute_pretrends_power` adapter consumes full Σ_22 on the non-bootstrap paths for ALL three result types:
+  - `MultiPeriodDiDResults`: full pre-period sub-block from `results.vcov` when `interaction_indices` is populated; diag fallback only when `interaction_indices` is None.
+  - `CallawaySantAnnaResults`: full `event_study_vcov` sub-block on non-bootstrap fits (the matrix is persisted at `staggered_results.py:126-128`). Bootstrap CS fits clear `event_study_vcov` at `staggered.py:2032-2036` to prevent mixing analytical VCV with bootstrap SEs, so they fall through to `diag(ses^2)`.
+  - `SunAbrahamResults`: full `event_study_vcov` sub-block on non-bootstrap fits, constructed in `sun_abraham.py` via `W @ vcov_cohort @ W.T` where W is the cohort-aggregation matrix (PR-B Step 3 SA extension). Bootstrap SA fits and replicate-weight survey fits clear `event_study_vcov` for the same reason as CS.
 
-- **Note (deviation from paper — diagonal pre-period VCV fallback):** Roth (2022)'s power and bias objects (both the paper-analyzed NIS box probability and the library's Wald / noncentral-χ² form) operate on the full pre-period covariance block Σ_22. The shipped `compute_pretrends_power` adapter currently uses different sources for the pre-period covariance by result type:
-  - `MultiPeriodDiDResults` (`pretrends.py:592-601`): extracts the full pre-period sub-block from `results.vcov` when `interaction_indices` is populated; falls back to `diag(ses^2)` otherwise.
-  - `CallawaySantAnnaResults` (`pretrends.py:609-652`): hard-codes `vcov = diag(ses^2)`. Non-bootstrap CS fits persist a full `event_study_vcov` matrix (`staggered_results.py:126-128`), so the diag fallback is a deliberate choice in that path. Bootstrap CS fits clear `event_study_vcov` before storing results (`staggered.py:2032-2036`) to prevent mixing analytical VCV with bootstrap SEs, so the full-Σ22 route is not available for bootstrap fits at all.
-  - `SunAbrahamResults` (`pretrends.py:660-687`): hard-codes `vcov = diag(ses^2)`; the diag fallback is *forced* because `SunAbrahamResults` does not currently expose an event-study or cohort covariance matrix.
+  The diag-fallback path is therefore reserved for cases where the analytical VCV is genuinely unavailable (bootstrap fits, replicate-weight survey fits, MPD without `interaction_indices`). In those cases dropping off-diagonals is documented as a non-paper approximation — not provably conservative, since the direction of the discrepancy with the full-Σ_22 calc depends on the sign and magnitude of the dropped correlations. See `docs/methodology/papers/roth-2022-review.md` for the full derivation.
 
-  Dropping the off-diagonals is NOT a paper-supported numerical choice and is NOT guaranteed to be conservative for MDV/power (the direction of the discrepancy depends on the sign and magnitude of the dropped correlations). The PR-B follow-up audit (tracked in `TODO.md`) will either extend full-sub-VCV consumption to all three paths (with SA also requiring upstream surface work on `SunAbrahamResults`) or formally retain the diag fallback with explicit miscalibration framing. See `docs/methodology/papers/roth-2022-review.md` for the full derivation.
+- **Backwards-compat addendum (`power_at()` for `violation_type='custom'`):** `PreTrendsPowerResults` now persists `violation_weights` on fresh fits (PR-B Step 5), so `power_at(M)` works for all four violation types including custom. Old serialized results from before PR-B's field addition have `violation_weights=None`; for those legacy results, `power_at(M)` falls back to weight reconstruction from `violation_type + n_pre_periods`, but for `violation_type='custom'` the custom weights cannot be reconstructed and `power_at(M)` raises `NotImplementedError` with a "refit with current library version" message. Fresh fits do not hit this guard.
 
 **Reference implementation(s):**
-- R: `pretrends` package (Roth's official package)
+- R: [`pretrends`](https://github.com/jonathandroth/pretrends) (Roth's official package). NIS-based (`pretrends()`, `slope_for_power()`, `*_NIS` helpers). R-parity goldens deferred to PR-C; the generator script `benchmarks/R/generate_pretrends_golden.R` ships in PR-B with a placeholder commit reference pending an R-package revision pin.
+- R dependency: [`tmvtnorm`](https://cran.r-project.org/package=tmvtnorm) (Manjunath & Wilhelm 2012) — used by R `pretrends` for truncated multivariate normal moments. The Python library uses `scipy.stats.multivariate_normal.cdf` directly for the box probability (does not require a `tmvtnorm` port).
 
 **Requirements checklist:**
-- [ ] MDV = minimum detectable violation at target power level
-- [ ] Violation types: linear, constant, last_period, custom all implemented
-- [ ] Power curve plotting over violation magnitudes
-- [ ] Integrates with HonestDiD for combined sensitivity analysis
+- [x] NIS box probability implemented via scipy MVN CDF (PR-B)
+- [x] Wald form retained as paper-supported alternative under `pretest_form='wald'` (PR-B)
+- [x] Non-bootstrap CS/SA route through full `event_study_vcov` sub-block (PR-B Step 3)
+- [x] Linear-violation weights honor actual relative-time labels → γ-unit MDV (PR-B Step 4)
+- [x] Custom-violation weights persisted on `PreTrendsPowerResults`; `power_at(custom)` works on fresh fits (PR-B Step 5)
+- [x] Helper API (`compute_pretrends_power` / `compute_mdv`) supports `violation_weights` + `pretest_form` (PR-B Step 6)
+- [x] Methodology test file with paper-equation-numbered Verified Components walk-through (PR-B Step 7 — `tests/test_methodology_pretrends.py`)
+- [ ] R `pretrends` parity at pinned commit (deferred to PR-C; generator script committed in PR-B)
+- [x] Power curve plotting over violation magnitudes (preserved from pre-PR-B)
+- [x] Integrates with HonestDiD for combined sensitivity analysis (preserved from pre-PR-B)
 
 ---
 
diff --git a/docs/methodology/REPORTING.md b/docs/methodology/REPORTING.md
index f459dc8a..ea16530f 100644
--- a/docs/methodology/REPORTING.md
+++ b/docs/methodology/REPORTING.md
@@ -298,7 +298,16 @@ a library setting.
   while `schema["pre_trends"]["power_status"]` carries the
   machine-readable enum (`"ran"` / `"skipped"` / `"error"` /
   `"not_applicable"`). BusinessReport then reads
-  `mdv_share_of_att = mdv / abs(att)` and selects a tier:
+  `mdv_share_of_att = max_abs_pre_violation / abs(att)` and selects a tier.
+  The numerator is the **level-scale max pre-period violation under the
+  MDV**, computed as `mdv * max(|violation_weights|)` — NOT the raw `mdv`
+  scalar. Post PR-B Step 4, raw `mdv` for `violation_type='linear'` is in
+  Roth's γ units (a slope on relative time), so comparing it directly to
+  a level-scale `|att|` would mix units on irregular pre-period grids and
+  mis-tier the result. The level-scale quantity is exposed via the new
+  `PreTrendsPowerResults.max_abs_pre_violation` property and the
+  `DiagnosticReport.pretrends_power` block schema field of the same name.
+  Tier thresholds:
 
   - `< 0.25` &rarr; `well_powered` &mdash; "the test has 80% power to
     detect a violation of magnitude M, which is only X% of the
@@ -321,27 +330,44 @@ a library setting.
   The library already ships `compute_pretrends_power()`, so using it
   is the honest default rather than hedging every non-violation.
 
-- **Note:** Diagonal-covariance fallback for staggered-estimator power.
-  `compute_pretrends_power()` currently drops to `np.diag(ses**2)` for
-  CS / SA / ImputationDiD / Stacked / etc. even when the full
-  `event_study_vcov` is attached on the result. The
-  `DiagnosticReport.pretrends_power` block records
-  `covariance_source: "diag_fallback_available_full_vcov_unused"` in
-  that case, and `BusinessReport` downgrades a `well_powered` tier to
-  `moderately_powered` before rendering prose. This is a documented
-  deviation from the paper-derived "use the full pre-period covariance"
-  position. **Not provably conservative**: under Roth (2022)'s NIS
-  framework and the library's Wald form, the MDV/power objects depend
-  on the off-diagonals of Σ_22, and the direction of the discrepancy
-  between full-Σ_22 and diag(ses^2) depends on the sign and magnitude
-  of the dropped correlations — see the `**Note (deviation from paper
-  — diagonal pre-period VCV fallback):**` block under `## PreTrendsPower`
-  in `docs/methodology/REGISTRY.md`. The `well_powered → moderately_powered`
-  downgrade in BusinessReport reduces the chance of an overly optimistic
-  claim in practice, but it is not a proof of conservatism. The right
-  long-term fix is to teach `compute_pretrends_power()` to consume
-  `event_study_vcov` and `event_study_vcov_index`; until that lands the
-  downgrade stays.
+- **Note:** Pre-period covariance routing for staggered-estimator power.
+  As of the PR-B PreTrendsPower implementation audit (Roth 2022),
+  `compute_pretrends_power()` consumes the full `event_study_vcov`
+  sub-block when it is available — non-bootstrap CS fits
+  (`staggered_results.py` populates the matrix) and non-bootstrap SA
+  fits (`sun_abraham.py` builds it via `W @ vcov_cohort @ W.T`). The
+  `PreTrendsPowerResults.covariance_source` field records the actual
+  extraction path (`"full_pre_period_vcov"` vs `"diag_fallback"`), and
+  the `DiagnosticReport.pretrends_power` block surfaces that label
+  unchanged. There are two paths through the report layer with
+  different downgrade semantics:
+
+  - **New fits** (post-PR-B, `PreTrendsPowerResults.covariance_source`
+    is populated): `DiagnosticReport` reads the persisted label
+    directly. Non-bootstrap CS / SA fits report
+    `"full_pre_period_vcov"` and are NOT downgraded; bootstrap /
+    replicate-weight paths report `"diag_fallback"` and also pass
+    through unchanged (no "available but unused" concern — the
+    estimator did its best with what was available).
+  - **Legacy serialized results** (pre-PR-B, no
+    `covariance_source` field on the object): the report layer falls
+    back to type-based inference in
+    `_infer_cov_source(source_fit)`. For event-study result types
+    (CS / SA / etc.) with populated `event_study_vcov`, the legacy-
+    ambiguous case still emits the conservative
+    `"diag_fallback_available_full_vcov_unused"` sentinel and the
+    `well_powered → moderately_powered` downgrade still applies —
+    because without the persisted provenance we cannot rule out that
+    the stored power was computed from `diag(ses^2)` under PR-A
+    semantics. For `MultiPeriodDiDResults` without
+    `interaction_indices`, the legacy fallback reports
+    `"diag_fallback"` (a genuine fallback, not the "available but
+    unused" case, so no downgrade applies).
+
+  Remaining `"diag_fallback"` cases on new fits — bootstrap /
+  replicate-weight CS and SA, plus ImputationDiD / Stacked /
+  EfficientDiD / TwoStageDiD — pass through unchanged because
+  nothing better is available on those result types yet.
 
 - **Note:** Unit-translation policy. BusinessReport does not
   arithmetically translate log-points to percents or level effects to
diff --git a/pyproject.toml b/pyproject.toml
index 4b0433b6..c37b06f9 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -47,7 +47,12 @@ classifiers = [
 dependencies = [
     "numpy>=1.20.0",
     "pandas>=1.3.0",
-    "scipy>=1.7.0",
+    # scipy>=1.10 required for scipy.stats.multivariate_normal.cdf(..., lower_limit=...)
+    # — used by diff_diff.pretrends._compute_nis_acceptance_prob for the
+    # rectangular box probability in Roth (2022) NIS pretest power. The
+    # lower_limit parameter was added in scipy 1.10 (release notes
+    # https://docs.scipy.org/doc/scipy/release/1.10.0-notes.html).
+    "scipy>=1.10",
 ]
 
 [project.optional-dependencies]
diff --git a/tests/test_business_report.py b/tests/test_business_report.py
index 276c75c6..cf96bc22 100644
--- a/tests/test_business_report.py
+++ b/tests/test_business_report.py
@@ -2420,6 +2420,113 @@ def test_center_downgrade_fires_on_real_cs_fit(self, cs_fit):
         # ``well_powered`` — centralized downgrade guarantees this.
         assert pp["tier"] != "well_powered"
 
+    def test_full_vcov_path_no_downgrade_on_real_cs_fit(self, cs_fit):
+        """PR-B R4 regression: when ``compute_pretrends_power`` actually
+        consumes the full ``event_study_vcov`` sub-block (PR-B Step 3),
+        the DR / BR layer must NOT downgrade ``well_powered``.
+
+        Exercises the live PR-B path on the deterministic ``cs_fit``
+        fixture (analytical non-bootstrap CS, ``seed=7``,
+        ``treatment_effect=1.5``). On this fixture the raw
+        ``mdv / |att|`` ratio is well under the ``0.25`` well_powered
+        threshold, so the expected tier is unconditionally
+        ``well_powered`` — no skip-on-different-tier branch (R6 codex:
+        previous version would silently bypass the key assertion if a
+        regression reintroduced the downgrade).
+
+        ``pretrends.py`` records
+        ``covariance_source='full_pre_period_vcov'`` on the result, which
+        the DR adapter consumes directly. The BR ``summary()`` prose
+        (the actual surface the well-powered phrasing is rendered on)
+        must contain the well-powered text and lack the conservative
+        moderately-informative text.
+        """
+        from diff_diff import BusinessReport, DiagnosticReport
+        from diff_diff.pretrends import compute_pretrends_power
+
+        fit, sdf = cs_fit
+        dr = DiagnosticReport(
+            fit,
+            data=sdf,
+            outcome="outcome",
+            unit="unit",
+            time="period",
+            first_treat="first_treat",
+        )
+        block = dr.to_dict()["pretrends_power"]
+        assert block.get("status") == "ran", "pretrends_power should run on cs_fit"
+
+        # Deterministic fixture pins (cs_fit at seed=7, treatment_effect=1.5):
+        # cov_source = full_pre_period_vcov; max_abs_pre_violation ≈ 0.375
+        # (γ * max(|t|) where pre-periods are [-4, -3, -2]); |att| ≈ 1.779;
+        # mdv_share_of_att ≈ 0.211, well under 0.25 → tier = well_powered.
+        # Codex R12 P1: this ratio is now `max_abs_pre_violation / |att|`,
+        # the level-scale max pre-period violation under the MDV (post-PR-B
+        # Step 4 linear MDV is in Roth's γ units, a slope; the level-scale
+        # comparable is mdv * max(|violation_weights|)).
+        assert block["covariance_source"] == "full_pre_period_vcov", (
+            "cs_fit is analytical CS with event_study_vcov populated — "
+            "PR-B routing must report full_pre_period_vcov"
+        )
+        # max_abs_pre_violation = mdv * max(|t|) = 0.0937 * 4 ≈ 0.375
+        assert block.get("max_abs_pre_violation") is not None
+        assert 0.35 < block["max_abs_pre_violation"] < 0.40, (
+            f"cs_fit max_abs_pre_violation={block['max_abs_pre_violation']} "
+            "should be ≈ 0.375 (γ ≈ 0.094 × max|t|=4)"
+        )
+        ratio = block["mdv_share_of_att"]
+        assert ratio is not None and ratio < 0.25, (
+            f"cs_fit mdv_share_of_att={ratio} (level-scale max_abs_pre_violation / "
+            "|att|) must be in the well_powered range (<0.25) for this assertion "
+            "to pin the no-downgrade contract"
+        )
+        assert (
+            block["tier"] == "well_powered"
+        ), "well-powered ratio must NOT be downgraded under the PR-B full-VCV path"
+
+        # Architectural fix: the same provenance label appears on the
+        # compute_pretrends_power output's persisted field, locking that
+        # provenance is recorded at fit time and consumed at the report
+        # layer (not re-inferred from the source-fit type).
+        pp = compute_pretrends_power(fit, alpha=0.05, target_power=0.80)
+        assert pp.covariance_source == "full_pre_period_vcov"
+
+        # Positive prose contract on the rendered surfaces.
+        br = BusinessReport(fit, data=sdf)
+        summary = br.summary()
+        full = br.full_report()
+        # Primary surface: summary() renders the tier prose.
+        assert "well-powered" in summary, (
+            "BR.summary() should surface well-powered phrasing under the "
+            "PR-B full-VCV no-downgrade path"
+        )
+        assert "moderately informative" not in summary
+        assert "moderately-informative" not in summary
+        # Secondary defensive check on full_report().
+        assert "moderately informative" not in full.lower()
+        assert "moderately-informative" not in full.lower()
+
+        # PR-B R14 P2: max_abs_pre_violation must round-trip through the
+        # BR schema lift AND render in full_report(). Pre-R14 the field
+        # was emitted by DR, the renderer printed it, but the BR lift
+        # boundary at `_lift_pre_trends` silently dropped it — so the
+        # rendered line never fired even though the renderer had the
+        # branch.
+        br_schema = br.to_dict()
+        pt_block = br_schema.get("pre_trends", {})
+        assert "max_abs_pre_violation" in pt_block, (
+            "BR.to_dict()['pre_trends'] must surface max_abs_pre_violation "
+            "post-PR-B R14 — _lift_pre_trends regression"
+        )
+        assert pt_block["max_abs_pre_violation"] is not None
+        assert np.isclose(pt_block["max_abs_pre_violation"], 0.375, atol=0.05)
+        # full_report() must render the new "Max pre-period level
+        # deviation at MDV" line.
+        assert "Max pre-period level deviation at MDV:" in full, (
+            "BR.full_report() must render the max_abs_pre_violation line "
+            "(renderer wired in R12; lift boundary fixed in R14)"
+        )
+
 
 class TestCSNotYetTreatedControlGroupSemantics:
     """Round-13 P1 regression: ``BusinessReport`` must not relabel
diff --git a/tests/test_diagnostic_report.py b/tests/test_diagnostic_report.py
index 810eca20..32e7a50d 100644
--- a/tests/test_diagnostic_report.py
+++ b/tests/test_diagnostic_report.py
@@ -375,17 +375,15 @@ def test_precomputed_pretrends_power_parity_with_default_path(self, cs_fit):
         assert default_block["tier"] == precomp_block["tier"]
         assert default_block["covariance_source"] == precomp_block["covariance_source"]
 
-    def test_precomputed_pretrends_power_downgrades_when_full_vcov_unused(self):
-        """Stub-based regression: when the source fit has both
-        ``event_study_vcov`` and ``event_study_vcov_index`` populated but
-        the diagonal fallback was used, the precomputed adapter must emit
-        ``covariance_source='diag_fallback_available_full_vcov_unused'`` and
-        downgrade a ``well_powered`` tier to ``moderately_powered`` — just
-        like the default compute path. Complements the live-fit parity test
-        by exercising the tier-bumping edge explicitly.
+    def test_precomputed_pretrends_power_persisted_full_vcov_no_downgrade(self):
+        """PR-B R3+R4 regression: a precomputed ``PreTrendsPowerResults``
+        carrying ``covariance_source='full_pre_period_vcov'`` (the value
+        ``compute_pretrends_power`` records post-PR-B) must NOT be
+        downgraded by ``DiagnosticReport``. Locks the new contract that
+        full-VCV CS / SA fits keep their ``well_powered`` tier.
         """
+        from diff_diff.pretrends import PreTrendsPowerResults
 
-        # Minimal CS-shaped stub with full vcov flagged.
         class _CSStub:
             overall_att = 1.0
             overall_se = 0.25
@@ -404,8 +402,66 @@ class _CSStub:
         stub = _CSStub()
         stub.__class__.__name__ = "CallawaySantAnnaResults"
 
-        class _PPStub:
-            mdv = 0.1  # |ATT| = 1.0 -> ratio = 0.1 -> well_powered before downgrade
+        pp = PreTrendsPowerResults(
+            power=0.80,
+            mdv=0.1,
+            violation_magnitude=0.1,
+            violation_type="linear",
+            alpha=0.05,
+            target_power=0.80,
+            n_pre_periods=2,
+            test_statistic=np.nan,
+            critical_value=1.96,
+            noncentrality=np.nan,
+            pre_period_effects=np.zeros(2),
+            pre_period_ses=np.ones(2),
+            vcov=np.eye(2),
+            original_results=stub,
+            covariance_source="full_pre_period_vcov",
+        )
+
+        dr = DiagnosticReport(stub, precomputed={"pretrends_power": pp})
+        block = dr.to_dict()["pretrends_power"]
+        assert block["status"] == "ran"
+        assert block["covariance_source"] == "full_pre_period_vcov"
+        assert block["tier"] == "well_powered"
+
+    def test_precomputed_pretrends_power_legacy_missing_field_still_downgraded(self):
+        """R4 regression: legacy ``PreTrendsPowerResults`` pre-PR-B has no
+        ``covariance_source`` field. We cannot tell from the source-fit
+        object whether the stored power was computed from full Σ_22 or
+        from the diag fallback (PR-A behavior was diag even when
+        ``event_study_vcov`` was attached). The adapter MUST treat the
+        missing-field case as legacy-ambiguous and apply the conservative
+        downgrade — otherwise an old serialized CS result silently
+        upgrades to ``well_powered``.
+
+        Pairs with the
+        ``test_precomputed_pretrends_power_persisted_full_vcov_no_downgrade``
+        positive case to lock both legs of the legacy-fallback contract.
+        """
+
+        # Minimal CS-shaped stub with full vcov populated.
+        class _CSStub:
+            overall_att = 1.0
+            overall_se = 0.25
+            overall_t_stat = 4.0
+            overall_p_value = 0.001
+            overall_conf_int = (0.5, 1.5)
+            alpha = 0.05
+            n_obs = 400
+            n_treated = 80
+            n_control = 320
+            survey_metadata = None
+            event_study_effects = None
+            event_study_vcov = np.eye(3)
+            event_study_vcov_index = {-2: 0, -1: 1, 0: 2}
+
+        stub = _CSStub()
+        stub.__class__.__name__ = "CallawaySantAnnaResults"
+
+        class _LegacyPPStub:
+            mdv = 0.1
             violation_type = "linear"
             alpha = 0.05
             target_power = 0.80
@@ -413,14 +469,127 @@ class _PPStub:
             power = 0.80
             n_pre_periods = 2
             original_results = stub
+            # No covariance_source attribute — simulates an old serialized
+            # PreTrendsPowerResults from a pre-PR-B fit.
 
-        dr = DiagnosticReport(stub, precomputed={"pretrends_power": _PPStub()})
+        dr = DiagnosticReport(stub, precomputed={"pretrends_power": _LegacyPPStub()})
         block = dr.to_dict()["pretrends_power"]
         assert block["status"] == "ran"
+        # Legacy-ambiguous → conservative sentinel + downgrade applies.
         assert block["covariance_source"] == "diag_fallback_available_full_vcov_unused"
-        # Downgrade must apply: pre-tier is well_powered, post-tier is moderately_powered.
         assert block["tier"] == "moderately_powered"
 
+    def test_precomputed_pretrends_power_legacy_mpd_without_interaction_indices_reports_diag(
+        self,
+    ):
+        """PR-B R5 regression: ``MultiPeriodDiDResults`` legacy fits without
+        ``interaction_indices`` truly take the ``np.diag(ses**2)`` fallback
+        inside ``pretrends.py:_extract_pre_period_params`` MPD branch. The
+        report-layer's ``_infer_cov_source`` fallback must surface that
+        accurately as ``"diag_fallback"`` rather than overclaiming
+        ``"full_pre_period_vcov"`` (MPD is not in the event-study type set,
+        so the previous non-event-study branch unconditionally returned
+        ``"full_pre_period_vcov"`` — wrong for MPD without interaction
+        indices).
+        """
+
+        class _LegacyMPDStub:
+            avg_att = 1.0
+            avg_se = 0.25
+            avg_t_stat = 4.0
+            avg_p_value = 0.001
+            avg_conf_int = (0.5, 1.5)
+            alpha = 0.05
+            n_obs = 400
+            n_treated = 80
+            n_control = 320
+            survey_metadata = None
+            # No interaction_indices, no full vcov — pretrends.py MPD
+            # branch falls through to diag(ses**2).
+            vcov = None
+            interaction_indices = None
+
+        stub = _LegacyMPDStub()
+        stub.__class__.__name__ = "MultiPeriodDiDResults"
+
+        class _LegacyPPStub:
+            mdv = 0.1
+            violation_type = "linear"
+            alpha = 0.05
+            target_power = 0.80
+            violation_magnitude = 0.1
+            power = 0.80
+            n_pre_periods = 2
+            original_results = stub
+            # Legacy — no covariance_source field set.
+
+        dr = DiagnosticReport(stub, precomputed={"pretrends_power": _LegacyPPStub()})
+        block = dr.to_dict()["pretrends_power"]
+        assert block["status"] == "ran"
+        # Legacy MPD without interaction_indices reports diag_fallback —
+        # the conservative downgrade does NOT fire (this isn't an
+        # "available but unused" case, just a normal fallback).
+        assert block["covariance_source"] == "diag_fallback"
+        # No downgrade applies on "diag_fallback" (vs the sentinel label).
+        assert block["tier"] == "well_powered"
+
+    def test_precomputed_pretrends_power_consumes_persisted_cov_source(self):
+        """PR-B R3 regression: the precomputed adapter must prefer the
+        ``covariance_source`` recorded on ``PreTrendsPowerResults`` over
+        the legacy type-based inference. Demonstrates the architectural
+        fix the R3 codex review called out (provenance should be recorded
+        on the result, not re-inferred from result type each time).
+
+        Constructs a stub fit whose source-side type-based inference would
+        produce the LEGACY conservative downgrade label
+        ``diag_fallback_available_full_vcov_unused`` — and verifies that
+        the explicit persisted ``full_pre_period_vcov`` label wins,
+        keeping the ``well_powered`` tier. The legacy fallback only
+        activates when the persisted field is missing or ``"unknown"``.
+        """
+        from diff_diff.pretrends import PreTrendsPowerResults
+
+        class _CSStub:
+            overall_att = 1.0
+            overall_se = 0.25
+            overall_t_stat = 4.0
+            overall_p_value = 0.001
+            overall_conf_int = (0.5, 1.5)
+            alpha = 0.05
+            n_obs = 400
+            n_treated = 80
+            n_control = 320
+            survey_metadata = None
+            event_study_effects = None
+            event_study_vcov = np.eye(3)
+            event_study_vcov_index = {-2: 0, -1: 1, 0: 2}
+
+        stub = _CSStub()
+        stub.__class__.__name__ = "CallawaySantAnnaResults"
+
+        pp = PreTrendsPowerResults(
+            power=0.80,
+            mdv=0.1,
+            violation_magnitude=0.1,
+            violation_type="linear",
+            alpha=0.05,
+            target_power=0.80,
+            n_pre_periods=2,
+            test_statistic=np.nan,
+            critical_value=1.96,
+            noncentrality=np.nan,
+            pre_period_effects=np.zeros(2),
+            pre_period_ses=np.ones(2),
+            vcov=np.eye(2),
+            original_results=stub,
+            covariance_source="full_pre_period_vcov",
+        )
+
+        dr = DiagnosticReport(stub, precomputed={"pretrends_power": pp})
+        block = dr.to_dict()["pretrends_power"]
+        assert block["covariance_source"] == "full_pre_period_vcov"
+        assert block["tier"] == "well_powered"
+
     def test_precomputed_parallel_trends_bypasses_applicability_gate(self, cs_fit):
         """Round-22 P1 regression: ``precomputed["parallel_trends"]`` was
         documented as supported but ``_instance_skip_reason`` skipped the
diff --git a/tests/test_methodology_pretrends.py b/tests/test_methodology_pretrends.py
new file mode 100644
index 00000000..53b594d3
--- /dev/null
+++ b/tests/test_methodology_pretrends.py
@@ -0,0 +1,1164 @@
+"""
+PreTrendsPower methodology test file — Roth (2022) Section II.A-B walkthrough.
+
+Companion to ``tests/test_pretrends.py`` (basic unit-test surface): this file
+validates the library against Roth's specific paper equations and propositions,
+with paper-equation-numbered assertions. Mirrors the structure of
+``tests/test_methodology_bacon.py``.
+
+Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for
+    Parallel Trends. *American Economic Review: Insights*, 4(3), 305-322.
+    https://doi.org/10.1257/aeri.20210236
+
+Paper review on file: ``docs/methodology/papers/roth-2022-review.md``.
+
+Class structure:
+
+- ``TestPretrendsHandCalculation`` — K=1 closed-form match against
+  Proposition 2 proof's univariate truncated-normal expression; NIS power
+  against Monte Carlo simulation at small K; MDV inversion sanity.
+- ``TestPretrendsPropositions`` — Roth Propositions 1-4 numerical
+  verification via Monte Carlo simulation.
+- ``TestPretrendsLinearGrid`` — γ-unit MDV on regular, irregular, and
+  anticipation-shifted pre-period grids (PR-B Step 4 regression).
+- ``TestPretrendsCustomWeightPersistence`` — custom weights stored on
+  PreTrendsPowerResults; power_at(M) for custom matches a refit (PR-B
+  Step 5 regression).
+- ``TestPretrendsCovarianceSource`` — CS/SA full-VCV routing through
+  event_study_vcov (PR-B Step 3 regression).
+- ``TestPretrendsHelperAPI`` — compute_pretrends_power + compute_mdv accept
+  violation_weights + pretest_form end-to-end (PR-B Step 6 regression).
+- ``TestPretrendsNISvsWald`` — NIS and Wald forms produce form-consistent
+  output; backwards-compat regression on the Wald path.
+- ``TestPretrendsParityR`` — R `pretrends` package parity (skips when
+  goldens at ``benchmarks/data/r_pretrends_golden.json`` are missing;
+  populated in PR-C).
+"""
+
+import json
+import os
+
+import numpy as np
+import pandas as pd
+import pytest
+from scipy import stats
+
+from diff_diff.pretrends import (
+    PreTrendsPower,
+    PreTrendsPowerResults,
+    compute_mdv,
+    compute_pretrends_power,
+)
+from diff_diff.sun_abraham import SunAbraham
+
+# =============================================================================
+# Shared fixtures
+# =============================================================================
+
+
+def _make_sa_panel(n_units_per_cohort=20, cohorts=(3, 4, 5), n_periods=6, seed=0):
+    """Build a staggered-adoption panel for SunAbraham fitting.
+
+    Default: 3 timing cohorts (3, 4, 5) of 20 units each + 20 never-treated,
+    panel length 6. K=3 pre-periods for the first-treated cohort under default
+    `anticipation=0`. Null DGP (no real treatment effect) — useful for
+    SE-and-power tests without confounding.
+    """
+    rng = np.random.default_rng(seed)
+    rows = []
+    uid = 0
+    for g in cohorts:
+        for _ in range(n_units_per_cohort):
+            for t in range(1, n_periods + 1):
+                rows.append((uid, g, t))
+            uid += 1
+    for _ in range(n_units_per_cohort):
+        for t in range(1, n_periods + 1):
+            rows.append((uid, 0, t))
+        uid += 1
+    df = pd.DataFrame(rows, columns=["unit", "first_treat", "time"])
+    df["y"] = rng.normal(0, 0.5, len(df))
+    return df
+
+
+@pytest.fixture
+def sa_results():
+    """Fitted SunAbraham results on a 3-cohort + never-treated panel.
+
+    Returns a SunAbrahamResults with event_study_vcov populated (post-PR-B
+    Step 3 SA extension). Pre-periods at first-treated cohort g=3 are
+    {-2, -1} under default anticipation=0 — but the full event_study_vcov_index
+    spans {-4, -3, -2, 0, 1, 2, 3} across all cohorts.
+    """
+    df = _make_sa_panel()
+    return SunAbraham().fit(df, outcome="y", unit="unit", first_treat="first_treat", time="time")
+
+
+# =============================================================================
+# TestPretrendsHandCalculation — paper-equation closed-forms + small-K MC
+# =============================================================================
+
+
+class TestPretrendsHandCalculation:
+    """Closed-form sanity checks against Roth (2022) Section II.A-B equations."""
+
+    def test_z_critical_value_matches_paper_default(self):
+        """B_NIS critical value z_{1-α/2} = 1.96 at α=0.05 (Roth Eq. for B_NIS)."""
+        pt = PreTrendsPower(alpha=0.05, pretest_form="nis")
+        # The critical_value field on results is exactly z_{1-α/2} for NIS
+        # (set in _compute_power_nis).
+        # Build a minimal SunAbraham fit so we can extract it via the results.
+        df = _make_sa_panel(n_units_per_cohort=15)
+        sa_res = SunAbraham().fit(
+            df, outcome="y", unit="unit", first_treat="first_treat", time="time"
+        )
+        result = pt.fit(sa_res)
+        assert np.isclose(result.critical_value, 1.96, atol=0.01)
+
+    def test_nis_power_at_h0_matches_independent_normals_formula(self):
+        """Under H0 (M=0) with diagonal Σ, NIS power = 1 - (1 - α)^K.
+
+        Roth Section II.A: B_NIS is the joint individual-CI acceptance event.
+        Under H0 with independent normals, P(reject) = 1 - (1 - α)^K.
+        """
+        pt = PreTrendsPower(alpha=0.05, pretest_form="nis")
+        # K=3, independent Σ_22 = 0.25 * I, M=0 (null)
+        weights = np.array([1.0, 1.0, 1.0])
+        vcov_diag = np.eye(3) * 0.25
+        power, _, _, z_alpha = pt._compute_power_nis(0.0, weights, vcov_diag)
+        expected = 1.0 - (1.0 - 0.05) ** 3
+        assert np.isclose(power, expected, atol=0.005)
+        assert np.isclose(z_alpha, stats.norm.ppf(0.975), atol=1e-10)
+
+    def test_wald_power_at_h0_equals_alpha(self):
+        """Under H0 (M=0), Wald noncentral-χ² power = alpha (size).
+
+        Roth Section II.A: Wald form `W ~ χ²(K)` under H0 by construction;
+        rejection probability at the (1-α) chi-squared critical value is α.
+        """
+        pt = PreTrendsPower(alpha=0.05, pretest_form="wald")
+        weights = np.array([1.0, 1.0, 1.0]) / np.sqrt(3)  # L2-normalized
+        vcov = np.eye(3) * 0.25
+        power, _, _, _ = pt._compute_power_wald(0.0, weights, vcov)
+        assert np.isclose(power, 0.05, atol=0.01)
+
+    def test_nis_power_matches_monte_carlo_K2_diagonal(self):
+        """NIS power via scipy MVN matches MC simulation at K=2, diag Σ_22."""
+        pt = PreTrendsPower(alpha=0.05, pretest_form="nis")
+        weights = np.array([1.0, 1.0])  # equal weights, K=2
+        vcov = np.eye(2) * 0.16  # σ = 0.4 each
+        M = 0.6
+
+        # Analytical via _compute_power_nis
+        power_analytical, _, _, z_alpha = pt._compute_power_nis(M, weights, vcov)
+
+        # MC: draw N samples from N(M * weights, vcov), check NIS rejection
+        rng = np.random.default_rng(42)
+        delta = M * weights
+        samples = rng.multivariate_normal(mean=delta, cov=vcov, size=50_000)
+        sigma = np.sqrt(np.diag(vcov))
+        reject = np.any(np.abs(samples) > z_alpha * sigma, axis=1)
+        power_mc = float(reject.mean())
+
+        # MC SE on N=50k with power ~ 0.5: ~0.003. Allow 0.01 tolerance.
+        assert np.isclose(
+            power_analytical, power_mc, atol=0.01
+        ), f"analytical={power_analytical:.4f}, mc={power_mc:.4f}"
+
+    def test_nis_power_matches_monte_carlo_K3_correlated(self):
+        """NIS power matches MC at K=3 with correlated Σ_22 (off-diagonals).
+
+        This is the regime where Wald and NIS genuinely differ — both
+        analytical paths must match their respective simulation truth.
+        """
+        pt = PreTrendsPower(alpha=0.05, pretest_form="nis")
+        weights = np.array([1.0, 1.0, 1.0])
+        # ρ=0.3 equicorrelation, σ²=0.25
+        rho = 0.3
+        sigma2 = 0.25
+        vcov = sigma2 * (rho * np.ones((3, 3)) + (1 - rho) * np.eye(3))
+        M = 0.5
+
+        power_analytical, _, _, z_alpha = pt._compute_power_nis(M, weights, vcov)
+
+        rng = np.random.default_rng(123)
+        delta = M * weights
+        samples = rng.multivariate_normal(mean=delta, cov=vcov, size=50_000)
+        sigma_per = np.sqrt(np.diag(vcov))
+        reject = np.any(np.abs(samples) > z_alpha * sigma_per, axis=1)
+        power_mc = float(reject.mean())
+
+        assert np.isclose(
+            power_analytical, power_mc, atol=0.01
+        ), f"analytical={power_analytical:.4f}, mc={power_mc:.4f}"
+
+    def test_mdv_inversion_round_trip_nis(self):
+        """MDV(target_power) achieves exactly target_power when evaluated.
+
+        Both NIS and Wald: M = MDV computed at target_power=0.8 should give
+        power(M) ≈ 0.8.
+        """
+        for form in ("nis", "wald"):
+            pt = PreTrendsPower(alpha=0.05, power=0.80, pretest_form=form)
+            weights = np.array([3.0, 2.0, 1.0])
+            if form == "wald":
+                weights = weights / np.linalg.norm(weights)
+            vcov = np.eye(3) * 0.16
+            mdv = pt._compute_mdv(weights, vcov)
+            power_at_mdv = pt._compute_power(mdv, weights, vcov)[0]
+            assert np.isclose(
+                power_at_mdv, 0.80, atol=0.01
+            ), f"form={form}: MDV={mdv:.4f}, power(MDV)={power_at_mdv:.4f}"
+
+    def test_power_monotone_in_M_nis(self):
+        """NIS power is monotone non-decreasing in |M| (basic sanity)."""
+        pt = PreTrendsPower(pretest_form="nis")
+        weights = np.array([3.0, 2.0, 1.0])
+        vcov = np.eye(3) * 0.16
+        powers = [pt._compute_power_nis(M, weights, vcov)[0] for M in [0, 0.5, 1.0, 2.0]]
+        # Strictly non-decreasing
+        for i in range(1, len(powers)):
+            assert powers[i] >= powers[i - 1] - 1e-10, f"NIS power not monotone: {powers}"
+
+    def test_mdv_nis_returns_zero_when_target_below_null_size(self):
+        """NIS MDV returns 0.0 when target_power ≤ null rejection probability.
+
+        NIS size under the null (with independent Σ) is `1 - (1-α)^K`, not α.
+        For α=0.05, K=3 that's ≈ 0.143. Calling MDV with target_power=0.10
+        should return 0.0 — no violation needed because the null already
+        rejects at the target rate. Pre-fix: `_compute_mdv_nis` silently
+        fell through to `M_high=1.0` because `brentq(0, 1)` raised
+        ValueError on the boundary (power_minus_target(0) > 0).
+        Post-fix: short-circuit at the boundary check.
+        """
+        pt = PreTrendsPower(alpha=0.05, power=0.10, pretest_form="nis")
+        weights = np.array([1.0, 1.0, 1.0])
+        vcov = np.eye(3) * 0.25  # diagonal, independence
+        mdv = pt._compute_mdv_nis(weights, vcov)
+        assert mdv == 0.0, f"target=0.10 < null size≈0.143; MDV should be 0.0, got {mdv}"
+
+    def test_nis_power_handles_non_finite_cdf_via_mc_fallback(self):
+        """NIS power_at falls back to MC when MVN CDF returns NaN (not just raises).
+
+        The pre-fix code only triggered MC fallback on ValueError /
+        LinAlgError exceptions; if scipy's Genz algorithm returns NaN
+        directly (e.g., extreme numerical degeneracy), the NaN propagated
+        through np.clip and into the MDV solver. Post-fix: explicit
+        `np.isfinite(accept_prob)` check triggers MC fallback uniformly.
+
+        We exercise this by monkey-patching `scipy.stats.multivariate_normal.cdf`
+        to return NaN; the helper should fall through to simulation and
+        produce a finite power in [0, 1].
+        """
+        from unittest.mock import patch
+
+        from diff_diff.pretrends import _compute_nis_acceptance_prob
+
+        weights = np.array([1.0, 1.0, 1.0])
+        vcov = np.eye(3) * 0.16
+
+        # Force the CDF to return NaN — verify MC fallback engages.
+        with patch(
+            "diff_diff.pretrends.stats.multivariate_normal.cdf",
+            return_value=float("nan"),
+        ):
+            accept_prob = _compute_nis_acceptance_prob(0.5, weights, vcov, 1.96)
+
+        # MC fallback should produce a valid probability in [0, 1].
+        assert np.isfinite(accept_prob), "MC fallback did not engage"
+        assert 0.0 <= accept_prob <= 1.0, f"MC accept_prob={accept_prob} out of [0, 1]"
+
+    def test_mdv_nis_nonconvergence_cap_returns_inf(self):
+        """NIS MDV returns ∞ when target power is unreachable in M ≤ 1000.
+
+        With K=1 and σ = 1e4, the per-period acceptance prob remains very
+        close to 1-α even at M=1000 (since δ/σ = 0.1 is still small relative
+        to z=1.96). Power stays below target=0.99 throughout the doubling
+        expansion → 1000-cap fires → return ∞.
+
+        The Wald path's 1000-cap is on the noncentrality parameter and is
+        structurally impossible to trigger for any finite target_power < 1
+        on a finite-Σ scalar problem (ncx2.sf(cv, K, nc=1000) → 1 quickly),
+        so we test the cap only on the NIS path.
+        """
+        pt = PreTrendsPower(alpha=0.05, power=0.99, pretest_form="nis")
+        weights = np.array([1.0])
+        vcov = np.array([[1e8]])  # σ = 1e4
+        mdv = pt._compute_mdv_nis(weights, vcov)
+        assert np.isinf(mdv), f"NIS MDV cap should return ∞, got {mdv}"
+
+    def test_mdv_nis_finite_root_at_doubling_endpoint(self):
+        """NIS MDV returns a finite root even when M_high lands at the 1024 cap.
+
+        Concrete counter-example from R2 codex review: with σ ≈ 224
+        (vcov=[[50000]]) and target_power=0.8, the doubling expansion
+        sweeps M_high = 1, 2, 4, ..., 512, 1024. Power(M=512) ≈ 0.36 < 0.8
+        and power(M=1024) ≈ 0.997 > 0.8, so the root sits in [512, 1024].
+        Pre-fix the cap-check fired on the >=1000 condition and returned
+        inf even though brentq could have bracketed the finite root.
+        Post-fix the cap-check only triggers when power(M_high) is still
+        below target — finite-root cases pass through to brentq.
+        """
+        pt = PreTrendsPower(alpha=0.05, power=0.8, pretest_form="nis")
+        weights = np.array([1.0])
+        vcov = np.array([[50000.0]])  # σ ≈ 223.6, root in [512, 1024]
+        mdv = pt._compute_mdv_nis(weights, vcov)
+        assert np.isfinite(mdv), f"finite-root case should NOT return ∞, got {mdv}"
+        assert 512.0 < mdv < 1024.0, f"root expected in (512, 1024), got {mdv}"
+        # Spot-check: the brentq result actually achieves target power.
+        achieved, _, _, _ = pt._compute_power_nis(mdv, weights, vcov)
+        assert abs(achieved - 0.8) < 1e-3, f"brentq root power={achieved}, expected ≈ 0.8"
+
+
+# =============================================================================
+# TestPretrendsPropositions — Roth Props 1-4 numerical verification (MC)
+# =============================================================================
+
+
+class TestPretrendsPropositions:
+    """Roth (2022) Propositions 1-4 numerical verification via Monte Carlo.
+
+    These tests validate that the LIBRARY's downstream consumers can rely on
+    the conditional moments + variance reduction guarantees Roth proves. The
+    library does not compute conditional moments in production code (it only
+    needs the box probability for power), but the methodology test file
+    exercises them via simulation to lock the contract that future audit
+    rounds can compare against.
+
+    Roth Proposition 1 (Section II.B):
+        E[β̂_post | β̂_pre ∈ B(Σ)] = τ_post + δ_post
+          + Σ_{12} Σ_{22}^{-1} ( E[β̂_pre | β̂_pre ∈ B(Σ)] - β_pre )
+
+    Roth Proposition 3 (Section II.C):
+        Var[β̂_post | β̂_pre ∈ B(Σ)]
+          = Var[β̂_post] + (Σ_{12} Σ_{22}^{-1}) (Var[β̂_pre | β̂_pre ∈ B(Σ)]
+            - Var[β̂_pre]) (Σ_{12} Σ_{22}^{-1})'
+
+    Roth Proposition 4 (Section II.C): for convex B(Σ),
+        Var[β̂_post | β̂_pre ∈ B(Σ)] ≤ Var[β̂_post]
+    """
+
+    @pytest.mark.slow
+    def test_proposition_1_conditional_mean_matches_mc(self):
+        """Prop 1: conditional mean E[β̂_post | NIS] matches MC at atol=0.01."""
+        # Simple joint normal setup: K=2 pre-periods, M=1 post-period
+        rng = np.random.default_rng(0)
+        K, M_post = 2, 1
+        # Σ structure: K+M-dim joint covariance
+        # Block form: Σ = [[Σ_post, Σ_post,pre], [Σ_pre,post, Σ_pre]]
+        sigma_pre = np.eye(K) * 0.16
+        sigma_post = np.eye(M_post) * 0.16
+        sigma_cross = 0.05 * np.ones((M_post, K))  # post-pre covariance
+        # Build full joint Σ via block stacking — but for the test we just need
+        # the regression coefficient Σ_{12} Σ_{22}^{-1} from post-on-pre.
+        # Truth: β_pre = (0.3, 0.2), τ_post = 0, δ_post = 0.1
+        beta_pre = np.array([0.3, 0.2])
+        tau_post = np.array([0.0])
+        delta_post = np.array([0.1])
+
+        # Draw N samples from joint normal
+        N = 200_000
+        # Use scipy: sample jointly with mean = [beta_post; beta_pre]
+        # beta_post = tau_post + delta_post under Roth's decomposition
+        mean_post = tau_post + delta_post
+        full_mean = np.concatenate([mean_post, beta_pre])
+        full_cov = np.block(
+            [
+                [sigma_post, sigma_cross],
+                [sigma_cross.T, sigma_pre],
+            ]
+        )
+        joint = rng.multivariate_normal(full_mean, full_cov, size=N)
+        beta_post_samples = joint[:, :M_post]
+        beta_pre_samples = joint[:, M_post:]
+
+        # NIS acceptance: |β̂_pre,t| ≤ 1.96 σ_t for all t
+        sigma_pre_diag = np.sqrt(np.diag(sigma_pre))
+        accept = np.all(np.abs(beta_pre_samples) <= 1.96 * sigma_pre_diag, axis=1)
+        cond_post_mean_mc = beta_post_samples[accept].mean(axis=0)
+
+        # Prop 1 prediction
+        cond_pre_mean_mc = beta_pre_samples[accept].mean(axis=0)
+        gamma = sigma_cross @ np.linalg.inv(sigma_pre)
+        prop1_prediction = tau_post + delta_post + gamma @ (cond_pre_mean_mc - beta_pre)
+
+        # MC noise floor at this N: ~0.01 with accept rate ~0.7.
+        assert np.allclose(
+            cond_post_mean_mc, prop1_prediction, atol=0.01
+        ), f"MC={cond_post_mean_mc}, Prop1={prop1_prediction}"
+
+    @pytest.mark.slow
+    def test_proposition_4_variance_reduction_under_convex_B(self):
+        """Prop 4: Var[β̂_post | β̂_pre ∈ B_NIS] ≤ Var[β̂_post] (B_NIS convex).
+
+        B_NIS is convex (a Cartesian product of intervals), so Prop 4 applies.
+        """
+        rng = np.random.default_rng(1)
+        K, M_post = 3, 1
+        sigma_pre = np.eye(K) * 0.16
+        sigma_post = np.eye(M_post) * 0.16
+        sigma_cross = 0.04 * np.ones((M_post, K))
+        full_cov = np.block(
+            [
+                [sigma_post, sigma_cross],
+                [sigma_cross.T, sigma_pre],
+            ]
+        )
+        # Parallel trends: β_pre = 0 → δ_pre = 0
+        full_mean = np.zeros(K + M_post)
+        N = 200_000
+        joint = rng.multivariate_normal(full_mean, full_cov, size=N)
+        beta_post_samples = joint[:, :M_post]
+        beta_pre_samples = joint[:, M_post:]
+
+        sigma_pre_diag = np.sqrt(np.diag(sigma_pre))
+        accept = np.all(np.abs(beta_pre_samples) <= 1.96 * sigma_pre_diag, axis=1)
+
+        var_unconditional = float(beta_post_samples.var(ddof=1))
+        var_conditional = float(beta_post_samples[accept].var(ddof=1))
+
+        # Prop 4: conditional variance should be NO LARGER than unconditional.
+        # Allow small MC slop.
+        assert (
+            var_conditional <= var_unconditional + 0.01
+        ), f"Prop 4 violated: unc={var_unconditional:.4f}, cond={var_conditional:.4f}"
+
+
+# =============================================================================
+# TestPretrendsLinearGrid — γ-unit MDV (PR-B Step 4 regression)
+# =============================================================================
+
+
+class TestPretrendsLinearGrid:
+    """Linear weights honor actual pre-period relative-time labels.
+
+    PR-B Step 4 closed the PR-A linear-pattern deviation by threading
+    `relative_times` through `_get_violation_weights('linear')` and skipping
+    L2 normalization on that path so the reported MDV is in Roth's γ units.
+    """
+
+    def test_regular_grid_produces_decreasing_weights(self):
+        """Regular grid [-3, -2, -1] → linear weights = |t| = [3, 2, 1]."""
+        pt = PreTrendsPower(violation_type="linear", pretest_form="nis")
+        weights = pt._get_violation_weights(3, relative_times=np.array([-3, -2, -1]))
+        np.testing.assert_allclose(weights, [3.0, 2.0, 1.0])
+
+    def test_irregular_grid_reflects_actual_spacing(self):
+        """Irregular grid [-5, -3, -1] → weights = [5, 3, 1] (not [3, 2, 1])."""
+        pt = PreTrendsPower(violation_type="linear", pretest_form="nis")
+        weights = pt._get_violation_weights(3, relative_times=np.array([-5, -3, -1]))
+        np.testing.assert_allclose(weights, [5.0, 3.0, 1.0])
+
+    def test_max_abs_pre_violation_uses_weight_scale_on_irregular_grid(self):
+        """PR-B R12 P1 regression: ``PreTrendsPowerResults.max_abs_pre_violation``
+        scales raw γ-unit ``mdv`` by ``max(|violation_weights|)`` so the
+        level-scale comparison against ``|att|`` / per-period SEs is
+        unit-consistent.
+
+        On an irregular grid ``[-5, -3, -1]`` with linear weights
+        ``[5, 3, 1]``, the largest level-scale pre-period violation under
+        the MDV is ``mdv * 5``, NOT ``mdv * 1`` (the wrong unit-mixed
+        scalar the report layer used pre-R12). Locks the architectural
+        fix: raw γ should NEVER be compared to a level effect; always go
+        through ``max_abs_pre_violation``.
+
+        Uses synthetic Σ_22 + sa_results-shaped inputs so the fixture
+        runs deterministically across pure-Python and Rust backends.
+        """
+        from diff_diff.pretrends import _coerce_relative_times_from_reference
+
+        # Confirm the helper produces the irregular relative times.
+        _ = _coerce_relative_times_from_reference([-5, -3, -1], 0)
+
+        # K=3, ρ=0.4 equicorrelated, σ²=0.04 → moderate-power regime
+        # so we get a finite mdv and can spot-check the level-scale scalar.
+        K = 3
+        rho = 0.4
+        sigma2 = 0.04
+        vcov = sigma2 * (rho * np.ones((K, K)) + (1 - rho) * np.eye(K))
+
+        # Construct a synthetic result skeleton directly to exercise the
+        # max_abs_pre_violation property end-to-end.
+        relative_times = np.array([-5.0, -3.0, -1.0])
+        pt = PreTrendsPower(violation_type="linear", pretest_form="nis", power=0.5)
+        weights = pt._get_violation_weights(3, relative_times=relative_times)
+        np.testing.assert_allclose(weights, [5.0, 3.0, 1.0])
+
+        mdv = pt._compute_mdv_nis(weights, vcov)
+        assert np.isfinite(mdv), f"MDV should be finite, got {mdv}"
+
+        # Hand-construct the result with the right weights field so the
+        # property exercises the new code path. Use minimal repr=False
+        # field placeholders.
+        from diff_diff.pretrends import PreTrendsPowerResults
+
+        res = PreTrendsPowerResults(
+            power=0.5,
+            mdv=mdv,
+            violation_magnitude=mdv,
+            violation_type="linear",
+            alpha=0.05,
+            target_power=0.5,
+            n_pre_periods=3,
+            test_statistic=np.nan,
+            critical_value=1.96,
+            noncentrality=np.nan,
+            pre_period_effects=np.zeros(3),
+            pre_period_ses=np.full(3, np.sqrt(sigma2)),
+            vcov=vcov,
+            violation_weights=weights,
+            covariance_source="full_pre_period_vcov",
+        )
+        # Level-scale scalar: mdv * max(|weights|) = mdv * 5 (the
+        # `t=-5` slot dominates on irregular grids).
+        expected = float(mdv * 5.0)
+        assert np.isclose(res.max_abs_pre_violation, expected, atol=1e-10), (
+            f"max_abs_pre_violation={res.max_abs_pre_violation} should equal "
+            f"mdv * max(|w|) = {expected} on irregular grid [-5, -3, -1]"
+        )
+        # Sanity: raw mdv is materially smaller — confirms the unit-fix
+        # actually moves the scalar (regression against a future revert
+        # back to raw γ).
+        assert (
+            res.max_abs_pre_violation > 4 * mdv
+        ), "max_abs_pre_violation must scale by max(|w|)=5, not collapse to mdv"
+
+    def test_constant_violation_pattern_is_level_shift(self, sa_results):
+        """``violation_type='constant'`` produces a per-period level shift,
+        not an L2-normalized direction (PR-B R13 fix).
+
+        REGISTRY ``## PreTrendsPower`` documents constant as ``δ_t = c``.
+        The implementation now returns unnormalized ``[1, 1, ..., 1]``
+        weights so the contract holds at the public API surface:
+
+        - ``violation_weights == [1, 1, ..., 1]`` after fit (no L2 norm).
+        - ``max_abs_pre_violation == mdv * 1 == mdv`` (level-scale and
+          γ-scale coincide for the constant pattern).
+        - ``power_at(M)`` evaluates the violation `δ_t = M` per period,
+          not `δ_t = M/√K`.
+
+        Pre-PR-B-R13 the constant path was silently divided by √K,
+        so a constant MDV of 0.5 was a per-period shift of 0.5/√K,
+        not 0.5 as the docs claimed. Locks the level-shift contract
+        end-to-end on a real fit.
+        """
+        pt = PreTrendsPower(violation_type="constant", pretest_form="nis")
+        result = pt.fit(sa_results)
+
+        n_pre = result.n_pre_periods
+        # Weights are exactly [1, 1, ..., 1] — NOT L2-normalized.
+        assert result.violation_weights is not None
+        np.testing.assert_allclose(result.violation_weights, np.ones(n_pre))
+        # L2 norm of weights is √K, not 1.
+        assert np.isclose(np.linalg.norm(result.violation_weights), np.sqrt(n_pre))
+        # Level-scale max coincides with raw mdv (max(|w|) = 1).
+        assert np.isclose(result.max_abs_pre_violation, result.mdv)
+
+        # power_at(M) round-trip: under the level-shift contract,
+        # power_at(M) for constant must equal power at `M=0.1` of a refit.
+        # Loose atol because scipy MVN CDF and the centered helper take
+        # slightly different paths with ~1e-6 sub-ULP roundoff.
+        refit = pt.fit(sa_results, M=0.1)
+        assert np.isclose(result.power_at(0.1), refit.power, atol=1e-4)
+
+    def test_is_informative_uses_level_scale_not_raw_gamma(self):
+        """``is_informative`` consumes ``max_abs_pre_violation`` (level scale)
+        rather than raw ``mdv`` (slope scale) — locks the R12 fix on the
+        property surface so future regressions cannot flip back to the
+        wrong-unit heuristic.
+        """
+        from diff_diff.pretrends import PreTrendsPowerResults
+
+        # SE = 0.5 across pre-periods; MDV = 0.4 (raw γ); weights have
+        # max(|w|)=3 on a regular `[-3, -2, -1]` grid → level-scale max
+        # violation = 1.2, well above 2 * max(SE) = 1.0 → NOT informative.
+        res = PreTrendsPowerResults(
+            power=0.5,
+            mdv=0.4,
+            violation_magnitude=0.4,
+            violation_type="linear",
+            alpha=0.05,
+            target_power=0.5,
+            n_pre_periods=3,
+            test_statistic=np.nan,
+            critical_value=1.96,
+            noncentrality=np.nan,
+            pre_period_effects=np.zeros(3),
+            pre_period_ses=np.full(3, 0.5),
+            vcov=np.eye(3) * 0.25,
+            violation_weights=np.array([3.0, 2.0, 1.0]),
+        )
+        # max_abs_pre_violation = 0.4 * 3 = 1.2 > 2 * 0.5 = 1.0 → not informative
+        assert np.isclose(res.max_abs_pre_violation, 1.2, atol=1e-10)
+        assert res.is_informative is False, (
+            "raw mdv=0.4 < 2*SE=1.0 would say 'informative', but the level-scale "
+            "violation 1.2 > 1.0 says 'not informative' — the level-scale check wins"
+        )
+
+    def test_no_l2_normalization_when_relative_times_provided(self):
+        """Linear-with-relative_times skips L2 norm → ||weights||_2 ≠ 1."""
+        pt = PreTrendsPower(violation_type="linear", pretest_form="nis")
+        weights = pt._get_violation_weights(3, relative_times=np.array([-3, -2, -1]))
+        norm = np.linalg.norm(weights)
+        # Norm should NOT be 1.0 — that's the bug we're regressing against.
+        assert (
+            norm > 1.5
+        ), f"Linear-with-relative_times should NOT be L2-normalized, got ||·||_2 = {norm}"
+
+    def test_mpd_calendar_period_ids_derive_relative_times_from_reference(self):
+        """MPD calendar period IDs are correctly converted to Roth relative times.
+
+        For MPD with `pre_periods=[0, 1, 2, 3]` and `reference_period=4`,
+        the Roth-style relative times are `[-4, -3, -2, -1]`, not the raw
+        period IDs `[0, 1, 2, 3]`. Pre-fix: the MPD adapter passed raw
+        period IDs into `_get_violation_weights` as relative times,
+        producing linear weights `[0, 1, 2, 3]` instead of Roth-style
+        `[4, 3, 2, 1]`. Post-fix: derive
+        `relative_times = estimated_pre_periods - reference_period`.
+
+        Constructs a real ``MultiPeriodDiDResults`` and calls
+        ``_extract_pre_period_params`` directly so the MPD branch is
+        actually exercised (R2 P2 fix — prior version did manual
+        arithmetic and never hit the production code path).
+        """
+        from diff_diff.results import MultiPeriodDiDResults, PeriodEffect
+
+        period_ids = [0, 1, 2, 3]
+        reference_period = 4
+
+        period_effects = {
+            p: PeriodEffect(
+                period=p, effect=0.1 * p, se=0.2, t_stat=0.0, p_value=0.5, conf_int=(0.0, 0.0)
+            )
+            for p in period_ids
+        }
+        mpd_results = MultiPeriodDiDResults(
+            period_effects=period_effects,
+            avg_att=0.0,
+            avg_se=0.2,
+            avg_t_stat=0.0,
+            avg_p_value=0.5,
+            avg_conf_int=(0.0, 0.0),
+            n_obs=100,
+            n_treated=50,
+            n_control=50,
+            pre_periods=period_ids,
+            post_periods=[5, 6, 7],
+            reference_period=reference_period,
+        )
+
+        pt = PreTrendsPower(pretest_form="nis", violation_type="linear")
+        (
+            _,
+            ses,
+            vcov,
+            n_pre,
+            relative_times,
+            covariance_source,
+        ) = pt._extract_pre_period_params(mpd_results)
+
+        # End-to-end assertion: the MPD branch produced Roth-style relative
+        # times derived from `reference_period`, not the raw period IDs.
+        assert relative_times is not None, "MPD branch should produce relative_times"
+        np.testing.assert_allclose(relative_times, [-4.0, -3.0, -2.0, -1.0])
+        assert n_pre == 4
+        # vcov falls through to diag(ses**2) because the mock has no
+        # interaction_indices and no full vcov.
+        np.testing.assert_allclose(np.diag(vcov), np.array(ses) ** 2)
+        # MPD without `interaction_indices` records the diag-fallback source.
+        assert covariance_source == "diag_fallback"
+
+        # Plumbed through to _get_violation_weights: weights = |t| = [4, 3, 2, 1].
+        weights = pt._get_violation_weights(n_pre, relative_times=relative_times)
+        np.testing.assert_allclose(weights, [4.0, 3.0, 2.0, 1.0])
+
+    def test_mpd_non_numeric_reference_warns_and_falls_back_to_legacy_weights(self):
+        """MPD with non-numeric reference_period warns + falls back to legacy.
+
+        When ``reference_period`` is a genuinely non-numeric / non-datetime
+        label (e.g., the string "REF_STRING"), the MPD branch emits an
+        explicit ``UserWarning`` and returns ``relative_times=None`` so
+        ``_get_violation_weights('linear')`` uses the legacy count-based
+        direction. The warning surfaces the contract that the reported
+        MDV is NOT in Roth's γ units under this fallback (R8 CI codex
+        fix: was previously a silent fallback, undocumented as a
+        deviation in REGISTRY).
+        """
+        import warnings as _warnings
+
+        from diff_diff.results import MultiPeriodDiDResults, PeriodEffect
+
+        period_ids = ["A", "B", "C"]
+        period_effects = {
+            p: PeriodEffect(
+                period=p, effect=0.1, se=0.2, t_stat=0.0, p_value=0.5, conf_int=(0.0, 0.0)
+            )
+            for p in period_ids
+        }
+        mpd_results = MultiPeriodDiDResults(
+            period_effects=period_effects,
+            avg_att=0.0,
+            avg_se=0.2,
+            avg_t_stat=0.0,
+            avg_p_value=0.5,
+            avg_conf_int=(0.0, 0.0),
+            n_obs=100,
+            n_treated=50,
+            n_control=50,
+            pre_periods=period_ids,
+            post_periods=["D", "E"],
+            reference_period="REF_STRING",  # non-numeric, non-datetime
+        )
+
+        pt = PreTrendsPower(pretest_form="nis", violation_type="linear")
+        with _warnings.catch_warnings(record=True) as caught:
+            _warnings.simplefilter("always")
+            _, _, _, _, relative_times, _ = pt._extract_pre_period_params(mpd_results)
+
+        assert relative_times is None, "Non-numeric reference should yield None"
+        nis_warns = [
+            w
+            for w in caught
+            if "reference_period" in str(w.message) and "γ units" in str(w.message)
+        ]
+        assert len(nis_warns) >= 1, (
+            "Non-numeric reference_period must emit an explicit UserWarning "
+            f"noting the γ-unit contract is not held; got warnings: {[str(w.message) for w in caught]}"
+        )
+
+    def test_mpd_pandas_period_reference_yields_numeric_relative_times(self):
+        """MPD with pandas.Period reference_period produces γ-unit weights.
+
+        Quarterly-Period labels ``[2019Q1, 2019Q2, 2019Q3]`` with
+        ``reference_period=2019Q4`` produce relative offsets in units of
+        quarters: ``[-3, -2, -1]``. Validates the R8 CI codex fix that
+        datetime-like labels are NOT silently fall-through cases — Period
+        / Timestamp arithmetic supplies the γ-unit relative times the
+        legacy fallback would have lost.
+        """
+        from diff_diff.results import MultiPeriodDiDResults, PeriodEffect
+
+        periods = [pd.Period(f"2019Q{q}", freq="Q") for q in (1, 2, 3)]
+        reference_period = pd.Period("2019Q4", freq="Q")
+        period_effects = {
+            p: PeriodEffect(
+                period=p, effect=0.1, se=0.2, t_stat=0.0, p_value=0.5, conf_int=(0.0, 0.0)
+            )
+            for p in periods
+        }
+        mpd_results = MultiPeriodDiDResults(
+            period_effects=period_effects,
+            avg_att=0.0,
+            avg_se=0.2,
+            avg_t_stat=0.0,
+            avg_p_value=0.5,
+            avg_conf_int=(0.0, 0.0),
+            n_obs=100,
+            n_treated=50,
+            n_control=50,
+            pre_periods=periods,
+            post_periods=[pd.Period(f"2020Q{q}", freq="Q") for q in (1, 2)],
+            reference_period=reference_period,
+        )
+
+        pt = PreTrendsPower(pretest_form="nis", violation_type="linear")
+        _, _, _, n_pre, relative_times, _ = pt._extract_pre_period_params(mpd_results)
+
+        # Period subtraction yields a Period offset whose `.n` is the
+        # number-of-frequencies difference; signs matter and pre-periods
+        # are NEGATIVE offsets from the reference.
+        assert relative_times is not None
+        np.testing.assert_allclose(relative_times, [-3.0, -2.0, -1.0])
+
+        # Plumbed through to linear weights: |t| = [3, 2, 1] in γ units.
+        weights = pt._get_violation_weights(n_pre, relative_times=relative_times)
+        np.testing.assert_allclose(weights, [3.0, 2.0, 1.0])
+
+    def test_backwards_compat_no_relative_times_uses_legacy_normalized(self):
+        """Without relative_times: legacy [n-1, ..., 0]/||·||_2 direction.
+
+        Preserves the pre-PR-B shipped behavior for callers that bypass fit()
+        and call _get_violation_weights(n_pre) directly without relative_times.
+        """
+        pt = PreTrendsPower(violation_type="linear", pretest_form="nis")
+        weights = pt._get_violation_weights(3)  # no relative_times
+        # Legacy: [2, 1, 0] / sqrt(5) = [0.894, 0.447, 0]
+        expected_legacy = np.array([2.0, 1.0, 0.0]) / np.sqrt(5.0)
+        np.testing.assert_allclose(weights, expected_legacy, atol=1e-10)
+
+
+# =============================================================================
+# TestPretrendsCustomWeightPersistence — power_at(custom) (PR-B Step 5)
+# =============================================================================
+
+
+class TestPretrendsCustomWeightPersistence:
+    """Custom violation weights are persisted on PreTrendsPowerResults.
+
+    Per PR-B Step 5, the new ``violation_weights`` field on the result class
+    enables ``power_at(M)`` to work for ``violation_type='custom'`` without
+    re-fitting (lifting the PR-A R18 NotImplementedError guard for fresh fits).
+    """
+
+    def test_custom_weights_stored_on_results(self, sa_results):
+        """After fit, results.violation_weights matches the L2-normalized input.
+
+        The custom path in ``_get_violation_weights`` L2-normalizes the input
+        weights to unit norm before fitting. The persisted
+        ``violation_weights`` field on the result reflects the NORMALIZED
+        weights (matching what `power_at()` and `_compute_power_*` actually
+        operated on).
+        """
+        # Probe via a linear fit to learn n_pre (panel-dependent).
+        probe = PreTrendsPower(violation_type="linear", pretest_form="nis").fit(sa_results)
+        n_pre = probe.n_pre_periods
+        # Build a length-n_pre custom weights vector deterministically.
+        custom_w_raw = np.linspace(0.1, 0.6, n_pre)
+        custom_w_normalized = custom_w_raw / np.linalg.norm(custom_w_raw)
+
+        pt = PreTrendsPower(
+            violation_type="custom", violation_weights=custom_w_raw, pretest_form="nis"
+        )
+        result = pt.fit(sa_results)
+        assert result.violation_weights is not None
+        np.testing.assert_allclose(result.violation_weights, custom_w_normalized)
+
+    def test_power_at_custom_matches_refit(self, sa_results):
+        """results.power_at(M) for custom matches a fresh fit(M=M)."""
+        probe = PreTrendsPower(violation_type="linear", pretest_form="nis").fit(sa_results)
+        n_pre = probe.n_pre_periods
+        custom_w = np.array([0.2, 0.3, 0.5][:n_pre])
+        if len(custom_w) < n_pre:
+            custom_w = np.concatenate([custom_w, np.zeros(n_pre - len(custom_w))])
+
+        pt = PreTrendsPower(violation_type="custom", violation_weights=custom_w, pretest_form="nis")
+        results_base = pt.fit(sa_results)
+        results_at_target = pt.fit(sa_results, M=0.5)
+
+        power_via_method = results_base.power_at(0.5)
+        power_via_refit = results_at_target.power
+
+        # Tight tolerance — both paths use the same _compute_power_nis call.
+        assert np.isclose(
+            power_via_method, power_via_refit, atol=1e-6
+        ), f"power_at={power_via_method:.6f}, refit={power_via_refit:.6f}"
+
+    def test_to_dict_is_json_serializable(self, sa_results):
+        """PR-B R5 regression: ``to_dict()`` must produce JSON-serializable
+        output. ``violation_weights`` is emitted as ``list[float]`` (not raw
+        ``np.ndarray``) so ``json.dumps`` works out of the box.
+
+        Pre-R5 the dict carried a raw ``np.ndarray`` for ``violation_weights``;
+        ``json.dumps(result.to_dict())`` raised ``TypeError``. Post-R5 the
+        helper coerces to a Python list of floats.
+        """
+        probe = PreTrendsPower(violation_type="linear", pretest_form="nis").fit(sa_results)
+        n_pre = probe.n_pre_periods
+        custom_w = np.linspace(0.1, 0.6, n_pre)
+
+        pt = PreTrendsPower(violation_type="custom", violation_weights=custom_w, pretest_form="nis")
+        result = pt.fit(sa_results)
+
+        d = result.to_dict()
+        # Type contract: violation_weights round-trips as list[float] or None.
+        assert isinstance(d["violation_weights"], list)
+        for w in d["violation_weights"]:
+            assert isinstance(w, float)
+
+        # End-to-end JSON round-trip (NaN → strings in default mode? scipy
+        # returns finite NaN — json.dumps with allow_nan=True is default).
+        encoded = json.dumps(d, allow_nan=True)
+        decoded = json.loads(encoded)
+        # Spot-check provenance fields round-trip intact.
+        assert decoded["covariance_source"] == result.covariance_source
+        assert decoded["pretest_form"] == result.pretest_form
+
+
+# =============================================================================
+# TestPretrendsCovarianceSource — CS/SA full-VCV routing (PR-B Step 3)
+# =============================================================================
+
+
+class TestPretrendsCovarianceSource:
+    """CS and SA adapters route through event_study_vcov on non-bootstrap fits.
+
+    Pre-PR-B, both CS and SA branches in _extract_pre_period_params hard-coded
+    diag(ses^2). PR-B Step 3 added the W-matrix construction for SA and
+    routed both branches through the new module-level helper
+    _extract_event_study_vcov_subblock when event_study_vcov is available.
+    """
+
+    def test_sa_non_bootstrap_persists_event_study_vcov(self, sa_results):
+        """SunAbrahamResults.event_study_vcov is populated on non-bootstrap fits."""
+        assert sa_results.event_study_vcov is not None
+        assert sa_results.event_study_vcov_index is not None
+        # Shape: |event_times| × |event_times|
+        n_et = len(sa_results.event_study_vcov_index)
+        assert sa_results.event_study_vcov.shape == (n_et, n_et)
+        # Symmetric
+        np.testing.assert_allclose(
+            sa_results.event_study_vcov, sa_results.event_study_vcov.T, atol=1e-12
+        )
+
+    def test_sa_event_study_vcov_diagonal_matches_per_event_se(self, sa_results):
+        """event_study_vcov diagonal[i, i] = se(e_i)^2 (W-matrix sanity).
+
+        The diagonal entries should reproduce the existing per-event-time SE
+        computation in _compute_iw_effects at atol=1e-10.
+        """
+        es_vcov = sa_results.event_study_vcov
+        es_index = sa_results.event_study_vcov_index
+        for i, e in enumerate(es_index):
+            diag_se = float(np.sqrt(es_vcov[i, i]))
+            es_effect = sa_results.event_study_effects.get(e, {})
+            if "se" in es_effect:
+                assert np.isclose(
+                    diag_se, es_effect["se"], atol=1e-10
+                ), f"e={e}: diag_se={diag_se}, es_effects[e][se]={es_effect['se']}"
+
+    def test_sa_pretrends_consumes_full_vcov_not_diag(self, sa_results):
+        """compute_pretrends_power on SA uses the full sub-VCV, not diag(ses^2)."""
+        from diff_diff.pretrends import _extract_event_study_vcov_subblock
+
+        # The new helper should produce a sub-block that differs from the
+        # diag(ses**2) fallback IF the off-diagonals are nonzero.
+        # Find the pre-periods of the SA panel.
+        pre_periods = [t for t in sa_results.event_study_effects if t < 0]
+        if not pre_periods:
+            pytest.skip("No pre-periods in fixture")
+
+        ses = np.array([sa_results.event_study_effects[t]["se"] for t in sorted(pre_periods)])
+        sub, source = _extract_event_study_vcov_subblock(sa_results, sorted(pre_periods), ses)
+        diag_fallback = np.diag(ses**2)
+
+        # Source label reflects the full-VCV path being actually taken.
+        assert source == "full_pre_period_vcov"
+        # Should NOT be identical (assuming the panel produces nonzero
+        # off-diagonal cohort overlap). At minimum the shape matches.
+        assert sub.shape == diag_fallback.shape
+        # Off-diagonals should generally be nonzero (cohort weights overlap
+        # at adjacent event times).
+        off_diag_sum = float(np.abs(sub - np.diag(np.diag(sub))).sum())
+        assert off_diag_sum > 1e-8, (
+            "SA event_study_vcov sub-block has all-zero off-diagonals — "
+            "either the panel is degenerate or the W-matrix routing didn't fire."
+        )
+
+
+# =============================================================================
+# TestPretrendsHelperAPI — helper-API extension (PR-B Step 6)
+# =============================================================================
+
+
+class TestPretrendsHelperAPI:
+    """Helper functions accept violation_weights and pretest_form end-to-end."""
+
+    def test_compute_pretrends_power_accepts_violation_weights_custom(self, sa_results):
+        """compute_pretrends_power(..., violation_type='custom', violation_weights=...)"""
+        # Probe n_pre
+        probe = compute_pretrends_power(sa_results, violation_type="linear")
+        n_pre = probe.n_pre_periods
+
+        custom_w = np.arange(1, n_pre + 1, dtype=float)
+        custom_w = custom_w / np.linalg.norm(custom_w)  # arbitrary normalized
+
+        result = compute_pretrends_power(
+            sa_results,
+            violation_type="custom",
+            violation_weights=custom_w,
+        )
+        assert isinstance(result, PreTrendsPowerResults)
+        assert result.violation_type == "custom"
+        assert result.violation_weights is not None
+        np.testing.assert_allclose(result.violation_weights, custom_w)
+
+    def test_compute_mdv_accepts_violation_weights_custom(self, sa_results):
+        """compute_mdv mirrors compute_pretrends_power for custom support."""
+        probe = compute_pretrends_power(sa_results, violation_type="linear")
+        n_pre = probe.n_pre_periods
+        custom_w = np.arange(1, n_pre + 1, dtype=float)
+        custom_w = custom_w / np.linalg.norm(custom_w)
+
+        mdv = compute_mdv(sa_results, violation_type="custom", violation_weights=custom_w)
+        assert isinstance(mdv, float)
+        assert mdv >= 0
+
+    def test_compute_pretrends_power_accepts_pretest_form_wald(self, sa_results):
+        """pretest_form='wald' opt-in selects the Wald acceptance-region form.
+
+        Routes through ``_compute_power_wald`` / ``_compute_mdv_wald`` (the
+        renamed pre-PR-B math), preserving the noncentral-χ² ellipsoidal
+        acceptance region. NOTE: bit-identity to pre-PR-B numerical output
+        on a fitted result is only guaranteed on the legacy `relative_times=None`
+        path; new fits via `compute_pretrends_power(...)` thread `relative_times`
+        into both NIS and Wald linear-weight construction, so a Wald fit on an
+        irregular grid produces γ-unit MDV (not the pre-PR-B count-based L2-
+        normalized MDV). See REGISTRY `## PreTrendsPower` linear-pattern Note.
+        """
+        wald_result = compute_pretrends_power(sa_results, pretest_form="wald")
+        nis_result = compute_pretrends_power(sa_results, pretest_form="nis")
+
+        assert wald_result.pretest_form == "wald"
+        assert nis_result.pretest_form == "nis"
+        # Wald has a finite noncentrality; NIS has NaN noncentrality.
+        assert np.isfinite(wald_result.noncentrality)
+        assert np.isnan(nis_result.noncentrality)
+        # NIS has a finite box probability; Wald has NaN box probability.
+        assert np.isfinite(nis_result.nis_box_probability)
+        assert np.isnan(wald_result.nis_box_probability)
+
+
+# =============================================================================
+# TestPretrendsNISvsWald — form-comparison + backwards-compat (PR-B Step 2)
+# =============================================================================
+
+
+class TestPretrendsNISvsWald:
+    """NIS and Wald form-comparison; Wald backwards-compat regression."""
+
+    def test_default_pretest_form_is_nis(self):
+        """PR-B Step 2 flipped the default from implicit-Wald to explicit-NIS."""
+        pt = PreTrendsPower()
+        assert pt.pretest_form == "nis"
+
+    def test_wald_path_preserves_pre_pr_b_acceptance_region_form(self, sa_results):
+        """pretest_form='wald' preserves the pre-PR-B acceptance-region form.
+
+        The Wald math (noncentral-χ² on the quadratic form
+        ``δ' Σ_22^{-1} δ``) is byte-identical to pre-PR-B: the methods
+        are renamed to ``_compute_power_wald`` + ``_compute_mdv_wald``
+        with unchanged function bodies, and the dispatcher in
+        ``_compute_power`` / ``_compute_mdv`` selects this branch when
+        ``pretest_form='wald'``.
+
+        **Backward-compat scope**: this test locks the form-of-the-test
+        contract, NOT bit-identity to pre-PR-B fitted-result numerics.
+        Bit-identity for fitted results is regime-dependent:
+
+        - On the **legacy `relative_times=None` path** (callers that
+          bypass `fit()` and call `_get_violation_weights(n_pre)`
+          directly), the count-based L2-normalized direction is
+          unchanged, so Wald numerics ARE bit-identical to pre-PR-B.
+        - On the **new `fit()`-threaded path** (PR-B Step 4), both NIS
+          and Wald consume `relative_times` for linear violations and
+          skip L2 normalization → γ-unit MDV. A Wald fit on an
+          irregular grid `{-5, -3, -1}` therefore produces a
+          γ-different MDV than pre-PR-B. See REGISTRY linear-pattern
+          Note for the convention.
+        """
+        pt = PreTrendsPower(pretest_form="wald")
+        result = pt.fit(sa_results)
+        # Wald-specific fields populated (acceptance-region form contract)
+        assert np.isfinite(result.noncentrality)
+        assert np.isfinite(result.test_statistic)
+        # NIS-specific fields are NaN under Wald
+        assert np.isnan(result.nis_box_probability)
+        # Power is in [0, 1]
+        assert 0.0 <= result.power <= 1.0
+
+    def test_nis_and_wald_differ_in_general(self):
+        """NIS and Wald produce different power at the same M (general case).
+
+        Under correlated Σ_22, the rectangular (NIS) and ellipsoidal (Wald)
+        acceptance regions cover different probability mass under H1. Use a
+        synthetic vcov with non-trivial off-diagonals at a small M so power
+        is well-inside (0, 1) and the differentiation is observable.
+        """
+        # K=3, ρ=0.6 equicorrelated, σ²=0.04 — moderate-power regime
+        rho = 0.6
+        sigma2 = 0.04
+        K = 3
+        vcov = sigma2 * (rho * np.ones((K, K)) + (1 - rho) * np.eye(K))
+        weights = np.array([3.0, 2.0, 1.0])
+        weights_wald = weights / np.linalg.norm(weights)
+
+        pt_nis = PreTrendsPower(pretest_form="nis")
+        pt_wald = PreTrendsPower(pretest_form="wald")
+
+        # Use a small M so power isn't saturated at 1
+        M = 0.3
+        power_nis, _, _, _ = pt_nis._compute_power_nis(M, weights, vcov)
+        power_wald, _, _, _ = pt_wald._compute_power_wald(M, weights_wald, vcov)
+
+        # The two forms should produce different power values
+        assert not np.isclose(power_nis, power_wald, atol=0.02), (
+            f"NIS and Wald produced essentially-equal power: "
+            f"NIS={power_nis:.4f}, Wald={power_wald:.4f}"
+        )
+
+
+# =============================================================================
+# TestPretrendsParityR — R parity (skips when goldens missing; PR-C)
+# =============================================================================
+
+
+@pytest.mark.skipif(
+    not os.path.exists(
+        os.path.join(
+            os.path.dirname(os.path.dirname(os.path.abspath(__file__))),
+            "benchmarks",
+            "data",
+            "r_pretrends_golden.json",
+        )
+    ),
+    reason="R `pretrends` parity goldens not yet committed — see PR-C",
+)
+class TestPretrendsParityR:
+    """R `pretrends` package parity at `atol=1e-6`.
+
+    All tests skip when `benchmarks/data/r_pretrends_golden.json` is absent
+    (the canonical PR-B-vs-PR-C handoff: the generator script ships in PR-B
+    with a placeholder commit reference; PR-C pins the audited revision,
+    runs the script, commits the JSON, and activates these tests). See
+    REGISTRY.md `## PreTrendsPower` requirements checklist for the R-parity
+    deferred-to-PR-C status.
+    """
+
+    @staticmethod
+    def _load_r_golden():
+        path = os.path.join(
+            os.path.dirname(os.path.dirname(os.path.abspath(__file__))),
+            "benchmarks",
+            "data",
+            "r_pretrends_golden.json",
+        )
+        with open(path) as f:
+            return json.load(f)
+
+    def test_nis_power_matches_r_pretrends_at_atol_1e_6(self):
+        """Python NIS power matches R `pretrends::pretrends()` at atol=1e-6.
+
+        Stub — PR-C populates with concrete fixture iteration.
+        """
+        goldens = self._load_r_golden()
+        for fixture_name, fixture in goldens.items():
+            if fixture_name == "meta":
+                continue
+            # PR-C will iterate fixture['panel'] + fixture['r_power_at_gamma'] etc.
+            assert isinstance(fixture, dict)
+
+    def test_mdv_gamma_p_matches_r_slope_for_power_at_atol_1e_6(self):
+        """Python MDV (γ_p) matches R `slope_for_power()` at atol=1e-6.
+
+        Stub — PR-C populates with concrete fixture iteration.
+        """
+        goldens = self._load_r_golden()
+        for fixture_name, fixture in goldens.items():
+            if fixture_name == "meta":
+                continue
+            assert isinstance(fixture, dict)
+
+    def test_irregular_grid_gamma_unit_matches_r(self):
+        """γ-unit MDV on irregular pre-period grids matches R at atol=1e-6.
+
+        Specifically tests the PR-B linear-units fix: irregular grid
+        {-5, -3, -1} should produce a γ value that R's pretrends package
+        also reports as the slope, not a normalized direction.
+
+        Stub — PR-C populates with concrete fixture iteration.
+        """
+        goldens = self._load_r_golden()
+        for fixture_name, fixture in goldens.items():
+            if fixture_name == "meta":
+                continue
+            assert isinstance(fixture, dict)
diff --git a/tests/test_pretrends.py b/tests/test_pretrends.py
index c42d305f..ce222576 100644
--- a/tests/test_pretrends.py
+++ b/tests/test_pretrends.py
@@ -19,7 +19,6 @@
 )
 from diff_diff.results import MultiPeriodDiDResults, PeriodEffect
 
-
 # =============================================================================
 # Fixtures
 # =============================================================================
@@ -49,13 +48,15 @@ def simple_panel_data():
 
             y += np.random.normal(0, 0.5)
 
-            data.append({
-                'unit': unit,
-                'period': period,
-                'treated': int(is_treated),
-                'post': int(post),
-                'outcome': y
-            })
+            data.append(
+                {
+                    "unit": unit,
+                    "period": period,
+                    "treated": int(is_treated),
+                    "post": int(post),
+                    "outcome": y,
+                }
+            )
 
     return pd.DataFrame(data)
 
@@ -66,10 +67,10 @@ def multiperiod_results(simple_panel_data):
     mp_did = MultiPeriodDiD()
     results = mp_did.fit(
         simple_panel_data,
-        outcome='outcome',
-        treatment='treated',
-        time='period',
-        post_periods=[4, 5, 6, 7]
+        outcome="outcome",
+        treatment="treated",
+        time="period",
+        post_periods=[4, 5, 6, 7],
     )
     return results
 
@@ -86,53 +87,39 @@ def mock_multiperiod_results():
     # Pre-period effects (excluding reference period 3)
     period_effects = {
         0: PeriodEffect(
-            period=0, effect=0.1, se=0.5,
-            t_stat=0.2, p_value=0.84,
-            conf_int=(-0.88, 1.08)
+            period=0, effect=0.1, se=0.5, t_stat=0.2, p_value=0.84, conf_int=(-0.88, 1.08)
         ),
         1: PeriodEffect(
-            period=1, effect=-0.05, se=0.5,
-            t_stat=-0.1, p_value=0.92,
-            conf_int=(-1.03, 0.93)
+            period=1, effect=-0.05, se=0.5, t_stat=-0.1, p_value=0.92, conf_int=(-1.03, 0.93)
         ),
         2: PeriodEffect(
-            period=2, effect=0.08, se=0.5,
-            t_stat=0.16, p_value=0.87,
-            conf_int=(-0.90, 1.06)
+            period=2, effect=0.08, se=0.5, t_stat=0.16, p_value=0.87, conf_int=(-0.90, 1.06)
         ),
         # Period 3 is reference - not in period_effects
         # Post-period effects
         4: PeriodEffect(
-            period=4, effect=5.0, se=0.5,
-            t_stat=10.0, p_value=0.0001,
-            conf_int=(4.02, 5.98)
+            period=4, effect=5.0, se=0.5, t_stat=10.0, p_value=0.0001, conf_int=(4.02, 5.98)
         ),
         5: PeriodEffect(
-            period=5, effect=5.2, se=0.5,
-            t_stat=10.4, p_value=0.0001,
-            conf_int=(4.22, 6.18)
+            period=5, effect=5.2, se=0.5, t_stat=10.4, p_value=0.0001, conf_int=(4.22, 6.18)
         ),
         6: PeriodEffect(
-            period=6, effect=4.8, se=0.5,
-            t_stat=9.6, p_value=0.0001,
-            conf_int=(3.82, 5.78)
+            period=6, effect=4.8, se=0.5, t_stat=9.6, p_value=0.0001, conf_int=(3.82, 5.78)
         ),
         7: PeriodEffect(
-            period=7, effect=5.0, se=0.5,
-            t_stat=10.0, p_value=0.0001,
-            conf_int=(4.02, 5.98)
+            period=7, effect=5.0, se=0.5, t_stat=10.0, p_value=0.0001, conf_int=(4.02, 5.98)
         ),
     }
 
     # Coefficients for estimated periods (excludes reference period 3)
     coefficients = {
-        'treated:period_0': 0.1,
-        'treated:period_1': -0.05,
-        'treated:period_2': 0.08,
-        'treated:period_4': 5.0,
-        'treated:period_5': 5.2,
-        'treated:period_6': 4.8,
-        'treated:period_7': 5.0,
+        "treated:period_0": 0.1,
+        "treated:period_1": -0.05,
+        "treated:period_2": 0.08,
+        "treated:period_4": 5.0,
+        "treated:period_5": 5.2,
+        "treated:period_6": 4.8,
+        "treated:period_7": 5.0,
     }
 
     # Create vcov matrix (diagonal for simplicity)
@@ -240,15 +227,23 @@ def test_linear_weights(self):
         assert len(weights) == 4
 
     def test_constant_weights(self):
-        """Test constant violation weights."""
+        """Constant violation weights are ``[1, 1, ..., 1]`` (no L2 norm).
+
+        REGISTRY ``## PreTrendsPower`` documents ``δ_t = c`` (per-period
+        level shift) for the constant violation pattern; PR-B R13 fix
+        flipped ``_get_violation_weights('constant')`` to return the
+        unnormalized direction so ``δ_t = M`` exactly. The previous
+        L2-normalized ``[1/√K, ..., 1/√K]`` direction silently re-scaled
+        the reported MDV by ``1/√K`` relative to the documented contract.
+        """
         pt = PreTrendsPower(violation_type="constant")
         weights = pt._get_violation_weights(4)
 
-        # Should be normalized to unit norm
-        assert np.isclose(np.linalg.norm(weights), 1.0)
-        # All weights should be equal
-        assert np.allclose(weights[0], weights[1])
-        assert np.allclose(weights[1], weights[2])
+        # PR-B R13: unnormalized [1, 1, 1, 1] (NOT L2-normalized) so
+        # δ_t = M reflects a per-period level shift of magnitude M.
+        np.testing.assert_allclose(weights, [1.0, 1.0, 1.0, 1.0])
+        # L2 norm should be √K, not 1.
+        assert np.isclose(np.linalg.norm(weights), 2.0)
 
     def test_last_period_weights(self):
         """Test last_period violation weights."""
@@ -278,8 +273,15 @@ class TestPowerComputation:
     """Tests for power computation."""
 
     def test_power_at_zero_equals_alpha(self):
-        """Test that power at M=0 equals alpha (size of test)."""
-        pt = PreTrendsPower(alpha=0.05)
+        """Test that power at M=0 equals alpha (size of test).
+
+        This is a Wald-form property: under H0, the noncentrality is 0 and
+        the rejection probability equals alpha exactly. Under NIS the joint
+        rejection probability at H0 is 1 - (1 - alpha)^K ≈ K*alpha for
+        small alpha (~0.14 for K=3 at alpha=0.05). Pin Wald to test the
+        Wald-specific size property.
+        """
+        pt = PreTrendsPower(alpha=0.05, pretest_form="wald")
 
         # Create simple vcov
         n_pre = 3
@@ -395,16 +397,16 @@ def test_results_has_expected_attributes(self, mock_multiperiod_results):
         pt = PreTrendsPower()
         results = pt.fit(mock_multiperiod_results)
 
-        assert hasattr(results, 'power')
-        assert hasattr(results, 'mdv')
-        assert hasattr(results, 'violation_magnitude')
-        assert hasattr(results, 'violation_type')
-        assert hasattr(results, 'alpha')
-        assert hasattr(results, 'target_power')
-        assert hasattr(results, 'n_pre_periods')
-        assert hasattr(results, 'test_statistic')
-        assert hasattr(results, 'critical_value')
-        assert hasattr(results, 'noncentrality')
+        assert hasattr(results, "power")
+        assert hasattr(results, "mdv")
+        assert hasattr(results, "violation_magnitude")
+        assert hasattr(results, "violation_type")
+        assert hasattr(results, "alpha")
+        assert hasattr(results, "target_power")
+        assert hasattr(results, "n_pre_periods")
+        assert hasattr(results, "test_statistic")
+        assert hasattr(results, "critical_value")
+        assert hasattr(results, "noncentrality")
 
     def test_results_n_pre_periods(self, mock_multiperiod_results):
         """Test that n_pre_periods matches estimated pre-periods (excluding reference)."""
@@ -413,10 +415,13 @@ def test_results_n_pre_periods(self, mock_multiperiod_results):
 
         # n_pre_periods should be the number of estimated coefficients (3)
         # not the total number of pre-periods (4), since period 3 is the reference
-        expected_n_pre = len([
-            p for p in mock_multiperiod_results.pre_periods
-            if f"treated:period_{p}" in mock_multiperiod_results.coefficients
-        ])
+        expected_n_pre = len(
+            [
+                p
+                for p in mock_multiperiod_results.pre_periods
+                if f"treated:period_{p}" in mock_multiperiod_results.coefficients
+            ]
+        )
         assert results.n_pre_periods == expected_n_pre
         assert results.n_pre_periods == 3  # 4 pre-periods minus 1 reference
 
@@ -468,8 +473,8 @@ def test_power_curve_to_dataframe(self, mock_multiperiod_results):
         df = curve.to_dataframe()
 
         assert isinstance(df, pd.DataFrame)
-        assert 'M' in df.columns
-        assert 'power' in df.columns
+        assert "M" in df.columns
+        assert "power" in df.columns
 
 
 # =============================================================================
@@ -497,9 +502,9 @@ def test_results_to_dict(self, mock_multiperiod_results):
 
         d = results.to_dict()
         assert isinstance(d, dict)
-        assert 'power' in d
-        assert 'mdv' in d
-        assert 'violation_type' in d
+        assert "power" in d
+        assert "mdv" in d
+        assert "violation_type" in d
 
     def test_results_to_dataframe(self, mock_multiperiod_results):
         """Test to_dataframe method."""
@@ -524,26 +529,45 @@ def test_power_adequate_property(self, mock_multiperiod_results):
 
         assert isinstance(results.power_adequate, bool)
 
-    def test_power_at_raises_on_custom_violation_type(self, mock_multiperiod_results):
-        """power_at(M) must raise NotImplementedError for violation_type='custom'.
-
-        The PreTrendsPowerResults dataclass does not currently persist the
-        fitted violation_weights, so power_at() cannot reconstruct the
-        custom direction. To prevent silent wrong output (equal-weights
-        fallback), the method raises NotImplementedError and points users
-        to refit with the new M. See REGISTRY.md PreTrendsPower section's
-        silent-failure-guard Note, the audit at
-        docs/methodology/papers/roth-2022-review.md, and the TODO.md row
-        tracking the planned weight-persistence follow-up.
+    def test_power_at_works_for_custom_violation_type(self, mock_multiperiod_results):
+        """power_at(M) now works for custom violation type (PR-B Step 5).
+
+        PR-A R18 added a NotImplementedError guard because
+        PreTrendsPowerResults did not persist fitted violation_weights.
+        PR-B persisted them on the result dataclass and refactored
+        power_at() to read them directly. This test confirms the guard
+        is lifted for fresh fits: a custom-weights PreTrendsPower fit
+        produces a result whose power_at(M) returns a finite, in-[0,1]
+        power value.
         """
-        # mock_multiperiod_results has 4 pre-periods but period 3 is the
-        # reference, so n_pre_periods after fit is 3 (matches
-        # test_results_n_pre_periods expectation in this class).
         weights = np.array([0.1, 0.3, 0.6])
         pt = PreTrendsPower(violation_type="custom", violation_weights=weights)
         results = pt.fit(mock_multiperiod_results)
 
-        with pytest.raises(NotImplementedError, match="violation_type='custom'"):
+        # No longer raises; returns a finite power value in [0, 1].
+        power = results.power_at(0.5)
+        assert np.isfinite(power)
+        assert 0.0 <= power <= 1.0
+
+    def test_power_at_raises_on_legacy_custom_result_without_weights(
+        self, mock_multiperiod_results
+    ):
+        """power_at(M) still raises for old serialized results lacking
+        violation_weights (backwards-compat guard).
+
+        The dataclass default for violation_weights is None; old serialized
+        PreTrendsPowerResults objects from before PR-B's field addition will
+        have None there. For custom fits, power_at() cannot reconstruct
+        custom weights from violation_type + n_pre_periods alone, so the
+        PR-A R18 guard is retained for that specific backwards-compat path.
+        """
+        weights = np.array([0.1, 0.3, 0.6])
+        pt = PreTrendsPower(violation_type="custom", violation_weights=weights)
+        results = pt.fit(mock_multiperiod_results)
+        # Simulate a legacy-result scenario by clearing the persisted weights.
+        results.violation_weights = None
+
+        with pytest.raises(NotImplementedError, match="custom violation weights"):
             results.power_at(0.5)
 
 
@@ -564,15 +588,12 @@ def test_compute_pretrends_power(self, mock_multiperiod_results):
     def test_compute_pretrends_power_custom_params(self, mock_multiperiod_results):
         """Test compute_pretrends_power with custom parameters."""
         results = compute_pretrends_power(
-            mock_multiperiod_results,
-            alpha=0.10,
-            target_power=0.90,
-            violation_type='constant'
+            mock_multiperiod_results, alpha=0.10, target_power=0.90, violation_type="constant"
         )
 
         assert results.alpha == 0.10
         assert results.target_power == 0.90
-        assert results.violation_type == 'constant'
+        assert results.violation_type == "constant"
 
     def test_compute_mdv(self, mock_multiperiod_results):
         """Test compute_mdv function."""
@@ -581,29 +602,28 @@ def test_compute_mdv(self, mock_multiperiod_results):
         assert isinstance(mdv, float)
         assert mdv > 0
 
-    def test_compute_pretrends_power_rejects_custom_violation_type(
-        self, mock_multiperiod_results
-    ):
-        """compute_pretrends_power(..., violation_type='custom') must raise ValueError.
-
-        The helper does not accept ``violation_weights``, so a custom-type
-        call cannot supply the required weights vector. The underlying
-        PreTrendsPower constructor must raise to prevent the helper from
-        silently coercing a custom request into a degenerate fit. See
-        REGISTRY.md PreTrendsPower section + docs/methodology/papers/
-        roth-2022-review.md (helper/class API gap).
+    def test_compute_pretrends_power_rejects_custom_violation_type(self, mock_multiperiod_results):
+        """compute_pretrends_power(..., violation_type='custom') without explicit
+        ``violation_weights`` must raise ValueError.
+
+        PR-B Step 6 added the ``violation_weights`` kwarg to both helpers, so
+        ``violation_type='custom'`` is now usable from the helper API when the
+        weights vector is supplied. This regression locks the loud-fail
+        contract for the unsupplied-weights case: silently coercing a custom
+        request into a degenerate (zero / equal-weights) fit was the PR-A
+        R18 silent-failure that the loud guard prevents. See REGISTRY.md
+        PreTrendsPower section + docs/methodology/papers/roth-2022-review.md.
         """
         with pytest.raises(ValueError, match="violation_weights"):
-            compute_pretrends_power(
-                mock_multiperiod_results, violation_type="custom"
-            )
+            compute_pretrends_power(mock_multiperiod_results, violation_type="custom")
 
     def test_compute_mdv_rejects_custom_violation_type(self, mock_multiperiod_results):
-        """compute_mdv(..., violation_type='custom') must raise ValueError.
+        """compute_mdv(..., violation_type='custom') without ``violation_weights``
+        must raise ValueError.
 
-        Same contract as ``compute_pretrends_power``: the helper does not
-        accept ``violation_weights``, so the custom path is unusable from
-        the helper.
+        Same contract as ``compute_pretrends_power``: PR-B Step 6 made the
+        helper accept ``violation_weights``, so the rejection is now scoped
+        to the unsupplied-weights case rather than the entire custom path.
         """
         with pytest.raises(ValueError, match="violation_weights"):
             compute_mdv(mock_multiperiod_results, violation_type="custom")
@@ -619,12 +639,12 @@ class TestGetSetParams:
 
     def test_get_params(self):
         """Test get_params method."""
-        pt = PreTrendsPower(alpha=0.10, power=0.90, violation_type='constant')
+        pt = PreTrendsPower(alpha=0.10, power=0.90, violation_type="constant")
         params = pt.get_params()
 
-        assert params['alpha'] == 0.10
-        assert params['power'] == 0.90
-        assert params['violation_type'] == 'constant'
+        assert params["alpha"] == 0.10
+        assert params["power"] == 0.90
+        assert params["violation_type"] == "constant"
 
     def test_set_params(self):
         """Test set_params method."""
@@ -675,9 +695,9 @@ def test_sensitivity_to_honest_did(self, mock_multiperiod_results):
         pt = PreTrendsPower()
         sensitivity = pt.sensitivity_to_honest_did(mock_multiperiod_results)
 
-        assert 'mdv' in sensitivity
-        assert 'interpretation' in sensitivity
-        assert isinstance(sensitivity['interpretation'], str)
+        assert "mdv" in sensitivity
+        assert "interpretation" in sensitivity
+        assert isinstance(sensitivity["interpretation"], str)
 
 
 # =============================================================================
@@ -690,30 +710,30 @@ class TestViolationTypes:
 
     def test_linear_violation(self, mock_multiperiod_results):
         """Test power analysis with linear violation."""
-        pt = PreTrendsPower(violation_type='linear')
+        pt = PreTrendsPower(violation_type="linear")
         results = pt.fit(mock_multiperiod_results)
 
-        assert results.violation_type == 'linear'
+        assert results.violation_type == "linear"
 
     def test_constant_violation(self, mock_multiperiod_results):
         """Test power analysis with constant violation."""
-        pt = PreTrendsPower(violation_type='constant')
+        pt = PreTrendsPower(violation_type="constant")
         results = pt.fit(mock_multiperiod_results)
 
-        assert results.violation_type == 'constant'
+        assert results.violation_type == "constant"
 
     def test_last_period_violation(self, mock_multiperiod_results):
         """Test power analysis with last_period violation."""
-        pt = PreTrendsPower(violation_type='last_period')
+        pt = PreTrendsPower(violation_type="last_period")
         results = pt.fit(mock_multiperiod_results)
 
-        assert results.violation_type == 'last_period'
+        assert results.violation_type == "last_period"
 
     def test_different_types_give_different_results(self, mock_multiperiod_results):
         """Test that different violation types can give different MDV."""
-        pt_linear = PreTrendsPower(violation_type='linear')
-        pt_constant = PreTrendsPower(violation_type='constant')
-        pt_last = PreTrendsPower(violation_type='last_period')
+        pt_linear = PreTrendsPower(violation_type="linear")
+        pt_constant = PreTrendsPower(violation_type="constant")
+        pt_last = PreTrendsPower(violation_type="last_period")
 
         mdv_linear = pt_linear.fit(mock_multiperiod_results).mdv
         mdv_constant = pt_constant.fit(mock_multiperiod_results).mdv
@@ -745,21 +765,17 @@ def test_single_pre_period(self):
         """
         period_effects = {
             2: PeriodEffect(
-                period=2, effect=0.1, se=0.5,
-                t_stat=0.2, p_value=0.84,
-                conf_int=(-0.88, 1.08)
+                period=2, effect=0.1, se=0.5, t_stat=0.2, p_value=0.84, conf_int=(-0.88, 1.08)
             ),
             # Period 3 is reference - not estimated
             4: PeriodEffect(
-                period=4, effect=5.0, se=0.5,
-                t_stat=10.0, p_value=0.0001,
-                conf_int=(4.02, 5.98)
+                period=4, effect=5.0, se=0.5, t_stat=10.0, p_value=0.0001, conf_int=(4.02, 5.98)
             ),
         }
 
         coefficients = {
-            'treated:period_2': 0.1,
-            'treated:period_4': 5.0,
+            "treated:period_2": 0.1,
+            "treated:period_4": 5.0,
         }
 
         results = MultiPeriodDiDResults(
@@ -796,25 +812,31 @@ def test_many_pre_periods(self):
         period_effects = {}
         for i in range(n_pre_estimated):
             period_effects[i] = PeriodEffect(
-                period=i, effect=0.05 * (i - 4), se=0.5,
-                t_stat=0.1 * (i - 4), p_value=0.92,
-                conf_int=(-0.88, 1.08)
+                period=i,
+                effect=0.05 * (i - 4),
+                se=0.5,
+                t_stat=0.1 * (i - 4),
+                p_value=0.92,
+                conf_int=(-0.88, 1.08),
             )
 
         # Post-period effects
         for i in range(4):
             period_effects[n_pre_total + i] = PeriodEffect(
-                period=n_pre_total + i, effect=5.0, se=0.5,
-                t_stat=10.0, p_value=0.0001,
-                conf_int=(4.02, 5.98)
+                period=n_pre_total + i,
+                effect=5.0,
+                se=0.5,
+                t_stat=10.0,
+                p_value=0.0001,
+                conf_int=(4.02, 5.98),
             )
 
         # Coefficients (excluding reference period 9)
         coefficients = {}
         for i in range(n_pre_estimated):
-            coefficients[f'treated:period_{i}'] = 0.05 * (i - 4)
+            coefficients[f"treated:period_{i}"] = 0.05 * (i - 4)
         for i in range(4):
-            coefficients[f'treated:period_{n_pre_total + i}'] = 5.0
+            coefficients[f"treated:period_{n_pre_total + i}"] = 5.0
 
         results = MultiPeriodDiDResults(
             period_effects=period_effects,
@@ -857,16 +879,16 @@ def test_callaway_santanna_universal_base_period(self):
         cs = CallawaySantAnna(base_period="universal")
         results = cs.fit(
             data,
-            outcome='outcome',
-            unit='unit',
-            time='period',
-            first_treat='first_treat',
-            aggregate='event_study'
+            outcome="outcome",
+            unit="unit",
+            time="period",
+            first_treat="first_treat",
+            aggregate="event_study",
         )
 
         # Verify reference period exists with NaN SE
         assert -1 in results.event_study_effects
-        assert np.isnan(results.event_study_effects[-1]['se'])
+        assert np.isnan(results.event_study_effects[-1]["se"])
 
         # PreTrendsPower should work without errors (reference period filtered out)
         pt = PreTrendsPower()
@@ -890,7 +912,7 @@ def test_power_curve_has_plot_method(self, mock_multiperiod_results):
         pt = PreTrendsPower()
         curve = pt.power_curve(mock_multiperiod_results)
 
-        assert hasattr(curve, 'plot')
+        assert hasattr(curve, "plot")
         assert callable(curve.plot)
 
 
@@ -921,13 +943,17 @@ def test_power_at_basic(self, mock_multiperiod_results):
         assert 0 <= power_5 <= 1
 
     def test_power_at_zero(self, mock_multiperiod_results):
-        """Test power_at with M=0 (should equal alpha)."""
-        pt = PreTrendsPower(alpha=0.05)
+        """Test power_at with M=0 (should equal alpha under Wald form).
+
+        See note on TestPowerComputation.test_power_at_zero_equals_alpha:
+        the exact-equals-alpha property is Wald-specific. Pin Wald.
+        """
+        pt = PreTrendsPower(alpha=0.05, pretest_form="wald")
         results = pt.fit(mock_multiperiod_results)
 
         power_0 = results.power_at(0.0)
 
-        # At M=0, power should equal size (alpha)
+        # At M=0, power should equal size (alpha) under Wald.
         assert np.isclose(power_0, 0.05, atol=0.01)
 
     def test_power_at_matches_fit(self, mock_multiperiod_results):
@@ -993,18 +1019,26 @@ def event_study_all_periods_results(self):
         # Pre-periods (0, 1, 2) - period 3 would be reference
         for p in [0, 1, 2]:
             period_effects[p] = PeriodEffect(
-                period=p, effect=np.random.normal(0, 0.1), se=0.5,
-                t_stat=0.2, p_value=0.84, conf_int=(-0.88, 1.08)
+                period=p,
+                effect=np.random.normal(0, 0.1),
+                se=0.5,
+                t_stat=0.2,
+                p_value=0.84,
+                conf_int=(-0.88, 1.08),
             )
-            coefficients[f'treated:period_{p}'] = period_effects[p].effect
+            coefficients[f"treated:period_{p}"] = period_effects[p].effect
 
         # Post-periods (4, 5, 6, 7)
         for p in [4, 5, 6, 7]:
             period_effects[p] = PeriodEffect(
-                period=p, effect=5.0 + np.random.normal(0, 0.1), se=0.5,
-                t_stat=10.0, p_value=0.0001, conf_int=(4.02, 5.98)
+                period=p,
+                effect=5.0 + np.random.normal(0, 0.1),
+                se=0.5,
+                t_stat=10.0,
+                p_value=0.0001,
+                conf_int=(4.02, 5.98),
             )
-            coefficients[f'treated:period_{p}'] = period_effects[p].effect
+            coefficients[f"treated:period_{p}"] = period_effects[p].effect
 
         # In this scenario, pre_periods=[3] (only reference), post_periods=[0,1,2,4,5,6,7]
         vcov = np.diag([0.25] * 7)
@@ -1032,10 +1066,7 @@ def test_fit_with_explicit_pre_periods(self, event_study_all_periods_results):
         # Without pre_periods, would fail because results.pre_periods=[3]
         # and period 3 has no coefficient (it's the reference)
         # With explicit pre_periods=[0,1,2], should work
-        results = pt.fit(
-            event_study_all_periods_results,
-            pre_periods=[0, 1, 2]
-        )
+        results = pt.fit(event_study_all_periods_results, pre_periods=[0, 1, 2])
 
         assert results.n_pre_periods == 3
         assert results.power >= 0
@@ -1046,10 +1077,7 @@ def test_pre_periods_overrides_results(self, event_study_all_periods_results):
         pt = PreTrendsPower()
 
         # Explicitly set pre_periods to [0, 1]
-        results = pt.fit(
-            event_study_all_periods_results,
-            pre_periods=[0, 1]
-        )
+        results = pt.fit(event_study_all_periods_results, pre_periods=[0, 1])
 
         # Should use 2 pre-periods, not what's in results
         assert results.n_pre_periods == 2
@@ -1058,11 +1086,7 @@ def test_power_at_with_pre_periods(self, event_study_all_periods_results):
         """Test power_at() method with pre_periods parameter."""
         pt = PreTrendsPower()
 
-        power = pt.power_at(
-            event_study_all_periods_results,
-            M=1.0,
-            pre_periods=[0, 1, 2]
-        )
+        power = pt.power_at(event_study_all_periods_results, M=1.0, pre_periods=[0, 1, 2])
 
         assert 0 <= power <= 1
 
@@ -1070,11 +1094,7 @@ def test_power_curve_with_pre_periods(self, event_study_all_periods_results):
         """Test power_curve() with pre_periods parameter."""
         pt = PreTrendsPower()
 
-        curve = pt.power_curve(
-            event_study_all_periods_results,
-            n_points=10,
-            pre_periods=[0, 1, 2]
-        )
+        curve = pt.power_curve(event_study_all_periods_results, n_points=10, pre_periods=[0, 1, 2])
 
         assert len(curve.M_values) == 10
         assert len(curve.powers) == 10
@@ -1084,26 +1104,20 @@ def test_sensitivity_to_honest_did_with_pre_periods(self, event_study_all_period
         pt = PreTrendsPower()
 
         sensitivity = pt.sensitivity_to_honest_did(
-            event_study_all_periods_results,
-            pre_periods=[0, 1, 2]
+            event_study_all_periods_results, pre_periods=[0, 1, 2]
         )
 
-        assert 'mdv' in sensitivity
-        assert sensitivity['mdv'] > 0
+        assert "mdv" in sensitivity
+        assert sensitivity["mdv"] > 0
 
     def test_convenience_functions_with_pre_periods(self, event_study_all_periods_results):
         """Test convenience functions with pre_periods parameter."""
         # compute_mdv
-        mdv = compute_mdv(
-            event_study_all_periods_results,
-            pre_periods=[0, 1, 2]
-        )
+        mdv = compute_mdv(event_study_all_periods_results, pre_periods=[0, 1, 2])
         assert mdv > 0
 
         # compute_pretrends_power
         results = compute_pretrends_power(
-            event_study_all_periods_results,
-            M=1.0,
-            pre_periods=[0, 1, 2]
+            event_study_all_periods_results, M=1.0, pre_periods=[0, 1, 2]
         )
         assert results.n_pre_periods == 3