diff --git a/CHANGELOG.md b/CHANGELOG.md index 278ce96f..8b3f2d5c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - **`MultiPeriodDiD(cluster=..., vcov_type="hc2_bm")` now supported** (`diff_diff/estimators.py:1657`). Pre-PR the combination raised `NotImplementedError` because the cluster-aware CR2 Bell-McCaffrey Satterthwaite DOF for the post-period-average ATT (`avg_att = (1/n_post) Σ_{t ≥ t_treat} β_t`) was not implemented — only the per-coefficient case existed in `_compute_cr2_bm`. New `_compute_cr2_bm_contrast_dof` helper in `diff_diff/linalg.py` generalizes the per-coefficient loop to arbitrary `(k, m)` contrast matrices using the identical Pustejovsky-Tipton 2018 Section 4 algebra; `_compute_cr2_bm` is refactored to call it with `contrasts=eye(k)` so the existing per-coefficient parity to clubSandwich's `coef_test$df_Satt` is preserved (refactor regression at atol=1e-10). `MultiPeriodDiD.fit()` extends its existing avg_att DOF block to branch on `effective_cluster_ids`: one-way `_compute_bm_dof_from_contrasts` when None, cluster-aware `_compute_cr2_bm_contrast_dof` otherwise. Cluster IDs are per-observation length `n` and are NOT subscripted by the rank-deficient column-drop mask. R parity verified at atol=1e-10 against clubSandwich's `Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom` on the new `mpd_clustered_avg_att_dof` fixture in `benchmarks/data/clubsandwich_cr2_golden.json` (Wald_test's HTZ on a 1-row constraint matrix yields the Satterthwaite t-test DOF). Per-coefficient `period_effects[t].p_value` / `conf_int` and `avg_att` `avg_p_value` / `avg_conf_int` now reflect the correct Satterthwaite DOF rather than the n-k fallback under cluster+hc2_bm. Weighted CR2-BM (`survey_design=` paths) remains a separate gate. New tests: `tests/test_linalg_hc2_bm.py::TestCR2BMContrastDOF` (4 tests: refactor regression, R-parity, shape validation, cluster-count validation); existing `test_multi_period_cluster_plus_hc2_bm_rejected` flipped to behavioral `test_multi_period_cluster_plus_hc2_bm_produces_finite_inference`. +- **PreTrendsPower: NIS box probability as the new primary test form (PR-B methodology audit, Roth 2022).** Implements Roth (2022) Section II.A-B no-individually-significant (NIS) box probability `P(β̂_pre ∈ B_NIS(Σ))` as the new default `pretest_form='nis'` on `PreTrendsPower`, `compute_pretrends_power`, and `compute_mdv`. The Wald noncentral-χ² form previously shipped as the implicit default is now opt-in via `pretest_form='wald'` and remains as a paper-supported alternative (Propositions 1+3+4 all apply — the Wald ellipsoid is convex). Computation uses `scipy.stats.multivariate_normal.cdf` with `lower_limit=` for the rectangular box probability on the centered change-of-variable `Y = β̂_pre - δ_pre ~ N(0, Σ_22)`; the MDV is solved via doubling expansion + `optimize.brentq` bisection with a 1000-cap non-convergence fallback returning `np.inf`. New private helpers `_compute_power_nis` and `_compute_mdv_nis`; the existing methods are renamed `_compute_power_wald` and `_compute_mdv_wald` with byte-identical math, and `_compute_power` / `_compute_mdv` become dispatchers on `self.pretest_form`. `power_curve()` and `PreTrendsPowerResults.power_at()` inherit the dispatch (power_at via the new persisted `pretest_form` field on the result). The `summary()` / `to_dict()` / `to_dataframe()` outputs dispatch on `pretest_form` — NIS fits print "NIS box probability: ..." instead of "Non-centrality parameter: ...". +- **PreTrendsPower: full Σ_22 routing on CS and SA event-study adapters (PR-B methodology audit, Σ_22 fidelity).** The shipped `compute_pretrends_power` adapter previously hard-coded `np.diag(ses**2)` for both `CallawaySantAnnaResults` and `SunAbrahamResults` regardless of whether the analytical event-study VCV was available, dropping the off-diagonal correlations Roth's framework relies on. PR-B routes non-bootstrap CS fits through the full `event_study_vcov` sub-block (already persisted at `staggered_results.py:126-128`) and extends `SunAbrahamResults` to also persist `event_study_vcov` + `event_study_vcov_index` constructed via the W-matrix aggregation `event_study_vcov = W @ vcov_cohort @ W.T` where W is the cohort-aggregation matrix (`|event_times| × n_interactions` sparse matrix with `W[i, j] = cohort_weights[e_i][g]` at column `j = coef_index_map[(g, e_i)]`). The new shared helper `_extract_event_study_vcov_subblock` at module level in `pretrends.py` consumes the full VCV when available with a `.index()` lookup on `event_study_vcov_index`; defensive ValueError on label mismatch. Bootstrap fits and replicate-weight survey fits clear `event_study_vcov` (mirroring the CS bootstrap-clear pattern at `staggered.py:2032-2036`) so they fall through to `diag(ses^2)` and the analytical VCV is never mixed with bootstrap/replicate SE overrides downstream. Diagonal-entry sanity check verifies that `event_study_vcov[i, i] = se(e_i)^2` matches the existing per-event-time SE computation in `_compute_iw_effects` at `atol=1e-10`. **Backwards-compatible field additions**: new `event_study_vcov` + `event_study_vcov_index` fields on `SunAbrahamResults` default to `None`, so existing consumers that don't read them see no change. +- **`PreTrendsPowerResults` now persists fitted `violation_weights` + `pretest_form` + `nis_box_probability` (PR-B Step 5).** New optional fields on the result dataclass enable `power_at(M)` to work for ALL four violation types (linear / constant / last_period / **custom**) on fresh fits, by reading the stored weights directly instead of reconstructing from `violation_type` alone. The PR-A R18 NotImplementedError silent-failure guard for `violation_type='custom'` is retained ONLY for legacy serialized results (`violation_weights=None`) — fresh fits no longer hit it. +- **Helper API: `compute_pretrends_power` and `compute_mdv` now accept `violation_weights` and `pretest_form` (PR-B Step 6).** Closes the PR-A R18 helper/class API gap that previously made `violation_type='custom'` unusable from the helper functions. Helpers now forward both new parameters to the underlying `PreTrendsPower` class. Default `pretest_form='nis'` matches the class default. All existing helper call sites in `test_pretrends.py` and `test_pretrends_event_study.py` continue to pass without changes because the form-invariance of most assertions allowed the default flip with only 3 tests needing targeted updates. +- **NEW `tests/test_methodology_pretrends.py` (PR-B Step 7).** Roth (2022) Section II.A-B paper-equation-numbered Verified Components walk-through. 8 classes, 30+ tests covering K=1 closed-form (Proposition 2 proof), NIS box probability via MC simulation cross-check, Propositions 1-4 simulation parity, linear-units γ-scale verification on regular / irregular / pandas.Period grids, custom-weight persistence regression, JSON-serializability of `to_dict`, CS/SA full-VCV adapter regression, helper API end-to-end, NIS-vs-Wald differentiation, and skip-gated `TestPretrendsParityR` stubs for PR-C R-package goldens. +- **`benchmarks/R/generate_pretrends_golden.R` (PR-B Step 12).** R generator script for the PR-C deferred goldens. Script committed with a `` placeholder commit reference; PR-C pins the audited `pretrends` revision, runs the script, commits the JSON goldens at `benchmarks/data/r_pretrends_golden.json`, and activates the parity tests. - **`MultiPeriodDiD(absorb=..., vcov_type in {"hc2", "hc2_bm"})` now supported** (`diff_diff/estimators.py:1476`). Mirrors the DiD-absorb auto-route shipped earlier in this release: when `absorb=` is paired with `vcov_type in {"hc2","hc2_bm"}`, `MultiPeriodDiD.fit()` promotes the absorb columns to `fixed_effects=` internally so the existing full-dummy-design code path computes the algebraically correct vcov on the event-study design (`treated + period_X dummies + treated:period_X interactions + factor(unit)`). Verified at ~1e-10 vs `lm() + sandwich::vcovHC(type="HC2")` and `lm() + clubSandwich::vcovCR(cluster=1:n, type="CR2")` on a 5-cohort × 5-period event-study fixture (new `tests/test_estimators_vcov_type.py::TestMPDAbsorbedFERParity` against `benchmarks/data/clubsandwich_cr2_golden.json` scenario `mpd_absorbed_fe_did`). HC1/CR1 paths on `absorb=` are unchanged (no leverage term). `TwoWayFixedEffects(vcov_type in {"hc2","hc2_bm"})` rejection remains as a follow-up (different fit-path structure — no `fixed_effects=` equivalent inside TWFE). **Behavioral note (full `MultiPeriodDiDResults` surface change under auto-route):** under the auto-route, the entire returned `MultiPeriodDiDResults` reflects the full-dummy fit rather than the within-transformed fit — `result.coefficients`, `result.vcov`, `result.residuals`, `result.fitted_values`, `result.r_squared` all include the FE-dummy entries / un-demeaned values. `result.period_effects[t].effect` / `.se` / `.p_value` / `.conf_int` and `result.avg_att` / `.avg_se` are invariant to this routing (FWL guarantee). MPD requires a time-invariant ever-treated indicator that lies in the span of the intercept and the post-auto-route unit FE dummies (the exact alias depends on the omitted FE reference category under `pd.get_dummies(drop_first=True)`, not just on "the sum of treated-cohort unit dummies"), so `solve_ols` drops one column from that collinear set under R-style rank-deficiency handling. Which specific column is dropped is pivot-order and dummy-coding dependent (in the shipped parity fixture it is a never-treated unit dummy, not the `treated` main effect itself). The per-period interaction coefficients (`treated:period_X`) and `avg_att` are identified and invariant to that choice; parity tests target those rather than the `treated` main effect. **Survey-design scope (replicate weights):** when `survey_design=` uses replicate weights, the auto-route short-circuits the absorb-refit branch at `estimators.py:1693` and routes through the standard `compute_replicate_vcov` path on the fixed full-dummy design — correct because the design does not depend on replicate weights so no per-replicate refit is needed. **Redundant time-FE skip:** when the routed (or directly-supplied) `fixed_effects` list contains the `time` column, MPD silently skips emitting `