Thread vcov_type through SunAbraham (Phase 1b 1/8) by igerber · Pull Request #472 · igerber/diff-diff

igerber · 2026-05-19T23:27:10Z

Summary

Threads vcov_type ∈ {classical, hc1, hc2, hc2_bm} through SunAbraham, mirroring the DiD/MPD/TWFE chain from Phase 1a (PR Lift Gate 1: HC2/HC2-BM for TwoWayFixedEffects via full-dummy auto-route #469 et al). Defaults to "hc1" — preserves prior behavior bit-equally (SA historically hard-coded HC1).
When vcov_type ∈ {classical, hc2, hc2_bm}, _fit_saturated_regression auto-routes to a full-dummy saturated design (intercept + cohort × event-time interactions + unit dummies + time dummies). FWL preserves cohort coefficients but not the hat matrix — HC2 leverage and Bell-McCaffrey Satterthwaite DOF require the full FE projection; classical also routes through full-dummy so the (n-k) finite-sample correction matches R's lm() interpretation. Same Part B surgery shape as TWFE Gate 1 (PR Lift Gate 1: HC2/HC2-BM for TwoWayFixedEffects via full-dummy auto-route #469).
hc1 keeps the within-transform path (cluster-robust HC1 does not depend on the hat matrix; matches fixest::sunab(cluster=~unit) convention).
Auto-cluster-at-unit dropped on explicit one-way families (hc2, classical); preserved for hc1 and hc2_bm (which routes to CR2-BM at unit).
SurveyDesign (any kind — analytical weights / stratified / PSU / replicate-weight) combined with vcov_type ∈ {classical, hc2, hc2_bm} raises NotImplementedError: the survey TSL (or replicate-weight refit) variance overrides the analytical sandwich family, AND the auto-cluster guard for one-way families would silently downgrade unit-level PSUs to per-observation PSUs. Use vcov_type="hc1" (default) for survey designs.
vcov_type="conley" rejected at __init__ with a deferral message (TODO row tracks the threading needed for conley_* params on the saturated regression call).
vcov_type propagated to SunAbrahamResults.vcov_type for downstream introspection.
First PR of Phase 1b — 7 follow-ups for the remaining standalone estimators: StackedDiD, WooldridgeDiD-OLS, CallawaySantAnna, ImputationDiD, TripleDifference, TwoStageDiD, EfficientDiD.

Methodology references (required if estimator / math changes)

Method name(s): SunAbraham interaction-weighted event study (variance/inference layer only — IW estimand and cohort weighting unchanged)
Paper / source link(s): Sun & Abraham (2021), Journal of Econometrics, doi:10.1016/j.jeconom.2020.09.006
Any intentional deviations from the source (and why): HC1 finite-sample correction. SA's within-transform HC1 SE differs from fixest::sunab() by ~1-2% (~2e-3 absolute) due to a different (n-k) count: fixest counts absorbed FE in k_total; SA's solve_ols counts only within-transformed columns. The IW aggregation step is otherwise identical. Documented in docs/methodology/REGISTRY.md SunAbraham section, pinned at atol=5e-3 in tests/test_methodology_sun_abraham.py, tracked in TODO.md for follow-up harmonization.

Validation

Tests added/updated:
- tests/test_sun_abraham.py — 17 new behavioral tests in TestSunAbrahamVcovType (all vcov_type values finite-and-distinct, auto-cluster drop/preserve, replicate/survey rejects, get_params/set_params + _vcov_type_explicit refresh, clone+repeat-fit idempotence, invalid-value rejection, vcov_type propagated to SunAbrahamResults, n_psu + df_survey regression for survey path, full-dummy-vs-within-transform HC2 divergence)
- tests/test_methodology_sun_abraham.py — NEW file, 5 R-parity tests: classical / hc2 / hc2_bm cohort SE at atol=1e-10 vs lm()+sandwich/clubSandwich, BM Satterthwaite DOF (singleton + cluster=unit) at atol=1e-10, HC1 event-study e=0 vs fixest::sunab(cluster=~unit) at atol=5e-3 (documented deviation)
- benchmarks/R/generate_clubsandwich_golden.R — new sun_abraham_two_cohort scenario (5-cohort × 8-period balanced panel; saturated full-dummy lm() + sandwich::vcovHC + clubSandwich::vcovCR at unit + singleton-cluster + fixest::sunab event-study e=0). JSON golden regenerated.
Backtest / simulation / notebook evidence (if applicable): N/A (no tutorials touched; existing SA tutorial uses default HC1 — unchanged)
Pre-flight: standalone smoke test (R + Python) verified Python solve_ols full-dummy vs R lm()+vcovHC/vcovCR at atol=1e-12 to 1e-15 before any source edit (per feedback_r_source_smoke_test_before_implementing.md).
Local Codex review: 5 rounds via /ai-review-local --backend codex until ✅ clean — only P3 informational items remain (HC1 finite-sample-correction deviation + Conley deferral, both tracked in TODO.md).

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

github-actions · 2026-05-19T23:34:41Z

Overall Assessment

⚠️ Needs changes

I cross-checked the PR against the Sun-Abraham registry entry and the in-code Sun & Abraham references. I did not find an untracked variance-formula or identification mismatch in the implemented analytical paths, and the HC1/fixest gap plus the Conley deferral are both documented and tracked. The blocking issue is narrower: the new vcov_type surface is not tested across several existing SA code paths it now exercises, which is a P1 under the project’s parameter-interaction checklist.

Executive Summary

Affected method: Sun-Abraham interaction-weighted event study; this PR changes the variance/inference layer, not the IW estimand or cohort-weight aggregation.
The main methodology deviations in the diff are documented in the registry: the full-dummy auto-route for classical/hc2/hc2_bm, the HC1 finite-sample-correction gap vs fixest::sunab(), and the fit-time reject for survey_design + {classical,hc2,hc2_bm}. See REGISTRY.md and TODO.md.
I did not find an untracked P0/P1 variance/SE bug in the new analytical implementation itself.
I am still at ⚠️ Needs changes because the new vcov_type parameter is not covered on important existing SA paths: manual full-dummy builds with covariates / control_group="not_yet_treated", and bootstrap refits that now re-enter the new vcov_type plumbing.
There is also a small code-quality inconsistency: bootstrap helpers still type cluster_var as non-null even though the new one-way paths legitimately pass None.

Methodology

Severity P3 (informational). Impact: none/blocking. The PR’s substantive deviations from prior behavior are documented and therefore mitigated: full-dummy routing for classical/hc2/hc2_bm, the HC1 finite-sample-correction difference vs fixest::sunab(), and rejecting survey_design with non-HC1 analytical families. Concrete fix: none required in this PR; the documentation/tracking is already present in REGISTRY.md and TODO.md.

Code Quality

Severity P3. Impact: low. fit() now legitimately sets cluster_var = None for explicit one-way families, but both bootstrap helpers still advertise cluster_var: str, which hides a real nullable state from readers and type checkers. See sun_abraham.py, sun_abraham.py, and sun_abraham.py. Concrete fix: change both helper signatures to Optional[str] and align the docstrings/comments.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity P3 (informational). Impact: none/blocking. The PR properly tracks its deferred work for vcov_type="conley" and the HC1/fixest finite-sample-correction harmonization in TODO.md. Concrete fix: none required for approval.

Security

No findings.

Documentation/Tests

Severity P1. Impact: the new full-dummy branch in _fit_saturated_regression() is a manual second implementation of the SA regression design, but the added tests only exercise the default control group with no covariates. There is no coverage that point estimates stay invariant when control_group="not_yet_treated" or covariates are present, even though those paths now flow through the rewritten design builder for {"classical","hc2","hc2_bm"}. A regression here would silently change the estimator, not just the SEs. See sun_abraham.py, tests/test_sun_abraham.py, and tests/test_methodology_sun_abraham.py. Concrete fix: add invariance tests comparing overall_att, event_study_effects, and representative cohort coefficients across vcov_type="hc1" vs {"classical","hc2","hc2_bm"} on panels with non-empty covariates and with control_group="not_yet_treated".
Severity P1. Impact: the new vcov_type values are not covered on the existing bootstrap paths. Both _run_bootstrap() and _run_rao_wu_bootstrap() now re-enter _fit_saturated_regression(..., vcov_type=self.vcov_type), so bootstrap fits with the new families execute the new full-dummy plumbing on every refit, but TestSunAbrahamVcovType has no n_bootstrap > 0 cases. See sun_abraham.py, sun_abraham.py, and tests/test_sun_abraham.py. Concrete fix: add bootstrap coverage for vcov_type="classical", "hc2", and "hc2_bm" (including explicit cluster= for hc2_bm if supported) asserting finite bootstrap outputs and point-estimate invariance to vcov_type.

Path to Approval

Add analytical invariance tests for the new full-dummy vcov_type paths with control_group="not_yet_treated" and with at least one covariate column.
Add bootstrap tests for vcov_type in {"classical","hc2","hc2_bm"} so the new refit path is exercised under n_bootstrap > 0.

github-actions · 2026-05-20T00:13:10Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 269e9045c65a5827b43b68ebbd98f59cdacff559

Overall Assessment

⚠️ Needs changes

The re-review resolves the two prior P1 test-coverage findings, but there is still one methodology-significant inference bug in the new hc2_bm path for Sun-Abraham’s aggregated outputs.

Executive Summary

Affected method: Sun-Abraham interaction-weighted event study; this PR changes the variance/inference layer, not the IW estimand or cohort-weight formula.
The previously raised P1 coverage gaps appear addressed: there are now explicit invariance tests for covariates, control_group="not_yet_treated", and n_bootstrap > 0 across vcov_type values in tests/test_sun_abraham.py.
The HC1/fixest finite-sample-correction gap, the full-dummy auto-route, the survey-design reject for non-HC1 families, and the Conley deferral are all documented/tracked in docs/methodology/REGISTRY.md and TODO.md, so those are non-blocking.
[Newly identified] vcov_type="hc2_bm" does not propagate Bell-McCaffrey/Satterthwaite DOF to the user-facing aggregated event_study_effects or overall_att inference, so p_value/CI/significance output silently falls back to normal inference despite the registry and docstrings advertising CR2-BM inference.

Methodology

Severity P1 [Newly identified]. Impact: hc2_bm only gets Bell-McCaffrey DOF at the raw saturated-regression coefficient level via LinearRegression.get_inference(), but Sun-Abraham’s user-facing event-study and overall ATT inference is recomputed later with safe_inference(..., df=None). That means event_study_effects[*]["p_value"/"conf_int"] and overall_p_value/overall_conf_int silently use normal inference instead of CR2 Bell-McCaffrey contrast DOF, which is an undocumented deviation from the registry contract for vcov_type="hc2_bm". See diff_diff/sun_abraham.py, diff_diff/sun_abraham.py, docs/methodology/REGISTRY.md, and the existing contrast-DOF pattern in diff_diff/estimators.py / diff_diff/linalg.py. Concrete fix: for analytical hc2_bm, build the aggregation contrasts for each event-time and for the overall ATT on the full-dummy design, compute CR2-BM contrast DOF with the existing helper, and pass those DOF values into the safe_inference calls that populate event_study_effects and overall_att.
Severity P3. Impact: none/blocking. The PR’s substantive deviations that are documented in the registry are properly mitigated: the full-dummy auto-route for classical/hc2/hc2_bm, the HC1/fixest finite-sample-correction gap, the survey-design reject for non-HC1 families, and the Conley deferral. See docs/methodology/REGISTRY.md and TODO.md. Concrete fix: none.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new findings beyond the already tracked TODO items for Conley threading and the HC1/fixest finite-sample correction gap in TODO.md.

Security

No findings.

Documentation/Tests

Severity P2. Impact: the new methodology suite pins only raw cohort hc2_bm DOF/SE, so it would not catch the user-facing aggregated-inference bug above. tests/test_methodology_sun_abraham.py stops at LinearRegression._bm_dof rather than checking SunAbrahamResults.event_study_effects / overall_att. See tests/test_methodology_sun_abraham.py. Concrete fix: add end-to-end hc2_bm tests for at least one aggregated event-study contrast and the overall ATT contrast, asserting the reported p_value/CI (or denominator df) matches the CR2-BM contrast-DOF calculation.
No other findings. The prior review’s coverage gaps on covariates, not_yet_treated, and bootstrap refits appear resolved in tests/test_sun_abraham.py.

Path to Approval

For analytical vcov_type="hc2_bm", compute and use contrast-specific Bell-McCaffrey DOF for each aggregated event-study effect and for overall_att, instead of calling safe_inference with df=None.
Add methodology tests on the committed golden panel that verify hc2_bm aggregated event-study and overall ATT inference matches the CR2-BM contrast-DOF calculation end to end.

github-actions · 2026-05-20T01:08:42Z

🔁 AI review rerun (requested by @igerber)

Head SHA: d7bb9da6f2d0e22112e4241a1d7ad96fd61d0d73

Overall Assessment

⚠️ Needs changes

Affected method: Sun-Abraham interaction-weighted event study. The PR correctly fixes the prior hc2_bm aggregated-inference methodology bug and documents the intended deviations in the registry/TODO, but it introduces a new validation regression that can silently change the requested variance estimator.

Executive Summary

The prior re-review blocker is resolved: aggregated hc2_bm event-study and overall-ATT inference now use CR2 Bell-McCaffrey contrast DOF in diff_diff/sun_abraham.py:L972-L1072, with R-parity tests in tests/test_methodology_sun_abraham.py:L180-L233.
The PR’s documented methodology deviations are properly mitigated: the full-dummy auto-route, the HC1/fixest finite-sample-correction gap, the survey reject for non-HC1 families, and the Conley deferral are all recorded in docs/methodology/REGISTRY.md:L977-L1048 and TODO.md:L101-L105.
Severity P1 [Newly identified]: an explicit cluster= that is absent from the data is now silently treated as None, which downgrades clustered inference to one-way HC1 / HC2 / classical / one-way BM instead of raising.
The earlier coverage gaps on covariates, control_group="not_yet_treated", and bootstrap refits appear addressed in tests/test_sun_abraham.py:L1811-L1915.

Methodology

Severity P1 [Newly identified]. Impact: explicit cluster= is no longer validated, so a missing cluster column can silently produce the wrong SEs/p-values. fit() still validates only outcome/unit/time/first_treat/covariates, not cluster (diff_diff/sun_abraham.py:L621-L628); then the new cluster-resolution path turns a missing cluster column into cluster_ids_raw=None (diff_diff/sun_abraham.py:L786-L792), and both regression branches accept that downgrade instead of erroring (diff_diff/sun_abraham.py:L1381-L1411). For vcov_type="hc2_bm" this silently changes CR2-BM to one-way HC2+BM; for vcov_type="hc1" it silently drops clustering entirely. Concrete fix: front-door validate self.cluster against data.columns before cluster resolution, and do not use the ... in df.columns else None fallback for explicit user-supplied clusters.
Severity P3. Impact: none/blocking. The PR’s substantive deviations from prior behavior are documented and tracked, so they are not defects: full-dummy routing for classical/hc2/hc2_bm, aggregated BM contrast DOF, the HC1/fixest finite-sample-correction gap, and the Conley/survey follow-ups in docs/methodology/REGISTRY.md:L977-L1048 and TODO.md:L101-L105. Concrete fix: none.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new untracked tech-debt findings. The remaining SA follow-ups are already tracked in TODO.md:L101-L105.

Security

No findings.

Documentation/Tests

No additional findings. The new methodology coverage for aggregated BM contrast DOF is present in tests/test_methodology_sun_abraham.py:L180-L233, and the broader regression coverage added for covariates / not_yet_treated / bootstrap is in tests/test_sun_abraham.py:L1811-L1915. Review note: I could not execute the suite in this environment because the local runtime here lacks pytest and core test dependencies.

Path to Approval

Add explicit validation in SunAbraham.fit() that cluster exists in data.columns when the user supplies it; do this before the new vcov_type auto-cluster logic so cluster="missing" cannot degrade to cluster_ids=None.
Add regression tests that SunAbraham(cluster="missing").fit(...) raises for at least vcov_type="hc1" and vcov_type="hc2_bm"; ideally also cover vcov_type="classical" so an absent explicit cluster cannot mask the intended one-way-family rejection.

github-actions · 2026-05-20T01:19:06Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 5996d43b1b705b427e169ad4feb14e31e43c89e3

Overall Assessment

⚠️ Needs changes

Affected method: Sun-Abraham interaction-weighted event study. The paper’s load-bearing point is the contamination of staggered-TWFE lead/lag coefficients under heterogeneous treatment effects and the interaction-weighted alternative that avoids it; this PR keeps that estimand intact and changes only the variance/inference layer around it. (ideas.repec.org)

Executive Summary

The prior blocker is fixed: explicit missing cluster= columns now fail fast in diff_diff/sun_abraham.py:L630-L639, with regression coverage in tests/test_sun_abraham.py:L1881-L1903.
The substantive methodology changes are otherwise aligned with the registry: full-dummy routing for classical/hc2/hc2_bm, BM contrast DOF for aggregated hc2_bm, the HC1-vs-fixest finite-sample-correction gap, survey rejection for non-HC1 families, and the Conley deferral are all documented/tracked in docs/methodology/REGISTRY.md:L977-L1048 and TODO.md:L101-L103.
Severity P1 [Newly identified]: explicit cluster columns with missing labels are still not rejected on the non-survey hc1/hc2_bm paths, so clustered inference can remain silently malformed.
There is one non-blocking documentation inconsistency: the SunAbraham registry section now documents hc2_bm aggregated BM contrast DOF, but a later bullet still states aggregated SA inference uses the normal distribution.

Methodology

Severity P1 [Newly identified]. Impact: the new front-door cluster validation in diff_diff/sun_abraham.py:L630-L639 checks only column existence, not missing cluster labels. On the non-survey cluster-robust path, diff_diff/linalg.py:L2049-L2068 computes n_clusters from np.unique(cluster_ids) but forms cluster scores with DataFrame.groupby(cluster_ids), which drops NA groups by default; that means rows with missing cluster labels can be omitted from the meat while still affecting the cluster-count adjustment, yielding silently wrong SEs/p-values for explicit-cluster hc1/hc2_bm fits. The survey branch already fails closed on missing injected PSUs at diff_diff/survey.py:L1307-L1313. Concrete fix: add an upfront data[self.cluster].isna().any() guard in SunAbraham.fit() for explicit cluster=, before cluster resolution, and raise ValueError with a clear message. (pandas.pydata.org)
Severity P3. Impact: none/blocking. The PR’s estimator-level deviations from prior behavior are documented in the methodology registry, so under the review policy they are not defects: full-dummy auto-routing, BM contrast DOF on aggregated hc2_bm outputs, the HC1/fixest finite-sample-correction gap, survey rejection for non-HC1 families, and the Conley deferral in docs/methodology/REGISTRY.md:L997-L1048 and TODO.md:L101-L103. Concrete fix: none.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity P3. Impact: none/blocking. The remaining SunAbraham follow-ups introduced or referenced by this PR are properly tracked in TODO.md:L101-L105, so they do not block approval. Concrete fix: none.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the SunAbraham registry section now correctly documents BM contrast DOF for aggregated hc2_bm inference at docs/methodology/REGISTRY.md:L1010-L1027, but the later “Inference distribution” bullets still say aggregated event-study and overall ATT p-values use the normal distribution at docs/methodology/REGISTRY.md:L1070-L1074. That contradiction will mislead future methodology reviews. Concrete fix: update the later bullets to carve out vcov_type="hc2_bm" or restate the rule by vcov_type.
Reviewer note: I could not execute the tests locally because pytest is not installed in this environment; the assessment is based on static review of the code and the added test coverage.

Path to Approval

Add explicit NA/NaN validation for user-supplied cluster= in SunAbraham.fit() before cluster resolution, so non-survey clustered fits fail closed instead of entering the malformed clustered-vcov path.
Add regression tests that SunAbraham(cluster="<col>", vcov_type="hc1") and SunAbraham(cluster="<col>", vcov_type="hc2_bm") raise on missing cluster labels on the non-survey path.

…1b 1/8) Adds `vcov_type` parameter to `SunAbraham`, mirroring the DiD/MPD/TWFE chain from Phase 1a. Defaults to "hc1" (preserves prior bit-equal behavior - SA historically hard-coded HC1). First PR of Phase 1b, which threads `vcov_type` through the 8 standalone estimators that expose `cluster=` but not yet `vcov_type=`. Methodology: when `vcov_type ∈ {classical, hc2, hc2_bm}`, `_fit_saturated_regression` auto-routes to a full-dummy saturated design (intercept + cohort × event-time interactions + unit dummies + time dummies). FWL preserves cohort coefficients but not the hat matrix, so HC2 leverage and Bell-McCaffrey DOF must be computed on the full FE projection. Mirrors TWFE Gate 1 from PR #469. Empirically matches `lm() + sandwich::vcovHC(type="HC2")` and `clubSandwich::vcovCR(..., type="CR2") + coef_test()$df_Satt` at atol=1e-10 (pinned in tests/test_methodology_sun_abraham.py). Scope limits: replicate-weight survey + hc2/hc2_bm raises NotImplementedError (per-replicate full-dummy refit not implemented). `vcov_type="conley"` rejected at __init__ with a deferral message (threading conley_* params is a follow-up). Auto-cluster-at-unit is dropped when the user opts into explicit `vcov_type="hc2"` or `"classical"` (both one-way only); preserved for `"hc1"` and `"hc2_bm"`. Documented deviation from R: SA's within-transform HC1 SE differs from `fixest::sunab()` by ~1-2% on typical panel sizes (different (n-k) finite-sample correction). The IW aggregation is otherwise identical; parity at atol=5e-3. Test surface: 15 new behavioral tests in test_sun_abraham.py covering default-vs-explicit bit-equality, all four vcov_type values finite-and-distinct, auto-cluster drop/preserve, replicate-weight reject, get_params/set_params, clone+repeat-fit idempotence, invalid value rejection, cluster_var=None cascade through survey-PSU injection, full-dummy vs within-transform HC2 divergence. 4 new R-parity tests in test_methodology_sun_abraham.py against sandwich/clubSandwich/fixest goldens. New R golden scenario `sun_abraham_two_cohort` in benchmarks/data/clubsandwich_cr2_golden.json (5 cohorts × 8 periods panel; pins classical_se, hc2_se, cr2_bm_singleton_se+dof, cr2_bm_unit_se+dof, sunab_hc1_event_study_e0_se). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-20T01:27:22Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 119db85403ae24f1ec605d3be69d90f5fb52b916

Overall Assessment

✅ Looks good

Affected method: Sun-Abraham interaction-weighted event study. The PR changes the variance/inference layer, not the IW estimand. On re-review, the previous P1 is resolved and I do not see any unmitigated P0/P1 issues in the changed code.

Executive Summary

The prior blocker is fixed: explicit cluster= now fails fast on both missing columns and missing cluster labels in diff_diff/sun_abraham.py:L630-L653, with regression coverage in tests/test_sun_abraham.py:L1881-L1927.
The estimator-level methodology changes are aligned with the Methodology Registry: full-dummy routing for classical/hc2/hc2_bm, CR2 Bell-McCaffrey contrast DOF for aggregated hc2_bm outputs, survey rejection for non-hc1, and the HC1 finite-sample-correction gap are documented in docs/methodology/REGISTRY.md:L977-L1048.
The earlier registry inconsistency is fixed: aggregated inference under vcov_type="hc2_bm" is now documented consistently in docs/methodology/REGISTRY.md:L1010-L1027 and docs/methodology/REGISTRY.md:L1070-L1080.
Added R-parity coverage is appropriately targeted: tests/test_methodology_sun_abraham.py:L63-L264 pins classical/HC2/CR2-BM cohort SEs and BM contrast DOF against R goldens.
Remaining SunAbraham follow-ups are properly tracked in TODO.md:L101-L105, so they are non-blocking.
I could not execute the test suite locally because pytest is not installed in this environment.

Methodology

Severity: P3 informational
Impact: The previous methodology blocker is resolved. SunAbraham.fit() now rejects malformed explicit clustering before cluster resolution, closing the silent wrong-inference path on non-survey clustered fits. See diff_diff/sun_abraham.py:L630-L653; regression coverage is in tests/test_sun_abraham.py:L1881-L1927.
Concrete fix: None.
Severity: P3 informational
Impact: The changed inference behavior is documented rather than silent. Full-dummy routing for classical/hc2/hc2_bm, CR2 BM contrast DOF on aggregated hc2_bm outputs, the HC1-vs-fixest finite-sample-correction gap, and survey rejection for non-hc1 are all recorded in docs/methodology/REGISTRY.md:L977-L1048 and exercised in tests/test_methodology_sun_abraham.py:L63-L264. Under the review policy, these documented deviations are not defects.
Concrete fix: None.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3 informational
Impact: Deferred follow-up work is properly tracked in TODO.md:L101-L105 (remaining standalone estimator threading, SunAbraham Conley support, HC1 harmonization). That tracking is sufficient and does not block approval.
Concrete fix: None.

Security

No findings.

Documentation/Tests

Severity: P3 informational
Impact: The documentation/test surface now matches the implementation more closely. The registry’s aggregated-inference description is consistent, and the new parity/regression coverage targets the high-risk parts of this PR in docs/methodology/REGISTRY.md:L1010-L1027, docs/methodology/REGISTRY.md:L1070-L1080, tests/test_methodology_sun_abraham.py:L63-L264, and tests/test_sun_abraham.py:L1881-L1927.
Concrete fix: None.
Reviewer note: Local execution was not possible because pytest is not installed in this environment.

… (R1 P0) Local codex R1 caught a P0: StackedDiD(vcov_type="hc2_bm") computed CR2 vcov correctly but never propagated the Bell-McCaffrey Satterthwaite DOF into safe_inference() calls. event_study_effects[h]['p_value']/['conf_int'] and overall_p_value/overall_conf_int silently fell back to normal-theory inference (df=None ⇒ scipy.norm), contradicting the registry contract. Fix mirrors the SunAbraham aggregated-inference pattern from PR igerber#472 (sun_abraham.py:997-1097). After solve_ols(), if vcov_type=="hc2_bm" and not on the survey replicate-refit path, build contrast matrix: - Per-event-time: unit vector at each interaction_indices[h] - Overall ATT: 1/K average across post-period interaction columns Call _compute_cr2_bm_contrast_dof(X, cluster_ids, bread, contrasts, weights=composed_weights). Apply per-event-time DOFs to event_study_effects inference, overall DOF to overall_* inference. Wrap in try/except so any rank-deficient or linalg failure emits a UserWarning and falls back to normal-theory (visible deviation, not silent). R fixture extended with the post-period-average ATT contrast DOF via Wald_test(constraints=row_avg, vcov=CR2, test="HTZ")$df_denom (mirrors PR igerber#465's MPD avg_att approach). New goldens at both cluster=unit and cluster=unit_subexp. Test additions / strengthening (addresses R1 P2 + P3): - test_hc2_bm_per_event_dof_matches_coef_test_df_satt_unit_cluster: uses brentq inversion of CI half-width to recover the DOF safe_inference actually used. If propagation failed (the R1 P0 bug), the inversion raises ValueError → test FAILS instead of silently skipping (replaces the prior `continue`-on-failure pattern which could vacuously pass). Hard-asserts validated_count == len(event_times). - test_hc2_bm_overall_att_dof_matches_wald_test_htz_unit_cluster (NEW): pins overall ATT DOF at atol=1e-6 against R Wald_test(HTZ)$df_denom. - test_hc2_bm_overall_att_dof_matches_wald_test_htz_unit_subexp_cluster (NEW): symmetric coverage at alternate cluster level. - Renamed CR1 → CR1S throughout docs/tests/REGISTRY for consistency (diff-diff's HC1+cluster uses Stata-style G/(G-1)*(n-1)/(n-p); plain CR1 omits the (n-1)/(n-p) term and diverges by ~1.4%). 192 tests pass (74 stacked + 19 wls_cr2 + 47 SA + 52 estimators_vcov_type). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Release notes consolidate 8 PRs since 3.4.0 (2026-05-19): Public-surface variance lifts: - SpilloverDiD survey_design on HC1/CR1 via Binder TSL (Wave E.1, igerber#468) - SpilloverDiD vcov_type=conley + survey_design via stratified-Conley on PSU totals (Wave E.2, igerber#474) + lag_cutoff>0 follow-up (igerber#477) - SunAbraham vcov_type ∈ {classical, hc1, hc2, hc2_bm} (Phase 1b 1/8, igerber#472) - WLS-CR2 Bell-McCaffrey gates lifted via clubSandwich port (igerber#475) Methodology-review-tracker promotions (mostly docs/tests): - PreTrendsPower R pretrends parity goldens (PR-C, igerber#471) - HAD methodology-review-tracker promotion (igerber#473) - ContinuousDiD methodology-review-tracker promotion (igerber#476) All changes additive; bit-equal defaults preserved across the affected estimators. No new estimators (patch-level per semver convention). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ath (Phase 1b 3/8) WooldridgeDiD now accepts `vcov_type` for the OLS path, mirroring the SunAbraham PR igerber#472 / StackedDiD PR igerber#479 pattern: - `hc1` (default) preserves bit-equal within-transform CR1 behavior - `hc2_bm` / `hc2` / `classical` auto-route to full-dummy saturated design (FWL doesn't preserve the hat matrix; HC2 leverage + BM DOF need the full FE projection). Matches `clubSandwich::vcovCR(lm(...), type="CR2") + coef_test()$df_Satt` at atol=1e-10 on the 6 R-parity tests in tests/test_methodology_wooldridge.py. - Bell-McCaffrey Satterthwaite DOF threaded into overall ATT inference via `_compute_cr2_bm_contrast_dof`; fail-closed (all-NaN) when DOF unavailable, per feedback_bm_contrast_dof_fail_closed. - One-way `hc2`/`classical` auto-drop the unit auto-cluster (one-way families don't compose with cluster_ids). Explicit `cluster="X"` + one-way raises at the linalg validator. - `method ∈ {logit, poisson}` + `vcov_type != "hc1"` rejected at `__init__` (GLM CR2-BM derivation deferred to follow-up TODO row). - `SurveyDesign` + `vcov_type != "hc1"` rejected at `fit()` (survey TSL overrides analytical sandwich). - `n_bootstrap > 0` + one-way + `cluster=None` rejected at `fit()` (bootstrap is intrinsically clustered). WooldridgeDiDResults gains `vcov_type`, `cluster_name`, `n_clusters` fields for downstream introspection. Third PR of the Phase 1b standalone-estimator threading initiative (5 PRs remaining). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber force-pushed the sa-vcov-type-phase1b-1 branch from c227c35 to 269e904 Compare May 20, 2026 00:06

igerber force-pushed the sa-vcov-type-phase1b-1 branch from 269e904 to d7bb9da Compare May 20, 2026 01:01

igerber force-pushed the sa-vcov-type-phase1b-1 branch from d7bb9da to 5996d43 Compare May 20, 2026 01:11

igerber force-pushed the sa-vcov-type-phase1b-1 branch from 5996d43 to 119db85 Compare May 20, 2026 01:21

igerber added the ready-for-ci Triggers CI test workflows label May 20, 2026

igerber merged commit 96965eb into main May 20, 2026
33 of 34 checks passed

igerber deleted the sa-vcov-type-phase1b-1 branch May 20, 2026 10:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread vcov_type through SunAbraham (Phase 1b 1/8)#472

Thread vcov_type through SunAbraham (Phase 1b 1/8)#472
igerber merged 1 commit into
mainfrom
sa-vcov-type-phase1b-1

igerber commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 19, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant