Thread vcov_type through SunAbraham (Phase 1b 1/8)#472
Conversation
|
Overall Assessment I cross-checked the PR against the Sun-Abraham registry entry and the in-code Sun & Abraham references. I did not find an untracked variance-formula or identification mismatch in the implemented analytical paths, and the HC1/fixest gap plus the Conley deferral are both documented and tracked. The blocking issue is narrower: the new Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
c227c35 to
269e904
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment The re-review resolves the two prior P1 test-coverage findings, but there is still one methodology-significant inference bug in the new Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
269e904 to
d7bb9da
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Affected method: Sun-Abraham interaction-weighted event study. The PR correctly fixes the prior Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
d7bb9da to
5996d43
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Affected method: Sun-Abraham interaction-weighted event study. The paper’s load-bearing point is the contamination of staggered-TWFE lead/lag coefficients under heterogeneous treatment effects and the interaction-weighted alternative that avoids it; this PR keeps that estimand intact and changes only the variance/inference layer around it. (ideas.repec.org) Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
…1b 1/8)
Adds `vcov_type` parameter to `SunAbraham`, mirroring the DiD/MPD/TWFE
chain from Phase 1a. Defaults to "hc1" (preserves prior bit-equal
behavior - SA historically hard-coded HC1). First PR of Phase 1b, which
threads `vcov_type` through the 8 standalone estimators that expose
`cluster=` but not yet `vcov_type=`.
Methodology: when `vcov_type ∈ {classical, hc2, hc2_bm}`,
`_fit_saturated_regression` auto-routes to a full-dummy saturated design
(intercept + cohort × event-time interactions + unit dummies + time
dummies). FWL preserves cohort coefficients but not the hat matrix, so
HC2 leverage and Bell-McCaffrey DOF must be computed on the full FE
projection. Mirrors TWFE Gate 1 from PR #469. Empirically matches
`lm() + sandwich::vcovHC(type="HC2")` and
`clubSandwich::vcovCR(..., type="CR2") + coef_test()$df_Satt` at
atol=1e-10 (pinned in tests/test_methodology_sun_abraham.py).
Scope limits: replicate-weight survey + hc2/hc2_bm raises
NotImplementedError (per-replicate full-dummy refit not implemented).
`vcov_type="conley"` rejected at __init__ with a deferral message
(threading conley_* params is a follow-up). Auto-cluster-at-unit is
dropped when the user opts into explicit `vcov_type="hc2"` or
`"classical"` (both one-way only); preserved for `"hc1"` and
`"hc2_bm"`.
Documented deviation from R: SA's within-transform HC1 SE differs from
`fixest::sunab()` by ~1-2% on typical panel sizes (different (n-k)
finite-sample correction). The IW aggregation is otherwise identical;
parity at atol=5e-3.
Test surface: 15 new behavioral tests in test_sun_abraham.py covering
default-vs-explicit bit-equality, all four vcov_type values
finite-and-distinct, auto-cluster drop/preserve, replicate-weight
reject, get_params/set_params, clone+repeat-fit idempotence, invalid
value rejection, cluster_var=None cascade through survey-PSU injection,
full-dummy vs within-transform HC2 divergence. 4 new R-parity tests in
test_methodology_sun_abraham.py against sandwich/clubSandwich/fixest
goldens.
New R golden scenario `sun_abraham_two_cohort` in
benchmarks/data/clubsandwich_cr2_golden.json (5 cohorts × 8 periods
panel; pins classical_se, hc2_se, cr2_bm_singleton_se+dof,
cr2_bm_unit_se+dof, sunab_hc1_event_study_e0_se).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5996d43 to
119db85
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Affected method: Sun-Abraham interaction-weighted event study. The PR changes the variance/inference layer, not the IW estimand. On re-review, the previous P1 is resolved and I do not see any unmitigated P0/P1 issues in the changed code. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
… (R1 P0) Local codex R1 caught a P0: StackedDiD(vcov_type="hc2_bm") computed CR2 vcov correctly but never propagated the Bell-McCaffrey Satterthwaite DOF into safe_inference() calls. event_study_effects[h]['p_value']/['conf_int'] and overall_p_value/overall_conf_int silently fell back to normal-theory inference (df=None ⇒ scipy.norm), contradicting the registry contract. Fix mirrors the SunAbraham aggregated-inference pattern from PR igerber#472 (sun_abraham.py:997-1097). After solve_ols(), if vcov_type=="hc2_bm" and not on the survey replicate-refit path, build contrast matrix: - Per-event-time: unit vector at each interaction_indices[h] - Overall ATT: 1/K average across post-period interaction columns Call _compute_cr2_bm_contrast_dof(X, cluster_ids, bread, contrasts, weights=composed_weights). Apply per-event-time DOFs to event_study_effects inference, overall DOF to overall_* inference. Wrap in try/except so any rank-deficient or linalg failure emits a UserWarning and falls back to normal-theory (visible deviation, not silent). R fixture extended with the post-period-average ATT contrast DOF via Wald_test(constraints=row_avg, vcov=CR2, test="HTZ")$df_denom (mirrors PR igerber#465's MPD avg_att approach). New goldens at both cluster=unit and cluster=unit_subexp. Test additions / strengthening (addresses R1 P2 + P3): - test_hc2_bm_per_event_dof_matches_coef_test_df_satt_unit_cluster: uses brentq inversion of CI half-width to recover the DOF safe_inference actually used. If propagation failed (the R1 P0 bug), the inversion raises ValueError → test FAILS instead of silently skipping (replaces the prior `continue`-on-failure pattern which could vacuously pass). Hard-asserts validated_count == len(event_times). - test_hc2_bm_overall_att_dof_matches_wald_test_htz_unit_cluster (NEW): pins overall ATT DOF at atol=1e-6 against R Wald_test(HTZ)$df_denom. - test_hc2_bm_overall_att_dof_matches_wald_test_htz_unit_subexp_cluster (NEW): symmetric coverage at alternate cluster level. - Renamed CR1 → CR1S throughout docs/tests/REGISTRY for consistency (diff-diff's HC1+cluster uses Stata-style G/(G-1)*(n-1)/(n-p); plain CR1 omits the (n-1)/(n-p) term and diverges by ~1.4%). 192 tests pass (74 stacked + 19 wls_cr2 + 47 SA + 52 estimators_vcov_type). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Release notes consolidate 8 PRs since 3.4.0 (2026-05-19): Public-surface variance lifts: - SpilloverDiD survey_design on HC1/CR1 via Binder TSL (Wave E.1, igerber#468) - SpilloverDiD vcov_type=conley + survey_design via stratified-Conley on PSU totals (Wave E.2, igerber#474) + lag_cutoff>0 follow-up (igerber#477) - SunAbraham vcov_type ∈ {classical, hc1, hc2, hc2_bm} (Phase 1b 1/8, igerber#472) - WLS-CR2 Bell-McCaffrey gates lifted via clubSandwich port (igerber#475) Methodology-review-tracker promotions (mostly docs/tests): - PreTrendsPower R pretrends parity goldens (PR-C, igerber#471) - HAD methodology-review-tracker promotion (igerber#473) - ContinuousDiD methodology-review-tracker promotion (igerber#476) All changes additive; bit-equal defaults preserved across the affected estimators. No new estimators (patch-level per semver convention). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ath (Phase 1b 3/8) WooldridgeDiD now accepts `vcov_type` for the OLS path, mirroring the SunAbraham PR igerber#472 / StackedDiD PR igerber#479 pattern: - `hc1` (default) preserves bit-equal within-transform CR1 behavior - `hc2_bm` / `hc2` / `classical` auto-route to full-dummy saturated design (FWL doesn't preserve the hat matrix; HC2 leverage + BM DOF need the full FE projection). Matches `clubSandwich::vcovCR(lm(...), type="CR2") + coef_test()$df_Satt` at atol=1e-10 on the 6 R-parity tests in tests/test_methodology_wooldridge.py. - Bell-McCaffrey Satterthwaite DOF threaded into overall ATT inference via `_compute_cr2_bm_contrast_dof`; fail-closed (all-NaN) when DOF unavailable, per feedback_bm_contrast_dof_fail_closed. - One-way `hc2`/`classical` auto-drop the unit auto-cluster (one-way families don't compose with cluster_ids). Explicit `cluster="X"` + one-way raises at the linalg validator. - `method ∈ {logit, poisson}` + `vcov_type != "hc1"` rejected at `__init__` (GLM CR2-BM derivation deferred to follow-up TODO row). - `SurveyDesign` + `vcov_type != "hc1"` rejected at `fit()` (survey TSL overrides analytical sandwich). - `n_bootstrap > 0` + one-way + `cluster=None` rejected at `fit()` (bootstrap is intrinsically clustered). WooldridgeDiDResults gains `vcov_type`, `cluster_name`, `n_clusters` fields for downstream introspection. Third PR of the Phase 1b standalone-estimator threading initiative (5 PRs remaining). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
vcov_type ∈ {classical, hc1, hc2, hc2_bm}throughSunAbraham, mirroring the DiD/MPD/TWFE chain from Phase 1a (PR Lift Gate 1: HC2/HC2-BM for TwoWayFixedEffects via full-dummy auto-route #469 et al). Defaults to"hc1"— preserves prior behavior bit-equally (SA historically hard-coded HC1).vcov_type ∈ {classical, hc2, hc2_bm},_fit_saturated_regressionauto-routes to a full-dummy saturated design (intercept + cohort × event-time interactions + unit dummies + time dummies). FWL preserves cohort coefficients but not the hat matrix — HC2 leverage and Bell-McCaffrey Satterthwaite DOF require the full FE projection; classical also routes through full-dummy so the(n-k)finite-sample correction matches R'slm()interpretation. Same Part B surgery shape as TWFE Gate 1 (PR Lift Gate 1: HC2/HC2-BM for TwoWayFixedEffects via full-dummy auto-route #469).hc1keeps the within-transform path (cluster-robust HC1 does not depend on the hat matrix; matchesfixest::sunab(cluster=~unit)convention).hc2,classical); preserved forhc1andhc2_bm(which routes to CR2-BM at unit).SurveyDesign(any kind — analytical weights / stratified / PSU / replicate-weight) combined withvcov_type ∈ {classical, hc2, hc2_bm}raisesNotImplementedError: the survey TSL (or replicate-weight refit) variance overrides the analytical sandwich family, AND the auto-cluster guard for one-way families would silently downgrade unit-level PSUs to per-observation PSUs. Usevcov_type="hc1"(default) for survey designs.vcov_type="conley"rejected at__init__with a deferral message (TODO row tracks the threading needed forconley_*params on the saturated regression call).vcov_typepropagated toSunAbrahamResults.vcov_typefor downstream introspection.Methodology references (required if estimator / math changes)
fixest::sunab()by ~1-2% (~2e-3 absolute) due to a different(n-k)count: fixest counts absorbed FE ink_total; SA'ssolve_olscounts only within-transformed columns. The IW aggregation step is otherwise identical. Documented indocs/methodology/REGISTRY.mdSunAbraham section, pinned atatol=5e-3intests/test_methodology_sun_abraham.py, tracked in TODO.md for follow-up harmonization.Validation
tests/test_sun_abraham.py— 17 new behavioral tests inTestSunAbrahamVcovType(allvcov_typevalues finite-and-distinct, auto-cluster drop/preserve, replicate/survey rejects,get_params/set_params+_vcov_type_explicitrefresh, clone+repeat-fit idempotence, invalid-value rejection,vcov_typepropagated toSunAbrahamResults,n_psu+df_surveyregression for survey path, full-dummy-vs-within-transform HC2 divergence)tests/test_methodology_sun_abraham.py— NEW file, 5 R-parity tests: classical / hc2 / hc2_bm cohort SE atatol=1e-10vslm()+sandwich/clubSandwich, BM Satterthwaite DOF (singleton + cluster=unit) atatol=1e-10, HC1 event-study e=0 vsfixest::sunab(cluster=~unit)atatol=5e-3(documented deviation)benchmarks/R/generate_clubsandwich_golden.R— newsun_abraham_two_cohortscenario (5-cohort × 8-period balanced panel; saturated full-dummylm()+sandwich::vcovHC+clubSandwich::vcovCRat unit + singleton-cluster +fixest::sunabevent-study e=0). JSON golden regenerated.solve_olsfull-dummy vs Rlm()+vcovHC/vcovCRatatol=1e-12to1e-15before any source edit (perfeedback_r_source_smoke_test_before_implementing.md)./ai-review-local --backend codexuntil ✅ clean — only P3 informational items remain (HC1 finite-sample-correction deviation + Conley deferral, both tracked in TODO.md).Security / privacy
Generated with Claude Code