SpilloverDiD Wave E.3: SurveyDesign.subpopulation() full-design retention by igerber · Pull Request #482 · igerber/diff-diff

igerber · 2026-05-21T18:08:17Z

Summary

Wave E.3 closes the user-facing P3 limitation documented at REGISTRY.md:3249 since Wave E.1: SurveyDesign.subpopulation()-derived designs AND warn-and-drop fits now preserve the full-domain resolved survey design (n_psu / n_strata / df_survey / Binder TSL per-stratum centering all reflect the FULL domain rather than the post-finite_mask fit sample).
Documented synthesis (library-convention adoption, NOT new methodology): adopts the R survey::svyrecvar(subset()) zero-pad convention (Lumley 2010 §2.5) already established in imputation.py:2175-2183 and prep.py:1401-1432. Propagates the same convention to SpilloverDiD's Wave E.1 Binder TSL × Wave D Gardner GMM × Wave E.2/follow-up stratified-Conley + serial Bartlett meat.
New score_pad_mask kwarg on _compute_gmm_corrected_meat: gamma_hat/Psi built on survey_finite_mask = finite_mask & survey_weights > 0 (the R6 fix preserves FE drop-first column-space invariance against zero-weight subpop rows that pd.factorize compaction would otherwise reorder); Psi zero-padded back to full panel length inside the helper before kernel dispatch; kernel-dispatch arrays (cluster_ids, conley_coords, conley_time, conley_unit, resolved_survey) passed at FULL length so meat helpers see full-domain PSU/strata/centroid/time geometry.
count_mask invariant: res.n_obs / res.n_treated / res.n_control / res.n_far_away_obs + event-study per-cell n_obs all reflect survey_finite_mask on the survey path (matches the effective weighted estimation sample).

Methodology references (required if estimator / math changes)

Method name(s): SpilloverDiD survey-design analytical variance under Wave E.3 full-design retention (extending Wave E.1 Binder TSL + Wave D Gardner GMM + Wave E.2/follow-up stratified-Conley + serial Bartlett)
Paper / source link(s):
- Lumley, T. (2010). Complex Surveys: A Guide to Analysis Using R. §2.5 Domains and subpopulations. https://doi.org/10.1002/9780470580066
- In-library precedents: diff_diff/imputation.py:2175-2183 (PreTrendsImputation lead regression), diff_diff/prep.py:1401-1432 (DCDH cell variance)
- Inherited synthesis: Binder (1983), Gerber (2026) Prop 1, Butts (2021) §3.1, Gardner (2022) §4, Conley (1999), Newey-West (1987)
Any intentional deviations from the source: None. Library-convention adoption — Wave E.3 propagates an existing in-library zero-pad pattern to the SpilloverDiD survey path; no new methodology beyond aligning with R svyrecvar(subset()) semantics.

Validation

Tests added/updated: tests/test_spillover.py — new TestSpilloverDiDWaveE3SubpopulationFullDesign (14 tests) + TestSpilloverDiDWaveE3SubpopulationFullDesignEventStudy (5 tests). 316 spillover regression tests pass on both DIFF_DIFF_BACKEND=python and DIFF_DIFF_BACKEND=rust. Pre-existing Wave E.1 test_p2_finite_mask_forces_drop_under_survey assertion flipped from n_psu=8 (subset) to n_psu=10 (full domain) to reflect the new contract.
Backtest / simulation / notebook evidence: N/A — analytical methodology change.

Security / privacy

Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

…tion Closes the user-facing limitation documented at REGISTRY.md:3249 since Wave E.1: SurveyDesign.subpopulation()-derived designs and warn-and-drop fits now preserve the full-domain resolved survey design. n_psu / n_strata / df_survey / Binder TSL per-stratum centering reflect the FULL domain rather than the post-finite_mask fit sample. Documented synthesis (library-convention adoption, not new methodology): adopts the R survey::svyrecvar(subset()) zero-pad convention (Lumley 2010 §2.5) already established in imputation.py:2175-2183 and prep.py:1401-1432. Propagates the same convention to SpilloverDiD's Wave E.1 Binder TSL × Wave D Gardner GMM × Wave E.2/follow-up stratified-Conley + serial Bartlett meat. Mechanical realization (one new _compute_gmm_corrected_meat kwarg): the gamma_hat / Psi build stays on the SURVEY-FINITE-MASK subset of fit-sample inputs (= finite_mask & survey_weights > 0) so the drop-first stage-1 FE column space is INVARIANT to zero-weight subpop rows (critical because _build_butts_fe_design_csr re-factorizes via pd.factorize and drops the first unit/time code; including zero-weight rows would silently shift gamma_hat). _compute_gmm_corrected_meat gains a new optional kwarg score_pad_mask: when supplied, the helper zero-pads the survey-finite-mask Psi to full panel length AFTER construction but BEFORE kernel dispatch via Psi_padded[score_pad_mask] = Psi. Kernel-dispatch arrays (cluster_ids, conley_coords, conley_time, conley_unit, resolved_survey) are passed at FULL length so meat helpers see the full-domain PSU / strata / centroid / time geometry. The stage-2 OLS solve still runs on X_2_kept / y_tilde_fit (active sample); only the meat-helper boundary sees full-length arrays. count_mask invariant: top-level res.n_obs / res.n_treated / res.n_control / res.n_far_away_obs AND event-study event_study_meta["n_obs_per_col"] / att_dynamic.n_obs / event_study_effects[k]["n_obs"] / spillover_effects.n_obs all use count_mask (= survey_finite_mask on the survey path) so the reported counts match the effective weighted sample. Cross-surface n_psu consistency: top-level res.n_psu reads from len(resolved_survey_fit.weights) on the implicit-PSU branch (was int(finite_mask.sum())), so res.n_psu == res.survey_metadata.n_psu on weights-only / strata-only survey designs under warn-and-drop. A2 invariant (locked in _scratch/wave_e3_smoke.py): warn-and-drop and SurveyDesign.subpopulation() drops apply the same zero-pad mechanism — both produce identical meat output for identical row-level exclusions. Restrictions inherited: replicate-weight variance + subpopulation continues to raise NotImplementedError at the Wave E.1 gate. TwoStageDiD's analogous finite_mask + design-subset pattern at two_stage.py:567-601 is NOT yet adopted to Wave E.3 — separate parity follow-up tracked in TODO.md. Tests: 19 new tests in TestSpilloverDiDWaveE3SubpopulationFullDesign (+EventStudy mirror): pre-E.3 baseline parity via pinned goldens, n_psu cross-surface consistency on implicit-PSU branch, zero-pad mechanics via mock-spy, cluster-as-PSU + subpop parity, conley + lag>0 + subpop × {explicit PSU / cluster injection / weights-only NotImplementedError}, unit with BOTH zero weight AND no Omega_0 support, gamma_hat-build sample excludes zero-weight rows (R6 P1 regression), n_obs metadata excludes zero-weight rows, n_far_away_obs excludes zero-weight rows, warn-drop SE drift golden, ATT bit-equality under PSU-last-sort exclusion, exact event-study n_obs propagation across att_dynamic / event_study_effects / spillover_effects under treated-PSU exclusion, event-study on both is_staggered branches with analytical + conley+lag variants. Pre-existing Wave E.1 test_p2_finite_mask_forces_drop_under_survey assertion flipped from n_psu=8 (subset) to n_psu=10 (full domain) to reflect the new contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-05-21T18:16:05Z

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

The Wave E.3 estimator changes match the Methodology Registry: gamma_hat/Psi are built on survey_finite_mask, score_pad_mask zero-pads only at the meat boundary, and full-design PSU/strata/df bookkeeping is retained as documented in docs/methodology/REGISTRY.md:L3270-L3352, diff_diff/spillover.py:L2876-L3332, and diff_diff/two_stage.py:L211-L355.
I did not find new inference anti-patterns: the touched paths still use safe_inference(), and I did not see new partial-NaN or inline t = effect / se regressions in the modified estimator code.
The changed count surfaces appear internally consistent with the new active-sample contract: n_obs, n_treated, n_control, n_far_away_obs, and event-study n_obs_per_col all now derive from the same count_mask/survey_finite_mask logic in diff_diff/spillover.py:L3019-L3035 and diff_diff/spillover.py:L3387-L3566.
Test coverage in the diff is targeted at the new risk surfaces: subpopulation zero-padding, warn-drop retention, explicit PSU vs cluster injection, no-effective-PSU fail-closed Conley, and both event-study branches are all covered in tests/test_spillover.py:L7521-L8555.
Remaining follow-ups are explicitly tracked in TODO.md, so they are non-blocking for this PR. See TODO.md:L148-L149.

Methodology

Severity P3 — Informational. Impact: none. Concrete fix: none. The only notable source-material extension is treating warn-and-drop rows the same as subpopulation-excluded rows for zero-padding/full-design retention, but that behavior is explicitly documented in the registry, so it is not a defect. See docs/methodology/REGISTRY.md:L3270-L3352.

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity P3 — Informational. Impact: low drift risk only. Concrete fix: later extract the duplicated raw_w_for_meta construction into the shared helper already tracked in TODO.md:L149.

Tech Debt

Severity P3 — Informational. Impact: SpilloverDiD now has Wave E.3 parity, but TwoStageDiD still does not on the analogous subpopulation full-design-retention path. Concrete fix: implement the same zero-pad/full-design pattern in TwoStageDiD.fit() and add the targeted regression called out in TODO.md:L148.

Security

No findings.

Documentation/Tests

No findings in the diff. The docs and registry updates align with the shipped behavior, and the test additions target the right edge cases. I could not execute the suite locally because this review environment is missing pytest and numpy, so this review is static-only.

igerber · 2026-05-21T18:36:32Z

/ai-review

github-actions · 2026-05-21T18:45:47Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 3f075ba34a70101f20ee99a355659f83488c4019

Overall Assessment

⚠️ Needs changes — one unmitigated P1 finding.

Executive Summary

[Newly identified] The Wave E.3 survey path now defines the effective estimation sample as survey_finite_mask, but treated-support validation still runs on full-domain D_it, so a SurveyDesign.subpopulation() that zeros out every treated row can pass the front door and only fail later via generic rank-deficiency handling.
The core Wave E.3 methodology changes themselves look aligned with the registry: gamma_hat/Psi are built on survey_finite_mask, score_pad_mask zero-pads at the meat boundary, and full-design PSU/strata bookkeeping is retained.
I did not find new inline-inference or partial-NaN regressions in the modified estimator code.
The remaining follow-ups in this diff are properly tracked in TODO.md and are non-blocking.
Static-only review: this environment is missing pytest and pandas, so I could not execute the new tests.

Path to Approval

Add an active-sample treatment-support check after survey_finite_mask is computed in SpilloverDiD.fit; raise ValueError when the weighted subpopulation has zero treated observations (D_it[survey_finite_mask].sum() == 0).
Add a regression in tests/test_spillover.py where SurveyDesign.subpopulation() excludes all treated rows and assert the new front-door error.

Methodology

Severity P1 [Newly identified]. Impact: Wave E.3’s registry note defines the survey estimating sample as finite_mask & (survey_weights > 0), but SpilloverDiD.fit() still checks treatment support on full-domain D_it.sum() before that mask is applied. A subpopulation can therefore remove every treated observation from the actual estimating sample, yet the fit still passes the identification guard and falls through to the generic rank-deficient OLS path instead of failing with a clear assumption error. Concrete fix: validate treated support on survey_finite_mask (or the later count_mask) once the survey-domain mask is known, and add a regression for the all-treated-removed subpopulation case. References: docs/methodology/REGISTRY.md:L3332-L3335, diff_diff/spillover.py:L2550-L2566, diff_diff/spillover.py:L2727-L2741, diff_diff/spillover.py:L3512-L3534, diff_diff/linalg.py:L1011-L1048.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity P3. Impact: TwoStageDiD still lacks the analogous Wave E.3 full-design-retention path, and the metadata-resolution helper remains duplicated, but both items are explicitly tracked and are non-blocking for this PR. Concrete fix: none in this PR; keep the follow-ups in TODO.md. References: TODO.md:L148-L149.

Security

No findings.

Documentation/Tests

No additional findings. The Wave E.3 test additions are broad, but they do not cover the no-treated-active-subpopulation case above. Static-only review: pytest and pandas are unavailable in this environment, so I could not run the suite. References: tests/test_spillover.py:L7526-L8680.

…ubpop Issue: the front-door treatment-support gate at spillover.py:2556 runs on full-domain D_it.sum() BEFORE survey_finite_mask is computed. A SurveyDesign.subpopulation() mask that zeros out every treated row passes that gate (because full-domain D_it.sum() > 0) but lands on a rank-deficient stage-2 OLS solve downstream, surfacing as a generic rank-deficiency warning instead of a clean assumption-violation error. Fix at spillover.py:~2745 (immediately after survey_finite_mask is built): raise ValueError when resolved_survey is not None AND D_it[survey_finite_mask].sum() == 0. The error message names the survey_finite_mask construction + Wave E.3 contract + suggests remediation (expand the mask or verify the weight column). This matches the documented R svyrecvar(subset()) convention — domain estimation requires the domain to contain identifying variation. Added test_q4_subpop_excludes_all_treated_raises: subpopulation that excludes all ever-treated units raises ValueError matching "removes EVERY treated observation". Pre-fix would have fallen through to rank-deficient OLS. All 317 spillover tests pass + lint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-05-21T18:59:08Z

🔁 AI review rerun (requested by @igerber)

Head SHA: d2d660e2e9f7c17c4eb30b55c7fc0f9d6f02e569

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Re-review outcome: the prior P1 is addressed. SpilloverDiD.fit() now checks treated support on the active survey sample (survey_finite_mask) and raises before falling through to generic rank-deficiency handling when a subpopulation removes all treated rows. diff_diff/spillover.py:L2727-L2766, tests/test_spillover.py:L8352-L8390
Methodology affected: SpilloverDiD’s survey-design analytical variance path (Wave E.1 Binder TSL + Wave D Gardner GMM + Wave E.2/E.2-follow-up survey Conley) plus the shared _compute_gmm_corrected_meat() helper. The Wave E.3 zero-pad implementation matches the newly documented REGISTRY contract and the cited Lumley §2.5 domain-estimation convention. diff_diff/spillover.py:L3106-L3358, diff_diff/two_stage.py:L211-L355, docs/methodology/REGISTRY.md:L3328-L3352
I did not find new inline-inference or partial-NaN regressions in the modified estimator path; changed inference still routes through safe_inference(). diff_diff/spillover.py:L783-L840, diff_diff/spillover.py:L3484-L3503
Remaining follow-up work is properly tracked in TODO.md and is non-blocking for this PR. TODO.md:L148-L149
Static-only review: this environment lacks numpy, pandas, scipy, and pytest, so I could not execute the new tests.

Methodology

No findings. The affected method is the Wave E.3 SpilloverDiD survey-variance/full-design-retention path, and the code now implements the documented “build on survey_finite_mask, then zero-pad at the meat boundary” contract without an undocumented deviation from the REGISTRY entry. diff_diff/spillover.py:L2727-L2766, diff_diff/spillover.py:L3106-L3358, diff_diff/two_stage.py:L330-L478, docs/methodology/REGISTRY.md:L3328-L3352

Code Quality

No findings. The modified paths continue to centralize inference through safe_inference() rather than adding new inline t-stat / p-value / CI computation. diff_diff/spillover.py:L783-L840, diff_diff/spillover.py:L3484-L3503

Performance

No findings from static review. The new zero-pad step is localized and linear in the number of rows. diff_diff/two_stage.py:L330-L355

Maintainability

No findings. score_pad_mask is optional and backward-compatible, so the helper change is contained to the SpilloverDiD caller in this PR. diff_diff/two_stage.py:L62-L80, diff_diff/spillover.py:L3340-L3358

Tech Debt

Severity P3. Impact: TwoStageDiD still lacks the analogous Wave E.3 full-design-retention path, so survey-subpopulation behavior remains intentionally divergent across estimators. Concrete fix: leave this as the already-tracked parity follow-up in TODO.md. TODO.md:L148
Severity P3. Impact: compute_survey_metadata(...) assembly remains duplicated in SpilloverDiD after the Wave E.3 edits, which raises drift risk but does not change behavior today. Concrete fix: follow the tracked helper-extraction item in TODO.md. TODO.md:L149

Security

No findings.

Documentation/Tests

No findings. The docs/registry updates are consistent with the shipped behavior, and the re-review target is covered by a dedicated regression for the “subpopulation excludes all treated rows” case. docs/methodology/REGISTRY.md:L3328-L3352, docs/api/spillover.rst:L412-L483, tests/test_spillover.py:L8352-L8390
Static-only limitation: I could not run the suite in this environment because numpy, pandas, scipy, and pytest are unavailable.

…n survey design Mechanical transfer of the Wave E.3 SpilloverDiD invariant (PR igerber#482, merge 24de906) to TwoStageDiD. When the always-treated handler drops units from the OLS sample, the resolved survey design retains its FULL-DOMAIN n_psu / n_strata / df_survey / strata / fpc / psu arrays instead of being subsetted via replace(...). Per-cluster scores aggregate at fit-length then zero-pad onto the full-domain unique-PSU list via two new optional kwargs on _compute_gmm_variance: score_pad_mask and cluster_ids_full. PSUs containing only always-treated rows get zero score rows but still count toward G_full for n_psu / df_survey accounting. Documented synthesis (library-convention adoption): adopts the canonical R survey::svyrecvar(subset()) convention (Lumley 2010 §2.5), already established at imputation.py:2175-2183 (PreTrendsImputation), prep.py:1401-1432 (DCDH cell variance), and spillover.py (PR igerber#482). Implementation: - diff_diff/two_stage.py: delete L1485-1525 design-subset block; promote keep_mask to fit()-level scope (always defined; defaults all-True); cluster injection sources cluster_ids_raw from FULL-DOMAIN data[cluster] (not post-drop df[cluster]) so _inject_cluster_as_psu's zip against resolved_survey.strata stays length-aligned; df["_survey_cluster"] aligned to post-drop length via resolved_survey.psu[keep_mask.values]; _compute_gmm_variance + 3 inner _stage2_* methods gain score_pad_mask / cluster_ids_full kwargs; zero-pad expansion after per-cluster aggregation; strata/fpc obs_idx lookups use cluster_ids_full under padding; G < 2 unidentified gate fires on G_full when padding active. - diff_diff/two_stage.py: _refit_ts callback subsets each replicate weight w_r via keep_mask.values before threading into _fit_untreated_model and _stage2_*, matching the keep_mask-subsetting applied to survey_weights in the main fit (otherwise solve_ols rejects the length mismatch and compute_replicate_refit_variance swallows the ValueError so replicate inference NaNs out). - Always-treated warning text updated to reflect the new contract: weights are subsetted for OLS, design retained for variance. - Replicate variance + always-treated: existing path now works correctly (score_pad_mask_arg stays None on _uses_replicate_ts paths; per-replicate refit handles resampling separately). Tests (tests/test_two_stage.py): - New TestTwoStageDiDWaveE3ParityAlwaysTreated class with 8 tests: (a) no-always-treated baseline, (b) full-domain df_survey preservation under drop, (c) full-domain n_psu reporting, (d) per-cluster zero-pad mock-spy on _compute_stratified_meat_from_psu_scores, (e) subpopulation + always-treated composition, (f) cluster-as-PSU + always-treated, (g) no-survey path unchanged, (h) PSU entirely-always-treated. Tests (tests/test_replicate_weight_expansion.py): - Strengthened test_two_stage_always_treated to assert finite overall_se / overall_p_value / overall_conf_int (was only asserting finite ATT, missing the replicate-SE regression class). - New test_two_stage_always_treated_event_study_and_group_replicate exercising the event-study + group replicate refit branches end-to-end under always-treated drop with aggregate="all"; asserts finite se + p_value + conf_int on non-reference horizons and cohorts. Docs: - docs/methodology/REGISTRY.md: TwoStageDiD section gains "documented synthesis — Wave E.3 parity" note; SpilloverDiD Wave E.3 follow-up note updated to mark TwoStageDiD parity as shipped. - CHANGELOG.md: Unreleased Added entry leading with documented-synthesis framing. - TODO.md: drop parity follow-up row. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label May 21, 2026

igerber merged commit fb1c6c9 into main May 21, 2026
33 of 34 checks passed

igerber deleted the spillover-conley-wave-e3-subpopulation branch May 21, 2026 21:32

igerber mentioned this pull request May 22, 2026

TwoStageDiD Wave E.3 parity: always-treated full-design retention #485

Merged

This was referenced May 25, 2026

T23 SpilloverDiD tutorial: TVA-style ~40% understatement walkthrough #493

Merged

ConleySpatialHAC methodology-review-tracker promotion + Bertanha-Imbens 2014 citation correction #496

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpilloverDiD Wave E.3: SurveyDesign.subpopulation() full-design retention#482

SpilloverDiD Wave E.3: SurveyDesign.subpopulation() full-design retention#482
igerber merged 2 commits into
mainfrom
spillover-conley-wave-e3-subpopulation

igerber commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

igerber commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 21, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

igerber commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant