SpilloverDiD Wave E.3: SurveyDesign.subpopulation() full-design retention#482
Merged
Conversation
…tion
Closes the user-facing limitation documented at REGISTRY.md:3249 since
Wave E.1: SurveyDesign.subpopulation()-derived designs and warn-and-drop
fits now preserve the full-domain resolved survey design. n_psu /
n_strata / df_survey / Binder TSL per-stratum centering reflect the
FULL domain rather than the post-finite_mask fit sample.
Documented synthesis (library-convention adoption, not new methodology):
adopts the R survey::svyrecvar(subset()) zero-pad convention (Lumley
2010 §2.5) already established in imputation.py:2175-2183 and
prep.py:1401-1432. Propagates the same convention to SpilloverDiD's
Wave E.1 Binder TSL × Wave D Gardner GMM × Wave E.2/follow-up
stratified-Conley + serial Bartlett meat.
Mechanical realization (one new _compute_gmm_corrected_meat kwarg):
the gamma_hat / Psi build stays on the SURVEY-FINITE-MASK subset of
fit-sample inputs (= finite_mask & survey_weights > 0) so the
drop-first stage-1 FE column space is INVARIANT to zero-weight subpop
rows (critical because _build_butts_fe_design_csr re-factorizes via
pd.factorize and drops the first unit/time code; including zero-weight
rows would silently shift gamma_hat). _compute_gmm_corrected_meat
gains a new optional kwarg score_pad_mask: when supplied, the helper
zero-pads the survey-finite-mask Psi to full panel length AFTER
construction but BEFORE kernel dispatch via Psi_padded[score_pad_mask]
= Psi. Kernel-dispatch arrays (cluster_ids, conley_coords, conley_time,
conley_unit, resolved_survey) are passed at FULL length so meat
helpers see the full-domain PSU / strata / centroid / time geometry.
The stage-2 OLS solve still runs on X_2_kept / y_tilde_fit (active
sample); only the meat-helper boundary sees full-length arrays.
count_mask invariant: top-level res.n_obs / res.n_treated /
res.n_control / res.n_far_away_obs AND event-study
event_study_meta["n_obs_per_col"] / att_dynamic.n_obs /
event_study_effects[k]["n_obs"] / spillover_effects.n_obs all use
count_mask (= survey_finite_mask on the survey path) so the reported
counts match the effective weighted sample.
Cross-surface n_psu consistency: top-level res.n_psu reads from
len(resolved_survey_fit.weights) on the implicit-PSU branch (was
int(finite_mask.sum())), so res.n_psu == res.survey_metadata.n_psu
on weights-only / strata-only survey designs under warn-and-drop.
A2 invariant (locked in _scratch/wave_e3_smoke.py): warn-and-drop and
SurveyDesign.subpopulation() drops apply the same zero-pad mechanism —
both produce identical meat output for identical row-level exclusions.
Restrictions inherited: replicate-weight variance + subpopulation
continues to raise NotImplementedError at the Wave E.1 gate.
TwoStageDiD's analogous finite_mask + design-subset pattern at
two_stage.py:567-601 is NOT yet adopted to Wave E.3 — separate parity
follow-up tracked in TODO.md.
Tests: 19 new tests in TestSpilloverDiDWaveE3SubpopulationFullDesign
(+EventStudy mirror): pre-E.3 baseline parity via pinned goldens,
n_psu cross-surface consistency on implicit-PSU branch, zero-pad
mechanics via mock-spy, cluster-as-PSU + subpop parity, conley +
lag>0 + subpop × {explicit PSU / cluster injection / weights-only
NotImplementedError}, unit with BOTH zero weight AND no Omega_0
support, gamma_hat-build sample excludes zero-weight rows (R6 P1
regression), n_obs metadata excludes zero-weight rows, n_far_away_obs
excludes zero-weight rows, warn-drop SE drift golden, ATT bit-equality
under PSU-last-sort exclusion, exact event-study n_obs propagation
across att_dynamic / event_study_effects / spillover_effects under
treated-PSU exclusion, event-study on both is_staggered branches with
analytical + conley+lag variants. Pre-existing Wave E.1
test_p2_finite_mask_forces_drop_under_survey assertion flipped from
n_psu=8 (subset) to n_psu=10 (full domain) to reflect the new contract.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Owner
Author
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Path to Approval
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…ubpop Issue: the front-door treatment-support gate at spillover.py:2556 runs on full-domain D_it.sum() BEFORE survey_finite_mask is computed. A SurveyDesign.subpopulation() mask that zeros out every treated row passes that gate (because full-domain D_it.sum() > 0) but lands on a rank-deficient stage-2 OLS solve downstream, surfacing as a generic rank-deficiency warning instead of a clean assumption-violation error. Fix at spillover.py:~2745 (immediately after survey_finite_mask is built): raise ValueError when resolved_survey is not None AND D_it[survey_finite_mask].sum() == 0. The error message names the survey_finite_mask construction + Wave E.3 contract + suggests remediation (expand the mask or verify the weight column). This matches the documented R svyrecvar(subset()) convention — domain estimation requires the domain to contain identifying variation. Added test_q4_subpop_excludes_all_treated_raises: subpopulation that excludes all ever-treated units raises ValueError matching "removes EVERY treated observation". Pre-fix would have fallen through to rank-deficient OLS. All 317 spillover tests pass + lint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
HanomicsIMF
pushed a commit
to HanomicsIMF/diff-diff
that referenced
this pull request
May 22, 2026
…n survey design Mechanical transfer of the Wave E.3 SpilloverDiD invariant (PR igerber#482, merge 24de906) to TwoStageDiD. When the always-treated handler drops units from the OLS sample, the resolved survey design retains its FULL-DOMAIN n_psu / n_strata / df_survey / strata / fpc / psu arrays instead of being subsetted via replace(...). Per-cluster scores aggregate at fit-length then zero-pad onto the full-domain unique-PSU list via two new optional kwargs on _compute_gmm_variance: score_pad_mask and cluster_ids_full. PSUs containing only always-treated rows get zero score rows but still count toward G_full for n_psu / df_survey accounting. Documented synthesis (library-convention adoption): adopts the canonical R survey::svyrecvar(subset()) convention (Lumley 2010 §2.5), already established at imputation.py:2175-2183 (PreTrendsImputation), prep.py:1401-1432 (DCDH cell variance), and spillover.py (PR igerber#482). Implementation: - diff_diff/two_stage.py: delete L1485-1525 design-subset block; promote keep_mask to fit()-level scope (always defined; defaults all-True); cluster injection sources cluster_ids_raw from FULL-DOMAIN data[cluster] (not post-drop df[cluster]) so _inject_cluster_as_psu's zip against resolved_survey.strata stays length-aligned; df["_survey_cluster"] aligned to post-drop length via resolved_survey.psu[keep_mask.values]; _compute_gmm_variance + 3 inner _stage2_* methods gain score_pad_mask / cluster_ids_full kwargs; zero-pad expansion after per-cluster aggregation; strata/fpc obs_idx lookups use cluster_ids_full under padding; G < 2 unidentified gate fires on G_full when padding active. - diff_diff/two_stage.py: _refit_ts callback subsets each replicate weight w_r via keep_mask.values before threading into _fit_untreated_model and _stage2_*, matching the keep_mask-subsetting applied to survey_weights in the main fit (otherwise solve_ols rejects the length mismatch and compute_replicate_refit_variance swallows the ValueError so replicate inference NaNs out). - Always-treated warning text updated to reflect the new contract: weights are subsetted for OLS, design retained for variance. - Replicate variance + always-treated: existing path now works correctly (score_pad_mask_arg stays None on _uses_replicate_ts paths; per-replicate refit handles resampling separately). Tests (tests/test_two_stage.py): - New TestTwoStageDiDWaveE3ParityAlwaysTreated class with 8 tests: (a) no-always-treated baseline, (b) full-domain df_survey preservation under drop, (c) full-domain n_psu reporting, (d) per-cluster zero-pad mock-spy on _compute_stratified_meat_from_psu_scores, (e) subpopulation + always-treated composition, (f) cluster-as-PSU + always-treated, (g) no-survey path unchanged, (h) PSU entirely-always-treated. Tests (tests/test_replicate_weight_expansion.py): - Strengthened test_two_stage_always_treated to assert finite overall_se / overall_p_value / overall_conf_int (was only asserting finite ATT, missing the replicate-SE regression class). - New test_two_stage_always_treated_event_study_and_group_replicate exercising the event-study + group replicate refit branches end-to-end under always-treated drop with aggregate="all"; asserts finite se + p_value + conf_int on non-reference horizons and cohorts. Docs: - docs/methodology/REGISTRY.md: TwoStageDiD section gains "documented synthesis — Wave E.3 parity" note; SpilloverDiD Wave E.3 follow-up note updated to mark TwoStageDiD parity as shipped. - CHANGELOG.md: Unreleased Added entry leading with documented-synthesis framing. - TODO.md: drop parity follow-up row. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced May 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SurveyDesign.subpopulation()-derived designs AND warn-and-drop fits now preserve the full-domain resolved survey design (n_psu/n_strata/df_survey/ Binder TSL per-stratum centering all reflect the FULL domain rather than the post-finite_maskfit sample).survey::svyrecvar(subset())zero-pad convention (Lumley 2010 §2.5) already established inimputation.py:2175-2183andprep.py:1401-1432. Propagates the same convention to SpilloverDiD's Wave E.1 Binder TSL × Wave D Gardner GMM × Wave E.2/follow-up stratified-Conley + serial Bartlett meat.score_pad_maskkwarg on_compute_gmm_corrected_meat: gamma_hat/Psi built onsurvey_finite_mask = finite_mask & survey_weights > 0(the R6 fix preserves FE drop-first column-space invariance against zero-weight subpop rows that pd.factorize compaction would otherwise reorder); Psi zero-padded back to full panel length inside the helper before kernel dispatch; kernel-dispatch arrays (cluster_ids, conley_coords, conley_time, conley_unit, resolved_survey) passed at FULL length so meat helpers see full-domain PSU/strata/centroid/time geometry.res.n_obs/res.n_treated/res.n_control/res.n_far_away_obs+ event-study per-cell n_obs all reflectsurvey_finite_maskon the survey path (matches the effective weighted estimation sample).Methodology references (required if estimator / math changes)
diff_diff/imputation.py:2175-2183(PreTrendsImputation lead regression),diff_diff/prep.py:1401-1432(DCDH cell variance)svyrecvar(subset())semantics.Validation
tests/test_spillover.py— newTestSpilloverDiDWaveE3SubpopulationFullDesign(14 tests) +TestSpilloverDiDWaveE3SubpopulationFullDesignEventStudy(5 tests). 316 spillover regression tests pass on bothDIFF_DIFF_BACKEND=pythonandDIFF_DIFF_BACKEND=rust. Pre-existing Wave E.1test_p2_finite_mask_forces_drop_under_surveyassertion flipped fromn_psu=8(subset) ton_psu=10(full domain) to reflect the new contract.Security / privacy
🤖 Generated with Claude Code