igerber · igerber · May 21, 2026 · May 21, 2026 · May 21, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
diff --git a/README.md b/README.md
@@ -106,7 +106,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`.
 - [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html) - Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies
 - [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html) - Borusyak, Jaravel & Spiess (2024) imputation estimator, most efficient under homogeneous effects
 - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html) - Gardner (2022) two-stage estimator with GMM sandwich variance
-- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing; supports survey-design variance under `survey_design=` for HC1 / CR1 (Wave E.1 Binder TSL) and Conley (Wave E.2 panel-aware stratified-Conley sandwich on per-period PSU totals; extended in Wave E.2 follow-up to `conley_lag_cutoff > 0` via panel-block composition with within-PSU serial Bartlett HAC — `lag>0` requires an effective PSU via explicit `survey_design.psu` or injected `cluster=<col>`)
+- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing; supports survey-design variance under `survey_design=` for HC1 / CR1 (Wave E.1 Binder TSL) and Conley (Wave E.2 panel-aware stratified-Conley sandwich on per-period PSU totals; extended in Wave E.2 follow-up to `conley_lag_cutoff > 0` via panel-block composition with within-PSU serial Bartlett HAC — `lag>0` requires an effective PSU via explicit `survey_design.psu` or injected `cluster=<col>`); `SurveyDesign.subpopulation()` preserves full-design `n_psu` / `df_survey` via zero-padded scores (Wave E.3, R `svyrecvar(subset())` form)
 - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - Synthetic DiD combining standard DiD and synthetic control for few treated units
 - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html) - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
 - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html) - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves

diff --git a/TODO.md b/TODO.md
@@ -145,7 +145,8 @@ Deferred items from PR reviews that were not addressed before merge.
 | Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError` at the linalg validator. | `linalg.py::_validate_vcov_args` | Phase 5 (spillover-conley) | Medium |
 | `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low |
 | `SpilloverDiD(survey_design=...)` replicate-weight variance (BRR / Fay / JK1 / JKn / SDR). Wave E.1 ships Taylor-linearization only. Per Gerber (2026) Appendix A, the IF-reweighting shortcut does NOT apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit of stage 1 and stage 2 (200+ LoC of test surface beyond E.1). | `spillover.py::SpilloverDiD.fit`, `survey.py::compute_replicate_refit_variance` | follow-up | Low |
-| `SpilloverDiD(survey_design=...)` subpopulation preservation (Wave E.3). Wave E.1's `finite_mask` block physically removes zero-weight rows that lose stage-1 FE support, so `SurveyDesign.subpopulation()`-derived designs see `n_psu` / `df_survey` / Binder centering recomputed on the reduced fit sample rather than the full domain design. Standard domain-estimation practice (R `survey::svyrecvar` on a `subset()` design) preserves the original PSU/strata counts and treats out-of-domain rows as zero-score padding. Fix requires separating fit-sample alignment (Psi array) from design-level bookkeeping: preserve a full-design `resolved_survey` for inference metadata + zero-pad dropped zero-weight rows' IF contribution. Add `SurveyDesign.subpopulation()` regression test to lock the contract. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_binder_tsl_meat` | follow-up (Wave E.3) | Medium |
+| `TwoStageDiD(survey_design=SurveyDesign.subpopulation(...))` full-design retention (Wave E.3 parity follow-up). SpilloverDiD shipped Wave E.3's zero-pad convention (R `survey::svyrecvar(subset())` form) but TwoStageDiD's analogous `finite_mask + design-subset + recompute n_psu/n_strata` pattern at `two_stage.py:567-601` was NOT updated in the same PR (scope discipline). Fix template: apply the same Wave E.3 surgical edits to TwoStageDiD (delete design-subset block, zero-pad Psi inputs at the meat call site, gate fit-length sparse FE behind `if resolved_survey is None:`). Parity follow-up should ALSO add a targeted regression on a fixture that actually exercises the TwoStageDiD finite-mask subset path under subpopulation (e.g. a DGP where TwoStageDiD's always-treated handling at `two_stage.py:294-336` triggers warn-and-drop AND `SurveyDesign.subpopulation()` is used) — Wave E.3 attempted such a test on a SpilloverDiD warn-and-drop fixture but found the two estimators have different warn-and-drop semantics so the divergence didn't materialize, so the parity-anchor test was deferred to this follow-up. | `two_stage.py::TwoStageDiD.fit` | follow-up (Wave E.3 parity) | Low |
+| `compute_survey_metadata(resolved_survey, raw_w_for_meta)` helper extraction. Wave E.1/E.3 contain two near-duplicate `raw_w_for_meta` constructions (upstream + post-cluster-injection) that differ only in which point of the resolution pipeline they fire at. Factor out a shared helper that takes `(survey_design, data, [finite_mask])` and returns `(resolved_survey, raw_w_for_meta)` to reduce drift risk between the two paths. Cosmetic; behaviour unchanged. | `spillover.py::SpilloverDiD.fit` | follow-up | Low |
 | Serial Bartlett kernel logic duplicated between `diff_diff/two_stage.py::_compute_stratified_serial_bartlett_meat` (survey path) and `diff_diff/conley.py::_compute_conley_meat` panel-block branch (no-survey path). Both compute `K[t,s] = (1 - |t-s|/(L+1)) * 1{|t-s| <= L, t != s}` over dense panel-period codes. Factor out a shared `_serial_bartlett_kernel_matrix(t_codes, L)` helper and a shared post-meat finite + PSD-warning guard so the survey and no-survey paths can't drift on diagnostics or kernel weights. Cosmetic; refactor doesn't change behavior. | `two_stage.py::_compute_stratified_serial_bartlett_meat`, `conley.py::_compute_conley_meat` | follow-up | Low |
 | `SpilloverDiD(vcov_type="conley", conley_lag_cutoff > 0, survey_design=...)` no-effective-PSU serial Bartlett HAC. Wave E.2 follow-up ships the panel-block composition when an effective PSU exists (explicit `survey_design.psu` OR injected via `cluster=<col>` per `_inject_cluster_as_psu`). Weights-only / strata-only survey designs WITHOUT a cluster fallback raise `NotImplementedError` at `SpilloverDiD.fit` post-resolution because under the pseudo-PSU = obs-index fallback each pseudo-PSU appears in exactly one period — the per-PSU serial cross-period loop would silently contribute zero. Fix would either derive a unit-level serial fallback for no-PSU designs (mixes IF allocators with the pseudo-PSU spatial term — needs methodology work) or route the serial loop through `conley_unit` with explicit documentation of the IF-allocator asymmetry. Regression goldens vs the effective-PSU shipped path. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_stratified_serial_bartlett_meat` | follow-up (Wave E.2 follow-up tail) | Low |
 | `SpilloverDiD(ring_method="count")` extension. Currently only the nearest-treated-ring specification is exposed. Count-of-treated-in-ring (paper Section 3.2 end) is methodologically supported by Butts but re-introduces functional-form dependence; expose with an explicit kwarg gate and documentation warning. | `spillover.py::SpilloverDiD.fit` | follow-up | Low |

diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt
@@ -58,7 +58,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")`
 - [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html): Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies
 - [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html): Borusyak, Jaravel & Spiess (2024) imputation estimator — most efficient under homogeneous effects
 - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html): Gardner (2022) two-stage estimator with GMM sandwich variance
-- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=<col>` for CR1 via Gerber (2026) Binder TSL (Wave E.1) and under `vcov_type="conley"` via a panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2 cross-sectional `conley_lag_cutoff=0`) extended in Wave E.2 follow-up to `conley_lag_cutoff > 0` via panel-block composition with within-PSU serial Bartlett HAC (Newey-West 1987 separable form; `lag>0` requires an effective PSU via explicit `survey_design.psu` or injected `cluster=<col>`), both composed with the Wave D Gardner GMM correction (replicate weights queued as follow-up)
+- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=<col>` for CR1 via Gerber (2026) Binder TSL (Wave E.1) and under `vcov_type="conley"` via a panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2 cross-sectional `conley_lag_cutoff=0`) extended in Wave E.2 follow-up to `conley_lag_cutoff > 0` via panel-block composition with within-PSU serial Bartlett HAC (Newey-West 1987 separable form; `lag>0` requires an effective PSU via explicit `survey_design.psu` or injected `cluster=<col>`), both composed with the Wave D Gardner GMM correction; `SurveyDesign.subpopulation()` preserves full-design `n_psu` / `df_survey` via zero-padded scores at the meat-helper boundary (Wave E.3, R `svyrecvar(subset())` form) (replicate weights queued as follow-up)
 - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html): Synthetic DiD combining standard DiD and synthetic control methods for few treated units
 - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html): Triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
 - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html): Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves

diff --git a/diff_diff/results.py b/diff_diff/results.py
@@ -388,7 +388,16 @@ class SpilloverDiDResults(DiDResults):
     n_far_away_obs : int
         Number of observations with ``D_it = 0`` AND ``d_it > d_bar``;
         these observations identify the counterfactual trend (Butts
-        Assumption 5(ii)).
+        Assumption 5(ii)). Under Wave E.3, on the survey path this count
+        is reported on the EFFECTIVE WEIGHTED ESTIMATION SAMPLE
+        (``count_mask = survey_finite_mask`` = finite_mask AND
+        ``survey_weights > 0``), so zero-weight rows from
+        ``SurveyDesign.subpopulation()`` are excluded — matches the
+        Wave E.3 ``n_obs`` / ``n_treated`` / ``n_control`` contract.
+        The upstream ``_validate_far_away_exists`` gate still fires on
+        the FULL DOMAIN at fit-time (Assumption 5(ii) requires at
+        least one far-away identifying row); this metadata only
+        reports the count on the active sample.
     is_staggered : bool
         True if multiple distinct treatment-onset times were detected.
     event_study : bool