diff --git a/CHANGELOG.md b/CHANGELOG.md
index 3a851f73..66f4405d 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -8,9 +8,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
### Added
+- **`SpilloverDiD(vcov_type="conley", survey_design=...)` integration via stratified-Conley sandwich on PSU totals (Wave E.2).** Lifts the Wave E.1 `NotImplementedError` (`spillover.py:2201` upfront, `two_stage.py:217` helper-level) and adds spatial-HAC + design-based variance for the previously deferred composition. **Documented synthesis** of Conley (1999) spatial-HAC × Gerber (2026, arXiv:2605.04124) Proposition 1 Binder TSL (the Wave E.1 foundation) × Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. No reference software combines all three ingredients on a two-stage influence function. **Mechanical composition (panel-aware):** preserves the library's existing `conley_lag_cutoff = 0` semantic at `diff_diff.conley._compute_conley_meat` ("within-period spatial only — exclude cross-period spatial pairs") by looping over periods. For each period `t`, SpilloverDiD's per-obs Hájek-weighted Wave D IF `psi_i` is aggregated to per-period PSU totals `S_psu_t[g] = sum_{i in PSU g, time t} psi_i` (via `np.add.at`); per-PSU spatial centroids are panel-constant (mean of per-observation `conley_coords` within each PSU, vectorized `np.add.at` sums / `np.bincount` counts); for each stratum the within-stratum sandwich is `M_h_t = (1 - f_h) * n_h/(n_h-1) * sum_{j,k in PSUs_h} K(d(centroid_j, centroid_k) / conley_cutoff_km) * (S_psu_t[j] - S_bar_h_t)(S_psu_t[k] - S_bar_h_t)'`, where K is the Bartlett kernel (SpilloverDiD currently exposes Bartlett only and hardcodes it; the survey helper accepts `"uniform"` too but exposing that on the SpilloverDiD constructor is a separate follow-up) and `d` is haversine / euclidean / callable per `ConleyMetric`. Cross-stratum kernel weights are exactly zero by sampling design (strata are independence partitions). Total meat is `sum_t sum_h M_h_t`. Cross-period spatial pairs are excluded by construction — the per-period loop matches the library's panel Conley contract exactly. **Reduction semantics (load-bearing for tests):** the orchestrator's panel-aware meat equals `sum_t` of per-period within-stratum stratified-Conley sandwiches on per-period PSU totals (pinned at `tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant`); single stratum (H = 1, FPC = inf) reduces to `sum_t` plain Conley sandwich on per-period PSU totals (NOT on time-collapsed totals). **Implementation:** new `_compute_stratified_conley_meat_from_psu_scores` helper in `diff_diff/survey.py` (parallel to existing `_compute_stratified_meat_from_psu_scores` 3-tuple `(meat, variance_computed, legitimate_zero_count)` contract; per-stratum loop replaces the inner `centered.T @ centered` with `_compute_conley_meat(scores=centered, coords=psu_coords_h, ...)` in cross-sectional mode); new dispatch wrapper `_compute_stratified_conley_meat` in `diff_diff/two_stage.py` (parallel to existing `_compute_binder_tsl_meat`, performs per-obs Psi → PSU aggregation + centroid derivation + dispatch to survey helper, intentionally drops `cluster_ids` at the dispatch boundary — see Restrictions). `_compute_gmm_corrected_meat` conley branch extended with `if resolved_survey is not None` routing to the new wrapper; the `resolved_survey is None` branch is bit-identical to Wave D. **Singleton-stratum `lonely_psu="adjust"` parity:** the survey helper mirrors the Binder helper's `continue` to skip the FPC scale on singleton strata (with `n_h = 1` the scale `n_h / (n_h - 1)` would divide by zero); the degenerate one-PSU kernel `K = [[K(0)]] = [[1.0]]` reduces to `centered.T @ centered`, matching Binder's singleton-adjust output. **Saturated `df_survey = 0` NaN-fail:** mirrors Wave E.1 (`_compute_stratified_conley_meat` returns NaN meat with `UserWarning` template "Wave E.2 stratified-Conley sandwich: df_survey = 0..." so callers can `pytest.warns(UserWarning, match="Wave E.2 stratified-Conley")`). **Public surface restrictions:** replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` (inherits Wave E.1 gate; per-replicate full refit is separate follow-up scope); `cluster=
+ survey_design.psu + vcov_type="conley"` coerces `cluster=` to PSU per Wave E.1's warn-and-use-PSU pattern (the Conley cluster product kernel becomes a no-op after PSU aggregation, so `cluster_ids` is intentionally not threaded into the inner Conley kernel call — every PSU is its own cluster post-aggregation, which would zero all cross-PSU pairs); LinearRegression-side `vcov_type="conley" + survey_design=` gate at `diff_diff/linalg.py:2853` remains (separate Bertanha-Imbens 2014 weighted-Conley "Phase 5" roadmap, not Wave E); DiagnosticReport routing for `SpilloverDiDResults(vcov_type="conley", survey_design=)` requires `_APPLICABILITY` / `_PT_METHOD` registration (separate Wave F PR). **Tests:** new `TestSpilloverDiDWaveE2ConleySurveyDesign` and `TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy` classes in `tests/test_spillover.py` (bit-identical no-survey fallback; panel-aware per-period sum invariant on the orchestrator + helper composition; hand-computation methodology anchor; single-stratum ≡ plain Conley on PSU totals; cross-stratum independence as a unit test on the survey helper with interleaved cross-stratum centroids; Binder vs Conley singleton-adjust FPC skip parity; lonely-PSU sensitivity across three modes; FPC large ≡ no-FPC and FPC = n_h zeros stratum; saturated NaN-fail with `pytest.warns(match="Wave E.2 stratified-Conley")`; replicate-weight + non-pweight rejections; cluster warn-and-use-PSU; fit idempotency; `finite_mask` survey-array subsetting; no-PSU coverage — weights-only `SurveyDesign(weights=...)`, strata-only `SurveyDesign(weights=..., strata=...)`, and a per-period re-index unit invariant pinning that no cross-period spatial pairs leak into the meat on implicit-PSU layouts; event-study path on both `is_staggered=True`/`False` branches per `feedback_cohort_loop_trigger_cache_both_branches`; drift goldens at `rtol=1e-12 / atol=1e-14`). The pre-existing `tests/test_spillover.py::test_fit_conley_plus_survey_design_not_implemented` Wave E.1-era gate-assertion test is removed (replaced by the positive-path tests above). Wave E.1 entry's "Public surface restrictions" bullet updated to past-tense the conley+survey gate reference.
- **SunAbraham `vcov_type` parameter (Phase 1b PR 1/8).** `SunAbraham(vcov_type=...)` now accepts `{"classical","hc1","hc2","hc2_bm"}` (defaults to `"hc1"`, which preserves prior behavior bit-equally - SA historically hard-coded HC1). Auto-cluster-at-unit dropped when the user opts into explicit `vcov_type="hc2"` or `vcov_type="classical"` (one-way only); preserved for `"hc1"` and `"hc2_bm"`. When `vcov_type in {"classical","hc2","hc2_bm"}`, `_fit_saturated_regression` auto-routes to a full-dummy saturated design (mirrors TWFE Gate 1 from PR #469): FWL preserves cohort coefficients but not the hat matrix, so HC2 leverage and Bell-McCaffrey Satterthwaite DOF must be computed on the full FE projection. Empirically matches R `lm()` summary classical SE, `sandwich::vcovHC(type="HC2")`, and `clubSandwich::vcovCR(..., type="CR2")` + `coef_test()$df_Satt` at atol=1e-10 (cohort SE and BM DOF pinned in `tests/test_methodology_sun_abraham.py`). For `vcov_type="hc2_bm"`, the user-facing aggregated inference (`event_study_effects[e]['p_value']`/`['conf_int']`, `overall_p_value`/`overall_conf_int`) uses CR2 Bell-McCaffrey contrast DOF — matches `clubSandwich::Wald_test(test="HTZ")$df_denom` at atol=1e-10 (mirrors PR #465's `_compute_cr2_bm_contrast_dof` pattern for MultiPeriodDiD's post-period-average ATT). `vcov_type` is now propagated to `SunAbrahamResults.vcov_type` for downstream introspection. `SurveyDesign` (any kind — analytical weights, stratified, PSU, or replicate-weight) combined with `vcov_type in {"classical","hc2","hc2_bm"}` raises `NotImplementedError`: the survey-design TSL (or replicate-weight refit) variance overrides the analytical sandwich family, and the auto-cluster guard for one-way families would silently downgrade unit-level PSUs to per-observation PSUs. Use `vcov_type="hc1"` (default) for survey designs. `conley` rejected at `__init__` with a deferral message (would require threading 6+ `conley_*` params through the saturated regression call). **Deviation from R:** SA's within-transform HC1 SE differs from `fixest::sunab()` by ~1-2% (~2e-3 absolute) on typical panel sizes due to a different `(n-k)` finite-sample correction (fixest counts absorbed FE in k_total; SA's `solve_ols` counts only within-transformed columns); the IW aggregation step is otherwise identical (pinned at atol=5e-3, tracked in TODO.md). First PR of the Phase 1b standalone-estimator threading initiative (7 PRs to follow: StackedDiD, WooldridgeDiD-OLS, CallawaySantAnna, ImputationDiD, TripleDifference, TwoStageDiD, EfficientDiD).
- **PreTrendsPower R `pretrends` parity goldens (PR-C closes PR-B's deferred R-parity row).** JSON goldens at `benchmarks/data/r_pretrends_golden.json` generated from the committed `benchmarks/R/generate_pretrends_golden.R` script against `jonathandroth/pretrends` commit `122731d082` (package version 0.1.0, R 4.5.2). 4 fixtures cover regular K=3 grid (`uniform_3_pre_periods_no_anticipation`), irregular K=3 grid `[-5,-3,-1]` (`irregular_pre_periods` — locks the PR-B Step 4 γ-unit linear-weight fix), anticipation-shifted K=4 grid (`anticipation_shifted`), and K=1 closed form (`single_pre_period_closed_form` — Roth Proposition 2 univariate truncated-normal). `TestPretrendsParityR` in `tests/test_methodology_pretrends.py` now active (4 tests): NIS power vs R `pretrends::pretrends()` at `atol=1e-4` across all 4 fixtures × 4 γ values; γ_p MDV vs R `slope_for_power()` at `atol=1e-4` across all 4 fixtures × 2 target_power values; end-to-end `fit()` on irregular grid vs R γ_p at `atol=1e-4` (locks the full `fit() → _extract_pre_period_params → _get_violation_weights → _compute_mdv_nis` chain through the public API); K=1 three-way cross-check (Python ≡ analytical truncated-normal closed form `1 - Φ(z - γ/σ) + Φ(-z - γ/σ)` at `atol=1e-7`; both within `atol=1e-4` of R). Tolerance rationale: R hardcodes `thresholdTstat.Pretest=1.96` while Python uses `scipy.stats.norm.ppf(0.975) = 1.959963984540054` (`dz ≈ 3.6e-5`); R `slope_for_power` uses `uniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)` versus Python `brentq(xtol=2e-12)`; the inverse-solver tolerance gap dominates γ_p, and `mvtnorm::pmvnorm` (R) vs `scipy.stats.multivariate_normal.cdf` (Python) Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5. `METHODOLOGY_REVIEW.md` PreTrendsPower row promoted `**Complete** (R parity pending)` → `**Complete**`. Roth (2022) paper review's `R \`pretrends\` package version pin (provisional)` Gaps bullet struck. Closes the PR-C TODO row.
-- **`SpilloverDiD(survey_design=...)` integration on HC1 / CR1 paths via Binder TSL (Wave E.1).** Lifts the Wave B/C/D upfront `NotImplementedError` and adds design-based variance for `vcov_type ∈ {"hc1"}` plus `cluster=` (CR1). **Documented synthesis** of Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper's Appendix — composed with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. No reference software combines all ingredients. **Mechanical composition:** SpilloverDiD's per-obs Wave D IF `psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}` (with survey weights threaded through `gamma_hat` solve, eps construction, and bread inversion via Hájek normalization) is aggregated to PSU totals and passed to the audited `_compute_stratified_meat_from_psu_scores` Binder TSL meat helper. Stage-1 FE estimation extends `_iterative_fe_subset` with a `weights=` kwarg implementing WLS-FE via weighted bincount (numerator `bincount(w*resid)` / denominator `bincount(w)`); the `weights is None` path is bit-identical to the Wave B / C / D unweighted bincount. **Degrees of freedom:** t-distribution lookup uses `ResolvedSurveyDesign.df_survey` (4-way branch: PSU+strata → `n_PSU - n_strata`; PSU only → `n_PSU - 1`; strata only → `n_obs - n_strata`; neither → `n_obs - 1`), threaded through all four `safe_inference` call sites (aggregate `tau_total`, per-ring `delta_j`, event-study per-event-time `tau_k` / `delta_jk`, scalar `att` lincom). **Survey-array subsetting:** when `finite_mask` drops baseline-treated rows, `survey_weights` and `ResolvedSurveyDesign.{weights, strata, psu, fpc, replicate_weights}` are subsetted in parallel; `n_psu`, `n_strata`, and `survey_metadata` are recomputed (mirrors `TwoStageDiD.fit:567-601`). **Cluster + survey resolution:** when `cluster=` and `survey_design.psu` are both supplied with different groupings, a `UserWarning` fires and PSU wins (mirrors `_resolve_effective_cluster` at `survey.py:1253-1275`; TwoStageDiD parity). When `cluster=` is supplied without `survey_design.psu`, the cluster column is injected as the effective PSU via `_inject_cluster_as_psu`, which now honors `SurveyDesign.nest`: under `nest=False`, cluster labels must be globally unique across strata (raises if they repeat, matching the explicit-PSU resolver's contract). **Saturated `df_survey = 0` NaN-fail:** when `lonely_psu="remove"` removes all strata (singleton PSUs), the meat helper returns `(_, var_computed=False, legit_zero=0)` and SpilloverDiD's Wave E.1 path returns NaN meat with a `UserWarning` matching `"df_survey"` so callers can `pytest.warns(UserWarning, match="df_survey")`. This is a **departure from TwoStageDiD** (`two_stage.py:2003-2005`) which currently NaN-fails SILENTLY; Wave E.1 surfaces the diagnostic per `feedback_no_silent_failures`. **Subpopulation limitation (Wave E.3 follow-up):** `SurveyDesign.subpopulation()`-derived designs with zero-weight padding rows that lose stage-1 FE support have those rows physically removed by `finite_mask`, so `n_psu` / `df_survey` / Binder centering reflect the reduced fit sample rather than the full domain design (documented in REGISTRY; Wave E.3 will preserve full-design bookkeeping). **Public surface restrictions:** `vcov_type="conley" + survey_design=` raises `NotImplementedError` pointing at planned Wave E.2 (Conley × survey product-kernel synthesis with within-stratum Conley sandwich on PSU totals); replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` — per Gerber (2026) Appendix A, the IF-reweighting shortcut does not apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit and is queued as a follow-up; non-pweight (`weight_type ∈ {"fweight", "aweight"}`) raises `ValueError` (the Binder TSL assumes probability weights). **Implementation:** `_compute_gmm_corrected_meat` extended with `survey_weights` + `resolved_survey` kwargs at `diff_diff/two_stage.py:56` (TYPE_CHECKING forward reference for `ResolvedSurveyDesign` to avoid circular import); new module-level helper `_compute_binder_tsl_meat` at `diff_diff/two_stage.py` wraps `_compute_stratified_meat_from_psu_scores` with implicit per-obs PSU synthesis for no-PSU survey designs + the Wave E.1 NaN-fail + warning; `_iterative_fe_subset` weighted path at `diff_diff/spillover.py:1382` (in-place extension, bit-identical fallback, positive-weight identification gate); `_inject_cluster_as_psu` honors `nest` (shared survey-helper fix that also benefits TwoStageDiD); `ResolvedSurveyDesign` gains a `nest` field propagated through all 5 construction sites. `SpilloverDiDResults` extended with `survey_metadata`, `n_psu`, `n_strata` fields at `diff_diff/results.py`. **Tests:** new `TestSpilloverDiDWaveE1SurveyDesignHc1` (17 tests: bit-identity fallback, Binder TSL hand-check uniform + non-uniform weights, lonely_psu modes, FPC degenerate limits ×3, saturated NaN-fail with `pytest.warns(match="df_survey")`, cluster+survey warn-and-use-PSU, no-PSU regressions (weights-only, weights+strata, cluster-without-PSU, cluster overlap with nest=False/True), zero-weight Omega_0 exclusion + all-zero raises, replicate-weight + non-pweight + Conley+survey rejections, fit idempotency, finite_mask subsetting) and `TestSpilloverDiDWaveE1SurveyDesignEventStudy` (7 tests: event-study + survey on both `is_staggered` branches with `df_survey` lincom verification, distinguishability between survey-share and sample-share lincom rules via manual reconstruction with cohort-correlated weights + non-constant tau_k, aggregate-vs-event-study parity, drift goldens, subset-path invariant). Wave B/C/D bullets below are unchanged; this entry replaces the pre-Wave-E.1 `survey_design=` rejection.
+- **`SpilloverDiD(survey_design=...)` integration on HC1 / CR1 paths via Binder TSL (Wave E.1).** Lifts the Wave B/C/D upfront `NotImplementedError` and adds design-based variance for `vcov_type ∈ {"hc1"}` plus `cluster=` (CR1). **Documented synthesis** of Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper's Appendix — composed with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. No reference software combines all ingredients. **Mechanical composition:** SpilloverDiD's per-obs Wave D IF `psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}` (with survey weights threaded through `gamma_hat` solve, eps construction, and bread inversion via Hájek normalization) is aggregated to PSU totals and passed to the audited `_compute_stratified_meat_from_psu_scores` Binder TSL meat helper. Stage-1 FE estimation extends `_iterative_fe_subset` with a `weights=` kwarg implementing WLS-FE via weighted bincount (numerator `bincount(w*resid)` / denominator `bincount(w)`); the `weights is None` path is bit-identical to the Wave B / C / D unweighted bincount. **Degrees of freedom:** t-distribution lookup uses `ResolvedSurveyDesign.df_survey` (4-way branch: PSU+strata → `n_PSU - n_strata`; PSU only → `n_PSU - 1`; strata only → `n_obs - n_strata`; neither → `n_obs - 1`), threaded through all four `safe_inference` call sites (aggregate `tau_total`, per-ring `delta_j`, event-study per-event-time `tau_k` / `delta_jk`, scalar `att` lincom). **Survey-array subsetting:** when `finite_mask` drops baseline-treated rows, `survey_weights` and `ResolvedSurveyDesign.{weights, strata, psu, fpc, replicate_weights}` are subsetted in parallel; `n_psu`, `n_strata`, and `survey_metadata` are recomputed (mirrors `TwoStageDiD.fit:567-601`). **Cluster + survey resolution:** when `cluster=` and `survey_design.psu` are both supplied with different groupings, a `UserWarning` fires and PSU wins (mirrors `_resolve_effective_cluster` at `survey.py:1253-1275`; TwoStageDiD parity). When `cluster=` is supplied without `survey_design.psu`, the cluster column is injected as the effective PSU via `_inject_cluster_as_psu`, which now honors `SurveyDesign.nest`: under `nest=False`, cluster labels must be globally unique across strata (raises if they repeat, matching the explicit-PSU resolver's contract). **Saturated `df_survey = 0` NaN-fail:** when `lonely_psu="remove"` removes all strata (singleton PSUs), the meat helper returns `(_, var_computed=False, legit_zero=0)` and SpilloverDiD's Wave E.1 path returns NaN meat with a `UserWarning` matching `"df_survey"` so callers can `pytest.warns(UserWarning, match="df_survey")`. This is a **departure from TwoStageDiD** (`two_stage.py:2003-2005`) which currently NaN-fails SILENTLY; Wave E.1 surfaces the diagnostic per `feedback_no_silent_failures`. **Subpopulation limitation (Wave E.3 follow-up):** `SurveyDesign.subpopulation()`-derived designs with zero-weight padding rows that lose stage-1 FE support have those rows physically removed by `finite_mask`, so `n_psu` / `df_survey` / Binder centering reflect the reduced fit sample rather than the full domain design (documented in REGISTRY; Wave E.3 will preserve full-design bookkeeping). **Public surface restrictions:** `vcov_type="conley" + survey_design=` originally raised `NotImplementedError` pointing at planned Wave E.2; lifted in the Wave E.2 entry above (stratified-Conley sandwich on PSU totals). Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` — per Gerber (2026) Appendix A, the IF-reweighting shortcut does not apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit and is queued as a follow-up; non-pweight (`weight_type ∈ {"fweight", "aweight"}`) raises `ValueError` (the Binder TSL assumes probability weights). **Implementation:** `_compute_gmm_corrected_meat` extended with `survey_weights` + `resolved_survey` kwargs at `diff_diff/two_stage.py:56` (TYPE_CHECKING forward reference for `ResolvedSurveyDesign` to avoid circular import); new module-level helper `_compute_binder_tsl_meat` at `diff_diff/two_stage.py` wraps `_compute_stratified_meat_from_psu_scores` with implicit per-obs PSU synthesis for no-PSU survey designs + the Wave E.1 NaN-fail + warning; `_iterative_fe_subset` weighted path at `diff_diff/spillover.py:1382` (in-place extension, bit-identical fallback, positive-weight identification gate); `_inject_cluster_as_psu` honors `nest` (shared survey-helper fix that also benefits TwoStageDiD); `ResolvedSurveyDesign` gains a `nest` field propagated through all 5 construction sites. `SpilloverDiDResults` extended with `survey_metadata`, `n_psu`, `n_strata` fields at `diff_diff/results.py`. **Tests:** new `TestSpilloverDiDWaveE1SurveyDesignHc1` (17 tests: bit-identity fallback, Binder TSL hand-check uniform + non-uniform weights, lonely_psu modes, FPC degenerate limits ×3, saturated NaN-fail with `pytest.warns(match="df_survey")`, cluster+survey warn-and-use-PSU, no-PSU regressions (weights-only, weights+strata, cluster-without-PSU, cluster overlap with nest=False/True), zero-weight Omega_0 exclusion + all-zero raises, replicate-weight + non-pweight + Conley+survey rejections, fit idempotency, finite_mask subsetting) and `TestSpilloverDiDWaveE1SurveyDesignEventStudy` (7 tests: event-study + survey on both `is_staggered` branches with `df_survey` lincom verification, distinguishability between survey-share and sample-share lincom rules via manual reconstruction with cohort-correlated weights + non-constant tau_k, aggregate-vs-event-study parity, drift goldens, subset-path invariant). Wave B/C/D bullets below are unchanged; this entry replaces the pre-Wave-E.1 `survey_design=` rejection.
## [3.4.0] - 2026-05-19
diff --git a/README.md b/README.md
index 761d7046..a9b7e238 100644
--- a/README.md
+++ b/README.md
@@ -106,7 +106,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`.
- [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html) - Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies
- [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html) - Borusyak, Jaravel & Spiess (2024) imputation estimator, most efficient under homogeneous effects
- [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html) - Gardner (2022) two-stage estimator with GMM sandwich variance
-- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing; supports survey-design variance under `survey_design=` for HC1/CR1
+- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing; supports survey-design variance under `survey_design=` for HC1 / CR1 (Wave E.1 Binder TSL) and Conley (Wave E.2 panel-aware stratified-Conley sandwich on per-period PSU totals; `conley_lag_cutoff=0` only — serial Bartlett HAC composition queued as follow-up)
- [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - Synthetic DiD combining standard DiD and synthetic control for few treated units
- [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html) - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
- [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html) - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves
diff --git a/TODO.md b/TODO.md
index 9098ef1e..361ca9a2 100644
--- a/TODO.md
+++ b/TODO.md
@@ -134,9 +134,9 @@ Deferred items from PR reviews that were not addressed before merge.
| SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low |
| Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError` at the linalg validator. | `linalg.py::_validate_vcov_args` | Phase 5 (spillover-conley) | Medium |
| `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low |
-| `SpilloverDiD(vcov_type="conley", survey_design=...)` composition (Wave E.2). Wave E.1 ships HC1 / CR1 + survey via Binder TSL (Gerber 2026 Prop 1 + Wave D GMM). Wave E.2 needs the novel within-stratum Conley sandwich on PSU totals — no reference software combines Conley spatial-HAC with Binder TSL on a two-stage IF. Methodologically: aggregate Psi to PSU totals first, demean within stratum, then apply within-stratum Conley sandwich. Strata become an exact independence partition (no kernel weight crosses stratum boundaries by sampling design). | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_gmm_corrected_meat` | follow-up (Wave E.2) | Medium |
| `SpilloverDiD(survey_design=...)` replicate-weight variance (BRR / Fay / JK1 / JKn / SDR). Wave E.1 ships Taylor-linearization only. Per Gerber (2026) Appendix A, the IF-reweighting shortcut does NOT apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit of stage 1 and stage 2 (200+ LoC of test surface beyond E.1). | `spillover.py::SpilloverDiD.fit`, `survey.py::compute_replicate_refit_variance` | follow-up | Low |
| `SpilloverDiD(survey_design=...)` subpopulation preservation (Wave E.3). Wave E.1's `finite_mask` block physically removes zero-weight rows that lose stage-1 FE support, so `SurveyDesign.subpopulation()`-derived designs see `n_psu` / `df_survey` / Binder centering recomputed on the reduced fit sample rather than the full domain design. Standard domain-estimation practice (R `survey::svyrecvar` on a `subset()` design) preserves the original PSU/strata counts and treats out-of-domain rows as zero-score padding. Fix requires separating fit-sample alignment (Psi array) from design-level bookkeeping: preserve a full-design `resolved_survey` for inference metadata + zero-pad dropped zero-weight rows' IF contribution. Add `SurveyDesign.subpopulation()` regression test to lock the contract. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_binder_tsl_meat` | follow-up (Wave E.3) | Medium |
+| `SpilloverDiD(vcov_type="conley", conley_lag_cutoff > 0, survey_design=...)` serial Bartlett HAC composition. Wave E.2 ships the panel-aware `conley_lag_cutoff = 0` case ("within-period spatial only" — `sum_t sum_h M_h_t` per `tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant`) and raises `NotImplementedError` upfront at `spillover.py:fit` on `conley_lag_cutoff > 0`. The serial Bartlett component (within-unit / within-PSU temporal HAC at lag ≤ L) needs to compose with the panel-aware stratified-Conley spatial sandwich — the natural addition is `meat_serial = sum_g sum_{|t-s|<=L, t!=s} (1 - |t-s|/(L+1)) * (S_psu_t[g] - S_bar_h_t)(S_psu_s[g] - S_bar_h_s)'` per PSU, summed across all PSUs in each stratum, with appropriate Binder FPC scaling — plus a methodology call on whether to include cross-period spatial pairs in the serial term. Regression goldens vs the cross-sectional limit (lag=0, which is now the shipped path). | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_stratified_conley_meat` | follow-up (Wave E.2 follow-up) | Medium |
| `SpilloverDiD(ring_method="count")` extension. Currently only the nearest-treated-ring specification is exposed. Count-of-treated-in-ring (paper Section 3.2 end) is methodologically supported by Butts but re-introduces functional-form dependence; expose with an explicit kwarg gate and documentation warning. | `spillover.py::SpilloverDiD.fit` | follow-up | Low |
| `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low |
| `SpilloverDiD` T22 TVA tutorial (`docs/tutorials/22_spillover_did.ipynb`): synthetic TVA-style DGP reproducing Butts (2021) Section 4 Table 1 Panel A bias-correction direction (~40% understatement). Split from the methodology PR per user-confirmed scope split (2026-05-15). | `docs/tutorials/`, `tests/test_t22_*_drift.py` | follow-up (Wave B) | Medium |
diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt
index c0dc4936..6ae688fe 100644
--- a/diff_diff/guides/llms.txt
+++ b/diff_diff/guides/llms.txt
@@ -58,7 +58,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")`
- [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html): Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies
- [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html): Borusyak, Jaravel & Spiess (2024) imputation estimator — most efficient under homogeneous effects
- [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html): Gardner (2022) two-stage estimator with GMM sandwich variance
-- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=` for CR1 via Gerber (2026) Binder TSL composed with Wave D Gardner GMM correction (Conley × survey + replicate weights queued as follow-up)
+- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=` for CR1 via Gerber (2026) Binder TSL (Wave E.1) and under `vcov_type="conley"` via a panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2; `conley_lag_cutoff=0` only — serial Bartlett HAC composition queued as follow-up), both composed with the Wave D Gardner GMM correction (replicate weights queued as follow-up)
- [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html): Synthetic DiD combining standard DiD and synthetic control methods for few treated units
- [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html): Triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
- [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html): Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves
diff --git a/diff_diff/spillover.py b/diff_diff/spillover.py
index fe0466a3..a1f0db11 100644
--- a/diff_diff/spillover.py
+++ b/diff_diff/spillover.py
@@ -2190,22 +2190,43 @@ def fit(
heteroskedasticity-robust SE with the GMM correction.
"""
# Wave E.1: lift the Wave B/C/D upfront survey_design rejection.
- # The full resolution block (pweight gate, replicate gate, unit-constant
+ # Wave E.2 (this PR): conley × survey is now supported via a
+ # stratified-Conley sandwich on PSU totals (composition of Conley
+ # 1999 + Gerber 2026 Prop 1 Binder TSL + Wave D Gardner GMM). The
+ # full resolution block (pweight gate, replicate gate, unit-constant
# check, cluster-vs-PSU warn) runs AFTER `_validate_spillover_inputs`
# below so it sees the panel columns the validator guarantees.
#
- # The conley × survey composition is genuinely novel methodology
- # (no reference software combines spatial-HAC + Binder TSL on a
- # two-stage IF) and ships separately as Wave E.2. Reject upfront so
- # users get the pointer without waiting through stage 1 / 2 work.
- if survey_design is not None and self.vcov_type == "conley":
+ # Wave E.2 scope-limit (upfront, before resolution / panel work):
+ # the panel-block Conley HAC (`conley_lag_cutoff > 0`) is NOT
+ # composed with the survey path in this PR. The stratified-Conley
+ # helper applies a cross-sectional kernel on PSU-aggregated totals;
+ # composing the within-unit serial Bartlett HAC with the within-
+ # stratum cross-PSU spatial kernel requires carrying PSU-by-time
+ # scores into the meat construction, which is a separate Wave E.x
+ # follow-up tracked in TODO.md. Reject upfront with a clear pointer
+ # so users running `survey_design=` + `conley_lag_cutoff > 0` get
+ # the error before stage-1 / 2 work (per `feedback_no_silent_failures`).
+ if (
+ survey_design is not None
+ and self.vcov_type == "conley"
+ and self.conley_lag_cutoff is not None
+ and self.conley_lag_cutoff > 0
+ ):
raise NotImplementedError(
- "SpilloverDiD does not yet support vcov_type='conley' "
- "combined with survey_design=. Wave E.2 (planned) will "
- "compose Conley spatial-HAC with within-stratum Conley "
- "sandwich on PSU totals; see TODO.md for the planned PR. "
- "For Wave E.1, use vcov_type='hc1' (with optional "
- "cluster= for CR1) plus survey_design=."
+ "SpilloverDiD(vcov_type='conley', conley_lag_cutoff > 0) "
+ "combined with survey_design= is not supported in Wave E.2. "
+ "The Wave E.2 stratified-Conley sandwich aggregates Psi to "
+ "PSU totals before applying the cross-sectional Conley "
+ "kernel; the panel-block decomposition (within-unit serial "
+ "Bartlett HAC over time) would require carrying PSU-by-time "
+ "scores and composing the serial kernel with the within-"
+ "stratum cross-PSU spatial kernel. This composition is "
+ "queued as a follow-up (see TODO.md). For Wave E.2, use "
+ "conley_lag_cutoff=0 (cross-sectional Conley) with "
+ "survey_design=, or use survey_design= with "
+ "vcov_type='hc1' (+ cluster= for CR1) for the full "
+ "Wave E.1 path."
)
# Validate `anticipation` up front: must be a non-negative integer.
# Accepting fractional or negative values would silently shift
diff --git a/diff_diff/survey.py b/diff_diff/survey.py
index f2461501..89a75a60 100644
--- a/diff_diff/survey.py
+++ b/diff_diff/survey.py
@@ -1898,6 +1898,160 @@ def _compute_stratified_meat_from_psu_scores(
return meat, _variance_computed, legitimate_zero_count
+def _compute_stratified_conley_meat_from_psu_scores(
+ psu_scores: np.ndarray,
+ psu_strata: np.ndarray,
+ psu_coords: np.ndarray,
+ *,
+ cutoff: float,
+ metric,
+ kernel: str,
+ fpc_per_psu: "Optional[np.ndarray]" = None,
+ lonely_psu: str = "remove",
+) -> Tuple[np.ndarray, bool, int]:
+ """Wave E.2 stratified-Conley meat on PSU-aggregated scores.
+
+ Composes Conley (1999) spatial-HAC with Gerber (2026, arXiv:2605.04124)
+ Proposition 1 Binder TSL (the Wave E.1 foundation) and the Wave D
+ Gardner GMM first-stage uncertainty correction (Butts 2021 ss3.1 +
+ Gardner 2022 ss4). Used by SpilloverDiD's Wave E.2 GMM sandwich when
+ ``vcov_type="conley"`` is combined with ``survey_design=``.
+
+ Per-stratum loop: demean PSU scores within the stratum, apply the
+ cross-sectional Conley kernel between PSU centroids in that stratum,
+ scale by the Binder finite-population correction
+ ``(1 - f_h) * n_h/(n_h-1)``. Cross-stratum kernel weights are zero by
+ sampling design (strata are exact independence partitions); total meat
+ is the sum across strata.
+
+ Parameters
+ ----------
+ psu_scores : np.ndarray
+ Score matrix of shape (G, k) — one row per PSU.
+ psu_strata : np.ndarray
+ Stratum assignment per PSU, shape (G,).
+ psu_coords : np.ndarray
+ Per-PSU spatial centroid coordinates, shape (G, 2). Typically the
+ mean of per-observation ``conley_coords`` within each PSU.
+ cutoff : float
+ Conley spatial-HAC bandwidth in the same units as ``psu_coords``
+ (km when ``metric="haversine"``).
+ metric : str or callable
+ Distance metric; ``"haversine"`` / ``"euclidean"`` / callable per
+ :mod:`diff_diff.conley` (``ConleyMetric``).
+ kernel : str
+ Spatial kernel: ``"bartlett"`` or ``"uniform"``.
+ fpc_per_psu : np.ndarray, optional
+ FPC population size per PSU, shape (G,). All PSUs in the same
+ stratum should share the same FPC value (first occurrence used).
+ lonely_psu : str
+ How to handle singleton strata: ``"remove"``, ``"certainty"``, or
+ ``"adjust"``. Matches the existing
+ :func:`_compute_stratified_meat_from_psu_scores` behaviour exactly,
+ including the ``"adjust"`` branch's ``continue`` that skips FPC
+ scaling (with ``n_h=1`` the scale ``n_h/(n_h-1)`` would divide by
+ zero).
+
+ Returns
+ -------
+ meat : np.ndarray
+ Meat matrix of shape (k, k).
+ variance_computed : bool
+ Whether any actual variance computation happened.
+ legitimate_zero_count : int
+ Number of strata that legitimately contribute zero variance.
+
+ Notes
+ -----
+ Reduction semantics (load-bearing for tests):
+
+ - bandwidth -> 0 (Bartlett: ``K(d/tiny) = 0`` for ``d > 0`` and
+ ``K(0) = 1`` on the diagonal so K is the identity matrix): the
+ within-stratum sandwich ``sum_{j,k} K_jk c_j c_k' = sum_j c_j c_j'
+ = centered.T @ centered``, which is precisely Binder's formula at
+ :func:`_compute_stratified_meat_from_psu_scores`.
+ - Single stratum (H = 1, FPC = inf): reduces to ordinary Conley
+ sandwich on PSU totals via :func:`diff_diff.conley._compute_conley_meat`.
+
+ No reference software combines all three ingredients (Conley
+ spatial-HAC + Binder TSL + Gardner GMM correction) on a two-stage
+ influence function.
+ """
+ from diff_diff.conley import _compute_conley_meat
+
+ if psu_scores.ndim == 1:
+ psu_scores = psu_scores[:, np.newaxis]
+ k = psu_scores.shape[1]
+ meat = np.zeros((k, k))
+
+ unique_strata = np.unique(psu_strata)
+ _variance_computed = False
+ legitimate_zero_count = 0
+
+ _global_psu_mean = None
+ if lonely_psu == "adjust":
+ _global_psu_mean = psu_scores.mean(axis=0, keepdims=True)
+
+ for h in unique_strata:
+ mask_h = psu_strata == h
+ scores_h = psu_scores[mask_h]
+ coords_h = psu_coords[mask_h]
+ n_psu_h = scores_h.shape[0]
+
+ if n_psu_h < 2:
+ if lonely_psu == "remove":
+ continue
+ elif lonely_psu == "certainty":
+ legitimate_zero_count += 1
+ continue
+ elif lonely_psu == "adjust":
+ # Degenerate one-PSU kernel K = [[K(0)]] = [[1.0]] for both
+ # Bartlett and uniform; equivalent to centered.T @ centered.
+ # MUST `continue` to skip the FPC block below — with n_h = 1
+ # the scale n_h/(n_h-1) divides by zero. Mirrors the Binder
+ # helper's singleton-adjust branch exactly.
+ centered = scores_h - _global_psu_mean
+ with np.errstate(invalid="ignore", over="ignore"):
+ meat += centered.T @ centered
+ _variance_computed = True
+ continue
+
+ f_h = 0.0
+ if fpc_per_psu is not None:
+ N_h = fpc_per_psu[mask_h][0]
+ if N_h < n_psu_h:
+ raise ValueError(
+ f"FPC ({N_h}) is less than the number of PSUs "
+ f"({n_psu_h}) in stratum. FPC must be >= n_PSU."
+ )
+ f_h = n_psu_h / N_h
+ if f_h >= 1.0:
+ legitimate_zero_count += 1
+
+ psu_mean_h = scores_h.mean(axis=0, keepdims=True)
+ centered = scores_h - psu_mean_h
+
+ # Within-stratum Conley sandwich on PSU-centered scores. Pass
+ # ``cluster_ids=None`` explicitly: after PSU aggregation every PSU
+ # is its own cluster, so a cluster product kernel would zero all
+ # cross-PSU pairs. See Wave E.2 plan Chunk 3 step 4.
+ conley_meat_h = _compute_conley_meat(
+ centered,
+ coords_h,
+ cutoff,
+ metric,
+ kernel,
+ cluster_ids=None,
+ )
+
+ adjustment = (1.0 - f_h) * (n_psu_h / (n_psu_h - 1))
+ with np.errstate(invalid="ignore", over="ignore"):
+ meat += adjustment * conley_meat_h
+ _variance_computed = True
+
+ return meat, _variance_computed, legitimate_zero_count
+
+
def compute_survey_vcov(
X: np.ndarray,
residuals: np.ndarray,
diff --git a/diff_diff/two_stage.py b/diff_diff/two_stage.py
index 370560f4..51c15bed 100644
--- a/diff_diff/two_stage.py
+++ b/diff_diff/two_stage.py
@@ -125,9 +125,20 @@ def _compute_gmm_corrected_meat(
TSL has its own ``(1-f_h) * n_h/(n_h-1)`` correction).
- ``vcov_type="cluster"``: ``cluster_ids`` IS the PSU (via upstream
``_inject_cluster_as_psu``); identical to the HC1+survey branch.
- - ``vcov_type="conley"``: raises ``NotImplementedError``. Wave E.2
- (planned) will compose Conley spatial-HAC with within-stratum
- Conley sandwich on PSU totals.
+ - ``vcov_type="conley"`` (cross-sectional only — ``conley_lag_cutoff = 0``):
+ Wave E.2 stratified-Conley sandwich on PSU totals via
+ :func:`_compute_stratified_conley_meat`. Aggregates Psi to PSU
+ totals + derives per-PSU centroids as the mean of per-obs
+ ``conley_coords``; for each stratum applies the Conley kernel
+ between PSU centroids scaled by ``(1 - f_h) * n_h/(n_h-1)``.
+ Cross-stratum kernel weights are zero by sampling design.
+ - ``vcov_type="conley"`` with ``conley_lag_cutoff > 0`` (panel-block
+ Conley): raises ``NotImplementedError`` upstream at
+ ``SpilloverDiD.fit``. The panel-block decomposition would need to
+ compose the within-unit serial Bartlett HAC with the within-stratum
+ cross-PSU spatial kernel on PSU-by-time scores rather than the
+ collapsed PSU totals; out of Wave E.2 scope and tracked as a
+ follow-up in ``TODO.md``.
**`gamma_hat` solve** (mirror of `TwoStageDiD._compute_gmm_variance`
pattern at `two_stage.py:1886-1917`): factorize ``X_10' W X_10`` via
@@ -212,16 +223,10 @@ def _compute_gmm_corrected_meat(
cluster_ids=cluster_ids,
)
- # Wave E.1: reject the conley × survey composition. Wave E.2 (planned)
- # will add the within-stratum Conley sandwich on PSU totals.
- if vcov_type == "conley" and resolved_survey is not None:
- raise NotImplementedError(
- "SpilloverDiD does not yet support vcov_type='conley' combined "
- "with survey_design=. Wave E.2 (planned) will compose Conley "
- "spatial-HAC with within-stratum Conley sandwich on PSU totals; "
- "see TODO.md for the planned PR. For now, use vcov_type='hc1' "
- "(+ cluster= for CR1) with survey_design=."
- )
+ # Wave E.2 (this PR): conley × survey is now supported via the
+ # stratified-Conley sandwich on PSU totals. Dispatch happens inside
+ # the vcov_type == "conley" branch below (Wave E.1 already routed
+ # hc1 / cluster + survey to the Binder TSL helper).
# 1. gamma_hat = (X_10' W X_10)^{-1} (X_1' W X_2). Mirror the existing
# TwoStageDiD method at two_stage.py:1886-1917 — sparse_factorized
@@ -358,19 +363,37 @@ def _compute_gmm_corrected_meat(
"_compute_gmm_corrected_meat: vcov_type='conley' requires "
"conley_coords, conley_cutoff_km, and conley_metric."
)
- # Delegate to the shared kernel-application helper. No finite-sample
- # multiplier on the Conley path (matches conleyreg / Wave B convention).
- meat = _compute_conley_meat(
- Psi,
- conley_coords,
- conley_cutoff_km,
- conley_metric,
- conley_kernel,
- time=conley_time,
- unit=conley_unit,
- lag_cutoff=conley_lag_cutoff,
- cluster_ids=cluster_ids,
- )
+ if resolved_survey is not None:
+ # Wave E.2: stratified-Conley sandwich on PSU totals. cluster_ids
+ # is intentionally NOT threaded through — after PSU aggregation
+ # every PSU is its own cluster, so a cluster product kernel
+ # would zero all cross-PSU pairs. Wave E.1's
+ # _resolve_effective_cluster path already coerced any
+ # user-supplied cluster= into PSU upstream.
+ meat = _compute_stratified_conley_meat(
+ Psi,
+ conley_coords=np.asarray(conley_coords, dtype=np.float64),
+ conley_cutoff_km=conley_cutoff_km,
+ conley_metric=conley_metric,
+ conley_kernel=conley_kernel,
+ resolved_survey=resolved_survey,
+ conley_time=conley_time, # panel-aware per-period sandwich
+ )
+ else:
+ # Wave D no-survey Conley path UNCHANGED — bit-identical fallback.
+ # No finite-sample multiplier on the Conley path (matches conleyreg
+ # / Wave B convention).
+ meat = _compute_conley_meat(
+ Psi,
+ conley_coords,
+ conley_cutoff_km,
+ conley_metric,
+ conley_kernel,
+ time=conley_time,
+ unit=conley_unit,
+ lag_cutoff=conley_lag_cutoff,
+ cluster_ids=cluster_ids,
+ )
else:
raise ValueError(
f"_compute_gmm_corrected_meat: vcov_type must be one of "
@@ -514,6 +537,263 @@ def _compute_binder_tsl_meat(
return meat
+def _compute_stratified_conley_meat(
+ Psi: np.ndarray,
+ *,
+ conley_coords: np.ndarray,
+ conley_cutoff_km: float,
+ conley_metric,
+ conley_kernel: str,
+ resolved_survey: "ResolvedSurveyDesign",
+ conley_time: Optional[np.ndarray] = None,
+) -> np.ndarray:
+ """Wave E.2 panel-aware stratified-Conley meat on PSU-by-time scores.
+
+ Composes Conley (1999) spatial-HAC with Gerber (2026, arXiv:2605.04124)
+ Proposition 1 Binder TSL (the Wave E.1 foundation) and the Wave D
+ Gardner GMM first-stage uncertainty correction (Butts 2021 ss3.1 +
+ Gardner 2022 ss4) applied to SpilloverDiD's ring-indicator stage-2
+ design. No reference software combines all three ingredients on a
+ two-stage influence function.
+
+ **Panel-aware composition (preserves the library's panel Conley
+ contract):** for each period ``t``, aggregate per-obs Psi to PSU
+ totals WITHIN that period (``S_psu_t[g] = sum_{i in PSU g, time t}
+ Psi[i]``); derive each PSU's spatial centroid as the mean of
+ per-observation ``conley_coords`` (panel-constant — PSU is a sampling
+ unit with fixed location); apply the per-stratum Conley sandwich on
+ ``S_psu_t`` via
+ :func:`diff_diff.survey._compute_stratified_conley_meat_from_psu_scores`
+ (Binder FPC factor ``(1 - f_h) * n_h/(n_h-1)``); sum across periods.
+ Cross-period spatial pairs are excluded by construction, matching the
+ library's existing ``conley_lag_cutoff = 0`` semantic ("within-period
+ spatial only") at :func:`diff_diff.conley._compute_conley_meat`.
+ Cross-stratum kernel weights are zero by sampling design (strata are
+ exact independence partitions).
+
+ Parameters
+ ----------
+ Psi : np.ndarray of shape (n, p_2)
+ Per-obs Wave D Gardner GMM influence-function scores (already
+ Hajek-weighted via the Wave E.1 upstream eps multiplication).
+ conley_coords : np.ndarray of shape (n, 2)
+ Per-observation lat/lon (or generic 2D coordinates). Already
+ validated finite upstream at ``spillover.py:_validate_spillover_inputs``;
+ no defensive finiteness check on derived PSU centroids.
+ conley_cutoff_km : float
+ Conley spatial-HAC bandwidth in km (haversine) or the
+ coord units (euclidean / callable).
+ conley_metric : ConleyMetric
+ ``"haversine"`` / ``"euclidean"`` / callable, per
+ :mod:`diff_diff.conley`.
+ conley_kernel : str
+ ``"bartlett"`` or ``"uniform"``.
+ resolved_survey : ResolvedSurveyDesign
+ ``.psu`` may be None; when absent, each observation is treated as
+ its own singleton PSU (matches the implicit-PSU convention of
+ :class:`ResolvedSurveyDesign` no-PSU branches). ``.strata`` and
+ ``.fpc`` are optional; absent strata synthesize a single stratum.
+ conley_time : np.ndarray of shape (n,), optional
+ Per-observation period label. When None, all observations are
+ treated as a single period (T = 1; the per-period loop reduces to
+ one iteration on the full Psi, which is the cross-sectional
+ Wave E.2 design). When provided (the standard SpilloverDiD case),
+ the per-period loop preserves the within-period spatial semantic.
+
+ Returns
+ -------
+ meat : np.ndarray of shape (p_2, p_2)
+ Wave E.2 panel-aware stratified-Conley meat
+ (``sum_t meat_t`` where ``meat_t`` is the within-stratum Conley
+ sandwich on the period-``t`` PSU totals).
+
+ Notes
+ -----
+ ``cluster_ids`` is intentionally not accepted: after PSU aggregation
+ every PSU is its own cluster, so threading a cluster product kernel
+ into the inner :func:`_compute_stratified_conley_meat_from_psu_scores`
+ would zero all cross-PSU pairs (``1{cluster_j == cluster_k}`` = 0 for
+ j != k). The Wave E.1 ``_resolve_effective_cluster`` path already
+ collapsed any user-supplied ``cluster=`` into PSU upstream.
+
+ NaN-fails (with ``UserWarning``) when the inner survey helper
+ returns ``(False, 0)`` for every period — i.e. no stratum contributed
+ variance and none was a legitimate zero across any period. Mirrors the
+ Wave E.1 Binder TSL saturation behavior; departs from TwoStageDiD's
+ silent NaN-VCV at ``two_stage.py:2003-2005`` per
+ ``feedback_no_silent_failures``.
+
+ Reductions:
+
+ - ``T = 1`` (single period or ``conley_time is None``): single-pass
+ stratified-Conley sandwich on the full PSU totals (the original
+ cross-sectional Wave E.2 design).
+ - ``H = 1`` stratum, ``FPC = inf``: reduces to ``sum_t`` plain
+ Conley sandwich on per-period PSU totals.
+ - Bandwidth -> 0 (``K = I``): reduces to ``sum_t`` per-period
+ within-stratum HC sandwich on PSU totals (NOT Wave E.1 Binder,
+ which is over time-collapsed PSU totals).
+
+ Out of scope (deferred follow-up, tracked in TODO.md):
+
+ - ``conley_lag_cutoff > 0`` panel-block: the within-PSU serial
+ Bartlett HAC over time would compose with the spatial sandwich
+ here. Rejected upfront at ``SpilloverDiD.fit``.
+ """
+ from diff_diff.survey import _compute_stratified_conley_meat_from_psu_scores
+
+ p_2 = Psi.shape[1]
+ n_obs = Psi.shape[0]
+ coords_arr = np.asarray(conley_coords, dtype=np.float64)
+
+ # No-PSU fallback: each obs is its own singleton PSU. Matches Wave E.1
+ # Binder TSL convention at _compute_binder_tsl_meat L450-451.
+ if resolved_survey.psu is None:
+ psu_arr: np.ndarray = np.arange(n_obs, dtype=np.int64)
+ else:
+ psu_arr = np.asarray(resolved_survey.psu)
+ strata_arr_full = (
+ np.asarray(resolved_survey.strata) if resolved_survey.strata is not None else None
+ )
+ fpc_arr_full = (
+ np.asarray(resolved_survey.fpc, dtype=np.float64)
+ if resolved_survey.fpc is not None
+ else None
+ )
+
+ # Panel-constant PSU centroids for explicit-PSU layouts (R4 P1 fix).
+ # The Wave E.2 registry / api contract specifies
+ # ``centroid_g = mean over i in PSU g of conley_coords[i]`` (panel-wide,
+ # not per-period). For a PSU containing multiple units at different
+ # coordinates with finite_mask dropping different members across
+ # periods, per-period recomputation would silently shift the spatial
+ # kernel weights — that would be a documented-contract violation.
+ # Compute once on the full active sample so each period's helper call
+ # sees the SAME centroid for the same PSU.
+ #
+ # For implicit-PSU (pseudo-PSU = obs index), every pseudo-PSU appears
+ # in exactly one period, so the per-period slice naturally produces
+ # the obs's own coordinate as that pseudo-PSU's centroid — no precompute
+ # needed. The dictionary stays None on that branch.
+ coord_dim = coords_arr.shape[1]
+ psu_value_to_centroid: Optional[dict] = None
+ if resolved_survey.psu is not None:
+ unique_psus_full, _, psu_indices_full = np.unique(
+ psu_arr, return_index=True, return_inverse=True
+ )
+ G_full = len(unique_psus_full)
+ psu_coord_sums_full = np.zeros((G_full, coord_dim))
+ for d in range(coord_dim):
+ np.add.at(psu_coord_sums_full[:, d], psu_indices_full, coords_arr[:, d])
+ psu_counts_full = np.bincount(psu_indices_full, minlength=G_full).astype(np.float64)
+ psu_centroids_full = psu_coord_sums_full / psu_counts_full[:, None]
+ psu_value_to_centroid = {unique_psus_full[g]: psu_centroids_full[g] for g in range(G_full)}
+
+ # Per-period loop: preserves the library's "within-period spatial only"
+ # contract for conley_lag_cutoff = 0. PSU set, centroids, strata, and
+ # FPC are re-built from the ACTIVE rows in each period (not from the
+ # full panel) so implicit-PSU layouts (`resolved_survey.psu is None`,
+ # i.e. one pseudo-PSU per observation) don't drag off-period
+ # zero-padded entries into the kernel via centering. For explicit-PSU
+ # balanced-panel layouts the per-period centroids equal the
+ # panel-constant centroids (obs coords are time-invariant), so this
+ # re-indexing is bit-identical to the prior naive panel-wide PSU
+ # mapping on that branch.
+ if conley_time is None:
+ # Treat all obs as one period (cross-sectional fallback).
+ time_arr = np.zeros(n_obs, dtype=np.int64)
+ else:
+ time_arr = np.asarray(conley_time)
+ unique_times = np.unique(time_arr)
+
+ # Saturation guard for unstratified single-PSU on the FULL panel.
+ # The per-period helper invocation will also NaN-fail when no period
+ # contributes variance, but this front-door check matches Wave E.1's
+ # ergonomic "df_survey is undefined" message for the panel-level
+ # degenerate case.
+ if strata_arr_full is None and len(np.unique(psu_arr)) < 2:
+ G_total = len(np.unique(psu_arr))
+ warnings.warn(
+ "SpilloverDiD Wave E.2 stratified-Conley sandwich: df_survey is "
+ f"undefined (single PSU, no strata; G={G_total}). Returning NaN "
+ "meat so downstream inference NaN-propagates.",
+ UserWarning,
+ stacklevel=2,
+ )
+ return np.full((p_2, p_2), np.nan)
+
+ meat = np.zeros((p_2, p_2))
+ _variance_computed = False
+ _legit_zero = 0
+ for t in unique_times:
+ period_mask = time_arr == t
+ Psi_t = Psi[period_mask]
+ psu_arr_t = psu_arr[period_mask]
+ coords_arr_t = coords_arr[period_mask]
+ unique_psus_t, first_idx_t, psu_indices_t = np.unique(
+ psu_arr_t, return_index=True, return_inverse=True
+ )
+ G_t = len(unique_psus_t)
+
+ # Per-period PSU totals.
+ S_psu_t = np.zeros((G_t, p_2))
+ for j in range(p_2):
+ np.add.at(S_psu_t[:, j], psu_indices_t, Psi_t[:, j])
+
+ # Per-period PSU centroids: panel-constant for explicit-PSU
+ # (look up from the precomputed dict to match the documented
+ # ``centroid_g = mean over i in PSU g of conley_coords[i]``
+ # panel-wide contract); per-period mean for implicit-PSU
+ # (pseudo-PSU = obs, each appears in exactly one period, so the
+ # per-period mean IS the obs's own coord).
+ if psu_value_to_centroid is not None:
+ psu_centroids_t = np.array([psu_value_to_centroid[v] for v in unique_psus_t])
+ else:
+ psu_coord_sums_t = np.zeros((G_t, coord_dim))
+ for d in range(coord_dim):
+ np.add.at(psu_coord_sums_t[:, d], psu_indices_t, coords_arr_t[:, d])
+ psu_counts_t = np.bincount(psu_indices_t, minlength=G_t).astype(np.float64)
+ psu_centroids_t = psu_coord_sums_t / psu_counts_t[:, None]
+
+ # Per-period strata + fpc.
+ if strata_arr_full is not None:
+ psu_strata_t = strata_arr_full[period_mask][first_idx_t]
+ else:
+ psu_strata_t = np.zeros(G_t, dtype=int)
+ psu_fpc_t: Optional[np.ndarray] = None
+ if fpc_arr_full is not None:
+ psu_fpc_t = fpc_arr_full[period_mask][first_idx_t]
+
+ # Stratified Conley sandwich for period t.
+ meat_t, var_t, legit_zero_t = _compute_stratified_conley_meat_from_psu_scores(
+ psu_scores=S_psu_t,
+ psu_strata=psu_strata_t,
+ psu_coords=psu_centroids_t,
+ cutoff=conley_cutoff_km,
+ metric=conley_metric,
+ kernel=conley_kernel,
+ fpc_per_psu=psu_fpc_t,
+ lonely_psu=resolved_survey.lonely_psu,
+ )
+ meat += meat_t
+ _variance_computed = _variance_computed or var_t
+ _legit_zero += legit_zero_t
+
+ # Wave E.2 survey-saturated NaN-fail per `feedback_no_silent_failures`.
+ if not _variance_computed and _legit_zero == 0:
+ warnings.warn(
+ "SpilloverDiD Wave E.2 stratified-Conley sandwich: df_survey = 0 "
+ "(all strata removed by lonely_psu='remove' on single-PSU "
+ "strata; no PSU contributed to the meat). Returning NaN meat "
+ "so downstream inference NaN-propagates.",
+ UserWarning,
+ stacklevel=2,
+ )
+ return np.full((p_2, p_2), np.nan)
+
+ return meat
+
+
# =============================================================================
# Main Estimator
# =============================================================================
diff --git a/docs/api/spillover.rst b/docs/api/spillover.rst
index a78f4f25..5f883619 100644
--- a/docs/api/spillover.rst
+++ b/docs/api/spillover.rst
@@ -243,10 +243,6 @@ and planned follow-up enhancements:
Restrictions:
- - ``vcov_type="conley" + survey_design=`` raises
- ``NotImplementedError``; Wave E.2 (planned) will add the Conley × survey
- product-kernel synthesis with within-stratum Conley sandwich on PSU
- totals.
- Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises
``NotImplementedError``; per Gerber (2026) Appendix A, the
IF-reweighting shortcut does not apply because ``gamma_hat`` is
@@ -266,6 +262,70 @@ and planned follow-up enhancements:
centering reflect the reduced fit sample rather than the full
domain design. See the REGISTRY note for details and the Wave E.3
follow-up tracked in ``TODO.md``.
+- **Survey-design integration (Wave E.2 — Conley × survey via
+ stratified-Conley sandwich on PSU totals).** SHIPPED in Wave E.2.
+ ``vcov_type="conley" + survey_design=`` is now supported via a
+ per-stratum Conley sandwich applied to PSU-aggregated Wave D Gardner
+ GMM influence functions.
+
+ .. note::
+
+ Wave E.2 composes Conley (1999) spatial-HAC with Gerber (2026,
+ arXiv:2605.04124) Proposition 1 Binder TSL (the Wave E.1 foundation)
+ and the Wave D Gardner GMM first-stage uncertainty correction
+ (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's
+ ring-indicator stage-2 design. The composition is **panel-aware** —
+ it preserves the library's existing ``conley_lag_cutoff = 0``
+ semantic ("within-period spatial only, exclude cross-period pairs")
+ by looping over periods and aggregating Psi to PSU totals WITHIN
+ each period (not over the whole panel). For each period ``t``,
+ ``S_psu_t[g] = sum_{i in PSU g, time t} psi_i``; per-PSU centroids
+ are panel-constant (mean of per-observation ``conley_coords``);
+ for each stratum the within-stratum sandwich is
+ ``M_h_t = (1 - f_h) * n_h/(n_h-1) * sum_{j,k in PSUs_h}
+ K(d(centroid_j, centroid_k) / cutoff) *
+ (S_psu_t[j] - S_bar_h_t)(S_psu_t[k] - S_bar_h_t)'``, where K is the
+ Bartlett kernel (SpilloverDiD currently exposes Bartlett only and
+ hardcodes it at the fit-call site; the survey helper's ``kernel``
+ parameter can also take ``"uniform"``, but exposing that on the
+ SpilloverDiD constructor is a separate follow-up). Cross-stratum
+ kernel weights are exactly zero by sampling design (strata are
+ exact independence partitions). Total meat is ``sum_t sum_h M_h_t``.
+ Cross-period spatial pairs are excluded by construction. No
+ reference software combines all three ingredients on a two-stage
+ influence function.
+
+ Reduction semantics:
+
+ - Per-period sum invariant: ``sum_t`` of per-period within-stratum
+ stratified-Conley sandwiches on per-period PSU totals. Pinned at
+ ``tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant``
+ (pure unit test on the orchestrator + helper composition).
+ - Single stratum (H = 1, FPC = inf): reduces to ``sum_t`` plain
+ Conley sandwich on per-period PSU totals (NOT on time-collapsed
+ PSU totals — the per-period loop preserves ``lag_cutoff = 0``
+ semantics).
+ - All PSUs singleton + ``lonely_psu="remove"``: ``df_survey = 0`` and
+ the stratified-Conley meat NaN-fails (matches Wave E.1 saturation
+ behaviour, with ``UserWarning`` template "Wave E.2 stratified-Conley
+ sandwich: df_survey = 0...").
+
+ Restrictions:
+
+ - Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises
+ ``NotImplementedError`` (inherits Wave E.1 gate; per-replicate refit
+ is separate follow-up scope).
+ - ``cluster= + survey_design.psu + vcov_type="conley"``:
+ ``cluster=`` is coerced to PSU per Wave E.1's warn-and-use-PSU
+ pattern; the Conley cluster product kernel becomes a no-op after
+ PSU aggregation.
+ - The LinearRegression-side ``vcov_type="conley" + survey_design=``
+ gate at ``diff_diff/linalg.py`` is a separate Bertanha-Imbens 2014
+ weighted-Conley roadmap (not Wave E).
+ - DiagnosticReport routing for ``SpilloverDiDResults(vcov_type="conley",
+ survey_design=)`` is queued for a follow-up (the
+ ``_APPLICABILITY`` / ``_PT_METHOD`` wiring must register the new
+ combination first).
- **Count-of-treated-in-ring** — only the "nearest-treated ring"
specification is implemented. The "count" form re-introduces
functional-form dependence (paper Section 3.2 end) and is queued.
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index ce624616..62b2223a 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -3215,7 +3215,7 @@ The `_compute_stratified_meat_from_psu_scores` helper at `diff_diff/survey.py` i
Degrees of freedom for the t-distribution lookup use `ResolvedSurveyDesign.df_survey` (the standard survey 4-way branch: PSU+strata → `n_PSU - n_strata`; PSU only → `n_PSU - 1`; strata only → `n_obs - n_strata`; neither → `n_obs - 1`). Threaded through all four `safe_inference` call sites: aggregate `tau_total`, per-ring `delta_j`, event-study per-event-time `tau_k` / `delta_jk`, and the scalar `att` lincom in event-study mode.
-- **Note (documented synthesis):** Wave E.1 composes Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper's Appendix — with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. The composition is mechanical: SpilloverDiD's Wave D Psi is aggregated to PSU level and passed to the audited Binder TSL meat helper. Survey weights enter via Hájek normalization at the gamma_hat solve, eps construction, and bread inversion. No reference software combines all ingredients; Wave E.2 (planned) will extend with the Conley × survey product-kernel composition.
+- **Note (documented synthesis):** Wave E.1 composes Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper's Appendix — with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. The composition is mechanical: SpilloverDiD's Wave D Psi is aggregated to PSU level and passed to the audited Binder TSL meat helper. Survey weights enter via Hájek normalization at the gamma_hat solve, eps construction, and bread inversion. No reference software combines all ingredients; Wave E.2 extends with the Conley × survey product-kernel composition — see "Variance (Wave E.2)" subsection below.
- **Note (warn-and-use-PSU for cluster + survey):** when both `cluster=` and `survey_design.psu` are supplied with **different groupings**, the cluster argument emits a `UserWarning` and is overridden by PSU (mirrors `TwoStageDiD._resolve_effective_cluster`). PSU is the design-relevant cluster on survey panels; `cluster=` on SpilloverDiD is more often a spatial / unit-level label, so the design constraint wins. When both knobs are supplied with the **same** groupings, no warning fires and PSU still takes precedence (the inference is unchanged either way).
- **Note (limitation — `SurveyDesign.subpopulation()` with FE-undefined zero-weight rows):** when `survey_design` is built via `SurveyDesign.subpopulation()` (or otherwise carries zero-weight padding rows) AND those zero-weight rows lose stage-1 FE support (warn-and-drop unit path), Wave E.1's `finite_mask` block physically removes them from the survey design rather than retaining them as zero-score padding. Consequently `n_psu`, `df_survey`, and the Binder TSL centering are recomputed on the reduced fit sample rather than the full domain design. Standard domain-estimation practice (e.g. R's `survey::svyrecvar` on a `subset()` design) preserves the original PSU/strata counts. Practitioners using subpopulation-derived designs should expect SEs that may differ slightly from textbook domain expectations on warn-and-drop fits. Tracked as Wave E.3 follow-up — see TODO.md.
- **Note (saturated `df_survey = 0` NaN-fail):** when `lonely_psu="remove"` removes all strata (single PSU per stratum), `_compute_stratified_meat_from_psu_scores` returns `(_, var_computed=False, legit_zero=0)`. SpilloverDiD's Wave E.1 path returns NaN meat with a `UserWarning` matching `"df_survey"` so callers can pin via `pytest.warns(UserWarning, match="df_survey")`. This is a **departure from TwoStageDiD** (`two_stage.py:2003-2005`) which currently NaN-fails SILENTLY; Wave E.1 surfaces the diagnostic per `feedback_no_silent_failures`.
@@ -3224,6 +3224,29 @@ Degrees of freedom for the t-distribution lookup use `ResolvedSurveyDesign.df_su
**Implementation:** `_compute_gmm_corrected_meat` extended with `survey_weights` + `resolved_survey` kwargs at `diff_diff/two_stage.py:56`; new module-level helper `_compute_binder_tsl_meat` at `diff_diff/two_stage.py` wraps `_compute_stratified_meat_from_psu_scores` with the Wave E.1 NaN-fail + warning. `_iterative_fe_subset` weighted path at `diff_diff/spillover.py:1382` (in-place extension, bit-identical fallback). `SpilloverDiDResults` extended with `survey_metadata`, `n_psu`, `n_strata` fields at `diff_diff/results.py`. Tests: `TestSpilloverDiDWaveE1SurveyDesignHc1` + `TestSpilloverDiDWaveE1SurveyDesignEventStudy` at `tests/test_spillover.py`.
+### Variance (Wave E.2 — Conley × survey via stratified-Conley sandwich on PSU totals)
+
+`vcov_type="conley" + survey_design=` is now supported via a per-stratum Conley sandwich applied to PSU-aggregated Wave D Gardner GMM influence functions. SHIPPED in Wave E.2.
+
+- **Note (documented synthesis):** Wave E.2 composes Conley (1999) spatial-HAC with Gerber (2026, arXiv:2605.04124) Proposition 1 Binder TSL (the Wave E.1 foundation) and the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. The composition is **panel-aware** — it preserves the library's existing `conley_lag_cutoff = 0` semantic ("within-period spatial only — exclude cross-period pairs") at `diff_diff.conley._compute_conley_meat`. Per-PSU centroids are computed as `centroid_g = mean over i in PSU g of conley_coords[i]` (panel-constant — PSU is a sampling unit with fixed location). For each period `t`, SpilloverDiD's per-obs Wave D IF `psi_i` is aggregated to per-period PSU totals `S_psu_t[g] = sum_{i in PSU g, time t} psi_i`; the within-stratum sandwich is `M_h_t = (1 - f_h) * n_h/(n_h-1) * sum_{j,k in PSUs_h} K(d(centroid_j, centroid_k) / cutoff) * (S_psu_t[j] - S_bar_h_t)(S_psu_t[k] - S_bar_h_t)'`, where K is the Bartlett kernel (SpilloverDiD currently exposes Bartlett only and hardcodes it at the fit-call site; the survey helper's `kernel` parameter can also take `"uniform"`, but exposing that on the SpilloverDiD constructor is a separate follow-up) and `d` is haversine / euclidean / callable per `ConleyMetric`. Cross-stratum kernel weights are exactly zero by sampling design (strata are independence partitions). Total meat is `sum_t sum_h M_h_t`. Cross-period spatial pairs are excluded by construction — the per-period loop aggregates only within-period observations into each `S_psu_t`, matching the Wave D `conley_lag_cutoff = 0` block decomposition. No reference software combines all three ingredients (Conley spatial-HAC + Binder TSL + Gardner GMM correction) on a two-stage influence function.
+
+- **Reduction semantics (load-bearing for tests):**
+ - Per-period sum invariant: the orchestrator's panel-aware meat equals `sum_t` of per-period within-stratum stratified-Conley sandwiches on per-period PSU totals. Pinned at `tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant` (pure unit test on the orchestrator + helper composition).
+ - Single stratum (H = 1, FPC = inf): reduces to `sum_t` plain Conley sandwich on per-period PSU totals via `_compute_conley_meat(S_psu_t_centered, centroids, ...)`. Note this is NOT plain Conley on time-collapsed PSU totals — the per-period loop preserves the library's `lag_cutoff = 0` semantic.
+ - All PSUs singleton in their stratum + `lonely_psu="remove"`: `df_survey = 0` and the stratified-Conley meat NaN-fails (matches Wave E.1 saturation behaviour, with `UserWarning` template "Wave E.2 stratified-Conley sandwich: df_survey = 0...").
+ - Cross-stratum kernel weight is exactly zero (sampling-design assumption — no kernel pair crosses a stratum boundary).
+
+- **Note (singleton-stratum `lonely_psu="adjust"` FPC skip parity):** when a stratum has `n_h = 1` and `lonely_psu="adjust"`, the new `_compute_stratified_conley_meat_from_psu_scores` helper mirrors the Binder helper's `continue`-skip-FPC pattern exactly (the FPC scale `(1 - f_h) * n_h / (n_h - 1)` would divide by zero with `n_h = 1`). The degenerate one-PSU kernel `K = [[K(0)]] = [[1.0]]` reduces to `centered.T @ centered`, matching Binder's singleton-adjust contribution bit-identically.
+
+- **Cluster + Conley + survey routing:** `cluster= + survey_design.psu + vcov_type="conley"` coerces `cluster=` to PSU per Wave E.1's `_resolve_effective_cluster` warn-and-use-PSU pattern. The dispatch wrapper `_compute_stratified_conley_meat` intentionally does NOT thread `cluster_ids` into the inner Conley kernel call — after PSU aggregation every PSU is its own cluster, so a cluster product kernel `1{cluster_j == cluster_k}` would be zero for all `j != k` and the cross-PSU kernel weights would be silently dropped. The Wave E.2 architectural choice: PSU-aggregation handles within-PSU clustering exactly; cross-PSU spatial dependence enters via the kernel; cross-stratum independence is exact.
+
+- **Restrictions / out-of-scope (Wave E.2):**
+ - Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` (inherits Wave E.1 gate; per-replicate full refit is separate follow-up scope).
+ - LinearRegression-side `vcov_type="conley" + survey_design=` gate at `diff_diff/linalg.py:2853` remains (separate Bertanha-Imbens 2014 weighted-Conley "Phase 5" roadmap, not Wave E).
+ - DiagnosticReport routing for `SpilloverDiDResults(vcov_type="conley", survey_design=)` is queued for a follow-up Wave F PR — `_APPLICABILITY` / `_PT_METHOD` registration is required before the new combination can be claimed consumable downstream (per `feedback_audit_diagnostic_report_wiring_before_claim`).
+
+**Implementation:** new `_compute_stratified_conley_meat_from_psu_scores` helper in `diff_diff/survey.py` (parallel to existing Binder helper; 3-tuple `(meat, variance_computed, legitimate_zero_count)` return contract; per-stratum loop replaces the inner `centered.T @ centered` with `_compute_conley_meat(centered, coords_h, cutoff, metric, kernel)` cross-sectional mode). New dispatch wrapper `_compute_stratified_conley_meat` in `diff_diff/two_stage.py` (parallel to `_compute_binder_tsl_meat`; per-obs Psi → PSU aggregation via `np.add.at` + PSU centroid derivation via vectorized `np.add.at` sums / `np.bincount` counts + dispatch to survey helper; intentionally no `cluster_ids` parameter). `_compute_gmm_corrected_meat` conley branch extended at `diff_diff/two_stage.py` with `if resolved_survey is not None` routing to the new wrapper; the `resolved_survey is None` branch is bit-identical to Wave D no-survey Conley. Saturation NaN-fail mirrors Wave E.1 (`UserWarning` template "Wave E.2 stratified-Conley sandwich: df_survey = 0..."). Wave E.1 stage-1 weighted FE solver, `finite_mask` survey-array subsetting, `df_survey` threading to `safe_inference` call sites, bread weighting, and `SpilloverDiDResults` survey metadata are all inherited UNCHANGED — Psi construction is bit-identical regardless of `vcov_type`. Tests: `TestSpilloverDiDWaveE2ConleySurveyDesign` + `TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy` at `tests/test_spillover.py`.
+
**Edge cases (from paper Section 3.2 / Discussion):**
| # | Edge case | Handling |
@@ -3239,7 +3262,7 @@ Degrees of freedom for the t-distribution lookup use `ResolvedSurveyDesign.df_su
**Restrictions / deferred features:**
- `event_study=True` SHIPPED in Wave C — see Event-study mode subsection above. Emits `att_dynamic`, MultiIndex `spillover_effects`, and a TwoStageDiD-compatible `event_study_effects` dict alias.
-- `survey_design=` for `vcov_type ∈ {"hc1"}` (plus `cluster=` for CR1) SHIPPED in Wave E.1 — see "Variance (Wave E.1)" subsection below. Threads Hájek-normalized survey weights through stage-1 FE estimation, gamma_hat solve, eps construction, and bread inversion; aggregates the Wave D Psi to PSU totals and routes through the audited `_compute_stratified_meat_from_psu_scores` Binder TSL meat helper. `vcov_type="conley"` combined with `survey_design=` still raises `NotImplementedError` and points at planned Wave E.2 (Conley × survey product-kernel composition with within-stratum Conley sandwich on PSU totals). Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` — Gerber (2026) Appendix A notes the IF-reweighting shortcut does NOT apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit and is queued as a follow-up.
+- `survey_design=` for `vcov_type ∈ {"hc1"}` (plus `cluster=` for CR1) SHIPPED in Wave E.1 — see "Variance (Wave E.1)" subsection below. Threads Hájek-normalized survey weights through stage-1 FE estimation, gamma_hat solve, eps construction, and bread inversion; aggregates the Wave D Psi to PSU totals and routes through the audited `_compute_stratified_meat_from_psu_scores` Binder TSL meat helper. `vcov_type="conley"` combined with `survey_design=` SHIPPED in Wave E.2 — see "Variance (Wave E.2)" subsection below (stratified-Conley sandwich on PSU totals). Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` — Gerber (2026) Appendix A notes the IF-reweighting shortcut does NOT apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit and is queued as a follow-up.
- `covariates=` raises `NotImplementedError` — Gardner-style stage-1 residualization not yet wired through; planned follow-up.
- `ring_method="count"` not exposed — only the nearest-treated-ring specification.
- `vcov_type` ∈ {`"hc2"`, `"hc2_bm"`, `"classical"`} raises `NotImplementedError` — `hc2`/`hc2_bm` because current stage-2 inference uses generic residual df rather than per-coefficient Bell-McCaffrey / CR2 DOF; `classical` because the Wave D Gardner GMM first-stage correction has not been derived for the classical homoskedastic variance (different meat structure `sigma_hat^2 * (X_10' X_10)` vs the Wave D IF outer product `Psi' Psi`). Use `"hc1"` or `"conley"`, or pair with `cluster=` for CR1 — all three apply the Wave D GMM correction.
diff --git a/docs/references.rst b/docs/references.rst
index 21410d98..d8acabdb 100644
--- a/docs/references.rst
+++ b/docs/references.rst
@@ -190,6 +190,10 @@ Multi-Period and Staggered Adoption
Identifies the ring-indicator estimator implemented in our ``SpilloverDiD`` class. Section 2-3 covers non-staggered timing (Equations 5/6/8); Section 5 covers staggered timing via two-stage Gardner (Table 2). Section 3.1 (page 13) recommends Conley spatial-HAC for inference with cutoff = ``d_bar``.
+- **Conley, T. G. (1999).** "GMM Estimation with Cross Sectional Dependence." *Journal of Econometrics*, 92(1), 1-45. https://doi.org/10.1016/S0304-4076(98)00084-0
+
+ Primary source for the Conley spatial-HAC variance estimator. Equations 5-9 derive the spatial-kernel cross-product meat. Our ``diff_diff/conley.py`` implements the practitioner specializations (Bartlett / uniform kernels with haversine / euclidean metrics) cited in our ``SpilloverDiD`` Wave A/D Conley path and composed with Binder TSL + Gardner GMM in Wave E.2 (``_compute_stratified_conley_meat`` at ``diff_diff/two_stage.py``).
+
- **de Chaisemartin, C., & D'Haultfœuille, X. (2020).** "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects." *American Economic Review*, 110(9), 2964-2996. https://doi.org/10.1257/aer.20181169
- **de Chaisemartin, C., & D'Haultfœuille, X. (2022, revised 2024).** "Difference-in-Differences Estimators of Intertemporal Treatment Effects." *NBER Working Paper* 29873. https://www.nber.org/papers/w29873
diff --git a/tests/test_spillover.py b/tests/test_spillover.py
index dd02aaf5..81e4f844 100644
--- a/tests/test_spillover.py
+++ b/tests/test_spillover.py
@@ -577,43 +577,6 @@ def test_set_params_rejects_unknown_key(self):
with pytest.raises(ValueError, match="Unknown parameter"):
est.set_params(nonexistent_kwarg=42)
- def test_fit_conley_plus_survey_design_not_implemented(self):
- """Wave E.1 ships HC1 / CR1 + survey_design; Conley × survey is
- deferred to Wave E.2 (the novel within-stratum Conley sandwich on
- PSU totals). Confirm the upfront rejection points at the planned
- follow-up PR.
- """
- from diff_diff import SurveyDesign
-
- est = SpilloverDiD(
- rings=[0.0, 50.0],
- conley_coords=("lat", "lon"),
- conley_metric="euclidean",
- conley_cutoff_km=100.0,
- vcov_type="conley",
- )
- df = pd.DataFrame(
- {
- "unit": ["A", "A"],
- "time": [0, 1],
- "y": [1.0, 2.0],
- "D": [0, 1],
- "lat": [0.0, 0.0],
- "lon": [0.0, 0.0],
- "w": [1.0, 1.0],
- "psu": [0, 0],
- }
- )
- with pytest.raises(NotImplementedError, match="Wave E.2"):
- est.fit(
- df,
- outcome="y",
- unit="unit",
- time="time",
- treatment="D",
- survey_design=SurveyDesign(weights="w", psu="psu"),
- )
-
# =============================================================================
# Step 3: Two-stage Gardner fit() integration
@@ -5684,3 +5647,935 @@ def test_o_drift_golden(self):
assert res.n_psu == 8
assert res.n_strata == 2
assert res.survey_metadata.df_survey == 6
+
+
+class TestSpilloverDiDWaveE2ConleySurveyDesign:
+ """Wave E.2 conley + survey via stratified-Conley sandwich on PSU totals.
+
+ Methodology anchor: Conley (1999) spatial-HAC composed with Gerber
+ (2026) Prop 1 Binder TSL (Wave E.1 foundation) and the Wave D Gardner
+ GMM correction. Verifies reduction semantics (bandwidth -> 0 ≡ Binder;
+ H=1 ≡ plain Conley on PSU totals), cross-stratum independence,
+ singleton-adjust FPC skip parity with Binder, and the saturation
+ NaN-fail.
+ """
+
+ _CUTOFF_KM = 1000.0 # large enough that within-stratum PSU pairs are inside
+
+ def _fit(self, df, **kwargs):
+ design = kwargs.pop("design", None)
+ est = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=self._CUTOFF_KM,
+ conley_lag_cutoff=0,
+ vcov_type="conley",
+ event_study=False,
+ **kwargs,
+ )
+ return est.fit(
+ df,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+
+ def test_a_no_survey_conley_path_matches_wave_d_golden(self):
+ """The `resolved_survey is None` branch of the new dispatch must
+ produce the SAME no-survey Conley SE as the pre-Wave-E.2 (Wave D)
+ Conley path. The Wave D path is `_compute_conley_meat(...)` with
+ no changes; the new dispatch only ADDS an `if resolved_survey is
+ not None` branch above the existing call. Pin the SE to a golden
+ captured on this fixture so any future refactor that disturbs the
+ no-survey path is caught by a behavioral test, not just by
+ determinism.
+ """
+ df = generate_butts_nonstaggered_dgp(seed=0)
+ res = self._fit(df)
+ # Wave D no-survey Conley golden captured on this fixture (seed=0,
+ # 2-period non-staggered Butts DGP, cutoff=1000 km, Bartlett kernel).
+ # These values reflect the pre-Wave-E.2 no-survey Conley path.
+ # The dispatch in `_compute_gmm_corrected_meat` only ADDS a new
+ # `if resolved_survey is not None` branch above the existing
+ # `_compute_conley_meat` call, so the `resolved_survey is None`
+ # path is bit-identical to Wave D; any future refactor that
+ # disturbs it must update these goldens deliberately.
+ _WAVE_D_NO_SURVEY_CONLEY_ATT = -0.07471658104745109
+ _WAVE_D_NO_SURVEY_CONLEY_SE = 0.0018453344099259904
+ np.testing.assert_allclose(res.att, _WAVE_D_NO_SURVEY_CONLEY_ATT, rtol=1e-12, atol=1e-14)
+ np.testing.assert_allclose(res.se, _WAVE_D_NO_SURVEY_CONLEY_SE, rtol=1e-12, atol=1e-14)
+ assert np.isfinite(res.se) and res.se > 0
+
+ def test_a2_no_survey_conley_path_routes_through_wave_d_helper(self):
+ """Structural anchor: a no-survey conley fit invokes the Wave D
+ `_compute_conley_meat` helper directly, NOT the Wave E.2
+ `_compute_stratified_conley_meat` orchestrator. Pins the dispatch
+ branch in `_compute_gmm_corrected_meat` (no leak into the new
+ path when `resolved_survey is None`).
+ """
+ from unittest.mock import patch
+
+ df = generate_butts_nonstaggered_dgp(seed=2)
+ with patch("diff_diff.two_stage._compute_stratified_conley_meat") as mock_panel_aware:
+ self._fit(df)
+ assert not mock_panel_aware.called, (
+ "No-survey conley fit must NOT call _compute_stratified_conley_meat "
+ "(the Wave E.2 panel-aware survey path); it should route through "
+ "the Wave D _compute_conley_meat directly."
+ )
+
+ def test_b_panel_aware_per_period_sum_invariant(self):
+ """Panel-aware Wave E.2 meat == sum-across-periods of per-period
+ within-stratum Conley sandwich on per-period PSU totals.
+
+ Pure unit test on the orchestrator + helper composition: with T
+ periods of synthetic PSU-level data, ``_compute_stratified_conley_meat``'s
+ per-period loop must produce the same result as manually calling
+ the survey helper T times (once per period, on per-period PSU
+ totals) and summing. This pins the library's panel Conley contract
+ (``conley_lag_cutoff = 0`` means "within-period spatial only") on
+ the survey path — no cross-period spatial pairs leak through the
+ collapsed PSU totals.
+
+ Replaces the original "bandwidth → 0 reduces to Wave E.1 Binder"
+ claim, which only holds under T=1 (the cross-sectional limit).
+ SpilloverDiD's panel-only contract precludes a T=1 fit, so the
+ Wave E.1-equivalence claim is meaningful only on this synthetic
+ unit-test fixture.
+ """
+ from diff_diff.survey import (
+ ResolvedSurveyDesign,
+ _compute_stratified_conley_meat_from_psu_scores,
+ )
+ from diff_diff.two_stage import _compute_stratified_conley_meat
+
+ rng = np.random.default_rng(31)
+ # 4 PSUs × 2 periods × 3 obs per PSU-period = 24 obs.
+ n_obs, T, G, p_2 = 24, 2, 4, 3
+ obs_per_psu_period = 3
+ psu_id = np.repeat(np.arange(G), obs_per_psu_period * T)
+ time_arr = np.tile(np.repeat(np.arange(T), obs_per_psu_period), G)
+ Psi = rng.standard_normal((n_obs, p_2))
+ psu_centroids = np.array([[40.0, -120.0], [40.1, -120.0], [40.2, -120.0], [40.3, -120.0]])
+ coords = psu_centroids[psu_id]
+ psu_strata = np.array([0, 0, 1, 1]) # 2 PSUs per stratum
+ fpc_per_psu = np.full(G, 20.0)
+ resolved = ResolvedSurveyDesign(
+ weights=np.ones(n_obs),
+ weight_type="pweight",
+ strata=np.repeat(psu_strata, obs_per_psu_period * T),
+ psu=psu_id,
+ fpc=np.full(n_obs, 20.0),
+ n_strata=2,
+ n_psu=4,
+ lonely_psu="remove",
+ )
+ # Orchestrator (panel-aware).
+ meat = _compute_stratified_conley_meat(
+ Psi,
+ conley_coords=coords,
+ conley_cutoff_km=0.30,
+ conley_metric="euclidean",
+ conley_kernel="bartlett",
+ resolved_survey=resolved,
+ conley_time=time_arr,
+ )
+ # Hand: aggregate Psi to PSU WITHIN each period, run the survey
+ # helper per period, sum.
+ expected = np.zeros((p_2, p_2))
+ for t in range(T):
+ period_mask = time_arr == t
+ Psi_t = Psi[period_mask]
+ psu_id_t = psu_id[period_mask]
+ S_psu_t = np.zeros((G, p_2))
+ for g in range(G):
+ S_psu_t[g] = Psi_t[psu_id_t == g].sum(axis=0)
+ meat_t, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+ S_psu_t,
+ psu_strata,
+ psu_centroids,
+ cutoff=0.30,
+ metric="euclidean",
+ kernel="bartlett",
+ fpc_per_psu=fpc_per_psu,
+ )
+ expected += meat_t
+ np.testing.assert_allclose(meat, expected, rtol=1e-12, atol=1e-14)
+ # Sanity: a time-collapsed naive computation (the OLD pre-R2 design)
+ # would DIFFER from the panel-aware meat on the same inputs.
+ S_psu_collapsed = np.zeros((G, p_2))
+ for g in range(G):
+ S_psu_collapsed[g] = Psi[psu_id == g].sum(axis=0)
+ meat_collapsed, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+ S_psu_collapsed,
+ psu_strata,
+ psu_centroids,
+ cutoff=0.30,
+ metric="euclidean",
+ kernel="bartlett",
+ fpc_per_psu=fpc_per_psu,
+ )
+ # Differs by the cross-period off-diagonal mass (the panel-aware
+ # contract drops these by construction).
+ assert not np.allclose(meat, meat_collapsed, rtol=1e-3, atol=1e-3)
+
+ def test_c_hand_computation_methodology_anchor(self):
+ """Hand-compute the stratified-Conley meat formula on synthetic
+ PSU-level inputs and assert parity with the new survey helper.
+
+ Mirrors `_scratch/wave_e2_smoke.py` Chunk 1 methodology anchor.
+ """
+ from diff_diff.survey import _compute_stratified_conley_meat_from_psu_scores
+
+ rng = np.random.default_rng(7)
+ G, k = 8, 3
+ psu_strata = np.array([0, 0, 0, 0, 1, 1, 1, 1])
+ psu_coords = np.array(
+ [
+ [40.00, -120.0],
+ [40.10, -120.0],
+ [40.20, -120.0],
+ [40.30, -120.0],
+ [40.05, -120.0],
+ [40.15, -120.0],
+ [40.25, -120.0],
+ [40.35, -120.0],
+ ]
+ )
+ psu_scores = rng.standard_normal((G, k))
+ fpc = np.full(G, 20.0)
+ cutoff = 0.30
+
+ meat, var_ok, _ = _compute_stratified_conley_meat_from_psu_scores(
+ psu_scores,
+ psu_strata,
+ psu_coords,
+ cutoff=cutoff,
+ metric="euclidean",
+ kernel="bartlett",
+ fpc_per_psu=fpc,
+ lonely_psu="remove",
+ )
+ assert var_ok
+
+ # Hand: per stratum, demean, apply Bartlett K on PSU coords,
+ # FPC-scale, sum across strata.
+ expected = np.zeros((k, k))
+ for h in [0, 1]:
+ mask = psu_strata == h
+ s_h = psu_scores[mask]
+ c_h = psu_coords[mask]
+ n_h = s_h.shape[0]
+ centered = s_h - s_h.mean(axis=0, keepdims=True)
+ d = np.sqrt(((c_h[:, None, :] - c_h[None, :, :]) ** 2).sum(axis=2))
+ K = np.maximum(0.0, 1.0 - d / cutoff)
+ M_h = centered.T @ K @ centered
+ f_h = n_h / fpc[mask][0]
+ M_h *= (1.0 - f_h) * n_h / (n_h - 1)
+ expected += M_h
+ np.testing.assert_allclose(meat, expected, rtol=1e-12, atol=1e-14)
+
+ def test_d_single_stratum_reduces_to_plain_conley_on_psu_totals(self):
+ """H = 1 stratum, FPC = inf: reduces to ordinary Conley sandwich
+ on PSU totals (modulo the n/(n-1) finite-sample scale).
+ """
+ from diff_diff.conley import _compute_conley_meat
+ from diff_diff.survey import _compute_stratified_conley_meat_from_psu_scores
+
+ rng = np.random.default_rng(11)
+ G = 8
+ psu_strata = np.zeros(G, dtype=int)
+ psu_coords = np.array(
+ [
+ [40.00, -120.0],
+ [40.10, -120.0],
+ [40.20, -120.0],
+ [40.30, -120.0],
+ [40.05, -120.0],
+ [40.15, -120.0],
+ [40.25, -120.0],
+ [40.35, -120.0],
+ ]
+ )
+ psu_scores = rng.standard_normal((G, 3))
+ cutoff = 0.30
+
+ meat, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+ psu_scores,
+ psu_strata,
+ psu_coords,
+ cutoff=cutoff,
+ metric="euclidean",
+ kernel="bartlett",
+ )
+ # Plain Conley sandwich on PSU totals (no FPC). n/(n-1) scale
+ # comes from the survey helper's adjustment; FPC term is 1.
+ centered = psu_scores - psu_scores.mean(axis=0, keepdims=True)
+ plain = _compute_conley_meat(centered, psu_coords, cutoff, "euclidean", "bartlett")
+ plain *= G / (G - 1)
+ np.testing.assert_allclose(meat, plain, rtol=1e-12, atol=1e-14)
+
+ def test_e_cross_stratum_independence_invariant(self):
+ """Cross-stratum kernel weights are exactly zero by sampling design.
+
+ Pure unit test on the new survey helper: full meat ≡ partition-then-sum
+ when each partition is fit as a separate single-stratum call. Uses
+ interleaved cross-stratum centroids so cross-stratum pairs are
+ CLOSER in km than within-stratum pairs — any kernel leak across
+ strata would produce a large numerical difference.
+ """
+ from diff_diff.survey import _compute_stratified_conley_meat_from_psu_scores
+
+ rng = np.random.default_rng(13)
+ G, k = 8, 3
+ psu_strata = np.array([0, 0, 0, 0, 1, 1, 1, 1])
+ # Interleaved: stratum 0 at lats 40.00/40.10/40.20/40.30; stratum 1
+ # at 40.05/40.15/40.25/40.35. Cross-stratum nearest pair = 0.05 vs
+ # within-stratum nearest = 0.10 — kernel would weight them DOUBLE
+ # if it leaked.
+ psu_coords = np.array(
+ [
+ [40.00, -120.0],
+ [40.10, -120.0],
+ [40.20, -120.0],
+ [40.30, -120.0],
+ [40.05, -120.0],
+ [40.15, -120.0],
+ [40.25, -120.0],
+ [40.35, -120.0],
+ ]
+ )
+ psu_scores = rng.standard_normal((G, k))
+ fpc = np.full(G, 20.0)
+ cutoff = 0.30
+
+ meat_full, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+ psu_scores,
+ psu_strata,
+ psu_coords,
+ cutoff=cutoff,
+ metric="euclidean",
+ kernel="bartlett",
+ fpc_per_psu=fpc,
+ )
+ partitioned = np.zeros((k, k))
+ for h in [0, 1]:
+ mask = psu_strata == h
+ sub_strata = np.zeros(mask.sum(), dtype=int)
+ part_meat, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+ psu_scores[mask],
+ sub_strata,
+ psu_coords[mask],
+ cutoff=cutoff,
+ metric="euclidean",
+ kernel="bartlett",
+ fpc_per_psu=fpc[mask],
+ )
+ partitioned += part_meat
+ np.testing.assert_allclose(meat_full, partitioned, rtol=1e-12, atol=1e-14)
+
+ def test_f_lonely_psu_modes_accepted(self):
+ """All three lonely_psu modes flow through the conley+survey path."""
+ from diff_diff import SurveyDesign
+
+ df = generate_butts_nonstaggered_dgp(seed=14)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ df_s.loc[df_s["stratum"] == 0, "psu"] = 0 # collapse stratum 0 to a singleton PSU
+ for mode in ("remove", "certainty", "adjust"):
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", UserWarning)
+ design = SurveyDesign(
+ weights="w",
+ strata="stratum",
+ psu="psu",
+ fpc="N_h",
+ lonely_psu=mode,
+ )
+ res = self._fit(df_s, design=design)
+ if np.isfinite(res.se):
+ assert res.se >= 0
+ else:
+ assert np.isnan(res.t_stat) and np.isnan(res.p_value)
+
+ def test_f2_singleton_adjust_fpc_skip_parity_binder_vs_conley(self):
+ """Binder helper and Conley helper produce bit-identical output on
+ a singleton stratum with lonely_psu="adjust".
+
+ Load-bearing: pins the Chunk 2 `continue`-skip-FPC pattern. Without
+ the `continue`, the Conley helper would divide by `n_h - 1 = 0` on
+ the singleton stratum and the meat would NaN-propagate while
+ Binder's meat stays finite. With the kernel reducing to identity
+ on a singleton (K = [[K(0)]] = [[1.0]]) the two outputs MUST match.
+ """
+ from diff_diff.survey import (
+ _compute_stratified_conley_meat_from_psu_scores,
+ _compute_stratified_meat_from_psu_scores,
+ )
+
+ rng = np.random.default_rng(15)
+ # 5 PSUs: 1 in stratum 0 (singleton), 4 in stratum 1.
+ psu_scores = rng.standard_normal((5, 3))
+ psu_strata = np.array([0, 1, 1, 1, 1])
+ psu_coords = np.array(
+ [
+ [40.0, -120.0],
+ [40.1, -120.0],
+ [40.2, -120.0],
+ [40.3, -120.0],
+ [40.4, -120.0],
+ ]
+ )
+ fpc = np.full(5, 20.0)
+ binder_meat, _, _ = _compute_stratified_meat_from_psu_scores(
+ psu_scores,
+ psu_strata,
+ fpc_per_psu=fpc,
+ lonely_psu="adjust",
+ )
+ conley_meat, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+ psu_scores,
+ psu_strata,
+ psu_coords,
+ cutoff=1e-10,
+ metric="euclidean",
+ kernel="bartlett",
+ fpc_per_psu=fpc,
+ lonely_psu="adjust",
+ )
+ # Conley with bandwidth -> 0 collapses K to identity in EVERY stratum,
+ # so the entire meat (singleton + multi-PSU stratum) reduces to Binder.
+ np.testing.assert_allclose(conley_meat, binder_meat, rtol=1e-12, atol=1e-14)
+ # And both are finite (the singleton FPC skip prevents divide-by-zero).
+ assert np.all(np.isfinite(conley_meat))
+
+ def test_g_fpc_large_matches_no_fpc(self):
+ """Very-large FPC (1-f_h ≈ 1) produces SE close to the no-FPC path."""
+ from diff_diff import SurveyDesign
+
+ df = generate_butts_nonstaggered_dgp(seed=16)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=1e9)
+ design_fpc_large = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ design_no_fpc = SurveyDesign(weights="w", strata="stratum", psu="psu")
+ res_large = self._fit(df_s, design=design_fpc_large)
+ res_no = self._fit(df_s, design=design_no_fpc)
+ np.testing.assert_allclose(res_large.se, res_no.se, rtol=1e-6)
+
+ def test_h_fpc_equals_n_zeros_stratum(self):
+ """FPC = n_h per stratum makes (1-f_h) = 0; meat is zero, SE = 0."""
+ from diff_diff import SurveyDesign
+
+ df = generate_butts_nonstaggered_dgp(seed=17)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=4.0)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ res = self._fit(df_s, design=design)
+ np.testing.assert_allclose(res.se, 0.0, atol=1e-14)
+
+ def test_i_saturated_design_nan_fails(self):
+ """All-singleton strata + lonely_psu='remove' -> df_survey = 0 ->
+ NaN meat + UserWarning matching 'Wave E.2 stratified-Conley'.
+ """
+ from diff_diff import SurveyDesign
+
+ df = generate_butts_nonstaggered_dgp(seed=18)
+ df_s = df.copy()
+ df_s["w"] = 1.0
+ units_sorted = sorted(df_s["unit"].unique())
+ unit_to_idx = {u: idx for idx, u in enumerate(units_sorted)}
+ df_s["psu"] = df_s["unit"].map(unit_to_idx)
+ df_s["stratum"] = df_s["unit"].map(unit_to_idx) # H = n_units; every stratum singleton
+ df_s["N_h"] = 20.0
+ design = SurveyDesign(
+ weights="w",
+ strata="stratum",
+ psu="psu",
+ fpc="N_h",
+ lonely_psu="remove",
+ )
+ with pytest.warns(UserWarning, match="Wave E.2 stratified-Conley"):
+ res = self._fit(df_s, design=design)
+ assert np.isnan(res.se)
+ assert np.isnan(res.t_stat)
+ assert np.isnan(res.p_value)
+
+ def test_j0_panel_conley_lag_cutoff_rejected_under_survey(self):
+ """vcov_type='conley' + conley_lag_cutoff > 0 + survey_design raises
+ NotImplementedError upfront. Wave E.2 ships cross-sectional only;
+ the panel-block decomposition (within-unit serial Bartlett HAC over
+ time) would need PSU-by-time scores rather than the collapsed PSU
+ totals. Tracked as a Wave E.2 follow-up in TODO.md.
+ """
+ from diff_diff import SurveyDesign
+
+ df = generate_butts_nonstaggered_dgp(seed=180)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ est = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=self._CUTOFF_KM,
+ conley_lag_cutoff=2, # panel-block path
+ vcov_type="conley",
+ )
+ with pytest.raises(NotImplementedError, match="conley_lag_cutoff > 0"):
+ est.fit(
+ df_s,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+
+ def test_j_replicate_weights_rejection_inherits_wave_e1(self):
+ """Replicate-weight variance still raises NotImplementedError under
+ conley+survey (inherits Wave E.1 gate). SurveyDesign requires
+ replicate_weights to be set WITHOUT strata/psu/fpc (they encode
+ the design implicitly).
+ """
+ from diff_diff import SurveyDesign
+
+ df = generate_butts_nonstaggered_dgp(seed=19)
+ df_s = df.copy()
+ df_s["w"] = 1.0
+ # Add 10 replicate-weight columns; must be constant within units
+ # (panel survey constraint).
+ rng = np.random.default_rng(19)
+ units = sorted(df_s["unit"].unique())
+ for r in range(10):
+ rep_by_unit = dict(zip(units, rng.uniform(0.5, 2.0, size=len(units))))
+ df_s[f"rep_{r}"] = df_s["unit"].map(rep_by_unit)
+ design = SurveyDesign(
+ weights="w",
+ replicate_weights=[f"rep_{r}" for r in range(10)],
+ replicate_method="JK1",
+ )
+ with pytest.raises(NotImplementedError, match="(?i)replicate|follow-up"):
+ self._fit(df_s, design=design)
+
+ def test_k_non_pweight_rejection_inherits_wave_e1(self):
+ """Non-pweight weight_type still raises ValueError under conley+survey
+ (inherits Wave E.1 gate).
+ """
+ from diff_diff import SurveyDesign
+
+ df = generate_butts_nonstaggered_dgp(seed=20)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ design = SurveyDesign(
+ weights="w",
+ strata="stratum",
+ psu="psu",
+ fpc="N_h",
+ weight_type="aweight",
+ )
+ with pytest.raises((NotImplementedError, ValueError), match="(?i)pweight|aweight"):
+ self._fit(df_s, design=design)
+
+ def test_l_cluster_plus_conley_plus_survey_warn_and_use_psu(self):
+ """cluster= + conley + survey with different cluster vs PSU ->
+ UserWarning fires; PSU wins (mirrors Wave E.1 warn-and-use-PSU).
+ """
+ from diff_diff import SurveyDesign
+
+ df = generate_butts_nonstaggered_dgp(seed=21)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ # Inject a coarser cluster column distinct from PSU (1 cluster
+ # per unit). The warn-and-use-PSU path requires that cluster and
+ # PSU are NOT identical groupings.
+ units_sorted = sorted(df_s["unit"].unique())
+ unit_to_cluster = {u: idx // 2 for idx, u in enumerate(units_sorted)}
+ df_s["my_cluster"] = df_s["unit"].map(unit_to_cluster)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ with pytest.warns(UserWarning, match="(?i)cluster"):
+ res = self._fit(df_s, design=design, cluster="my_cluster")
+ assert np.isfinite(res.se) and res.se > 0
+
+ def test_m_fit_idempotency_under_conley_survey(self):
+ """clone() + repeat fit produces identical results; survey state
+ not mutated on fit() (per feedback_fit_does_not_mutate_config).
+ """
+ from diff_diff import SurveyDesign
+
+ df = generate_butts_nonstaggered_dgp(seed=22)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ est = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=self._CUTOFF_KM,
+ conley_lag_cutoff=0,
+ vcov_type="conley",
+ )
+ res_1 = est.fit(
+ df_s,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+ # Second fit on the SAME estimator instance (idempotency).
+ res_2 = est.fit(
+ df_s,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+ assert res_1.coefficients == res_2.coefficients
+ np.testing.assert_array_equal(res_1.vcov, res_2.vcov)
+ assert res_1.n_psu == res_2.n_psu
+ assert res_1.n_strata == res_2.n_strata
+
+ def test_n0_no_psu_weights_only_survey_design(self):
+ """`SurveyDesign(weights=...)` without explicit PSU — each obs is
+ its own pseudo-PSU. Panel-aware path must re-index PSUs WITHIN
+ each period (not pad zeros across the full panel) or the centering
+ leaks off-period spurious structure into the spatial meat.
+
+ Regression for the R3 P0 fix.
+ """
+ from diff_diff import SurveyDesign
+
+ df = generate_butts_nonstaggered_dgp(seed=240)
+ df_s = df.copy()
+ df_s["w"] = 1.0
+ design = SurveyDesign(weights="w")
+ res = self._fit(df_s, design=design)
+ assert np.isfinite(res.att)
+ assert np.isfinite(res.se) and res.se > 0
+
+ def test_n1_no_psu_strata_only_survey_design(self):
+ """`SurveyDesign(weights=..., strata=...)` without explicit PSU —
+ each obs is its own pseudo-PSU under stratified sampling. Same
+ per-period re-indexing requirement as test_n0.
+ """
+ from diff_diff import SurveyDesign
+
+ df = generate_butts_nonstaggered_dgp(seed=241)
+ df_s = df.copy()
+ df_s["w"] = 1.0
+ units_sorted = sorted(df_s["unit"].unique())
+ unit_to_stratum = {u: idx % 2 for idx, u in enumerate(units_sorted)}
+ df_s["stratum"] = df_s["unit"].map(unit_to_stratum)
+ design = SurveyDesign(weights="w", strata="stratum")
+ res = self._fit(df_s, design=design)
+ assert np.isfinite(res.att)
+ assert np.isfinite(res.se) and res.se > 0
+
+ def test_b2_explicit_psu_centroid_panel_constant_under_finite_mask(self):
+ """When a PSU contains multiple units at DIFFERENT coordinates
+ (simulating a finite_mask drop that varies coverage across
+ periods), the orchestrator must use PANEL-CONSTANT centroids
+ (mean across all obs in PSU, regardless of period) — NOT
+ per-period centroids. This matches the documented Wave E.2
+ contract "centroid_g = mean over i in PSU g of conley_coords[i]"
+ at REGISTRY.md and prevents support-sample-dependent kernel
+ weights.
+
+ Pure unit test on the orchestrator + helper composition with
+ synthetic per-obs inputs.
+ """
+ from diff_diff.survey import (
+ ResolvedSurveyDesign,
+ _compute_stratified_conley_meat_from_psu_scores,
+ )
+ from diff_diff.two_stage import _compute_stratified_conley_meat
+
+ rng = np.random.default_rng(331)
+ # 2 strata × 2 PSUs × 1 obs per PSU-period = 8 obs.
+ # PSU 0 obs coords differ across periods (simulating finite_mask
+ # variation): period 0 at [40.0, 0]; period 1 at [42.0, 0].
+ # PSU 1/2/3 have constant coords across periods.
+ n, p_2 = 8, 3
+ Psi = rng.standard_normal((n, p_2))
+ psu_id = np.array([0, 1, 2, 3, 0, 1, 2, 3]) # PSUs alternate per period
+ time_arr = np.array([0, 0, 0, 0, 1, 1, 1, 1])
+ # Coords vary across periods for PSU 0 only.
+ coords = np.array(
+ [
+ [40.0, 0.0], # PSU 0, period 0
+ [40.5, 0.0], # PSU 1, period 0
+ [50.0, 0.0], # PSU 2, period 0
+ [50.5, 0.0], # PSU 3, period 0
+ [42.0, 0.0], # PSU 0, period 1 — DIFFERENT coord
+ [40.5, 0.0], # PSU 1, period 1
+ [50.0, 0.0], # PSU 2, period 1
+ [50.5, 0.0], # PSU 3, period 1
+ ]
+ )
+ psu_strata_obs = np.array([0, 0, 1, 1, 0, 0, 1, 1])
+ resolved = ResolvedSurveyDesign(
+ weights=np.ones(n),
+ weight_type="pweight",
+ strata=psu_strata_obs,
+ psu=psu_id,
+ fpc=None,
+ n_strata=2,
+ n_psu=4,
+ lonely_psu="remove",
+ )
+ meat_panel = _compute_stratified_conley_meat(
+ Psi,
+ conley_coords=coords,
+ conley_cutoff_km=5.0,
+ conley_metric="euclidean",
+ conley_kernel="bartlett",
+ resolved_survey=resolved,
+ conley_time=time_arr,
+ )
+ # Hand calculation using PANEL-CONSTANT centroids (the contract).
+ # PSU 0 centroid = mean([40.0, 0], [42.0, 0]) = [41.0, 0].
+ # Other PSUs have constant coords → centroid equals that coord.
+ panel_centroids = np.array([[41.0, 0.0], [40.5, 0.0], [50.0, 0.0], [50.5, 0.0]])
+ # Per-period PSU totals (each PSU appears once per period in this
+ # fixture, so the PSU total per period IS the single obs's Psi).
+ psu_strata = np.array([0, 0, 1, 1])
+ expected = np.zeros((p_2, p_2))
+ for t in [0, 1]:
+ mask = time_arr == t
+ Psi_t = Psi[mask]
+ psu_id_t = psu_id[mask]
+ S_psu_t = np.zeros((4, p_2))
+ for g in range(4):
+ rows = psu_id_t == g
+ if rows.any():
+ S_psu_t[g] = Psi_t[rows].sum(axis=0)
+ meat_t, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+ S_psu_t,
+ psu_strata,
+ panel_centroids, # panel-constant — same across periods
+ cutoff=5.0,
+ metric="euclidean",
+ kernel="bartlett",
+ )
+ expected += meat_t
+ np.testing.assert_allclose(meat_panel, expected, rtol=1e-12, atol=1e-14)
+ # Counter-check: per-period centroids (the OLD pre-fix design)
+ # would give a different meat for PSU 0 because the centroid
+ # used in period 1 (42.0) differs from the one used in period 0
+ # (40.0). Verify the orchestrator does NOT match that buggy
+ # construction.
+ buggy_expected = np.zeros((p_2, p_2))
+ period_centroids = {
+ 0: np.array([[40.0, 0.0], [40.5, 0.0], [50.0, 0.0], [50.5, 0.0]]),
+ 1: np.array([[42.0, 0.0], [40.5, 0.0], [50.0, 0.0], [50.5, 0.0]]),
+ }
+ for t in [0, 1]:
+ mask = time_arr == t
+ Psi_t = Psi[mask]
+ psu_id_t = psu_id[mask]
+ S_psu_t = np.zeros((4, p_2))
+ for g in range(4):
+ rows = psu_id_t == g
+ if rows.any():
+ S_psu_t[g] = Psi_t[rows].sum(axis=0)
+ meat_t, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+ S_psu_t,
+ psu_strata,
+ period_centroids[t], # per-period (buggy)
+ cutoff=5.0,
+ metric="euclidean",
+ kernel="bartlett",
+ )
+ buggy_expected += meat_t
+ # The buggy construction MUST differ measurably from the
+ # panel-constant orchestrator output.
+ assert not np.allclose(
+ meat_panel, buggy_expected, rtol=1e-3, atol=1e-3
+ ), "orchestrator unexpectedly matches per-period (buggy) centroid construction"
+
+ def test_n2_no_psu_per_period_reindex_unit_invariant(self):
+ """Direct unit test on the orchestrator: the no-PSU per-period
+ re-indexing must NOT mix off-period rows into the kernel. With
+ synthetic data where obs 0/1 are in period 0 (close in km) and
+ obs 2/3 are in period 1 (far away), the meat must reflect
+ ONLY within-period spatial pairs.
+ """
+ from diff_diff.survey import (
+ ResolvedSurveyDesign,
+ _compute_stratified_conley_meat_from_psu_scores,
+ )
+ from diff_diff.two_stage import _compute_stratified_conley_meat
+
+ rng = np.random.default_rng(243)
+ n, p_2 = 4, 2
+ Psi = rng.standard_normal((n, p_2))
+ # Period 0: obs 0, 1 at lat 40.00 / 40.01 (close in km).
+ # Period 1: obs 2, 3 at lat 50.00 / 50.01 (far from period-0 obs).
+ coords = np.array([[40.00, 0.0], [40.01, 0.0], [50.00, 0.0], [50.01, 0.0]])
+ time_arr = np.array([0, 0, 1, 1])
+ strata_arr = np.array([0, 0, 1, 1])
+ resolved = ResolvedSurveyDesign(
+ weights=np.ones(n),
+ weight_type="pweight",
+ strata=strata_arr,
+ psu=None, # implicit per-obs pseudo-PSU
+ fpc=None,
+ n_strata=2,
+ n_psu=n,
+ lonely_psu="remove",
+ )
+ meat_panel = _compute_stratified_conley_meat(
+ Psi,
+ conley_coords=coords,
+ conley_cutoff_km=0.05,
+ conley_metric="euclidean",
+ conley_kernel="bartlett",
+ resolved_survey=resolved,
+ conley_time=time_arr,
+ )
+ # Hand: per-period only on active rows.
+ meat_p0, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+ Psi[:2],
+ np.array([0, 0]),
+ coords[:2],
+ cutoff=0.05,
+ metric="euclidean",
+ kernel="bartlett",
+ )
+ meat_p1, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+ Psi[2:],
+ np.array([0, 0]),
+ coords[2:],
+ cutoff=0.05,
+ metric="euclidean",
+ kernel="bartlett",
+ )
+ np.testing.assert_allclose(meat_panel, meat_p0 + meat_p1, rtol=1e-12, atol=1e-14)
+
+ def test_n_finite_mask_survey_array_subsetting(self):
+ """finite_mask drops baseline-treated rows; survey metadata
+ reflects the SUBSET sample, not the original.
+ """
+ from diff_diff import SurveyDesign
+
+ df = generate_butts_staggered_dgp(seed=23)
+ # Pin a unit to always-treated (g = period 0); finite_mask will
+ # drop its rows from stage 2.
+ first_unit = sorted(df["unit"].unique())[0]
+ df.loc[df["unit"] == first_unit, "first_treat"] = 0
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", UserWarning)
+ res = self._fit(df_s, design=design)
+ # Survey metadata reflects subset (post-finite_mask), not the full panel.
+ assert res.survey_metadata is not None
+ assert res.n_obs <= len(df_s) # at least the always-treated unit's rows dropped
+
+
+class TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy:
+ """Event-study branch + conley + survey, both is_staggered branches."""
+
+ _CUTOFF_KM = 1000.0
+
+ def test_o_event_study_conley_survey_is_staggered_true(self):
+ """Full plumbing end-to-end on the staggered event-study path."""
+ from diff_diff import SurveyDesign
+
+ df = generate_butts_staggered_dgp(seed=24)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ est = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=self._CUTOFF_KM,
+ conley_lag_cutoff=0,
+ vcov_type="conley",
+ event_study=True,
+ horizon_max=2,
+ )
+ res = est.fit(
+ df_s,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+ # Event-study + spillover finite end-to-end
+ assert np.isfinite(res.att) and np.isfinite(res.se) and res.se > 0
+ # spillover_effects populated (non-empty)
+ assert res.spillover_effects is not None
+ # df_survey lookup uses the survey branch
+ assert res.survey_metadata is not None
+ assert res.survey_metadata.df_survey == 6
+
+ def test_p_event_study_conley_survey_is_staggered_false(self):
+ """The non-staggered branch of the event-study path also works
+ (mirrors `feedback_cohort_loop_trigger_cache_both_branches`).
+ """
+ from diff_diff import SurveyDesign
+
+ df = generate_butts_nonstaggered_dgp(seed=25)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ est = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=self._CUTOFF_KM,
+ conley_lag_cutoff=0,
+ vcov_type="conley",
+ event_study=True,
+ horizon_max=1,
+ )
+ res = est.fit(
+ df_s,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+ assert np.isfinite(res.att) and np.isfinite(res.se) and res.se > 0
+ assert res.survey_metadata is not None
+
+ def test_r_drift_goldens(self):
+ """Pinned ATT + SE on a fixed-seed conley+survey fit.
+
+ Drift goldens captured on initial Wave E.2 implementation
+ (seed=999, standard 2-strata x 4-PSU augmentation, cutoff=1000km).
+ `assert_allclose` tolerance acknowledges PSU-aggregation BLAS
+ reduction order variation across CI runners.
+ """
+ from diff_diff import SurveyDesign
+
+ df = generate_butts_nonstaggered_dgp(seed=999)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ est = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=self._CUTOFF_KM,
+ conley_lag_cutoff=0,
+ vcov_type="conley",
+ )
+ res = est.fit(
+ df_s,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+ # Goldens — pinned on initial Wave E.2 implementation (seed=999,
+ # 2-strata x 4-PSU augmentation, cutoff=1000km). ATT is invariant
+ # to vcov_type, so it matches the Wave E.1 binder golden exactly.
+ # SE is Wave E.2-specific (stratified-Conley sandwich on PSU totals).
+ _WAVE_E2_GOLDEN_ATT = -0.07749624543132044
+ _WAVE_E2_GOLDEN_SE = 0.0006771937420330884
+ np.testing.assert_allclose(res.att, _WAVE_E2_GOLDEN_ATT, rtol=1e-12, atol=1e-14)
+ np.testing.assert_allclose(res.se, _WAVE_E2_GOLDEN_SE, rtol=1e-12, atol=1e-14)
+ # Lock down DOF + n_psu (deterministic).
+ assert res.n_psu == 8
+ assert res.n_strata == 2
+ assert res.survey_metadata.df_survey == 6