diff --git a/CHANGELOG.md b/CHANGELOG.md
index 3a851f73..66f4405d 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -8,9 +8,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
+- **`SpilloverDiD(vcov_type="conley", survey_design=...)` integration via stratified-Conley sandwich on PSU totals (Wave E.2).** Lifts the Wave E.1 `NotImplementedError` (`spillover.py:2201` upfront, `two_stage.py:217` helper-level) and adds spatial-HAC + design-based variance for the previously deferred composition. **Documented synthesis** of Conley (1999) spatial-HAC × Gerber (2026, arXiv:2605.04124) Proposition 1 Binder TSL (the Wave E.1 foundation) × Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. No reference software combines all three ingredients on a two-stage influence function. **Mechanical composition (panel-aware):** preserves the library's existing `conley_lag_cutoff = 0` semantic at `diff_diff.conley._compute_conley_meat` ("within-period spatial only — exclude cross-period spatial pairs") by looping over periods. For each period `t`, SpilloverDiD's per-obs Hájek-weighted Wave D IF `psi_i` is aggregated to per-period PSU totals `S_psu_t[g] = sum_{i in PSU g, time t} psi_i` (via `np.add.at`); per-PSU spatial centroids are panel-constant (mean of per-observation `conley_coords` within each PSU, vectorized `np.add.at` sums / `np.bincount` counts); for each stratum the within-stratum sandwich is `M_h_t = (1 - f_h) * n_h/(n_h-1) * sum_{j,k in PSUs_h} K(d(centroid_j, centroid_k) / conley_cutoff_km) * (S_psu_t[j] - S_bar_h_t)(S_psu_t[k] - S_bar_h_t)'`, where K is the Bartlett kernel (SpilloverDiD currently exposes Bartlett only and hardcodes it; the survey helper accepts `"uniform"` too but exposing that on the SpilloverDiD constructor is a separate follow-up) and `d` is haversine / euclidean / callable per `ConleyMetric`. Cross-stratum kernel weights are exactly zero by sampling design (strata are independence partitions). Total meat is `sum_t sum_h M_h_t`. Cross-period spatial pairs are excluded by construction — the per-period loop matches the library's panel Conley contract exactly. **Reduction semantics (load-bearing for tests):** the orchestrator's panel-aware meat equals `sum_t` of per-period within-stratum stratified-Conley sandwiches on per-period PSU totals (pinned at `tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant`); single stratum (H = 1, FPC = inf) reduces to `sum_t` plain Conley sandwich on per-period PSU totals (NOT on time-collapsed totals). **Implementation:** new `_compute_stratified_conley_meat_from_psu_scores` helper in `diff_diff/survey.py` (parallel to existing `_compute_stratified_meat_from_psu_scores` 3-tuple `(meat, variance_computed, legitimate_zero_count)` contract; per-stratum loop replaces the inner `centered.T @ centered` with `_compute_conley_meat(scores=centered, coords=psu_coords_h, ...)` in cross-sectional mode); new dispatch wrapper `_compute_stratified_conley_meat` in `diff_diff/two_stage.py` (parallel to existing `_compute_binder_tsl_meat`, performs per-obs Psi → PSU aggregation + centroid derivation + dispatch to survey helper, intentionally drops `cluster_ids` at the dispatch boundary — see Restrictions). `_compute_gmm_corrected_meat` conley branch extended with `if resolved_survey is not None` routing to the new wrapper; the `resolved_survey is None` branch is bit-identical to Wave D. **Singleton-stratum `lonely_psu="adjust"` parity:** the survey helper mirrors the Binder helper's `continue` to skip the FPC scale on singleton strata (with `n_h = 1` the scale `n_h / (n_h - 1)` would divide by zero); the degenerate one-PSU kernel `K = [[K(0)]] = [[1.0]]` reduces to `centered.T @ centered`, matching Binder's singleton-adjust output. **Saturated `df_survey = 0` NaN-fail:** mirrors Wave E.1 (`_compute_stratified_conley_meat` returns NaN meat with `UserWarning` template "Wave E.2 stratified-Conley sandwich: df_survey = 0..." so callers can `pytest.warns(UserWarning, match="Wave E.2 stratified-Conley")`). **Public surface restrictions:** replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` (inherits Wave E.1 gate; per-replicate full refit is separate follow-up scope); `cluster=<col> + survey_design.psu + vcov_type="conley"` coerces `cluster=<col>` to PSU per Wave E.1's warn-and-use-PSU pattern (the Conley cluster product kernel becomes a no-op after PSU aggregation, so `cluster_ids` is intentionally not threaded into the inner Conley kernel call — every PSU is its own cluster post-aggregation, which would zero all cross-PSU pairs); LinearRegression-side `vcov_type="conley" + survey_design=` gate at `diff_diff/linalg.py:2853` remains (separate Bertanha-Imbens 2014 weighted-Conley "Phase 5" roadmap, not Wave E); DiagnosticReport routing for `SpilloverDiDResults(vcov_type="conley", survey_design=)` requires `_APPLICABILITY` / `_PT_METHOD` registration (separate Wave F PR). **Tests:** new `TestSpilloverDiDWaveE2ConleySurveyDesign` and `TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy` classes in `tests/test_spillover.py` (bit-identical no-survey fallback; panel-aware per-period sum invariant on the orchestrator + helper composition; hand-computation methodology anchor; single-stratum ≡ plain Conley on PSU totals; cross-stratum independence as a unit test on the survey helper with interleaved cross-stratum centroids; Binder vs Conley singleton-adjust FPC skip parity; lonely-PSU sensitivity across three modes; FPC large ≡ no-FPC and FPC = n_h zeros stratum; saturated NaN-fail with `pytest.warns(match="Wave E.2 stratified-Conley")`; replicate-weight + non-pweight rejections; cluster warn-and-use-PSU; fit idempotency; `finite_mask` survey-array subsetting; no-PSU coverage — weights-only `SurveyDesign(weights=...)`, strata-only `SurveyDesign(weights=..., strata=...)`, and a per-period re-index unit invariant pinning that no cross-period spatial pairs leak into the meat on implicit-PSU layouts; event-study path on both `is_staggered=True`/`False` branches per `feedback_cohort_loop_trigger_cache_both_branches`; drift goldens at `rtol=1e-12 / atol=1e-14`). The pre-existing `tests/test_spillover.py::test_fit_conley_plus_survey_design_not_implemented` Wave E.1-era gate-assertion test is removed (replaced by the positive-path tests above). Wave E.1 entry's "Public surface restrictions" bullet updated to past-tense the conley+survey gate reference.
 - **SunAbraham `vcov_type` parameter (Phase 1b PR 1/8).** `SunAbraham(vcov_type=...)` now accepts `{"classical","hc1","hc2","hc2_bm"}` (defaults to `"hc1"`, which preserves prior behavior bit-equally - SA historically hard-coded HC1). Auto-cluster-at-unit dropped when the user opts into explicit `vcov_type="hc2"` or `vcov_type="classical"` (one-way only); preserved for `"hc1"` and `"hc2_bm"`. When `vcov_type in {"classical","hc2","hc2_bm"}`, `_fit_saturated_regression` auto-routes to a full-dummy saturated design (mirrors TWFE Gate 1 from PR #469): FWL preserves cohort coefficients but not the hat matrix, so HC2 leverage and Bell-McCaffrey Satterthwaite DOF must be computed on the full FE projection. Empirically matches R `lm()` summary classical SE, `sandwich::vcovHC(type="HC2")`, and `clubSandwich::vcovCR(..., type="CR2")` + `coef_test()$df_Satt` at atol=1e-10 (cohort SE and BM DOF pinned in `tests/test_methodology_sun_abraham.py`). For `vcov_type="hc2_bm"`, the user-facing aggregated inference (`event_study_effects[e]['p_value']`/`['conf_int']`, `overall_p_value`/`overall_conf_int`) uses CR2 Bell-McCaffrey contrast DOF — matches `clubSandwich::Wald_test(test="HTZ")$df_denom` at atol=1e-10 (mirrors PR #465's `_compute_cr2_bm_contrast_dof` pattern for MultiPeriodDiD's post-period-average ATT). `vcov_type` is now propagated to `SunAbrahamResults.vcov_type` for downstream introspection. `SurveyDesign` (any kind — analytical weights, stratified, PSU, or replicate-weight) combined with `vcov_type in {"classical","hc2","hc2_bm"}` raises `NotImplementedError`: the survey-design TSL (or replicate-weight refit) variance overrides the analytical sandwich family, and the auto-cluster guard for one-way families would silently downgrade unit-level PSUs to per-observation PSUs. Use `vcov_type="hc1"` (default) for survey designs. `conley` rejected at `__init__` with a deferral message (would require threading 6+ `conley_*` params through the saturated regression call). **Deviation from R:** SA's within-transform HC1 SE differs from `fixest::sunab()` by ~1-2% (~2e-3 absolute) on typical panel sizes due to a different `(n-k)` finite-sample correction (fixest counts absorbed FE in k_total; SA's `solve_ols` counts only within-transformed columns); the IW aggregation step is otherwise identical (pinned at atol=5e-3, tracked in TODO.md). First PR of the Phase 1b standalone-estimator threading initiative (7 PRs to follow: StackedDiD, WooldridgeDiD-OLS, CallawaySantAnna, ImputationDiD, TripleDifference, TwoStageDiD, EfficientDiD).
 - **PreTrendsPower R `pretrends` parity goldens (PR-C closes PR-B's deferred R-parity row).** JSON goldens at `benchmarks/data/r_pretrends_golden.json` generated from the committed `benchmarks/R/generate_pretrends_golden.R` script against `jonathandroth/pretrends` commit `122731d082` (package version 0.1.0, R 4.5.2). 4 fixtures cover regular K=3 grid (`uniform_3_pre_periods_no_anticipation`), irregular K=3 grid `[-5,-3,-1]` (`irregular_pre_periods` — locks the PR-B Step 4 γ-unit linear-weight fix), anticipation-shifted K=4 grid (`anticipation_shifted`), and K=1 closed form (`single_pre_period_closed_form` — Roth Proposition 2 univariate truncated-normal). `TestPretrendsParityR` in `tests/test_methodology_pretrends.py` now active (4 tests): NIS power vs R `pretrends::pretrends()` at `atol=1e-4` across all 4 fixtures × 4 γ values; γ_p MDV vs R `slope_for_power()` at `atol=1e-4` across all 4 fixtures × 2 target_power values; end-to-end `fit()` on irregular grid vs R γ_p at `atol=1e-4` (locks the full `fit() → _extract_pre_period_params → _get_violation_weights → _compute_mdv_nis` chain through the public API); K=1 three-way cross-check (Python ≡ analytical truncated-normal closed form `1 - Φ(z - γ/σ) + Φ(-z - γ/σ)` at `atol=1e-7`; both within `atol=1e-4` of R). Tolerance rationale: R hardcodes `thresholdTstat.Pretest=1.96` while Python uses `scipy.stats.norm.ppf(0.975) = 1.959963984540054` (`dz ≈ 3.6e-5`); R `slope_for_power` uses `uniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)` versus Python `brentq(xtol=2e-12)`; the inverse-solver tolerance gap dominates γ_p, and `mvtnorm::pmvnorm` (R) vs `scipy.stats.multivariate_normal.cdf` (Python) Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5. `METHODOLOGY_REVIEW.md` PreTrendsPower row promoted `**Complete** (R parity pending)` → `**Complete**`. Roth (2022) paper review's `R \`pretrends\` package version pin (provisional)` Gaps bullet struck. Closes the PR-C TODO row.
-- **`SpilloverDiD(survey_design=...)` integration on HC1 / CR1 paths via Binder TSL (Wave E.1).** Lifts the Wave B/C/D upfront `NotImplementedError` and adds design-based variance for `vcov_type ∈ {"hc1"}` plus `cluster=<col>` (CR1). **Documented synthesis** of Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper's Appendix — composed with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. No reference software combines all ingredients. **Mechanical composition:** SpilloverDiD's per-obs Wave D IF `psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}` (with survey weights threaded through `gamma_hat` solve, eps construction, and bread inversion via Hájek normalization) is aggregated to PSU totals and passed to the audited `_compute_stratified_meat_from_psu_scores` Binder TSL meat helper. Stage-1 FE estimation extends `_iterative_fe_subset` with a `weights=` kwarg implementing WLS-FE via weighted bincount (numerator `bincount(w*resid)` / denominator `bincount(w)`); the `weights is None` path is bit-identical to the Wave B / C / D unweighted bincount. **Degrees of freedom:** t-distribution lookup uses `ResolvedSurveyDesign.df_survey` (4-way branch: PSU+strata → `n_PSU - n_strata`; PSU only → `n_PSU - 1`; strata only → `n_obs - n_strata`; neither → `n_obs - 1`), threaded through all four `safe_inference` call sites (aggregate `tau_total`, per-ring `delta_j`, event-study per-event-time `tau_k` / `delta_jk`, scalar `att` lincom). **Survey-array subsetting:** when `finite_mask` drops baseline-treated rows, `survey_weights` and `ResolvedSurveyDesign.{weights, strata, psu, fpc, replicate_weights}` are subsetted in parallel; `n_psu`, `n_strata`, and `survey_metadata` are recomputed (mirrors `TwoStageDiD.fit:567-601`). **Cluster + survey resolution:** when `cluster=<col>` and `survey_design.psu` are both supplied with different groupings, a `UserWarning` fires and PSU wins (mirrors `_resolve_effective_cluster` at `survey.py:1253-1275`; TwoStageDiD parity). When `cluster=<col>` is supplied without `survey_design.psu`, the cluster column is injected as the effective PSU via `_inject_cluster_as_psu`, which now honors `SurveyDesign.nest`: under `nest=False`, cluster labels must be globally unique across strata (raises if they repeat, matching the explicit-PSU resolver's contract). **Saturated `df_survey = 0` NaN-fail:** when `lonely_psu="remove"` removes all strata (singleton PSUs), the meat helper returns `(_, var_computed=False, legit_zero=0)` and SpilloverDiD's Wave E.1 path returns NaN meat with a `UserWarning` matching `"df_survey"` so callers can `pytest.warns(UserWarning, match="df_survey")`. This is a **departure from TwoStageDiD** (`two_stage.py:2003-2005`) which currently NaN-fails SILENTLY; Wave E.1 surfaces the diagnostic per `feedback_no_silent_failures`. **Subpopulation limitation (Wave E.3 follow-up):** `SurveyDesign.subpopulation()`-derived designs with zero-weight padding rows that lose stage-1 FE support have those rows physically removed by `finite_mask`, so `n_psu` / `df_survey` / Binder centering reflect the reduced fit sample rather than the full domain design (documented in REGISTRY; Wave E.3 will preserve full-design bookkeeping). **Public surface restrictions:** `vcov_type="conley" + survey_design=` raises `NotImplementedError` pointing at planned Wave E.2 (Conley × survey product-kernel synthesis with within-stratum Conley sandwich on PSU totals); replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` — per Gerber (2026) Appendix A, the IF-reweighting shortcut does not apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit and is queued as a follow-up; non-pweight (`weight_type ∈ {"fweight", "aweight"}`) raises `ValueError` (the Binder TSL assumes probability weights). **Implementation:** `_compute_gmm_corrected_meat` extended with `survey_weights` + `resolved_survey` kwargs at `diff_diff/two_stage.py:56` (TYPE_CHECKING forward reference for `ResolvedSurveyDesign` to avoid circular import); new module-level helper `_compute_binder_tsl_meat` at `diff_diff/two_stage.py` wraps `_compute_stratified_meat_from_psu_scores` with implicit per-obs PSU synthesis for no-PSU survey designs + the Wave E.1 NaN-fail + warning; `_iterative_fe_subset` weighted path at `diff_diff/spillover.py:1382` (in-place extension, bit-identical fallback, positive-weight identification gate); `_inject_cluster_as_psu` honors `nest` (shared survey-helper fix that also benefits TwoStageDiD); `ResolvedSurveyDesign` gains a `nest` field propagated through all 5 construction sites. `SpilloverDiDResults` extended with `survey_metadata`, `n_psu`, `n_strata` fields at `diff_diff/results.py`. **Tests:** new `TestSpilloverDiDWaveE1SurveyDesignHc1` (17 tests: bit-identity fallback, Binder TSL hand-check uniform + non-uniform weights, lonely_psu modes, FPC degenerate limits ×3, saturated NaN-fail with `pytest.warns(match="df_survey")`, cluster+survey warn-and-use-PSU, no-PSU regressions (weights-only, weights+strata, cluster-without-PSU, cluster overlap with nest=False/True), zero-weight Omega_0 exclusion + all-zero raises, replicate-weight + non-pweight + Conley+survey rejections, fit idempotency, finite_mask subsetting) and `TestSpilloverDiDWaveE1SurveyDesignEventStudy` (7 tests: event-study + survey on both `is_staggered` branches with `df_survey` lincom verification, distinguishability between survey-share and sample-share lincom rules via manual reconstruction with cohort-correlated weights + non-constant tau_k, aggregate-vs-event-study parity, drift goldens, subset-path invariant). Wave B/C/D bullets below are unchanged; this entry replaces the pre-Wave-E.1 `survey_design=` rejection.
+- **`SpilloverDiD(survey_design=...)` integration on HC1 / CR1 paths via Binder TSL (Wave E.1).** Lifts the Wave B/C/D upfront `NotImplementedError` and adds design-based variance for `vcov_type ∈ {"hc1"}` plus `cluster=<col>` (CR1). **Documented synthesis** of Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper's Appendix — composed with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. No reference software combines all ingredients. **Mechanical composition:** SpilloverDiD's per-obs Wave D IF `psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}` (with survey weights threaded through `gamma_hat` solve, eps construction, and bread inversion via Hájek normalization) is aggregated to PSU totals and passed to the audited `_compute_stratified_meat_from_psu_scores` Binder TSL meat helper. Stage-1 FE estimation extends `_iterative_fe_subset` with a `weights=` kwarg implementing WLS-FE via weighted bincount (numerator `bincount(w*resid)` / denominator `bincount(w)`); the `weights is None` path is bit-identical to the Wave B / C / D unweighted bincount. **Degrees of freedom:** t-distribution lookup uses `ResolvedSurveyDesign.df_survey` (4-way branch: PSU+strata → `n_PSU - n_strata`; PSU only → `n_PSU - 1`; strata only → `n_obs - n_strata`; neither → `n_obs - 1`), threaded through all four `safe_inference` call sites (aggregate `tau_total`, per-ring `delta_j`, event-study per-event-time `tau_k` / `delta_jk`, scalar `att` lincom). **Survey-array subsetting:** when `finite_mask` drops baseline-treated rows, `survey_weights` and `ResolvedSurveyDesign.{weights, strata, psu, fpc, replicate_weights}` are subsetted in parallel; `n_psu`, `n_strata`, and `survey_metadata` are recomputed (mirrors `TwoStageDiD.fit:567-601`). **Cluster + survey resolution:** when `cluster=<col>` and `survey_design.psu` are both supplied with different groupings, a `UserWarning` fires and PSU wins (mirrors `_resolve_effective_cluster` at `survey.py:1253-1275`; TwoStageDiD parity). When `cluster=<col>` is supplied without `survey_design.psu`, the cluster column is injected as the effective PSU via `_inject_cluster_as_psu`, which now honors `SurveyDesign.nest`: under `nest=False`, cluster labels must be globally unique across strata (raises if they repeat, matching the explicit-PSU resolver's contract). **Saturated `df_survey = 0` NaN-fail:** when `lonely_psu="remove"` removes all strata (singleton PSUs), the meat helper returns `(_, var_computed=False, legit_zero=0)` and SpilloverDiD's Wave E.1 path returns NaN meat with a `UserWarning` matching `"df_survey"` so callers can `pytest.warns(UserWarning, match="df_survey")`. This is a **departure from TwoStageDiD** (`two_stage.py:2003-2005`) which currently NaN-fails SILENTLY; Wave E.1 surfaces the diagnostic per `feedback_no_silent_failures`. **Subpopulation limitation (Wave E.3 follow-up):** `SurveyDesign.subpopulation()`-derived designs with zero-weight padding rows that lose stage-1 FE support have those rows physically removed by `finite_mask`, so `n_psu` / `df_survey` / Binder centering reflect the reduced fit sample rather than the full domain design (documented in REGISTRY; Wave E.3 will preserve full-design bookkeeping). **Public surface restrictions:** `vcov_type="conley" + survey_design=` originally raised `NotImplementedError` pointing at planned Wave E.2; lifted in the Wave E.2 entry above (stratified-Conley sandwich on PSU totals). Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` — per Gerber (2026) Appendix A, the IF-reweighting shortcut does not apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit and is queued as a follow-up; non-pweight (`weight_type ∈ {"fweight", "aweight"}`) raises `ValueError` (the Binder TSL assumes probability weights). **Implementation:** `_compute_gmm_corrected_meat` extended with `survey_weights` + `resolved_survey` kwargs at `diff_diff/two_stage.py:56` (TYPE_CHECKING forward reference for `ResolvedSurveyDesign` to avoid circular import); new module-level helper `_compute_binder_tsl_meat` at `diff_diff/two_stage.py` wraps `_compute_stratified_meat_from_psu_scores` with implicit per-obs PSU synthesis for no-PSU survey designs + the Wave E.1 NaN-fail + warning; `_iterative_fe_subset` weighted path at `diff_diff/spillover.py:1382` (in-place extension, bit-identical fallback, positive-weight identification gate); `_inject_cluster_as_psu` honors `nest` (shared survey-helper fix that also benefits TwoStageDiD); `ResolvedSurveyDesign` gains a `nest` field propagated through all 5 construction sites. `SpilloverDiDResults` extended with `survey_metadata`, `n_psu`, `n_strata` fields at `diff_diff/results.py`. **Tests:** new `TestSpilloverDiDWaveE1SurveyDesignHc1` (17 tests: bit-identity fallback, Binder TSL hand-check uniform + non-uniform weights, lonely_psu modes, FPC degenerate limits ×3, saturated NaN-fail with `pytest.warns(match="df_survey")`, cluster+survey warn-and-use-PSU, no-PSU regressions (weights-only, weights+strata, cluster-without-PSU, cluster overlap with nest=False/True), zero-weight Omega_0 exclusion + all-zero raises, replicate-weight + non-pweight + Conley+survey rejections, fit idempotency, finite_mask subsetting) and `TestSpilloverDiDWaveE1SurveyDesignEventStudy` (7 tests: event-study + survey on both `is_staggered` branches with `df_survey` lincom verification, distinguishability between survey-share and sample-share lincom rules via manual reconstruction with cohort-correlated weights + non-constant tau_k, aggregate-vs-event-study parity, drift goldens, subset-path invariant). Wave B/C/D bullets below are unchanged; this entry replaces the pre-Wave-E.1 `survey_design=` rejection.
 
 ## [3.4.0] - 2026-05-19
 
diff --git a/README.md b/README.md
index 761d7046..a9b7e238 100644
--- a/README.md
+++ b/README.md
@@ -106,7 +106,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`.
 - [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html) - Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies
 - [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html) - Borusyak, Jaravel & Spiess (2024) imputation estimator, most efficient under homogeneous effects
 - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html) - Gardner (2022) two-stage estimator with GMM sandwich variance
-- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing; supports survey-design variance under `survey_design=` for HC1/CR1
+- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing; supports survey-design variance under `survey_design=` for HC1 / CR1 (Wave E.1 Binder TSL) and Conley (Wave E.2 panel-aware stratified-Conley sandwich on per-period PSU totals; `conley_lag_cutoff=0` only — serial Bartlett HAC composition queued as follow-up)
 - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - Synthetic DiD combining standard DiD and synthetic control for few treated units
 - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html) - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
 - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html) - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves
diff --git a/TODO.md b/TODO.md
index 9098ef1e..361ca9a2 100644
--- a/TODO.md
+++ b/TODO.md
@@ -134,9 +134,9 @@ Deferred items from PR reviews that were not addressed before merge.
 | SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low |
 | Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError` at the linalg validator. | `linalg.py::_validate_vcov_args` | Phase 5 (spillover-conley) | Medium |
 | `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low |
-| `SpilloverDiD(vcov_type="conley", survey_design=...)` composition (Wave E.2). Wave E.1 ships HC1 / CR1 + survey via Binder TSL (Gerber 2026 Prop 1 + Wave D GMM). Wave E.2 needs the novel within-stratum Conley sandwich on PSU totals — no reference software combines Conley spatial-HAC with Binder TSL on a two-stage IF. Methodologically: aggregate Psi to PSU totals first, demean within stratum, then apply within-stratum Conley sandwich. Strata become an exact independence partition (no kernel weight crosses stratum boundaries by sampling design). | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_gmm_corrected_meat` | follow-up (Wave E.2) | Medium |
 | `SpilloverDiD(survey_design=...)` replicate-weight variance (BRR / Fay / JK1 / JKn / SDR). Wave E.1 ships Taylor-linearization only. Per Gerber (2026) Appendix A, the IF-reweighting shortcut does NOT apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit of stage 1 and stage 2 (200+ LoC of test surface beyond E.1). | `spillover.py::SpilloverDiD.fit`, `survey.py::compute_replicate_refit_variance` | follow-up | Low |
 | `SpilloverDiD(survey_design=...)` subpopulation preservation (Wave E.3). Wave E.1's `finite_mask` block physically removes zero-weight rows that lose stage-1 FE support, so `SurveyDesign.subpopulation()`-derived designs see `n_psu` / `df_survey` / Binder centering recomputed on the reduced fit sample rather than the full domain design. Standard domain-estimation practice (R `survey::svyrecvar` on a `subset()` design) preserves the original PSU/strata counts and treats out-of-domain rows as zero-score padding. Fix requires separating fit-sample alignment (Psi array) from design-level bookkeeping: preserve a full-design `resolved_survey` for inference metadata + zero-pad dropped zero-weight rows' IF contribution. Add `SurveyDesign.subpopulation()` regression test to lock the contract. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_binder_tsl_meat` | follow-up (Wave E.3) | Medium |
+| `SpilloverDiD(vcov_type="conley", conley_lag_cutoff > 0, survey_design=...)` serial Bartlett HAC composition. Wave E.2 ships the panel-aware `conley_lag_cutoff = 0` case ("within-period spatial only" — `sum_t sum_h M_h_t` per `tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant`) and raises `NotImplementedError` upfront at `spillover.py:fit` on `conley_lag_cutoff > 0`. The serial Bartlett component (within-unit / within-PSU temporal HAC at lag ≤ L) needs to compose with the panel-aware stratified-Conley spatial sandwich — the natural addition is `meat_serial = sum_g sum_{|t-s|<=L, t!=s} (1 - |t-s|/(L+1)) * (S_psu_t[g] - S_bar_h_t)(S_psu_s[g] - S_bar_h_s)'` per PSU, summed across all PSUs in each stratum, with appropriate Binder FPC scaling — plus a methodology call on whether to include cross-period spatial pairs in the serial term. Regression goldens vs the cross-sectional limit (lag=0, which is now the shipped path). | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_stratified_conley_meat` | follow-up (Wave E.2 follow-up) | Medium |
 | `SpilloverDiD(ring_method="count")` extension. Currently only the nearest-treated-ring specification is exposed. Count-of-treated-in-ring (paper Section 3.2 end) is methodologically supported by Butts but re-introduces functional-form dependence; expose with an explicit kwarg gate and documentation warning. | `spillover.py::SpilloverDiD.fit` | follow-up | Low |
 | `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low |
 | `SpilloverDiD` T22 TVA tutorial (`docs/tutorials/22_spillover_did.ipynb`): synthetic TVA-style DGP reproducing Butts (2021) Section 4 Table 1 Panel A bias-correction direction (~40% understatement). Split from the methodology PR per user-confirmed scope split (2026-05-15). | `docs/tutorials/`, `tests/test_t22_*_drift.py` | follow-up (Wave B) | Medium |
diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt
index c0dc4936..6ae688fe 100644
--- a/diff_diff/guides/llms.txt
+++ b/diff_diff/guides/llms.txt
@@ -58,7 +58,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")`
 - [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html): Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies
 - [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html): Borusyak, Jaravel & Spiess (2024) imputation estimator — most efficient under homogeneous effects
 - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html): Gardner (2022) two-stage estimator with GMM sandwich variance
-- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=<col>` for CR1 via Gerber (2026) Binder TSL composed with Wave D Gardner GMM correction (Conley × survey + replicate weights queued as follow-up)
+- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=<col>` for CR1 via Gerber (2026) Binder TSL (Wave E.1) and under `vcov_type="conley"` via a panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2; `conley_lag_cutoff=0` only — serial Bartlett HAC composition queued as follow-up), both composed with the Wave D Gardner GMM correction (replicate weights queued as follow-up)
 - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html): Synthetic DiD combining standard DiD and synthetic control methods for few treated units
 - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html): Triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
 - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html): Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves
diff --git a/diff_diff/spillover.py b/diff_diff/spillover.py
index fe0466a3..a1f0db11 100644
--- a/diff_diff/spillover.py
+++ b/diff_diff/spillover.py
@@ -2190,22 +2190,43 @@ def fit(
         heteroskedasticity-robust SE with the GMM correction.
         """
         # Wave E.1: lift the Wave B/C/D upfront survey_design rejection.
-        # The full resolution block (pweight gate, replicate gate, unit-constant
+        # Wave E.2 (this PR): conley × survey is now supported via a
+        # stratified-Conley sandwich on PSU totals (composition of Conley
+        # 1999 + Gerber 2026 Prop 1 Binder TSL + Wave D Gardner GMM). The
+        # full resolution block (pweight gate, replicate gate, unit-constant
         # check, cluster-vs-PSU warn) runs AFTER `_validate_spillover_inputs`
         # below so it sees the panel columns the validator guarantees.
         #
-        # The conley × survey composition is genuinely novel methodology
-        # (no reference software combines spatial-HAC + Binder TSL on a
-        # two-stage IF) and ships separately as Wave E.2. Reject upfront so
-        # users get the pointer without waiting through stage 1 / 2 work.
-        if survey_design is not None and self.vcov_type == "conley":
+        # Wave E.2 scope-limit (upfront, before resolution / panel work):
+        # the panel-block Conley HAC (`conley_lag_cutoff > 0`) is NOT
+        # composed with the survey path in this PR. The stratified-Conley
+        # helper applies a cross-sectional kernel on PSU-aggregated totals;
+        # composing the within-unit serial Bartlett HAC with the within-
+        # stratum cross-PSU spatial kernel requires carrying PSU-by-time
+        # scores into the meat construction, which is a separate Wave E.x
+        # follow-up tracked in TODO.md. Reject upfront with a clear pointer
+        # so users running `survey_design=` + `conley_lag_cutoff > 0` get
+        # the error before stage-1 / 2 work (per `feedback_no_silent_failures`).
+        if (
+            survey_design is not None
+            and self.vcov_type == "conley"
+            and self.conley_lag_cutoff is not None
+            and self.conley_lag_cutoff > 0
+        ):
             raise NotImplementedError(
-                "SpilloverDiD does not yet support vcov_type='conley' "
-                "combined with survey_design=. Wave E.2 (planned) will "
-                "compose Conley spatial-HAC with within-stratum Conley "
-                "sandwich on PSU totals; see TODO.md for the planned PR. "
-                "For Wave E.1, use vcov_type='hc1' (with optional "
-                "cluster=<col> for CR1) plus survey_design=."
+                "SpilloverDiD(vcov_type='conley', conley_lag_cutoff > 0) "
+                "combined with survey_design= is not supported in Wave E.2. "
+                "The Wave E.2 stratified-Conley sandwich aggregates Psi to "
+                "PSU totals before applying the cross-sectional Conley "
+                "kernel; the panel-block decomposition (within-unit serial "
+                "Bartlett HAC over time) would require carrying PSU-by-time "
+                "scores and composing the serial kernel with the within-"
+                "stratum cross-PSU spatial kernel. This composition is "
+                "queued as a follow-up (see TODO.md). For Wave E.2, use "
+                "conley_lag_cutoff=0 (cross-sectional Conley) with "
+                "survey_design=, or use survey_design= with "
+                "vcov_type='hc1' (+ cluster=<col> for CR1) for the full "
+                "Wave E.1 path."
             )
         # Validate `anticipation` up front: must be a non-negative integer.
         # Accepting fractional or negative values would silently shift
diff --git a/diff_diff/survey.py b/diff_diff/survey.py
index f2461501..89a75a60 100644
--- a/diff_diff/survey.py
+++ b/diff_diff/survey.py
@@ -1898,6 +1898,160 @@ def _compute_stratified_meat_from_psu_scores(
     return meat, _variance_computed, legitimate_zero_count
 
 
+def _compute_stratified_conley_meat_from_psu_scores(
+    psu_scores: np.ndarray,
+    psu_strata: np.ndarray,
+    psu_coords: np.ndarray,
+    *,
+    cutoff: float,
+    metric,
+    kernel: str,
+    fpc_per_psu: "Optional[np.ndarray]" = None,
+    lonely_psu: str = "remove",
+) -> Tuple[np.ndarray, bool, int]:
+    """Wave E.2 stratified-Conley meat on PSU-aggregated scores.
+
+    Composes Conley (1999) spatial-HAC with Gerber (2026, arXiv:2605.04124)
+    Proposition 1 Binder TSL (the Wave E.1 foundation) and the Wave D
+    Gardner GMM first-stage uncertainty correction (Butts 2021 ss3.1 +
+    Gardner 2022 ss4). Used by SpilloverDiD's Wave E.2 GMM sandwich when
+    ``vcov_type="conley"`` is combined with ``survey_design=``.
+
+    Per-stratum loop: demean PSU scores within the stratum, apply the
+    cross-sectional Conley kernel between PSU centroids in that stratum,
+    scale by the Binder finite-population correction
+    ``(1 - f_h) * n_h/(n_h-1)``. Cross-stratum kernel weights are zero by
+    sampling design (strata are exact independence partitions); total meat
+    is the sum across strata.
+
+    Parameters
+    ----------
+    psu_scores : np.ndarray
+        Score matrix of shape (G, k) — one row per PSU.
+    psu_strata : np.ndarray
+        Stratum assignment per PSU, shape (G,).
+    psu_coords : np.ndarray
+        Per-PSU spatial centroid coordinates, shape (G, 2). Typically the
+        mean of per-observation ``conley_coords`` within each PSU.
+    cutoff : float
+        Conley spatial-HAC bandwidth in the same units as ``psu_coords``
+        (km when ``metric="haversine"``).
+    metric : str or callable
+        Distance metric; ``"haversine"`` / ``"euclidean"`` / callable per
+        :mod:`diff_diff.conley` (``ConleyMetric``).
+    kernel : str
+        Spatial kernel: ``"bartlett"`` or ``"uniform"``.
+    fpc_per_psu : np.ndarray, optional
+        FPC population size per PSU, shape (G,). All PSUs in the same
+        stratum should share the same FPC value (first occurrence used).
+    lonely_psu : str
+        How to handle singleton strata: ``"remove"``, ``"certainty"``, or
+        ``"adjust"``. Matches the existing
+        :func:`_compute_stratified_meat_from_psu_scores` behaviour exactly,
+        including the ``"adjust"`` branch's ``continue`` that skips FPC
+        scaling (with ``n_h=1`` the scale ``n_h/(n_h-1)`` would divide by
+        zero).
+
+    Returns
+    -------
+    meat : np.ndarray
+        Meat matrix of shape (k, k).
+    variance_computed : bool
+        Whether any actual variance computation happened.
+    legitimate_zero_count : int
+        Number of strata that legitimately contribute zero variance.
+
+    Notes
+    -----
+    Reduction semantics (load-bearing for tests):
+
+    - bandwidth -> 0 (Bartlett: ``K(d/tiny) = 0`` for ``d > 0`` and
+      ``K(0) = 1`` on the diagonal so K is the identity matrix): the
+      within-stratum sandwich ``sum_{j,k} K_jk c_j c_k' = sum_j c_j c_j'
+      = centered.T @ centered``, which is precisely Binder's formula at
+      :func:`_compute_stratified_meat_from_psu_scores`.
+    - Single stratum (H = 1, FPC = inf): reduces to ordinary Conley
+      sandwich on PSU totals via :func:`diff_diff.conley._compute_conley_meat`.
+
+    No reference software combines all three ingredients (Conley
+    spatial-HAC + Binder TSL + Gardner GMM correction) on a two-stage
+    influence function.
+    """
+    from diff_diff.conley import _compute_conley_meat
+
+    if psu_scores.ndim == 1:
+        psu_scores = psu_scores[:, np.newaxis]
+    k = psu_scores.shape[1]
+    meat = np.zeros((k, k))
+
+    unique_strata = np.unique(psu_strata)
+    _variance_computed = False
+    legitimate_zero_count = 0
+
+    _global_psu_mean = None
+    if lonely_psu == "adjust":
+        _global_psu_mean = psu_scores.mean(axis=0, keepdims=True)
+
+    for h in unique_strata:
+        mask_h = psu_strata == h
+        scores_h = psu_scores[mask_h]
+        coords_h = psu_coords[mask_h]
+        n_psu_h = scores_h.shape[0]
+
+        if n_psu_h < 2:
+            if lonely_psu == "remove":
+                continue
+            elif lonely_psu == "certainty":
+                legitimate_zero_count += 1
+                continue
+            elif lonely_psu == "adjust":
+                # Degenerate one-PSU kernel K = [[K(0)]] = [[1.0]] for both
+                # Bartlett and uniform; equivalent to centered.T @ centered.
+                # MUST `continue` to skip the FPC block below — with n_h = 1
+                # the scale n_h/(n_h-1) divides by zero. Mirrors the Binder
+                # helper's singleton-adjust branch exactly.
+                centered = scores_h - _global_psu_mean
+                with np.errstate(invalid="ignore", over="ignore"):
+                    meat += centered.T @ centered
+                _variance_computed = True
+                continue
+
+        f_h = 0.0
+        if fpc_per_psu is not None:
+            N_h = fpc_per_psu[mask_h][0]
+            if N_h < n_psu_h:
+                raise ValueError(
+                    f"FPC ({N_h}) is less than the number of PSUs "
+                    f"({n_psu_h}) in stratum. FPC must be >= n_PSU."
+                )
+            f_h = n_psu_h / N_h
+            if f_h >= 1.0:
+                legitimate_zero_count += 1
+
+        psu_mean_h = scores_h.mean(axis=0, keepdims=True)
+        centered = scores_h - psu_mean_h
+
+        # Within-stratum Conley sandwich on PSU-centered scores. Pass
+        # ``cluster_ids=None`` explicitly: after PSU aggregation every PSU
+        # is its own cluster, so a cluster product kernel would zero all
+        # cross-PSU pairs. See Wave E.2 plan Chunk 3 step 4.
+        conley_meat_h = _compute_conley_meat(
+            centered,
+            coords_h,
+            cutoff,
+            metric,
+            kernel,
+            cluster_ids=None,
+        )
+
+        adjustment = (1.0 - f_h) * (n_psu_h / (n_psu_h - 1))
+        with np.errstate(invalid="ignore", over="ignore"):
+            meat += adjustment * conley_meat_h
+        _variance_computed = True
+
+    return meat, _variance_computed, legitimate_zero_count
+
+
 def compute_survey_vcov(
     X: np.ndarray,
     residuals: np.ndarray,
diff --git a/diff_diff/two_stage.py b/diff_diff/two_stage.py
index 370560f4..51c15bed 100644
--- a/diff_diff/two_stage.py
+++ b/diff_diff/two_stage.py
@@ -125,9 +125,20 @@ def _compute_gmm_corrected_meat(
       TSL has its own ``(1-f_h) * n_h/(n_h-1)`` correction).
     - ``vcov_type="cluster"``: ``cluster_ids`` IS the PSU (via upstream
       ``_inject_cluster_as_psu``); identical to the HC1+survey branch.
-    - ``vcov_type="conley"``: raises ``NotImplementedError``. Wave E.2
-      (planned) will compose Conley spatial-HAC with within-stratum
-      Conley sandwich on PSU totals.
+    - ``vcov_type="conley"`` (cross-sectional only — ``conley_lag_cutoff = 0``):
+      Wave E.2 stratified-Conley sandwich on PSU totals via
+      :func:`_compute_stratified_conley_meat`. Aggregates Psi to PSU
+      totals + derives per-PSU centroids as the mean of per-obs
+      ``conley_coords``; for each stratum applies the Conley kernel
+      between PSU centroids scaled by ``(1 - f_h) * n_h/(n_h-1)``.
+      Cross-stratum kernel weights are zero by sampling design.
+    - ``vcov_type="conley"`` with ``conley_lag_cutoff > 0`` (panel-block
+      Conley): raises ``NotImplementedError`` upstream at
+      ``SpilloverDiD.fit``. The panel-block decomposition would need to
+      compose the within-unit serial Bartlett HAC with the within-stratum
+      cross-PSU spatial kernel on PSU-by-time scores rather than the
+      collapsed PSU totals; out of Wave E.2 scope and tracked as a
+      follow-up in ``TODO.md``.
 
     **`gamma_hat` solve** (mirror of `TwoStageDiD._compute_gmm_variance`
     pattern at `two_stage.py:1886-1917`): factorize ``X_10' W X_10`` via
@@ -212,16 +223,10 @@ def _compute_gmm_corrected_meat(
             cluster_ids=cluster_ids,
         )
 
-    # Wave E.1: reject the conley × survey composition. Wave E.2 (planned)
-    # will add the within-stratum Conley sandwich on PSU totals.
-    if vcov_type == "conley" and resolved_survey is not None:
-        raise NotImplementedError(
-            "SpilloverDiD does not yet support vcov_type='conley' combined "
-            "with survey_design=. Wave E.2 (planned) will compose Conley "
-            "spatial-HAC with within-stratum Conley sandwich on PSU totals; "
-            "see TODO.md for the planned PR. For now, use vcov_type='hc1' "
-            "(+ cluster=<col> for CR1) with survey_design=."
-        )
+    # Wave E.2 (this PR): conley × survey is now supported via the
+    # stratified-Conley sandwich on PSU totals. Dispatch happens inside
+    # the vcov_type == "conley" branch below (Wave E.1 already routed
+    # hc1 / cluster + survey to the Binder TSL helper).
 
     # 1. gamma_hat = (X_10' W X_10)^{-1} (X_1' W X_2). Mirror the existing
     #    TwoStageDiD method at two_stage.py:1886-1917 — sparse_factorized
@@ -358,19 +363,37 @@ def _compute_gmm_corrected_meat(
                 "_compute_gmm_corrected_meat: vcov_type='conley' requires "
                 "conley_coords, conley_cutoff_km, and conley_metric."
             )
-        # Delegate to the shared kernel-application helper. No finite-sample
-        # multiplier on the Conley path (matches conleyreg / Wave B convention).
-        meat = _compute_conley_meat(
-            Psi,
-            conley_coords,
-            conley_cutoff_km,
-            conley_metric,
-            conley_kernel,
-            time=conley_time,
-            unit=conley_unit,
-            lag_cutoff=conley_lag_cutoff,
-            cluster_ids=cluster_ids,
-        )
+        if resolved_survey is not None:
+            # Wave E.2: stratified-Conley sandwich on PSU totals. cluster_ids
+            # is intentionally NOT threaded through — after PSU aggregation
+            # every PSU is its own cluster, so a cluster product kernel
+            # would zero all cross-PSU pairs. Wave E.1's
+            # _resolve_effective_cluster path already coerced any
+            # user-supplied cluster=<col> into PSU upstream.
+            meat = _compute_stratified_conley_meat(
+                Psi,
+                conley_coords=np.asarray(conley_coords, dtype=np.float64),
+                conley_cutoff_km=conley_cutoff_km,
+                conley_metric=conley_metric,
+                conley_kernel=conley_kernel,
+                resolved_survey=resolved_survey,
+                conley_time=conley_time,  # panel-aware per-period sandwich
+            )
+        else:
+            # Wave D no-survey Conley path UNCHANGED — bit-identical fallback.
+            # No finite-sample multiplier on the Conley path (matches conleyreg
+            # / Wave B convention).
+            meat = _compute_conley_meat(
+                Psi,
+                conley_coords,
+                conley_cutoff_km,
+                conley_metric,
+                conley_kernel,
+                time=conley_time,
+                unit=conley_unit,
+                lag_cutoff=conley_lag_cutoff,
+                cluster_ids=cluster_ids,
+            )
     else:
         raise ValueError(
             f"_compute_gmm_corrected_meat: vcov_type must be one of "
@@ -514,6 +537,263 @@ def _compute_binder_tsl_meat(
     return meat
 
 
+def _compute_stratified_conley_meat(
+    Psi: np.ndarray,
+    *,
+    conley_coords: np.ndarray,
+    conley_cutoff_km: float,
+    conley_metric,
+    conley_kernel: str,
+    resolved_survey: "ResolvedSurveyDesign",
+    conley_time: Optional[np.ndarray] = None,
+) -> np.ndarray:
+    """Wave E.2 panel-aware stratified-Conley meat on PSU-by-time scores.
+
+    Composes Conley (1999) spatial-HAC with Gerber (2026, arXiv:2605.04124)
+    Proposition 1 Binder TSL (the Wave E.1 foundation) and the Wave D
+    Gardner GMM first-stage uncertainty correction (Butts 2021 ss3.1 +
+    Gardner 2022 ss4) applied to SpilloverDiD's ring-indicator stage-2
+    design. No reference software combines all three ingredients on a
+    two-stage influence function.
+
+    **Panel-aware composition (preserves the library's panel Conley
+    contract):** for each period ``t``, aggregate per-obs Psi to PSU
+    totals WITHIN that period (``S_psu_t[g] = sum_{i in PSU g, time t}
+    Psi[i]``); derive each PSU's spatial centroid as the mean of
+    per-observation ``conley_coords`` (panel-constant — PSU is a sampling
+    unit with fixed location); apply the per-stratum Conley sandwich on
+    ``S_psu_t`` via
+    :func:`diff_diff.survey._compute_stratified_conley_meat_from_psu_scores`
+    (Binder FPC factor ``(1 - f_h) * n_h/(n_h-1)``); sum across periods.
+    Cross-period spatial pairs are excluded by construction, matching the
+    library's existing ``conley_lag_cutoff = 0`` semantic ("within-period
+    spatial only") at :func:`diff_diff.conley._compute_conley_meat`.
+    Cross-stratum kernel weights are zero by sampling design (strata are
+    exact independence partitions).
+
+    Parameters
+    ----------
+    Psi : np.ndarray of shape (n, p_2)
+        Per-obs Wave D Gardner GMM influence-function scores (already
+        Hajek-weighted via the Wave E.1 upstream eps multiplication).
+    conley_coords : np.ndarray of shape (n, 2)
+        Per-observation lat/lon (or generic 2D coordinates). Already
+        validated finite upstream at ``spillover.py:_validate_spillover_inputs``;
+        no defensive finiteness check on derived PSU centroids.
+    conley_cutoff_km : float
+        Conley spatial-HAC bandwidth in km (haversine) or the
+        coord units (euclidean / callable).
+    conley_metric : ConleyMetric
+        ``"haversine"`` / ``"euclidean"`` / callable, per
+        :mod:`diff_diff.conley`.
+    conley_kernel : str
+        ``"bartlett"`` or ``"uniform"``.
+    resolved_survey : ResolvedSurveyDesign
+        ``.psu`` may be None; when absent, each observation is treated as
+        its own singleton PSU (matches the implicit-PSU convention of
+        :class:`ResolvedSurveyDesign` no-PSU branches). ``.strata`` and
+        ``.fpc`` are optional; absent strata synthesize a single stratum.
+    conley_time : np.ndarray of shape (n,), optional
+        Per-observation period label. When None, all observations are
+        treated as a single period (T = 1; the per-period loop reduces to
+        one iteration on the full Psi, which is the cross-sectional
+        Wave E.2 design). When provided (the standard SpilloverDiD case),
+        the per-period loop preserves the within-period spatial semantic.
+
+    Returns
+    -------
+    meat : np.ndarray of shape (p_2, p_2)
+        Wave E.2 panel-aware stratified-Conley meat
+        (``sum_t meat_t`` where ``meat_t`` is the within-stratum Conley
+        sandwich on the period-``t`` PSU totals).
+
+    Notes
+    -----
+    ``cluster_ids`` is intentionally not accepted: after PSU aggregation
+    every PSU is its own cluster, so threading a cluster product kernel
+    into the inner :func:`_compute_stratified_conley_meat_from_psu_scores`
+    would zero all cross-PSU pairs (``1{cluster_j == cluster_k}`` = 0 for
+    j != k). The Wave E.1 ``_resolve_effective_cluster`` path already
+    collapsed any user-supplied ``cluster=<col>`` into PSU upstream.
+
+    NaN-fails (with ``UserWarning``) when the inner survey helper
+    returns ``(False, 0)`` for every period — i.e. no stratum contributed
+    variance and none was a legitimate zero across any period. Mirrors the
+    Wave E.1 Binder TSL saturation behavior; departs from TwoStageDiD's
+    silent NaN-VCV at ``two_stage.py:2003-2005`` per
+    ``feedback_no_silent_failures``.
+
+    Reductions:
+
+    - ``T = 1`` (single period or ``conley_time is None``): single-pass
+      stratified-Conley sandwich on the full PSU totals (the original
+      cross-sectional Wave E.2 design).
+    - ``H = 1`` stratum, ``FPC = inf``: reduces to ``sum_t`` plain
+      Conley sandwich on per-period PSU totals.
+    - Bandwidth -> 0 (``K = I``): reduces to ``sum_t`` per-period
+      within-stratum HC sandwich on PSU totals (NOT Wave E.1 Binder,
+      which is over time-collapsed PSU totals).
+
+    Out of scope (deferred follow-up, tracked in TODO.md):
+
+    - ``conley_lag_cutoff > 0`` panel-block: the within-PSU serial
+      Bartlett HAC over time would compose with the spatial sandwich
+      here. Rejected upfront at ``SpilloverDiD.fit``.
+    """
+    from diff_diff.survey import _compute_stratified_conley_meat_from_psu_scores
+
+    p_2 = Psi.shape[1]
+    n_obs = Psi.shape[0]
+    coords_arr = np.asarray(conley_coords, dtype=np.float64)
+
+    # No-PSU fallback: each obs is its own singleton PSU. Matches Wave E.1
+    # Binder TSL convention at _compute_binder_tsl_meat L450-451.
+    if resolved_survey.psu is None:
+        psu_arr: np.ndarray = np.arange(n_obs, dtype=np.int64)
+    else:
+        psu_arr = np.asarray(resolved_survey.psu)
+    strata_arr_full = (
+        np.asarray(resolved_survey.strata) if resolved_survey.strata is not None else None
+    )
+    fpc_arr_full = (
+        np.asarray(resolved_survey.fpc, dtype=np.float64)
+        if resolved_survey.fpc is not None
+        else None
+    )
+
+    # Panel-constant PSU centroids for explicit-PSU layouts (R4 P1 fix).
+    # The Wave E.2 registry / api contract specifies
+    # ``centroid_g = mean over i in PSU g of conley_coords[i]`` (panel-wide,
+    # not per-period). For a PSU containing multiple units at different
+    # coordinates with finite_mask dropping different members across
+    # periods, per-period recomputation would silently shift the spatial
+    # kernel weights — that would be a documented-contract violation.
+    # Compute once on the full active sample so each period's helper call
+    # sees the SAME centroid for the same PSU.
+    #
+    # For implicit-PSU (pseudo-PSU = obs index), every pseudo-PSU appears
+    # in exactly one period, so the per-period slice naturally produces
+    # the obs's own coordinate as that pseudo-PSU's centroid — no precompute
+    # needed. The dictionary stays None on that branch.
+    coord_dim = coords_arr.shape[1]
+    psu_value_to_centroid: Optional[dict] = None
+    if resolved_survey.psu is not None:
+        unique_psus_full, _, psu_indices_full = np.unique(
+            psu_arr, return_index=True, return_inverse=True
+        )
+        G_full = len(unique_psus_full)
+        psu_coord_sums_full = np.zeros((G_full, coord_dim))
+        for d in range(coord_dim):
+            np.add.at(psu_coord_sums_full[:, d], psu_indices_full, coords_arr[:, d])
+        psu_counts_full = np.bincount(psu_indices_full, minlength=G_full).astype(np.float64)
+        psu_centroids_full = psu_coord_sums_full / psu_counts_full[:, None]
+        psu_value_to_centroid = {unique_psus_full[g]: psu_centroids_full[g] for g in range(G_full)}
+
+    # Per-period loop: preserves the library's "within-period spatial only"
+    # contract for conley_lag_cutoff = 0. PSU set, centroids, strata, and
+    # FPC are re-built from the ACTIVE rows in each period (not from the
+    # full panel) so implicit-PSU layouts (`resolved_survey.psu is None`,
+    # i.e. one pseudo-PSU per observation) don't drag off-period
+    # zero-padded entries into the kernel via centering. For explicit-PSU
+    # balanced-panel layouts the per-period centroids equal the
+    # panel-constant centroids (obs coords are time-invariant), so this
+    # re-indexing is bit-identical to the prior naive panel-wide PSU
+    # mapping on that branch.
+    if conley_time is None:
+        # Treat all obs as one period (cross-sectional fallback).
+        time_arr = np.zeros(n_obs, dtype=np.int64)
+    else:
+        time_arr = np.asarray(conley_time)
+    unique_times = np.unique(time_arr)
+
+    # Saturation guard for unstratified single-PSU on the FULL panel.
+    # The per-period helper invocation will also NaN-fail when no period
+    # contributes variance, but this front-door check matches Wave E.1's
+    # ergonomic "df_survey is undefined" message for the panel-level
+    # degenerate case.
+    if strata_arr_full is None and len(np.unique(psu_arr)) < 2:
+        G_total = len(np.unique(psu_arr))
+        warnings.warn(
+            "SpilloverDiD Wave E.2 stratified-Conley sandwich: df_survey is "
+            f"undefined (single PSU, no strata; G={G_total}). Returning NaN "
+            "meat so downstream inference NaN-propagates.",
+            UserWarning,
+            stacklevel=2,
+        )
+        return np.full((p_2, p_2), np.nan)
+
+    meat = np.zeros((p_2, p_2))
+    _variance_computed = False
+    _legit_zero = 0
+    for t in unique_times:
+        period_mask = time_arr == t
+        Psi_t = Psi[period_mask]
+        psu_arr_t = psu_arr[period_mask]
+        coords_arr_t = coords_arr[period_mask]
+        unique_psus_t, first_idx_t, psu_indices_t = np.unique(
+            psu_arr_t, return_index=True, return_inverse=True
+        )
+        G_t = len(unique_psus_t)
+
+        # Per-period PSU totals.
+        S_psu_t = np.zeros((G_t, p_2))
+        for j in range(p_2):
+            np.add.at(S_psu_t[:, j], psu_indices_t, Psi_t[:, j])
+
+        # Per-period PSU centroids: panel-constant for explicit-PSU
+        # (look up from the precomputed dict to match the documented
+        # ``centroid_g = mean over i in PSU g of conley_coords[i]``
+        # panel-wide contract); per-period mean for implicit-PSU
+        # (pseudo-PSU = obs, each appears in exactly one period, so the
+        # per-period mean IS the obs's own coord).
+        if psu_value_to_centroid is not None:
+            psu_centroids_t = np.array([psu_value_to_centroid[v] for v in unique_psus_t])
+        else:
+            psu_coord_sums_t = np.zeros((G_t, coord_dim))
+            for d in range(coord_dim):
+                np.add.at(psu_coord_sums_t[:, d], psu_indices_t, coords_arr_t[:, d])
+            psu_counts_t = np.bincount(psu_indices_t, minlength=G_t).astype(np.float64)
+            psu_centroids_t = psu_coord_sums_t / psu_counts_t[:, None]
+
+        # Per-period strata + fpc.
+        if strata_arr_full is not None:
+            psu_strata_t = strata_arr_full[period_mask][first_idx_t]
+        else:
+            psu_strata_t = np.zeros(G_t, dtype=int)
+        psu_fpc_t: Optional[np.ndarray] = None
+        if fpc_arr_full is not None:
+            psu_fpc_t = fpc_arr_full[period_mask][first_idx_t]
+
+        # Stratified Conley sandwich for period t.
+        meat_t, var_t, legit_zero_t = _compute_stratified_conley_meat_from_psu_scores(
+            psu_scores=S_psu_t,
+            psu_strata=psu_strata_t,
+            psu_coords=psu_centroids_t,
+            cutoff=conley_cutoff_km,
+            metric=conley_metric,
+            kernel=conley_kernel,
+            fpc_per_psu=psu_fpc_t,
+            lonely_psu=resolved_survey.lonely_psu,
+        )
+        meat += meat_t
+        _variance_computed = _variance_computed or var_t
+        _legit_zero += legit_zero_t
+
+    # Wave E.2 survey-saturated NaN-fail per `feedback_no_silent_failures`.
+    if not _variance_computed and _legit_zero == 0:
+        warnings.warn(
+            "SpilloverDiD Wave E.2 stratified-Conley sandwich: df_survey = 0 "
+            "(all strata removed by lonely_psu='remove' on single-PSU "
+            "strata; no PSU contributed to the meat). Returning NaN meat "
+            "so downstream inference NaN-propagates.",
+            UserWarning,
+            stacklevel=2,
+        )
+        return np.full((p_2, p_2), np.nan)
+
+    return meat
+
+
 # =============================================================================
 # Main Estimator
 # =============================================================================
diff --git a/docs/api/spillover.rst b/docs/api/spillover.rst
index a78f4f25..5f883619 100644
--- a/docs/api/spillover.rst
+++ b/docs/api/spillover.rst
@@ -243,10 +243,6 @@ and planned follow-up enhancements:
 
   Restrictions:
 
-  - ``vcov_type="conley" + survey_design=`` raises
-    ``NotImplementedError``; Wave E.2 (planned) will add the Conley × survey
-    product-kernel synthesis with within-stratum Conley sandwich on PSU
-    totals.
   - Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises
     ``NotImplementedError``; per Gerber (2026) Appendix A, the
     IF-reweighting shortcut does not apply because ``gamma_hat`` is
@@ -266,6 +262,70 @@ and planned follow-up enhancements:
     centering reflect the reduced fit sample rather than the full
     domain design. See the REGISTRY note for details and the Wave E.3
     follow-up tracked in ``TODO.md``.
+- **Survey-design integration (Wave E.2 — Conley × survey via
+  stratified-Conley sandwich on PSU totals).** SHIPPED in Wave E.2.
+  ``vcov_type="conley" + survey_design=`` is now supported via a
+  per-stratum Conley sandwich applied to PSU-aggregated Wave D Gardner
+  GMM influence functions.
+
+  .. note::
+
+     Wave E.2 composes Conley (1999) spatial-HAC with Gerber (2026,
+     arXiv:2605.04124) Proposition 1 Binder TSL (the Wave E.1 foundation)
+     and the Wave D Gardner GMM first-stage uncertainty correction
+     (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's
+     ring-indicator stage-2 design. The composition is **panel-aware** —
+     it preserves the library's existing ``conley_lag_cutoff = 0``
+     semantic ("within-period spatial only, exclude cross-period pairs")
+     by looping over periods and aggregating Psi to PSU totals WITHIN
+     each period (not over the whole panel). For each period ``t``,
+     ``S_psu_t[g] = sum_{i in PSU g, time t} psi_i``; per-PSU centroids
+     are panel-constant (mean of per-observation ``conley_coords``);
+     for each stratum the within-stratum sandwich is
+     ``M_h_t = (1 - f_h) * n_h/(n_h-1) * sum_{j,k in PSUs_h}
+     K(d(centroid_j, centroid_k) / cutoff) *
+     (S_psu_t[j] - S_bar_h_t)(S_psu_t[k] - S_bar_h_t)'``, where K is the
+     Bartlett kernel (SpilloverDiD currently exposes Bartlett only and
+     hardcodes it at the fit-call site; the survey helper's ``kernel``
+     parameter can also take ``"uniform"``, but exposing that on the
+     SpilloverDiD constructor is a separate follow-up). Cross-stratum
+     kernel weights are exactly zero by sampling design (strata are
+     exact independence partitions). Total meat is ``sum_t sum_h M_h_t``.
+     Cross-period spatial pairs are excluded by construction. No
+     reference software combines all three ingredients on a two-stage
+     influence function.
+
+  Reduction semantics:
+
+  - Per-period sum invariant: ``sum_t`` of per-period within-stratum
+    stratified-Conley sandwiches on per-period PSU totals. Pinned at
+    ``tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant``
+    (pure unit test on the orchestrator + helper composition).
+  - Single stratum (H = 1, FPC = inf): reduces to ``sum_t`` plain
+    Conley sandwich on per-period PSU totals (NOT on time-collapsed
+    PSU totals — the per-period loop preserves ``lag_cutoff = 0``
+    semantics).
+  - All PSUs singleton + ``lonely_psu="remove"``: ``df_survey = 0`` and
+    the stratified-Conley meat NaN-fails (matches Wave E.1 saturation
+    behaviour, with ``UserWarning`` template "Wave E.2 stratified-Conley
+    sandwich: df_survey = 0...").
+
+  Restrictions:
+
+  - Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises
+    ``NotImplementedError`` (inherits Wave E.1 gate; per-replicate refit
+    is separate follow-up scope).
+  - ``cluster=<col> + survey_design.psu + vcov_type="conley"``:
+    ``cluster=<col>`` is coerced to PSU per Wave E.1's warn-and-use-PSU
+    pattern; the Conley cluster product kernel becomes a no-op after
+    PSU aggregation.
+  - The LinearRegression-side ``vcov_type="conley" + survey_design=``
+    gate at ``diff_diff/linalg.py`` is a separate Bertanha-Imbens 2014
+    weighted-Conley roadmap (not Wave E).
+  - DiagnosticReport routing for ``SpilloverDiDResults(vcov_type="conley",
+    survey_design=)`` is queued for a follow-up (the
+    ``_APPLICABILITY`` / ``_PT_METHOD`` wiring must register the new
+    combination first).
 - **Count-of-treated-in-ring** — only the "nearest-treated ring"
   specification is implemented. The "count" form re-introduces
   functional-form dependence (paper Section 3.2 end) and is queued.
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index ce624616..62b2223a 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -3215,7 +3215,7 @@ The `_compute_stratified_meat_from_psu_scores` helper at `diff_diff/survey.py` i
 
 Degrees of freedom for the t-distribution lookup use `ResolvedSurveyDesign.df_survey` (the standard survey 4-way branch: PSU+strata → `n_PSU - n_strata`; PSU only → `n_PSU - 1`; strata only → `n_obs - n_strata`; neither → `n_obs - 1`). Threaded through all four `safe_inference` call sites: aggregate `tau_total`, per-ring `delta_j`, event-study per-event-time `tau_k` / `delta_jk`, and the scalar `att` lincom in event-study mode.
 
-- **Note (documented synthesis):** Wave E.1 composes Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper's Appendix — with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. The composition is mechanical: SpilloverDiD's Wave D Psi is aggregated to PSU level and passed to the audited Binder TSL meat helper. Survey weights enter via Hájek normalization at the gamma_hat solve, eps construction, and bread inversion. No reference software combines all ingredients; Wave E.2 (planned) will extend with the Conley × survey product-kernel composition.
+- **Note (documented synthesis):** Wave E.1 composes Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper's Appendix — with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. The composition is mechanical: SpilloverDiD's Wave D Psi is aggregated to PSU level and passed to the audited Binder TSL meat helper. Survey weights enter via Hájek normalization at the gamma_hat solve, eps construction, and bread inversion. No reference software combines all ingredients; Wave E.2 extends with the Conley × survey product-kernel composition — see "Variance (Wave E.2)" subsection below.
 - **Note (warn-and-use-PSU for cluster + survey):** when both `cluster=<col>` and `survey_design.psu` are supplied with **different groupings**, the cluster argument emits a `UserWarning` and is overridden by PSU (mirrors `TwoStageDiD._resolve_effective_cluster`). PSU is the design-relevant cluster on survey panels; `cluster=<col>` on SpilloverDiD is more often a spatial / unit-level label, so the design constraint wins. When both knobs are supplied with the **same** groupings, no warning fires and PSU still takes precedence (the inference is unchanged either way).
 - **Note (limitation — `SurveyDesign.subpopulation()` with FE-undefined zero-weight rows):** when `survey_design` is built via `SurveyDesign.subpopulation()` (or otherwise carries zero-weight padding rows) AND those zero-weight rows lose stage-1 FE support (warn-and-drop unit path), Wave E.1's `finite_mask` block physically removes them from the survey design rather than retaining them as zero-score padding. Consequently `n_psu`, `df_survey`, and the Binder TSL centering are recomputed on the reduced fit sample rather than the full domain design. Standard domain-estimation practice (e.g. R's `survey::svyrecvar` on a `subset()` design) preserves the original PSU/strata counts. Practitioners using subpopulation-derived designs should expect SEs that may differ slightly from textbook domain expectations on warn-and-drop fits. Tracked as Wave E.3 follow-up — see TODO.md.
 - **Note (saturated `df_survey = 0` NaN-fail):** when `lonely_psu="remove"` removes all strata (single PSU per stratum), `_compute_stratified_meat_from_psu_scores` returns `(_, var_computed=False, legit_zero=0)`. SpilloverDiD's Wave E.1 path returns NaN meat with a `UserWarning` matching `"df_survey"` so callers can pin via `pytest.warns(UserWarning, match="df_survey")`. This is a **departure from TwoStageDiD** (`two_stage.py:2003-2005`) which currently NaN-fails SILENTLY; Wave E.1 surfaces the diagnostic per `feedback_no_silent_failures`.
@@ -3224,6 +3224,29 @@ Degrees of freedom for the t-distribution lookup use `ResolvedSurveyDesign.df_su
 
 **Implementation:** `_compute_gmm_corrected_meat` extended with `survey_weights` + `resolved_survey` kwargs at `diff_diff/two_stage.py:56`; new module-level helper `_compute_binder_tsl_meat` at `diff_diff/two_stage.py` wraps `_compute_stratified_meat_from_psu_scores` with the Wave E.1 NaN-fail + warning. `_iterative_fe_subset` weighted path at `diff_diff/spillover.py:1382` (in-place extension, bit-identical fallback). `SpilloverDiDResults` extended with `survey_metadata`, `n_psu`, `n_strata` fields at `diff_diff/results.py`. Tests: `TestSpilloverDiDWaveE1SurveyDesignHc1` + `TestSpilloverDiDWaveE1SurveyDesignEventStudy` at `tests/test_spillover.py`.
 
+### Variance (Wave E.2 — Conley × survey via stratified-Conley sandwich on PSU totals)
+
+`vcov_type="conley" + survey_design=` is now supported via a per-stratum Conley sandwich applied to PSU-aggregated Wave D Gardner GMM influence functions. SHIPPED in Wave E.2.
+
+- **Note (documented synthesis):** Wave E.2 composes Conley (1999) spatial-HAC with Gerber (2026, arXiv:2605.04124) Proposition 1 Binder TSL (the Wave E.1 foundation) and the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. The composition is **panel-aware** — it preserves the library's existing `conley_lag_cutoff = 0` semantic ("within-period spatial only — exclude cross-period pairs") at `diff_diff.conley._compute_conley_meat`. Per-PSU centroids are computed as `centroid_g = mean over i in PSU g of conley_coords[i]` (panel-constant — PSU is a sampling unit with fixed location). For each period `t`, SpilloverDiD's per-obs Wave D IF `psi_i` is aggregated to per-period PSU totals `S_psu_t[g] = sum_{i in PSU g, time t} psi_i`; the within-stratum sandwich is `M_h_t = (1 - f_h) * n_h/(n_h-1) * sum_{j,k in PSUs_h} K(d(centroid_j, centroid_k) / cutoff) * (S_psu_t[j] - S_bar_h_t)(S_psu_t[k] - S_bar_h_t)'`, where K is the Bartlett kernel (SpilloverDiD currently exposes Bartlett only and hardcodes it at the fit-call site; the survey helper's `kernel` parameter can also take `"uniform"`, but exposing that on the SpilloverDiD constructor is a separate follow-up) and `d` is haversine / euclidean / callable per `ConleyMetric`. Cross-stratum kernel weights are exactly zero by sampling design (strata are independence partitions). Total meat is `sum_t sum_h M_h_t`. Cross-period spatial pairs are excluded by construction — the per-period loop aggregates only within-period observations into each `S_psu_t`, matching the Wave D `conley_lag_cutoff = 0` block decomposition. No reference software combines all three ingredients (Conley spatial-HAC + Binder TSL + Gardner GMM correction) on a two-stage influence function.
+
+- **Reduction semantics (load-bearing for tests):**
+  - Per-period sum invariant: the orchestrator's panel-aware meat equals `sum_t` of per-period within-stratum stratified-Conley sandwiches on per-period PSU totals. Pinned at `tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant` (pure unit test on the orchestrator + helper composition).
+  - Single stratum (H = 1, FPC = inf): reduces to `sum_t` plain Conley sandwich on per-period PSU totals via `_compute_conley_meat(S_psu_t_centered, centroids, ...)`. Note this is NOT plain Conley on time-collapsed PSU totals — the per-period loop preserves the library's `lag_cutoff = 0` semantic.
+  - All PSUs singleton in their stratum + `lonely_psu="remove"`: `df_survey = 0` and the stratified-Conley meat NaN-fails (matches Wave E.1 saturation behaviour, with `UserWarning` template "Wave E.2 stratified-Conley sandwich: df_survey = 0...").
+  - Cross-stratum kernel weight is exactly zero (sampling-design assumption — no kernel pair crosses a stratum boundary).
+
+- **Note (singleton-stratum `lonely_psu="adjust"` FPC skip parity):** when a stratum has `n_h = 1` and `lonely_psu="adjust"`, the new `_compute_stratified_conley_meat_from_psu_scores` helper mirrors the Binder helper's `continue`-skip-FPC pattern exactly (the FPC scale `(1 - f_h) * n_h / (n_h - 1)` would divide by zero with `n_h = 1`). The degenerate one-PSU kernel `K = [[K(0)]] = [[1.0]]` reduces to `centered.T @ centered`, matching Binder's singleton-adjust contribution bit-identically.
+
+- **Cluster + Conley + survey routing:** `cluster=<col> + survey_design.psu + vcov_type="conley"` coerces `cluster=<col>` to PSU per Wave E.1's `_resolve_effective_cluster` warn-and-use-PSU pattern. The dispatch wrapper `_compute_stratified_conley_meat` intentionally does NOT thread `cluster_ids` into the inner Conley kernel call — after PSU aggregation every PSU is its own cluster, so a cluster product kernel `1{cluster_j == cluster_k}` would be zero for all `j != k` and the cross-PSU kernel weights would be silently dropped. The Wave E.2 architectural choice: PSU-aggregation handles within-PSU clustering exactly; cross-PSU spatial dependence enters via the kernel; cross-stratum independence is exact.
+
+- **Restrictions / out-of-scope (Wave E.2):**
+  - Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` (inherits Wave E.1 gate; per-replicate full refit is separate follow-up scope).
+  - LinearRegression-side `vcov_type="conley" + survey_design=` gate at `diff_diff/linalg.py:2853` remains (separate Bertanha-Imbens 2014 weighted-Conley "Phase 5" roadmap, not Wave E).
+  - DiagnosticReport routing for `SpilloverDiDResults(vcov_type="conley", survey_design=)` is queued for a follow-up Wave F PR — `_APPLICABILITY` / `_PT_METHOD` registration is required before the new combination can be claimed consumable downstream (per `feedback_audit_diagnostic_report_wiring_before_claim`).
+
+**Implementation:** new `_compute_stratified_conley_meat_from_psu_scores` helper in `diff_diff/survey.py` (parallel to existing Binder helper; 3-tuple `(meat, variance_computed, legitimate_zero_count)` return contract; per-stratum loop replaces the inner `centered.T @ centered` with `_compute_conley_meat(centered, coords_h, cutoff, metric, kernel)` cross-sectional mode). New dispatch wrapper `_compute_stratified_conley_meat` in `diff_diff/two_stage.py` (parallel to `_compute_binder_tsl_meat`; per-obs Psi → PSU aggregation via `np.add.at` + PSU centroid derivation via vectorized `np.add.at` sums / `np.bincount` counts + dispatch to survey helper; intentionally no `cluster_ids` parameter). `_compute_gmm_corrected_meat` conley branch extended at `diff_diff/two_stage.py` with `if resolved_survey is not None` routing to the new wrapper; the `resolved_survey is None` branch is bit-identical to Wave D no-survey Conley. Saturation NaN-fail mirrors Wave E.1 (`UserWarning` template "Wave E.2 stratified-Conley sandwich: df_survey = 0..."). Wave E.1 stage-1 weighted FE solver, `finite_mask` survey-array subsetting, `df_survey` threading to `safe_inference` call sites, bread weighting, and `SpilloverDiDResults` survey metadata are all inherited UNCHANGED — Psi construction is bit-identical regardless of `vcov_type`. Tests: `TestSpilloverDiDWaveE2ConleySurveyDesign` + `TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy` at `tests/test_spillover.py`.
+
 **Edge cases (from paper Section 3.2 / Discussion):**
 
 | # | Edge case | Handling |
@@ -3239,7 +3262,7 @@ Degrees of freedom for the t-distribution lookup use `ResolvedSurveyDesign.df_su
 **Restrictions / deferred features:**
 
 - `event_study=True` SHIPPED in Wave C — see Event-study mode subsection above. Emits `att_dynamic`, MultiIndex `spillover_effects`, and a TwoStageDiD-compatible `event_study_effects` dict alias.
-- `survey_design=` for `vcov_type ∈ {"hc1"}` (plus `cluster=<col>` for CR1) SHIPPED in Wave E.1 — see "Variance (Wave E.1)" subsection below. Threads Hájek-normalized survey weights through stage-1 FE estimation, gamma_hat solve, eps construction, and bread inversion; aggregates the Wave D Psi to PSU totals and routes through the audited `_compute_stratified_meat_from_psu_scores` Binder TSL meat helper. `vcov_type="conley"` combined with `survey_design=` still raises `NotImplementedError` and points at planned Wave E.2 (Conley × survey product-kernel composition with within-stratum Conley sandwich on PSU totals). Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` — Gerber (2026) Appendix A notes the IF-reweighting shortcut does NOT apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit and is queued as a follow-up.
+- `survey_design=` for `vcov_type ∈ {"hc1"}` (plus `cluster=<col>` for CR1) SHIPPED in Wave E.1 — see "Variance (Wave E.1)" subsection below. Threads Hájek-normalized survey weights through stage-1 FE estimation, gamma_hat solve, eps construction, and bread inversion; aggregates the Wave D Psi to PSU totals and routes through the audited `_compute_stratified_meat_from_psu_scores` Binder TSL meat helper. `vcov_type="conley"` combined with `survey_design=` SHIPPED in Wave E.2 — see "Variance (Wave E.2)" subsection below (stratified-Conley sandwich on PSU totals). Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` — Gerber (2026) Appendix A notes the IF-reweighting shortcut does NOT apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit and is queued as a follow-up.
 - `covariates=` raises `NotImplementedError` — Gardner-style stage-1 residualization not yet wired through; planned follow-up.
 - `ring_method="count"` not exposed — only the nearest-treated-ring specification.
 - `vcov_type` ∈ {`"hc2"`, `"hc2_bm"`, `"classical"`} raises `NotImplementedError` — `hc2`/`hc2_bm` because current stage-2 inference uses generic residual df rather than per-coefficient Bell-McCaffrey / CR2 DOF; `classical` because the Wave D Gardner GMM first-stage correction has not been derived for the classical homoskedastic variance (different meat structure `sigma_hat^2 * (X_10' X_10)` vs the Wave D IF outer product `Psi' Psi`). Use `"hc1"` or `"conley"`, or pair with `cluster=` for CR1 — all three apply the Wave D GMM correction.
diff --git a/docs/references.rst b/docs/references.rst
index 21410d98..d8acabdb 100644
--- a/docs/references.rst
+++ b/docs/references.rst
@@ -190,6 +190,10 @@ Multi-Period and Staggered Adoption
 
   Identifies the ring-indicator estimator implemented in our ``SpilloverDiD`` class. Section 2-3 covers non-staggered timing (Equations 5/6/8); Section 5 covers staggered timing via two-stage Gardner (Table 2). Section 3.1 (page 13) recommends Conley spatial-HAC for inference with cutoff = ``d_bar``.
 
+- **Conley, T. G. (1999).** "GMM Estimation with Cross Sectional Dependence." *Journal of Econometrics*, 92(1), 1-45. https://doi.org/10.1016/S0304-4076(98)00084-0
+
+  Primary source for the Conley spatial-HAC variance estimator. Equations 5-9 derive the spatial-kernel cross-product meat. Our ``diff_diff/conley.py`` implements the practitioner specializations (Bartlett / uniform kernels with haversine / euclidean metrics) cited in our ``SpilloverDiD`` Wave A/D Conley path and composed with Binder TSL + Gardner GMM in Wave E.2 (``_compute_stratified_conley_meat`` at ``diff_diff/two_stage.py``).
+
 - **de Chaisemartin, C., & D'Haultfœuille, X. (2020).** "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects." *American Economic Review*, 110(9), 2964-2996. https://doi.org/10.1257/aer.20181169
 
 - **de Chaisemartin, C., & D'Haultfœuille, X. (2022, revised 2024).** "Difference-in-Differences Estimators of Intertemporal Treatment Effects." *NBER Working Paper* 29873. https://www.nber.org/papers/w29873
diff --git a/tests/test_spillover.py b/tests/test_spillover.py
index dd02aaf5..81e4f844 100644
--- a/tests/test_spillover.py
+++ b/tests/test_spillover.py
@@ -577,43 +577,6 @@ def test_set_params_rejects_unknown_key(self):
         with pytest.raises(ValueError, match="Unknown parameter"):
             est.set_params(nonexistent_kwarg=42)
 
-    def test_fit_conley_plus_survey_design_not_implemented(self):
-        """Wave E.1 ships HC1 / CR1 + survey_design; Conley × survey is
-        deferred to Wave E.2 (the novel within-stratum Conley sandwich on
-        PSU totals). Confirm the upfront rejection points at the planned
-        follow-up PR.
-        """
-        from diff_diff import SurveyDesign
-
-        est = SpilloverDiD(
-            rings=[0.0, 50.0],
-            conley_coords=("lat", "lon"),
-            conley_metric="euclidean",
-            conley_cutoff_km=100.0,
-            vcov_type="conley",
-        )
-        df = pd.DataFrame(
-            {
-                "unit": ["A", "A"],
-                "time": [0, 1],
-                "y": [1.0, 2.0],
-                "D": [0, 1],
-                "lat": [0.0, 0.0],
-                "lon": [0.0, 0.0],
-                "w": [1.0, 1.0],
-                "psu": [0, 0],
-            }
-        )
-        with pytest.raises(NotImplementedError, match="Wave E.2"):
-            est.fit(
-                df,
-                outcome="y",
-                unit="unit",
-                time="time",
-                treatment="D",
-                survey_design=SurveyDesign(weights="w", psu="psu"),
-            )
-
 
 # =============================================================================
 # Step 3: Two-stage Gardner fit() integration
@@ -5684,3 +5647,935 @@ def test_o_drift_golden(self):
         assert res.n_psu == 8
         assert res.n_strata == 2
         assert res.survey_metadata.df_survey == 6
+
+
+class TestSpilloverDiDWaveE2ConleySurveyDesign:
+    """Wave E.2 conley + survey via stratified-Conley sandwich on PSU totals.
+
+    Methodology anchor: Conley (1999) spatial-HAC composed with Gerber
+    (2026) Prop 1 Binder TSL (Wave E.1 foundation) and the Wave D Gardner
+    GMM correction. Verifies reduction semantics (bandwidth -> 0 ≡ Binder;
+    H=1 ≡ plain Conley on PSU totals), cross-stratum independence,
+    singleton-adjust FPC skip parity with Binder, and the saturation
+    NaN-fail.
+    """
+
+    _CUTOFF_KM = 1000.0  # large enough that within-stratum PSU pairs are inside
+
+    def _fit(self, df, **kwargs):
+        design = kwargs.pop("design", None)
+        est = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            conley_metric="haversine",
+            conley_cutoff_km=self._CUTOFF_KM,
+            conley_lag_cutoff=0,
+            vcov_type="conley",
+            event_study=False,
+            **kwargs,
+        )
+        return est.fit(
+            df,
+            outcome="y",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+            survey_design=design,
+        )
+
+    def test_a_no_survey_conley_path_matches_wave_d_golden(self):
+        """The `resolved_survey is None` branch of the new dispatch must
+        produce the SAME no-survey Conley SE as the pre-Wave-E.2 (Wave D)
+        Conley path. The Wave D path is `_compute_conley_meat(...)` with
+        no changes; the new dispatch only ADDS an `if resolved_survey is
+        not None` branch above the existing call. Pin the SE to a golden
+        captured on this fixture so any future refactor that disturbs the
+        no-survey path is caught by a behavioral test, not just by
+        determinism.
+        """
+        df = generate_butts_nonstaggered_dgp(seed=0)
+        res = self._fit(df)
+        # Wave D no-survey Conley golden captured on this fixture (seed=0,
+        # 2-period non-staggered Butts DGP, cutoff=1000 km, Bartlett kernel).
+        # These values reflect the pre-Wave-E.2 no-survey Conley path.
+        # The dispatch in `_compute_gmm_corrected_meat` only ADDS a new
+        # `if resolved_survey is not None` branch above the existing
+        # `_compute_conley_meat` call, so the `resolved_survey is None`
+        # path is bit-identical to Wave D; any future refactor that
+        # disturbs it must update these goldens deliberately.
+        _WAVE_D_NO_SURVEY_CONLEY_ATT = -0.07471658104745109
+        _WAVE_D_NO_SURVEY_CONLEY_SE = 0.0018453344099259904
+        np.testing.assert_allclose(res.att, _WAVE_D_NO_SURVEY_CONLEY_ATT, rtol=1e-12, atol=1e-14)
+        np.testing.assert_allclose(res.se, _WAVE_D_NO_SURVEY_CONLEY_SE, rtol=1e-12, atol=1e-14)
+        assert np.isfinite(res.se) and res.se > 0
+
+    def test_a2_no_survey_conley_path_routes_through_wave_d_helper(self):
+        """Structural anchor: a no-survey conley fit invokes the Wave D
+        `_compute_conley_meat` helper directly, NOT the Wave E.2
+        `_compute_stratified_conley_meat` orchestrator. Pins the dispatch
+        branch in `_compute_gmm_corrected_meat` (no leak into the new
+        path when `resolved_survey is None`).
+        """
+        from unittest.mock import patch
+
+        df = generate_butts_nonstaggered_dgp(seed=2)
+        with patch("diff_diff.two_stage._compute_stratified_conley_meat") as mock_panel_aware:
+            self._fit(df)
+            assert not mock_panel_aware.called, (
+                "No-survey conley fit must NOT call _compute_stratified_conley_meat "
+                "(the Wave E.2 panel-aware survey path); it should route through "
+                "the Wave D _compute_conley_meat directly."
+            )
+
+    def test_b_panel_aware_per_period_sum_invariant(self):
+        """Panel-aware Wave E.2 meat == sum-across-periods of per-period
+        within-stratum Conley sandwich on per-period PSU totals.
+
+        Pure unit test on the orchestrator + helper composition: with T
+        periods of synthetic PSU-level data, ``_compute_stratified_conley_meat``'s
+        per-period loop must produce the same result as manually calling
+        the survey helper T times (once per period, on per-period PSU
+        totals) and summing. This pins the library's panel Conley contract
+        (``conley_lag_cutoff = 0`` means "within-period spatial only") on
+        the survey path — no cross-period spatial pairs leak through the
+        collapsed PSU totals.
+
+        Replaces the original "bandwidth → 0 reduces to Wave E.1 Binder"
+        claim, which only holds under T=1 (the cross-sectional limit).
+        SpilloverDiD's panel-only contract precludes a T=1 fit, so the
+        Wave E.1-equivalence claim is meaningful only on this synthetic
+        unit-test fixture.
+        """
+        from diff_diff.survey import (
+            ResolvedSurveyDesign,
+            _compute_stratified_conley_meat_from_psu_scores,
+        )
+        from diff_diff.two_stage import _compute_stratified_conley_meat
+
+        rng = np.random.default_rng(31)
+        # 4 PSUs × 2 periods × 3 obs per PSU-period = 24 obs.
+        n_obs, T, G, p_2 = 24, 2, 4, 3
+        obs_per_psu_period = 3
+        psu_id = np.repeat(np.arange(G), obs_per_psu_period * T)
+        time_arr = np.tile(np.repeat(np.arange(T), obs_per_psu_period), G)
+        Psi = rng.standard_normal((n_obs, p_2))
+        psu_centroids = np.array([[40.0, -120.0], [40.1, -120.0], [40.2, -120.0], [40.3, -120.0]])
+        coords = psu_centroids[psu_id]
+        psu_strata = np.array([0, 0, 1, 1])  # 2 PSUs per stratum
+        fpc_per_psu = np.full(G, 20.0)
+        resolved = ResolvedSurveyDesign(
+            weights=np.ones(n_obs),
+            weight_type="pweight",
+            strata=np.repeat(psu_strata, obs_per_psu_period * T),
+            psu=psu_id,
+            fpc=np.full(n_obs, 20.0),
+            n_strata=2,
+            n_psu=4,
+            lonely_psu="remove",
+        )
+        # Orchestrator (panel-aware).
+        meat = _compute_stratified_conley_meat(
+            Psi,
+            conley_coords=coords,
+            conley_cutoff_km=0.30,
+            conley_metric="euclidean",
+            conley_kernel="bartlett",
+            resolved_survey=resolved,
+            conley_time=time_arr,
+        )
+        # Hand: aggregate Psi to PSU WITHIN each period, run the survey
+        # helper per period, sum.
+        expected = np.zeros((p_2, p_2))
+        for t in range(T):
+            period_mask = time_arr == t
+            Psi_t = Psi[period_mask]
+            psu_id_t = psu_id[period_mask]
+            S_psu_t = np.zeros((G, p_2))
+            for g in range(G):
+                S_psu_t[g] = Psi_t[psu_id_t == g].sum(axis=0)
+            meat_t, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+                S_psu_t,
+                psu_strata,
+                psu_centroids,
+                cutoff=0.30,
+                metric="euclidean",
+                kernel="bartlett",
+                fpc_per_psu=fpc_per_psu,
+            )
+            expected += meat_t
+        np.testing.assert_allclose(meat, expected, rtol=1e-12, atol=1e-14)
+        # Sanity: a time-collapsed naive computation (the OLD pre-R2 design)
+        # would DIFFER from the panel-aware meat on the same inputs.
+        S_psu_collapsed = np.zeros((G, p_2))
+        for g in range(G):
+            S_psu_collapsed[g] = Psi[psu_id == g].sum(axis=0)
+        meat_collapsed, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+            S_psu_collapsed,
+            psu_strata,
+            psu_centroids,
+            cutoff=0.30,
+            metric="euclidean",
+            kernel="bartlett",
+            fpc_per_psu=fpc_per_psu,
+        )
+        # Differs by the cross-period off-diagonal mass (the panel-aware
+        # contract drops these by construction).
+        assert not np.allclose(meat, meat_collapsed, rtol=1e-3, atol=1e-3)
+
+    def test_c_hand_computation_methodology_anchor(self):
+        """Hand-compute the stratified-Conley meat formula on synthetic
+        PSU-level inputs and assert parity with the new survey helper.
+
+        Mirrors `_scratch/wave_e2_smoke.py` Chunk 1 methodology anchor.
+        """
+        from diff_diff.survey import _compute_stratified_conley_meat_from_psu_scores
+
+        rng = np.random.default_rng(7)
+        G, k = 8, 3
+        psu_strata = np.array([0, 0, 0, 0, 1, 1, 1, 1])
+        psu_coords = np.array(
+            [
+                [40.00, -120.0],
+                [40.10, -120.0],
+                [40.20, -120.0],
+                [40.30, -120.0],
+                [40.05, -120.0],
+                [40.15, -120.0],
+                [40.25, -120.0],
+                [40.35, -120.0],
+            ]
+        )
+        psu_scores = rng.standard_normal((G, k))
+        fpc = np.full(G, 20.0)
+        cutoff = 0.30
+
+        meat, var_ok, _ = _compute_stratified_conley_meat_from_psu_scores(
+            psu_scores,
+            psu_strata,
+            psu_coords,
+            cutoff=cutoff,
+            metric="euclidean",
+            kernel="bartlett",
+            fpc_per_psu=fpc,
+            lonely_psu="remove",
+        )
+        assert var_ok
+
+        # Hand: per stratum, demean, apply Bartlett K on PSU coords,
+        # FPC-scale, sum across strata.
+        expected = np.zeros((k, k))
+        for h in [0, 1]:
+            mask = psu_strata == h
+            s_h = psu_scores[mask]
+            c_h = psu_coords[mask]
+            n_h = s_h.shape[0]
+            centered = s_h - s_h.mean(axis=0, keepdims=True)
+            d = np.sqrt(((c_h[:, None, :] - c_h[None, :, :]) ** 2).sum(axis=2))
+            K = np.maximum(0.0, 1.0 - d / cutoff)
+            M_h = centered.T @ K @ centered
+            f_h = n_h / fpc[mask][0]
+            M_h *= (1.0 - f_h) * n_h / (n_h - 1)
+            expected += M_h
+        np.testing.assert_allclose(meat, expected, rtol=1e-12, atol=1e-14)
+
+    def test_d_single_stratum_reduces_to_plain_conley_on_psu_totals(self):
+        """H = 1 stratum, FPC = inf: reduces to ordinary Conley sandwich
+        on PSU totals (modulo the n/(n-1) finite-sample scale).
+        """
+        from diff_diff.conley import _compute_conley_meat
+        from diff_diff.survey import _compute_stratified_conley_meat_from_psu_scores
+
+        rng = np.random.default_rng(11)
+        G = 8
+        psu_strata = np.zeros(G, dtype=int)
+        psu_coords = np.array(
+            [
+                [40.00, -120.0],
+                [40.10, -120.0],
+                [40.20, -120.0],
+                [40.30, -120.0],
+                [40.05, -120.0],
+                [40.15, -120.0],
+                [40.25, -120.0],
+                [40.35, -120.0],
+            ]
+        )
+        psu_scores = rng.standard_normal((G, 3))
+        cutoff = 0.30
+
+        meat, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+            psu_scores,
+            psu_strata,
+            psu_coords,
+            cutoff=cutoff,
+            metric="euclidean",
+            kernel="bartlett",
+        )
+        # Plain Conley sandwich on PSU totals (no FPC). n/(n-1) scale
+        # comes from the survey helper's adjustment; FPC term is 1.
+        centered = psu_scores - psu_scores.mean(axis=0, keepdims=True)
+        plain = _compute_conley_meat(centered, psu_coords, cutoff, "euclidean", "bartlett")
+        plain *= G / (G - 1)
+        np.testing.assert_allclose(meat, plain, rtol=1e-12, atol=1e-14)
+
+    def test_e_cross_stratum_independence_invariant(self):
+        """Cross-stratum kernel weights are exactly zero by sampling design.
+
+        Pure unit test on the new survey helper: full meat ≡ partition-then-sum
+        when each partition is fit as a separate single-stratum call. Uses
+        interleaved cross-stratum centroids so cross-stratum pairs are
+        CLOSER in km than within-stratum pairs — any kernel leak across
+        strata would produce a large numerical difference.
+        """
+        from diff_diff.survey import _compute_stratified_conley_meat_from_psu_scores
+
+        rng = np.random.default_rng(13)
+        G, k = 8, 3
+        psu_strata = np.array([0, 0, 0, 0, 1, 1, 1, 1])
+        # Interleaved: stratum 0 at lats 40.00/40.10/40.20/40.30; stratum 1
+        # at 40.05/40.15/40.25/40.35. Cross-stratum nearest pair = 0.05 vs
+        # within-stratum nearest = 0.10 — kernel would weight them DOUBLE
+        # if it leaked.
+        psu_coords = np.array(
+            [
+                [40.00, -120.0],
+                [40.10, -120.0],
+                [40.20, -120.0],
+                [40.30, -120.0],
+                [40.05, -120.0],
+                [40.15, -120.0],
+                [40.25, -120.0],
+                [40.35, -120.0],
+            ]
+        )
+        psu_scores = rng.standard_normal((G, k))
+        fpc = np.full(G, 20.0)
+        cutoff = 0.30
+
+        meat_full, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+            psu_scores,
+            psu_strata,
+            psu_coords,
+            cutoff=cutoff,
+            metric="euclidean",
+            kernel="bartlett",
+            fpc_per_psu=fpc,
+        )
+        partitioned = np.zeros((k, k))
+        for h in [0, 1]:
+            mask = psu_strata == h
+            sub_strata = np.zeros(mask.sum(), dtype=int)
+            part_meat, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+                psu_scores[mask],
+                sub_strata,
+                psu_coords[mask],
+                cutoff=cutoff,
+                metric="euclidean",
+                kernel="bartlett",
+                fpc_per_psu=fpc[mask],
+            )
+            partitioned += part_meat
+        np.testing.assert_allclose(meat_full, partitioned, rtol=1e-12, atol=1e-14)
+
+    def test_f_lonely_psu_modes_accepted(self):
+        """All three lonely_psu modes flow through the conley+survey path."""
+        from diff_diff import SurveyDesign
+
+        df = generate_butts_nonstaggered_dgp(seed=14)
+        df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+        df_s.loc[df_s["stratum"] == 0, "psu"] = 0  # collapse stratum 0 to a singleton PSU
+        for mode in ("remove", "certainty", "adjust"):
+            with warnings.catch_warnings():
+                warnings.simplefilter("ignore", UserWarning)
+                design = SurveyDesign(
+                    weights="w",
+                    strata="stratum",
+                    psu="psu",
+                    fpc="N_h",
+                    lonely_psu=mode,
+                )
+                res = self._fit(df_s, design=design)
+                if np.isfinite(res.se):
+                    assert res.se >= 0
+                else:
+                    assert np.isnan(res.t_stat) and np.isnan(res.p_value)
+
+    def test_f2_singleton_adjust_fpc_skip_parity_binder_vs_conley(self):
+        """Binder helper and Conley helper produce bit-identical output on
+        a singleton stratum with lonely_psu="adjust".
+
+        Load-bearing: pins the Chunk 2 `continue`-skip-FPC pattern. Without
+        the `continue`, the Conley helper would divide by `n_h - 1 = 0` on
+        the singleton stratum and the meat would NaN-propagate while
+        Binder's meat stays finite. With the kernel reducing to identity
+        on a singleton (K = [[K(0)]] = [[1.0]]) the two outputs MUST match.
+        """
+        from diff_diff.survey import (
+            _compute_stratified_conley_meat_from_psu_scores,
+            _compute_stratified_meat_from_psu_scores,
+        )
+
+        rng = np.random.default_rng(15)
+        # 5 PSUs: 1 in stratum 0 (singleton), 4 in stratum 1.
+        psu_scores = rng.standard_normal((5, 3))
+        psu_strata = np.array([0, 1, 1, 1, 1])
+        psu_coords = np.array(
+            [
+                [40.0, -120.0],
+                [40.1, -120.0],
+                [40.2, -120.0],
+                [40.3, -120.0],
+                [40.4, -120.0],
+            ]
+        )
+        fpc = np.full(5, 20.0)
+        binder_meat, _, _ = _compute_stratified_meat_from_psu_scores(
+            psu_scores,
+            psu_strata,
+            fpc_per_psu=fpc,
+            lonely_psu="adjust",
+        )
+        conley_meat, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+            psu_scores,
+            psu_strata,
+            psu_coords,
+            cutoff=1e-10,
+            metric="euclidean",
+            kernel="bartlett",
+            fpc_per_psu=fpc,
+            lonely_psu="adjust",
+        )
+        # Conley with bandwidth -> 0 collapses K to identity in EVERY stratum,
+        # so the entire meat (singleton + multi-PSU stratum) reduces to Binder.
+        np.testing.assert_allclose(conley_meat, binder_meat, rtol=1e-12, atol=1e-14)
+        # And both are finite (the singleton FPC skip prevents divide-by-zero).
+        assert np.all(np.isfinite(conley_meat))
+
+    def test_g_fpc_large_matches_no_fpc(self):
+        """Very-large FPC (1-f_h ≈ 1) produces SE close to the no-FPC path."""
+        from diff_diff import SurveyDesign
+
+        df = generate_butts_nonstaggered_dgp(seed=16)
+        df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=1e9)
+        design_fpc_large = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+        design_no_fpc = SurveyDesign(weights="w", strata="stratum", psu="psu")
+        res_large = self._fit(df_s, design=design_fpc_large)
+        res_no = self._fit(df_s, design=design_no_fpc)
+        np.testing.assert_allclose(res_large.se, res_no.se, rtol=1e-6)
+
+    def test_h_fpc_equals_n_zeros_stratum(self):
+        """FPC = n_h per stratum makes (1-f_h) = 0; meat is zero, SE = 0."""
+        from diff_diff import SurveyDesign
+
+        df = generate_butts_nonstaggered_dgp(seed=17)
+        df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=4.0)
+        design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+        res = self._fit(df_s, design=design)
+        np.testing.assert_allclose(res.se, 0.0, atol=1e-14)
+
+    def test_i_saturated_design_nan_fails(self):
+        """All-singleton strata + lonely_psu='remove' -> df_survey = 0 ->
+        NaN meat + UserWarning matching 'Wave E.2 stratified-Conley'.
+        """
+        from diff_diff import SurveyDesign
+
+        df = generate_butts_nonstaggered_dgp(seed=18)
+        df_s = df.copy()
+        df_s["w"] = 1.0
+        units_sorted = sorted(df_s["unit"].unique())
+        unit_to_idx = {u: idx for idx, u in enumerate(units_sorted)}
+        df_s["psu"] = df_s["unit"].map(unit_to_idx)
+        df_s["stratum"] = df_s["unit"].map(unit_to_idx)  # H = n_units; every stratum singleton
+        df_s["N_h"] = 20.0
+        design = SurveyDesign(
+            weights="w",
+            strata="stratum",
+            psu="psu",
+            fpc="N_h",
+            lonely_psu="remove",
+        )
+        with pytest.warns(UserWarning, match="Wave E.2 stratified-Conley"):
+            res = self._fit(df_s, design=design)
+        assert np.isnan(res.se)
+        assert np.isnan(res.t_stat)
+        assert np.isnan(res.p_value)
+
+    def test_j0_panel_conley_lag_cutoff_rejected_under_survey(self):
+        """vcov_type='conley' + conley_lag_cutoff > 0 + survey_design raises
+        NotImplementedError upfront. Wave E.2 ships cross-sectional only;
+        the panel-block decomposition (within-unit serial Bartlett HAC over
+        time) would need PSU-by-time scores rather than the collapsed PSU
+        totals. Tracked as a Wave E.2 follow-up in TODO.md.
+        """
+        from diff_diff import SurveyDesign
+
+        df = generate_butts_nonstaggered_dgp(seed=180)
+        df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+        design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+        est = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            conley_metric="haversine",
+            conley_cutoff_km=self._CUTOFF_KM,
+            conley_lag_cutoff=2,  # panel-block path
+            vcov_type="conley",
+        )
+        with pytest.raises(NotImplementedError, match="conley_lag_cutoff > 0"):
+            est.fit(
+                df_s,
+                outcome="y",
+                unit="unit",
+                time="time",
+                first_treat="first_treat",
+                survey_design=design,
+            )
+
+    def test_j_replicate_weights_rejection_inherits_wave_e1(self):
+        """Replicate-weight variance still raises NotImplementedError under
+        conley+survey (inherits Wave E.1 gate). SurveyDesign requires
+        replicate_weights to be set WITHOUT strata/psu/fpc (they encode
+        the design implicitly).
+        """
+        from diff_diff import SurveyDesign
+
+        df = generate_butts_nonstaggered_dgp(seed=19)
+        df_s = df.copy()
+        df_s["w"] = 1.0
+        # Add 10 replicate-weight columns; must be constant within units
+        # (panel survey constraint).
+        rng = np.random.default_rng(19)
+        units = sorted(df_s["unit"].unique())
+        for r in range(10):
+            rep_by_unit = dict(zip(units, rng.uniform(0.5, 2.0, size=len(units))))
+            df_s[f"rep_{r}"] = df_s["unit"].map(rep_by_unit)
+        design = SurveyDesign(
+            weights="w",
+            replicate_weights=[f"rep_{r}" for r in range(10)],
+            replicate_method="JK1",
+        )
+        with pytest.raises(NotImplementedError, match="(?i)replicate|follow-up"):
+            self._fit(df_s, design=design)
+
+    def test_k_non_pweight_rejection_inherits_wave_e1(self):
+        """Non-pweight weight_type still raises ValueError under conley+survey
+        (inherits Wave E.1 gate).
+        """
+        from diff_diff import SurveyDesign
+
+        df = generate_butts_nonstaggered_dgp(seed=20)
+        df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+        design = SurveyDesign(
+            weights="w",
+            strata="stratum",
+            psu="psu",
+            fpc="N_h",
+            weight_type="aweight",
+        )
+        with pytest.raises((NotImplementedError, ValueError), match="(?i)pweight|aweight"):
+            self._fit(df_s, design=design)
+
+    def test_l_cluster_plus_conley_plus_survey_warn_and_use_psu(self):
+        """cluster=<col> + conley + survey with different cluster vs PSU ->
+        UserWarning fires; PSU wins (mirrors Wave E.1 warn-and-use-PSU).
+        """
+        from diff_diff import SurveyDesign
+
+        df = generate_butts_nonstaggered_dgp(seed=21)
+        df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+        # Inject a coarser cluster column distinct from PSU (1 cluster
+        # per unit). The warn-and-use-PSU path requires that cluster and
+        # PSU are NOT identical groupings.
+        units_sorted = sorted(df_s["unit"].unique())
+        unit_to_cluster = {u: idx // 2 for idx, u in enumerate(units_sorted)}
+        df_s["my_cluster"] = df_s["unit"].map(unit_to_cluster)
+        design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+        with pytest.warns(UserWarning, match="(?i)cluster"):
+            res = self._fit(df_s, design=design, cluster="my_cluster")
+        assert np.isfinite(res.se) and res.se > 0
+
+    def test_m_fit_idempotency_under_conley_survey(self):
+        """clone() + repeat fit produces identical results; survey state
+        not mutated on fit() (per feedback_fit_does_not_mutate_config).
+        """
+        from diff_diff import SurveyDesign
+
+        df = generate_butts_nonstaggered_dgp(seed=22)
+        df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+        design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+        est = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            conley_metric="haversine",
+            conley_cutoff_km=self._CUTOFF_KM,
+            conley_lag_cutoff=0,
+            vcov_type="conley",
+        )
+        res_1 = est.fit(
+            df_s,
+            outcome="y",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+            survey_design=design,
+        )
+        # Second fit on the SAME estimator instance (idempotency).
+        res_2 = est.fit(
+            df_s,
+            outcome="y",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+            survey_design=design,
+        )
+        assert res_1.coefficients == res_2.coefficients
+        np.testing.assert_array_equal(res_1.vcov, res_2.vcov)
+        assert res_1.n_psu == res_2.n_psu
+        assert res_1.n_strata == res_2.n_strata
+
+    def test_n0_no_psu_weights_only_survey_design(self):
+        """`SurveyDesign(weights=...)` without explicit PSU — each obs is
+        its own pseudo-PSU. Panel-aware path must re-index PSUs WITHIN
+        each period (not pad zeros across the full panel) or the centering
+        leaks off-period spurious structure into the spatial meat.
+
+        Regression for the R3 P0 fix.
+        """
+        from diff_diff import SurveyDesign
+
+        df = generate_butts_nonstaggered_dgp(seed=240)
+        df_s = df.copy()
+        df_s["w"] = 1.0
+        design = SurveyDesign(weights="w")
+        res = self._fit(df_s, design=design)
+        assert np.isfinite(res.att)
+        assert np.isfinite(res.se) and res.se > 0
+
+    def test_n1_no_psu_strata_only_survey_design(self):
+        """`SurveyDesign(weights=..., strata=...)` without explicit PSU —
+        each obs is its own pseudo-PSU under stratified sampling. Same
+        per-period re-indexing requirement as test_n0.
+        """
+        from diff_diff import SurveyDesign
+
+        df = generate_butts_nonstaggered_dgp(seed=241)
+        df_s = df.copy()
+        df_s["w"] = 1.0
+        units_sorted = sorted(df_s["unit"].unique())
+        unit_to_stratum = {u: idx % 2 for idx, u in enumerate(units_sorted)}
+        df_s["stratum"] = df_s["unit"].map(unit_to_stratum)
+        design = SurveyDesign(weights="w", strata="stratum")
+        res = self._fit(df_s, design=design)
+        assert np.isfinite(res.att)
+        assert np.isfinite(res.se) and res.se > 0
+
+    def test_b2_explicit_psu_centroid_panel_constant_under_finite_mask(self):
+        """When a PSU contains multiple units at DIFFERENT coordinates
+        (simulating a finite_mask drop that varies coverage across
+        periods), the orchestrator must use PANEL-CONSTANT centroids
+        (mean across all obs in PSU, regardless of period) — NOT
+        per-period centroids. This matches the documented Wave E.2
+        contract "centroid_g = mean over i in PSU g of conley_coords[i]"
+        at REGISTRY.md and prevents support-sample-dependent kernel
+        weights.
+
+        Pure unit test on the orchestrator + helper composition with
+        synthetic per-obs inputs.
+        """
+        from diff_diff.survey import (
+            ResolvedSurveyDesign,
+            _compute_stratified_conley_meat_from_psu_scores,
+        )
+        from diff_diff.two_stage import _compute_stratified_conley_meat
+
+        rng = np.random.default_rng(331)
+        # 2 strata × 2 PSUs × 1 obs per PSU-period = 8 obs.
+        # PSU 0 obs coords differ across periods (simulating finite_mask
+        # variation): period 0 at [40.0, 0]; period 1 at [42.0, 0].
+        # PSU 1/2/3 have constant coords across periods.
+        n, p_2 = 8, 3
+        Psi = rng.standard_normal((n, p_2))
+        psu_id = np.array([0, 1, 2, 3, 0, 1, 2, 3])  # PSUs alternate per period
+        time_arr = np.array([0, 0, 0, 0, 1, 1, 1, 1])
+        # Coords vary across periods for PSU 0 only.
+        coords = np.array(
+            [
+                [40.0, 0.0],  # PSU 0, period 0
+                [40.5, 0.0],  # PSU 1, period 0
+                [50.0, 0.0],  # PSU 2, period 0
+                [50.5, 0.0],  # PSU 3, period 0
+                [42.0, 0.0],  # PSU 0, period 1 — DIFFERENT coord
+                [40.5, 0.0],  # PSU 1, period 1
+                [50.0, 0.0],  # PSU 2, period 1
+                [50.5, 0.0],  # PSU 3, period 1
+            ]
+        )
+        psu_strata_obs = np.array([0, 0, 1, 1, 0, 0, 1, 1])
+        resolved = ResolvedSurveyDesign(
+            weights=np.ones(n),
+            weight_type="pweight",
+            strata=psu_strata_obs,
+            psu=psu_id,
+            fpc=None,
+            n_strata=2,
+            n_psu=4,
+            lonely_psu="remove",
+        )
+        meat_panel = _compute_stratified_conley_meat(
+            Psi,
+            conley_coords=coords,
+            conley_cutoff_km=5.0,
+            conley_metric="euclidean",
+            conley_kernel="bartlett",
+            resolved_survey=resolved,
+            conley_time=time_arr,
+        )
+        # Hand calculation using PANEL-CONSTANT centroids (the contract).
+        # PSU 0 centroid = mean([40.0, 0], [42.0, 0]) = [41.0, 0].
+        # Other PSUs have constant coords → centroid equals that coord.
+        panel_centroids = np.array([[41.0, 0.0], [40.5, 0.0], [50.0, 0.0], [50.5, 0.0]])
+        # Per-period PSU totals (each PSU appears once per period in this
+        # fixture, so the PSU total per period IS the single obs's Psi).
+        psu_strata = np.array([0, 0, 1, 1])
+        expected = np.zeros((p_2, p_2))
+        for t in [0, 1]:
+            mask = time_arr == t
+            Psi_t = Psi[mask]
+            psu_id_t = psu_id[mask]
+            S_psu_t = np.zeros((4, p_2))
+            for g in range(4):
+                rows = psu_id_t == g
+                if rows.any():
+                    S_psu_t[g] = Psi_t[rows].sum(axis=0)
+            meat_t, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+                S_psu_t,
+                psu_strata,
+                panel_centroids,  # panel-constant — same across periods
+                cutoff=5.0,
+                metric="euclidean",
+                kernel="bartlett",
+            )
+            expected += meat_t
+        np.testing.assert_allclose(meat_panel, expected, rtol=1e-12, atol=1e-14)
+        # Counter-check: per-period centroids (the OLD pre-fix design)
+        # would give a different meat for PSU 0 because the centroid
+        # used in period 1 (42.0) differs from the one used in period 0
+        # (40.0). Verify the orchestrator does NOT match that buggy
+        # construction.
+        buggy_expected = np.zeros((p_2, p_2))
+        period_centroids = {
+            0: np.array([[40.0, 0.0], [40.5, 0.0], [50.0, 0.0], [50.5, 0.0]]),
+            1: np.array([[42.0, 0.0], [40.5, 0.0], [50.0, 0.0], [50.5, 0.0]]),
+        }
+        for t in [0, 1]:
+            mask = time_arr == t
+            Psi_t = Psi[mask]
+            psu_id_t = psu_id[mask]
+            S_psu_t = np.zeros((4, p_2))
+            for g in range(4):
+                rows = psu_id_t == g
+                if rows.any():
+                    S_psu_t[g] = Psi_t[rows].sum(axis=0)
+            meat_t, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+                S_psu_t,
+                psu_strata,
+                period_centroids[t],  # per-period (buggy)
+                cutoff=5.0,
+                metric="euclidean",
+                kernel="bartlett",
+            )
+            buggy_expected += meat_t
+        # The buggy construction MUST differ measurably from the
+        # panel-constant orchestrator output.
+        assert not np.allclose(
+            meat_panel, buggy_expected, rtol=1e-3, atol=1e-3
+        ), "orchestrator unexpectedly matches per-period (buggy) centroid construction"
+
+    def test_n2_no_psu_per_period_reindex_unit_invariant(self):
+        """Direct unit test on the orchestrator: the no-PSU per-period
+        re-indexing must NOT mix off-period rows into the kernel. With
+        synthetic data where obs 0/1 are in period 0 (close in km) and
+        obs 2/3 are in period 1 (far away), the meat must reflect
+        ONLY within-period spatial pairs.
+        """
+        from diff_diff.survey import (
+            ResolvedSurveyDesign,
+            _compute_stratified_conley_meat_from_psu_scores,
+        )
+        from diff_diff.two_stage import _compute_stratified_conley_meat
+
+        rng = np.random.default_rng(243)
+        n, p_2 = 4, 2
+        Psi = rng.standard_normal((n, p_2))
+        # Period 0: obs 0, 1 at lat 40.00 / 40.01 (close in km).
+        # Period 1: obs 2, 3 at lat 50.00 / 50.01 (far from period-0 obs).
+        coords = np.array([[40.00, 0.0], [40.01, 0.0], [50.00, 0.0], [50.01, 0.0]])
+        time_arr = np.array([0, 0, 1, 1])
+        strata_arr = np.array([0, 0, 1, 1])
+        resolved = ResolvedSurveyDesign(
+            weights=np.ones(n),
+            weight_type="pweight",
+            strata=strata_arr,
+            psu=None,  # implicit per-obs pseudo-PSU
+            fpc=None,
+            n_strata=2,
+            n_psu=n,
+            lonely_psu="remove",
+        )
+        meat_panel = _compute_stratified_conley_meat(
+            Psi,
+            conley_coords=coords,
+            conley_cutoff_km=0.05,
+            conley_metric="euclidean",
+            conley_kernel="bartlett",
+            resolved_survey=resolved,
+            conley_time=time_arr,
+        )
+        # Hand: per-period only on active rows.
+        meat_p0, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+            Psi[:2],
+            np.array([0, 0]),
+            coords[:2],
+            cutoff=0.05,
+            metric="euclidean",
+            kernel="bartlett",
+        )
+        meat_p1, _, _ = _compute_stratified_conley_meat_from_psu_scores(
+            Psi[2:],
+            np.array([0, 0]),
+            coords[2:],
+            cutoff=0.05,
+            metric="euclidean",
+            kernel="bartlett",
+        )
+        np.testing.assert_allclose(meat_panel, meat_p0 + meat_p1, rtol=1e-12, atol=1e-14)
+
+    def test_n_finite_mask_survey_array_subsetting(self):
+        """finite_mask drops baseline-treated rows; survey metadata
+        reflects the SUBSET sample, not the original.
+        """
+        from diff_diff import SurveyDesign
+
+        df = generate_butts_staggered_dgp(seed=23)
+        # Pin a unit to always-treated (g = period 0); finite_mask will
+        # drop its rows from stage 2.
+        first_unit = sorted(df["unit"].unique())[0]
+        df.loc[df["unit"] == first_unit, "first_treat"] = 0
+        df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+        design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            res = self._fit(df_s, design=design)
+        # Survey metadata reflects subset (post-finite_mask), not the full panel.
+        assert res.survey_metadata is not None
+        assert res.n_obs <= len(df_s)  # at least the always-treated unit's rows dropped
+
+
+class TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy:
+    """Event-study branch + conley + survey, both is_staggered branches."""
+
+    _CUTOFF_KM = 1000.0
+
+    def test_o_event_study_conley_survey_is_staggered_true(self):
+        """Full plumbing end-to-end on the staggered event-study path."""
+        from diff_diff import SurveyDesign
+
+        df = generate_butts_staggered_dgp(seed=24)
+        df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+        design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+        est = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            conley_metric="haversine",
+            conley_cutoff_km=self._CUTOFF_KM,
+            conley_lag_cutoff=0,
+            vcov_type="conley",
+            event_study=True,
+            horizon_max=2,
+        )
+        res = est.fit(
+            df_s,
+            outcome="y",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+            survey_design=design,
+        )
+        # Event-study + spillover finite end-to-end
+        assert np.isfinite(res.att) and np.isfinite(res.se) and res.se > 0
+        # spillover_effects populated (non-empty)
+        assert res.spillover_effects is not None
+        # df_survey lookup uses the survey branch
+        assert res.survey_metadata is not None
+        assert res.survey_metadata.df_survey == 6
+
+    def test_p_event_study_conley_survey_is_staggered_false(self):
+        """The non-staggered branch of the event-study path also works
+        (mirrors `feedback_cohort_loop_trigger_cache_both_branches`).
+        """
+        from diff_diff import SurveyDesign
+
+        df = generate_butts_nonstaggered_dgp(seed=25)
+        df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+        design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+        est = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            conley_metric="haversine",
+            conley_cutoff_km=self._CUTOFF_KM,
+            conley_lag_cutoff=0,
+            vcov_type="conley",
+            event_study=True,
+            horizon_max=1,
+        )
+        res = est.fit(
+            df_s,
+            outcome="y",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+            survey_design=design,
+        )
+        assert np.isfinite(res.att) and np.isfinite(res.se) and res.se > 0
+        assert res.survey_metadata is not None
+
+    def test_r_drift_goldens(self):
+        """Pinned ATT + SE on a fixed-seed conley+survey fit.
+
+        Drift goldens captured on initial Wave E.2 implementation
+        (seed=999, standard 2-strata x 4-PSU augmentation, cutoff=1000km).
+        `assert_allclose` tolerance acknowledges PSU-aggregation BLAS
+        reduction order variation across CI runners.
+        """
+        from diff_diff import SurveyDesign
+
+        df = generate_butts_nonstaggered_dgp(seed=999)
+        df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+        design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+        est = SpilloverDiD(
+            rings=[0.0, 100.0],
+            conley_coords=("lat", "lon"),
+            conley_metric="haversine",
+            conley_cutoff_km=self._CUTOFF_KM,
+            conley_lag_cutoff=0,
+            vcov_type="conley",
+        )
+        res = est.fit(
+            df_s,
+            outcome="y",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+            survey_design=design,
+        )
+        # Goldens — pinned on initial Wave E.2 implementation (seed=999,
+        # 2-strata x 4-PSU augmentation, cutoff=1000km). ATT is invariant
+        # to vcov_type, so it matches the Wave E.1 binder golden exactly.
+        # SE is Wave E.2-specific (stratified-Conley sandwich on PSU totals).
+        _WAVE_E2_GOLDEN_ATT = -0.07749624543132044
+        _WAVE_E2_GOLDEN_SE = 0.0006771937420330884
+        np.testing.assert_allclose(res.att, _WAVE_E2_GOLDEN_ATT, rtol=1e-12, atol=1e-14)
+        np.testing.assert_allclose(res.se, _WAVE_E2_GOLDEN_SE, rtol=1e-12, atol=1e-14)
+        # Lock down DOF + n_psu (deterministic).
+        assert res.n_psu == 8
+        assert res.n_strata == 2
+        assert res.survey_metadata.df_survey == 6