SpilloverDiD vcov_type='conley' + survey_design= via panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2) by igerber · Pull Request #474 · igerber/diff-diff

igerber · 2026-05-20T12:51:17Z

Summary

SpilloverDiD(vcov_type="conley", survey_design=...) is now supported via a panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2). Lifts the Wave E.1 NotImplementedError at spillover.py:2201 upfront and two_stage.py:217 helper-level.
Composes Conley (1999) spatial-HAC with Gerber (2026, arXiv:2605.04124) Proposition 1 Binder TSL (the Wave E.1 foundation) and the Wave D Gardner GMM first-stage correction (Butts 2021 §3.1 + Gardner 2022 §4). No reference software combines all three on a two-stage influence function.
Panel-aware: preserves the library's existing conley_lag_cutoff = 0 semantic at diff_diff/conley.py:_compute_conley_meat ("within-period spatial only — exclude cross-period pairs"). For each period t, per-obs Hájek-weighted Wave D IF psi_i is aggregated to per-period PSU totals S_psu_t[g] = sum_{i in PSU g, time t} psi_i; per-PSU centroids are panel-constant (mean of per-obs conley_coords within each PSU, computed ONCE on the full active sample); within-stratum sandwich applies the Conley kernel between PSU centroids scaled by Binder FPC (1 - f_h) * n_h/(n_h-1). Cross-stratum kernel weights are exactly zero by sampling design. Total meat is sum_t sum_h M_h_t.
Out of scope (deferred follow-ups in TODO.md): conley_lag_cutoff > 0 serial Bartlett HAC composition (fail-closed upfront); replicate-weight variance (inherits Wave E.1 gate); LinearRegression-side conley + survey_design at linalg.py:2853 (separate Bertanha-Imbens Phase 5 roadmap); DiagnosticReport routing for the new combination (Wave F).

Methodology references (required if estimator / math changes)

Method name(s): SpilloverDiD Wave E.2 — panel-aware stratified-Conley on PSU totals
Paper / source link(s):
- Conley (1999), "GMM Estimation with Cross Sectional Dependence," Journal of Econometrics 92(1) — https://doi.org/10.1016/S0304-4076(98)00084-0
- Gerber (2026), Proposition 1 (Binder TSL for IF representations) — https://arxiv.org/abs/2605.04124
- Butts (2023, originally 2021), "Difference-in-Differences with Spatial Spillovers" §3.1 — https://arxiv.org/abs/2105.03737
- Gardner (2022), "Two-stage differences in differences" §4 — https://arxiv.org/abs/2207.05943
Any intentional deviations from the source (and why): Wave E.2 is a documented novel synthesis — no reference software combines Conley spatial-HAC + Binder TSL + Gardner GMM correction on a two-stage IF. All three sources are cited in docs/methodology/REGISTRY.md Wave E.2 subsection (~L3227) and docs/api/spillover.rst Wave E.2 note block. The synthesis framing leads every documented surface from the first draft per the project's documented-synthesis convention.

Validation

Tests added/updated: tests/test_spillover.py — new TestSpilloverDiDWaveE2ConleySurveyDesign (21 tests) and TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy (3 tests). Coverage includes:
- no-survey conley path bit-identical-to-Wave-D goldens + mock-spy on dispatch routing
- panel-aware per-period sum invariant on the orchestrator + helper composition
- multi-coord PSU + simulated finite_mask centroid-stability regression
- hand-computation methodology anchor
- single-stratum ≡ plain Conley on per-period PSU totals
- cross-stratum independence unit test on the survey helper with interleaved centroids
- Binder vs Conley singleton-adjust FPC skip parity
- lonely-PSU sensitivity across three modes
- FPC large ≡ no-FPC, FPC = n_h zeros stratum
- saturated NaN-fail with pytest.warns(match="Wave E.2 stratified-Conley")
- replicate-weight + non-pweight + panel-Conley-lag (lag > 0) rejections
- cluster warn-and-use-PSU, fit idempotency, finite_mask survey-array subsetting
- no-PSU coverage: weights-only SurveyDesign(weights=...), strata-only SurveyDesign(weights=..., strata=...), per-period re-index unit invariant
- event-study path on both is_staggered=True/False branches per feedback_cohort_loop_trigger_cache_both_branches; drift goldens at rtol=1e-12 / atol=1e-14
Full SpilloverDiD (250 tests) + TwoStageDiD survey (94 tests) suite passes locally. Rust backend Wave E.2 subset (DIFF_DIFF_BACKEND=rust pytest -k WaveE2) all pass.
Backtest / simulation / notebook evidence (if applicable): N/A (methodology PR; no tutorials touched)

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…tified-Conley sandwich on per-period PSU totals (Wave E.2) Composes Conley (1999) spatial-HAC with Gerber (2026, arXiv:2605.04124) Proposition 1 Binder TSL (the Wave E.1 foundation) and the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. No reference software combines all three on a two-stage influence function. Panel-aware composition (preserves the library's existing `conley_lag_cutoff = 0` semantic at `diff_diff.conley._compute_conley_meat` — "within-period spatial only, exclude cross-period pairs"): per-PSU spatial centroids are panel-constant (mean of per-obs `conley_coords` within each PSU, computed once on the full active sample). For each period t, SpilloverDiD's per-obs Hájek-weighted Wave D IF psi_i is aggregated to per-period PSU totals `S_psu_t[g] = sum_{i in PSU g, time t} psi_i`; the within-stratum sandwich applies the Conley kernel between panel-constant PSU centroids scaled by the Binder FPC factor `(1 - f_h) * n_h/(n_h-1)`. Cross-stratum kernel weights are exactly zero by sampling design. Total meat is `sum_t sum_h M_h_t`. Implementation: - New `_compute_stratified_conley_meat_from_psu_scores` helper in `diff_diff/survey.py` (parallel to existing Binder helper; per-stratum Conley sandwich; singleton lonely_psu="adjust" `continue` to skip FPC parity with Binder). - New panel-aware dispatch wrapper `_compute_stratified_conley_meat` in `diff_diff/two_stage.py`: precomputes panel-constant centroids per explicit PSU; per-period loop re-builds the PSU set from ACTIVE rows in each period (handles both explicit-PSU and implicit-PSU=obs layouts correctly without zero-padding off-period rows). - `_compute_gmm_corrected_meat` conley branch routes to the new wrapper when `resolved_survey is not None`; the `resolved_survey is None` branch is bit-identical to Wave D. - Lifts `spillover.py:2201` upfront and `two_stage.py:217` helper-level NotImplementedError gates on conley+survey. - Upfront gate stays for `conley_lag_cutoff > 0` (serial Bartlett HAC composition is a separate follow-up in TODO.md). - Saturated-design NaN-fail mirrors Wave E.1 ("Wave E.2 stratified-Conley sandwich: df_survey = 0..." UserWarning). - `cluster_ids` intentionally dropped at the dispatch boundary (after PSU aggregation every PSU is its own cluster; threading would zero all cross-PSU kernel pairs). Out of scope (deferred to follow-up): `conley_lag_cutoff > 0` serial Bartlett composition with the panel-aware stratified-Conley spatial sandwich; replicate-weight variance (inherits Wave E.1 gate); LinearRegression-side conley+survey at `linalg.py:2853` (separate Bertanha-Imbens Phase 5 roadmap); DiagnosticReport routing for the new combination (Wave F). Tests: `TestSpilloverDiDWaveE2ConleySurveyDesign` (21 tests including no-survey conley path bit-identical-to-Wave-D + mock-spy on dispatch; panel-aware per-period sum invariant on orchestrator + helper composition; multi-coord PSU + finite_mask centroid-stability regression; hand-computation methodology anchor; single-stratum ≡ plain Conley on PSU totals; cross-stratum independence on survey helper; Binder vs Conley singleton-adjust FPC skip parity; lonely-PSU sensitivity; FPC large ≡ no-FPC, FPC = n_h zeros stratum; saturated NaN-fail with `pytest.warns(match="Wave E.2 stratified-Conley")`; replicate-weight + non-pweight + panel-Conley-lag rejections; cluster warn-and-use-PSU; fit idempotency; finite_mask survey-array subsetting; no-PSU coverage — weights-only `SurveyDesign(weights=...)`, strata-only `SurveyDesign(weights=..., strata=...)`, and a per-period re-index unit invariant pinning that no cross-period spatial pairs leak into the meat on implicit-PSU layouts). Plus `TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy` (3 tests: event-study path on both `is_staggered` branches; drift goldens at `rtol=1e-12 / atol=1e-14`). Full SpilloverDiD (250 tests) + TwoStageDiD survey (94 tests) suite passes. Rust backend Wave E.2 tests (`DIFF_DIFF_BACKEND=rust pytest -k WaveE2`) all pass. Docs: REGISTRY + spillover.rst + CHANGELOG + llms.txt + README + references.rst synthesis-framing first-draft; Wave E.1 entry's "Public surface restrictions" bullet updated to past-tense the conley+survey gate reference; TODO.md Wave E.2 row deleted; new follow-up row added for the `conley_lag_cutoff > 0` serial Bartlett composition. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-05-20T13:00:16Z

Overall Assessment

✅ Looks good

Executive Summary

No unmitigated P0/P1 findings in the changed SpilloverDiD Conley+survey path.
The estimator change is implemented as the documented Wave E.2 synthesis in the Methodology Registry, not as an undocumented methodology deviation, and the code matches that documented scope in docs/methodology/REGISTRY.md:L3227-L3248, diff_diff/two_stage.py:L540-L794, and diff_diff/survey.py:L1901-L2052.
The new path preserves the pre-existing no-survey Conley branch and keeps inference NaN-safe by returning NaN meat on survey saturation and routing downstream inference through safe_inference in diff_diff/two_stage.py:L360-L396 and diff_diff/spillover.py:L3188-L3286.
Edge-case coverage is strong for the changed code paths: no-PSU survey layouts, cluster-vs-PSU routing, lonely-PSU modes, finite-mask subsetting, panel-aware per-period aggregation, centroid stability, and both event-study branches all have targeted tests in tests/test_spillover.py:L5652-L6581.
Remaining limitations such as conley_lag_cutoff > 0, replicate-weight variance, and DiagnosticReport routing are explicitly rejected and/or tracked in TODO.md, so they are not blockers under the stated review rubric.

Methodology

Severity: P3-informational. Impact: Wave E.2 is a documented synthesis, and I did not find an undocumented mismatch in the changed estimator math, weighting, variance construction, or default behavior. The implementation matches the registry’s stated panel-aware sum_t sum_h M_h_t structure, centering/FPC treatment, cluster-to-PSU routing, and explicit conley_lag_cutoff > 0 rejection in docs/methodology/REGISTRY.md:L3231-L3248, diff_diff/two_stage.py:L559-L794, diff_diff/survey.py:L1920-L2052, and diff_diff/spillover.py:L2200-L2229. Concrete fix: None.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3-informational. Impact: The main out-of-scope items fail closed and are already tracked: serial Bartlett HAC composition for conley_lag_cutoff > 0 is rejected in diff_diff/spillover.py:L2200-L2229 and tracked in TODO.md:L139-L139; replicate-weight variance and DiagnosticReport routing are likewise tracked in TODO.md:L137-L145. Concrete fix: None for this PR.

Security

No findings.

Documentation/Tests

No findings. The registry/docs were updated alongside the implementation, and the new tests cover the major changed paths in tests/test_spillover.py:L5652-L6581. I could not execute the suite in this environment because pytest is not installed.

igerber · 2026-05-20T13:41:56Z

/ai-review

github-actions · 2026-05-20T13:50:23Z

🔁 AI review rerun (requested by @igerber)

Head SHA: f076614709e6688a5493036e21ee4fd575b47fce

Overall Assessment

✅ Looks good

Executive Summary

Re-review scope: the prior AI review had no P1+ findings to re-check, and I did not identify any new unmitigated P0/P1 issues in the changed diff.
The new Wave E.2 SpilloverDiD(vcov_type="conley", survey_design=...) path matches the project’s documented synthesis: per-period PSU aggregation, panel-constant PSU centroids, within-stratum Conley application, Binder-style FPC scaling, and explicit fail-closed rejection of conley_lag_cutoff > 0 are all aligned between code and registry. docs/methodology/REGISTRY.md:L3227-L3248, diff_diff/two_stage.py:L540-L794, diff_diff/survey.py:L1901-L2052, diff_diff/spillover.py:L2200-L2229. citeturn4view2turn2view0turn2view2turn2view3
Edge-case handling on the changed path is strong: no-survey dispatch parity, survey-saturation NaN-fail, no-PSU layouts, centroid-stability regression coverage, cluster-vs-PSU routing, and both event-study branches are all exercised in tests/test_spillover.py:L5686-L6581.
Remaining limitations are fail-closed and tracked rather than silently accepted: replicate-weight variance, conley_lag_cutoff > 0 under survey, and DiagnosticReport routing. TODO.md:L137-L145
I could not execute the test suite here because this environment does not have pytest, numpy, scipy, or pandas installed.

Methodology

Severity: P3-informational
Impact: I did not find an undocumented methodology mismatch. The implementation stays within the cited source scope: Butts’ spillover paper uses Conley-style spatial HAC for inference, Conley is the spatial-HAC foundation, Gardner’s two-stage paper requires a consistent variance estimator for valid asymptotic inference, and Gerber’s survey paper extends Binder-style stratified-cluster variance logic to IF-based modern DiD estimators. The changed code composes those pieces in the same way the registry documents: Psi is aggregated to PSU totals within period, centroids are panel-constant, the inner meat is computed within stratum, and unsupported serial HAC composition is rejected up front. docs/methodology/REGISTRY.md:L3231-L3248, diff_diff/two_stage.py:L366-L381, diff_diff/two_stage.py:L540-L794, diff_diff/survey.py:L1901-L2052, diff_diff/spillover.py:L2200-L2229. citeturn4view2turn2view0turn2view2turn2view3
Concrete fix: None.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3-informational
Impact: The main deferred items introduced or referenced by this PR are properly fail-closed and tracked in TODO.md, so they are not blockers under the review rubric: serial Bartlett composition for conley_lag_cutoff > 0, replicate-weight variance, and downstream DiagnosticReport wiring. TODO.md:L137-L145, diff_diff/spillover.py:L2210-L2229
Concrete fix: None for this PR.

Security

No findings.

Documentation/Tests

No findings.
Residual risk: I could not run the new tests in this environment. On inspection, the changed tests are appropriately targeted to the new variance path, especially tests/test_spillover.py:L5686-L6581.

…e E.2 follow-up) Extends the panel-aware stratified-Conley spatial sandwich (Wave E.2 cross- sectional, PR igerber#474) to `conley_lag_cutoff > 0` by adding a within-PSU serial Bartlett HAC term (Newey-West 1987 separable form). The composition `meat = meat_spatial + meat_serial` has disjoint index sets, exactly matching the no-survey panel-block decomposition at `diff_diff.conley._compute_conley_meat`. Methodology — documented synthesis of: - Conley (1999) spatial-HAC - Newey-West (1987) serial Bartlett kernel weights `(1 - |t-s|/(L+1))` - Binder (1983) / Gerber (2026) Prop 1 stratified TSL on Wave D Gardner GMM influence functions Serial term uses per-period within-stratum centering (Binder TSL form, matching the spatial helper); panel-wide per-stratum FPC (the serial sum is a panel-level construct, so the cluster set is panel-wide); hardcoded Bartlett serial kernel regardless of `conley_kernel` (mirrors `conley.py:951-965`); panel-wide dense time codes for lag math (matches `conley.py:940` R deviation). Supported surface — requires an effective PSU: either an explicit `survey_design.psu` OR a `cluster=<col>` argument that gets injected as the effective PSU per Wave E.1's `_inject_cluster_as_psu` routing. No-effective-PSU survey designs (weights-only / strata-only WITHOUT a cluster fallback) raise `NotImplementedError` post-resolution at `SpilloverDiD.fit` per `feedback_no_silent_failures`: the pseudo-PSU = obs-index fallback would silently zero the serial sum (each pseudo-PSU appears in exactly one period). Routing the serial loop to `conley_unit` would mix IF allocators with the spatial term and is queued as a follow-up. Code changes: - New sibling helper `_compute_stratified_serial_bartlett_meat` in `diff_diff/two_stage.py` (T=1 short-circuit, three-mode singleton-stratum branching with FPC inside the multi-PSU block to avoid divide-by-zero, panel-wide mean for `lonely_psu='adjust'`, zeroed centering for singleton-active-period cells so raw scores don't leak into the serial Bartlett cross-products under unbalanced panels) - Orchestrator `_compute_stratified_conley_meat` extended with `conley_lag_cutoff` kwarg; spatial loop unchanged; serial helper called after spatial loop when `L > 0` - Dispatch in `_compute_gmm_corrected_meat` conley branch threads `conley_lag_cutoff` through - `spillover.py:2210` Wave E.2-era `NotImplementedError` gate for lag>0 + survey deleted; replaced with post-resolution fail-closed gate that fires only when `resolved_survey_fit.psu` is None AFTER cluster injection (so the documented `cluster=<col>` injection surface continues to work) Tests — 24 new methods across two classes (`TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoff` and `TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoffEventStudy`): - `test_a` lag=0 strict bit-identity to shipped Wave E.2 meat - `test_a2` lag=0 does NOT invoke serial helper (mock-spy) - `test_b` lag=1 invokes serial helper exactly once (mock-spy) - `test_c0` raw-vs-centered hand-check pins Binder TSL centering - `test_c1`/`test_c2` hand-computation methodology anchors at L=1 and L=2 - `test_c3` AR(1) DGP serial inflation behavioral pin (rho=0.7, > 5%) - `test_d` single-stratum lag=1 finite output - `test_e` cross-stratum independence of serial term (partition + sum) - `test_f` singleton-adjust + lag=1 no divide-by-zero - `test_f2` all-singleton-remove + lag=1 returns zero meat - `test_g` unbalanced panel + panel-wide dense time codes (hand-computed) - `test_g2` lag > T-1 well-defined - `test_h` singleton-active-period centering zeros (sparse-period regression) - `test_j` no-survey panel-block conley unchanged after gate relaxation - `test_k` replicate-weight rejection still fires - `test_l` cluster + lag=1 + survey warn-and-use-PSU - `test_m` fit-idempotency under lag=1 + survey - `test_n`/`test_n2` no-effective-PSU survey + lag>0 raises NotImplementedError - `test_n3` cluster-injected effective-PSU surface fits + matches explicit PSU - `test_r` drift goldens at lag=1 vs lag=0 (ATT invariant, SE differs) - `test_o`/`test_p`/`test_r` event-study mirror (both is_staggered branches) Existing `test_j0_panel_conley_lag_cutoff_rejected_under_survey` (Wave E.2-era gate-assertion) deleted. Docs: - REGISTRY `Variance (Wave E.2 follow-up)` subsection with documented- synthesis framing + cross-references + effective-PSU restriction - `spillover.rst` Wave E.2 follow-up stanza - CHANGELOG `[Unreleased]` bullet - `llms.txt` + `README.md` catalog entries updated - `references.rst` adds Newey-West (1987) - TODO row deleted (old deferral); new row added for the no-effective-PSU follow-up tail Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Release notes consolidate 8 PRs since 3.4.0 (2026-05-19): Public-surface variance lifts: - SpilloverDiD survey_design on HC1/CR1 via Binder TSL (Wave E.1, igerber#468) - SpilloverDiD vcov_type=conley + survey_design via stratified-Conley on PSU totals (Wave E.2, igerber#474) + lag_cutoff>0 follow-up (igerber#477) - SunAbraham vcov_type ∈ {classical, hc1, hc2, hc2_bm} (Phase 1b 1/8, igerber#472) - WLS-CR2 Bell-McCaffrey gates lifted via clubSandwich port (igerber#475) Methodology-review-tracker promotions (mostly docs/tests): - PreTrendsPower R pretrends parity goldens (PR-C, igerber#471) - HAD methodology-review-tracker promotion (igerber#473) - ContinuousDiD methodology-review-tracker promotion (igerber#476) All changes additive; bit-equal defaults preserved across the affected estimators. No new estimators (patch-level per semver convention). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label May 20, 2026

igerber merged commit 32b4c67 into main May 20, 2026
33 of 34 checks passed

igerber deleted the spillover-conley-wave-e2-conley-survey branch May 20, 2026 15:32

This was referenced May 21, 2026

SpilloverDiD conley + survey + lag>0 via panel-block composition (Wave E.2 follow-up) #477

Merged

Release 3.4.1: SpilloverDiD survey + Conley lifts, SunAbraham vcov_type, WLS-CR2 BM, methodology-tracker promotions #480

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpilloverDiD vcov_type='conley' + survey_design= via panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2)#474

SpilloverDiD vcov_type='conley' + survey_design= via panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2)#474
igerber merged 1 commit into
mainfrom
spillover-conley-wave-e2-conley-survey

igerber commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

igerber commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 20, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

igerber commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant