Skip to content

SpilloverDiD vcov_type='conley' + survey_design= via panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2)#474

Merged
igerber merged 1 commit into
mainfrom
spillover-conley-wave-e2-conley-survey
May 20, 2026
Merged

SpilloverDiD vcov_type='conley' + survey_design= via panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2)#474
igerber merged 1 commit into
mainfrom
spillover-conley-wave-e2-conley-survey

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 20, 2026

Summary

  • SpilloverDiD(vcov_type="conley", survey_design=...) is now supported via a panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2). Lifts the Wave E.1 NotImplementedError at spillover.py:2201 upfront and two_stage.py:217 helper-level.
  • Composes Conley (1999) spatial-HAC with Gerber (2026, arXiv:2605.04124) Proposition 1 Binder TSL (the Wave E.1 foundation) and the Wave D Gardner GMM first-stage correction (Butts 2021 §3.1 + Gardner 2022 §4). No reference software combines all three on a two-stage influence function.
  • Panel-aware: preserves the library's existing conley_lag_cutoff = 0 semantic at diff_diff/conley.py:_compute_conley_meat ("within-period spatial only — exclude cross-period pairs"). For each period t, per-obs Hájek-weighted Wave D IF psi_i is aggregated to per-period PSU totals S_psu_t[g] = sum_{i in PSU g, time t} psi_i; per-PSU centroids are panel-constant (mean of per-obs conley_coords within each PSU, computed ONCE on the full active sample); within-stratum sandwich applies the Conley kernel between PSU centroids scaled by Binder FPC (1 - f_h) * n_h/(n_h-1). Cross-stratum kernel weights are exactly zero by sampling design. Total meat is sum_t sum_h M_h_t.
  • Out of scope (deferred follow-ups in TODO.md): conley_lag_cutoff > 0 serial Bartlett HAC composition (fail-closed upfront); replicate-weight variance (inherits Wave E.1 gate); LinearRegression-side conley + survey_design at linalg.py:2853 (separate Bertanha-Imbens Phase 5 roadmap); DiagnosticReport routing for the new combination (Wave F).

Methodology references (required if estimator / math changes)

  • Method name(s): SpilloverDiD Wave E.2 — panel-aware stratified-Conley on PSU totals
  • Paper / source link(s):
  • Any intentional deviations from the source (and why): Wave E.2 is a documented novel synthesis — no reference software combines Conley spatial-HAC + Binder TSL + Gardner GMM correction on a two-stage IF. All three sources are cited in docs/methodology/REGISTRY.md Wave E.2 subsection (~L3227) and docs/api/spillover.rst Wave E.2 note block. The synthesis framing leads every documented surface from the first draft per the project's documented-synthesis convention.

Validation

  • Tests added/updated: tests/test_spillover.py — new TestSpilloverDiDWaveE2ConleySurveyDesign (21 tests) and TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy (3 tests). Coverage includes:
    • no-survey conley path bit-identical-to-Wave-D goldens + mock-spy on dispatch routing
    • panel-aware per-period sum invariant on the orchestrator + helper composition
    • multi-coord PSU + simulated finite_mask centroid-stability regression
    • hand-computation methodology anchor
    • single-stratum ≡ plain Conley on per-period PSU totals
    • cross-stratum independence unit test on the survey helper with interleaved centroids
    • Binder vs Conley singleton-adjust FPC skip parity
    • lonely-PSU sensitivity across three modes
    • FPC large ≡ no-FPC, FPC = n_h zeros stratum
    • saturated NaN-fail with pytest.warns(match="Wave E.2 stratified-Conley")
    • replicate-weight + non-pweight + panel-Conley-lag (lag > 0) rejections
    • cluster warn-and-use-PSU, fit idempotency, finite_mask survey-array subsetting
    • no-PSU coverage: weights-only SurveyDesign(weights=...), strata-only SurveyDesign(weights=..., strata=...), per-period re-index unit invariant
    • event-study path on both is_staggered=True/False branches per feedback_cohort_loop_trigger_cache_both_branches; drift goldens at rtol=1e-12 / atol=1e-14
  • Full SpilloverDiD (250 tests) + TwoStageDiD survey (94 tests) suite passes locally. Rust backend Wave E.2 subset (DIFF_DIFF_BACKEND=rust pytest -k WaveE2) all pass.
  • Backtest / simulation / notebook evidence (if applicable): N/A (methodology PR; no tutorials touched)

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…tified-Conley sandwich on per-period PSU totals (Wave E.2)

Composes Conley (1999) spatial-HAC with Gerber (2026, arXiv:2605.04124)
Proposition 1 Binder TSL (the Wave E.1 foundation) and the Wave D Gardner
GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4)
applied to SpilloverDiD's ring-indicator stage-2 design. No reference
software combines all three on a two-stage influence function.

Panel-aware composition (preserves the library's existing
`conley_lag_cutoff = 0` semantic at `diff_diff.conley._compute_conley_meat`
— "within-period spatial only, exclude cross-period pairs"): per-PSU
spatial centroids are panel-constant (mean of per-obs `conley_coords`
within each PSU, computed once on the full active sample). For each
period t, SpilloverDiD's per-obs Hájek-weighted Wave D IF psi_i is
aggregated to per-period PSU totals `S_psu_t[g] = sum_{i in PSU g, time t}
psi_i`; the within-stratum sandwich applies the Conley kernel between
panel-constant PSU centroids scaled by the Binder FPC factor
`(1 - f_h) * n_h/(n_h-1)`. Cross-stratum kernel weights are exactly zero
by sampling design. Total meat is `sum_t sum_h M_h_t`.

Implementation:
- New `_compute_stratified_conley_meat_from_psu_scores` helper in
  `diff_diff/survey.py` (parallel to existing Binder helper; per-stratum
  Conley sandwich; singleton lonely_psu="adjust" `continue` to skip FPC
  parity with Binder).
- New panel-aware dispatch wrapper `_compute_stratified_conley_meat` in
  `diff_diff/two_stage.py`: precomputes panel-constant centroids per
  explicit PSU; per-period loop re-builds the PSU set from ACTIVE rows
  in each period (handles both explicit-PSU and implicit-PSU=obs
  layouts correctly without zero-padding off-period rows).
- `_compute_gmm_corrected_meat` conley branch routes to the new wrapper
  when `resolved_survey is not None`; the `resolved_survey is None`
  branch is bit-identical to Wave D.
- Lifts `spillover.py:2201` upfront and `two_stage.py:217` helper-level
  NotImplementedError gates on conley+survey.
- Upfront gate stays for `conley_lag_cutoff > 0` (serial Bartlett HAC
  composition is a separate follow-up in TODO.md).
- Saturated-design NaN-fail mirrors Wave E.1
  ("Wave E.2 stratified-Conley sandwich: df_survey = 0..." UserWarning).
- `cluster_ids` intentionally dropped at the dispatch boundary (after
  PSU aggregation every PSU is its own cluster; threading would zero
  all cross-PSU kernel pairs).

Out of scope (deferred to follow-up): `conley_lag_cutoff > 0` serial
Bartlett composition with the panel-aware stratified-Conley spatial
sandwich; replicate-weight variance (inherits Wave E.1 gate);
LinearRegression-side conley+survey at `linalg.py:2853` (separate
Bertanha-Imbens Phase 5 roadmap); DiagnosticReport routing for the
new combination (Wave F).

Tests: `TestSpilloverDiDWaveE2ConleySurveyDesign` (21 tests including
no-survey conley path bit-identical-to-Wave-D + mock-spy on dispatch;
panel-aware per-period sum invariant on orchestrator + helper
composition; multi-coord PSU + finite_mask centroid-stability
regression; hand-computation methodology anchor; single-stratum ≡ plain
Conley on PSU totals; cross-stratum independence on survey helper;
Binder vs Conley singleton-adjust FPC skip parity; lonely-PSU
sensitivity; FPC large ≡ no-FPC, FPC = n_h zeros stratum; saturated
NaN-fail with `pytest.warns(match="Wave E.2 stratified-Conley")`;
replicate-weight + non-pweight + panel-Conley-lag rejections; cluster
warn-and-use-PSU; fit idempotency; finite_mask survey-array
subsetting; no-PSU coverage — weights-only `SurveyDesign(weights=...)`,
strata-only `SurveyDesign(weights=..., strata=...)`, and a per-period
re-index unit invariant pinning that no cross-period spatial pairs leak
into the meat on implicit-PSU layouts). Plus
`TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy` (3 tests:
event-study path on both `is_staggered` branches; drift goldens at
`rtol=1e-12 / atol=1e-14`). Full SpilloverDiD (250 tests) + TwoStageDiD
survey (94 tests) suite passes. Rust backend Wave E.2 tests
(`DIFF_DIFF_BACKEND=rust pytest -k WaveE2`) all pass.

Docs: REGISTRY + spillover.rst + CHANGELOG + llms.txt + README +
references.rst synthesis-framing first-draft; Wave E.1 entry's "Public
surface restrictions" bullet updated to past-tense the conley+survey
gate reference; TODO.md Wave E.2 row deleted; new follow-up row added
for the `conley_lag_cutoff > 0` serial Bartlett composition.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

✅ Looks good

Executive Summary

  • No unmitigated P0/P1 findings in the changed SpilloverDiD Conley+survey path.
  • The estimator change is implemented as the documented Wave E.2 synthesis in the Methodology Registry, not as an undocumented methodology deviation, and the code matches that documented scope in docs/methodology/REGISTRY.md:L3227-L3248, diff_diff/two_stage.py:L540-L794, and diff_diff/survey.py:L1901-L2052.
  • The new path preserves the pre-existing no-survey Conley branch and keeps inference NaN-safe by returning NaN meat on survey saturation and routing downstream inference through safe_inference in diff_diff/two_stage.py:L360-L396 and diff_diff/spillover.py:L3188-L3286.
  • Edge-case coverage is strong for the changed code paths: no-PSU survey layouts, cluster-vs-PSU routing, lonely-PSU modes, finite-mask subsetting, panel-aware per-period aggregation, centroid stability, and both event-study branches all have targeted tests in tests/test_spillover.py:L5652-L6581.
  • Remaining limitations such as conley_lag_cutoff > 0, replicate-weight variance, and DiagnosticReport routing are explicitly rejected and/or tracked in TODO.md, so they are not blockers under the stated review rubric.

Methodology

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3-informational. Impact: The main out-of-scope items fail closed and are already tracked: serial Bartlett HAC composition for conley_lag_cutoff > 0 is rejected in diff_diff/spillover.py:L2200-L2229 and tracked in TODO.md:L139-L139; replicate-weight variance and DiagnosticReport routing are likewise tracked in TODO.md:L137-L145. Concrete fix: None for this PR.

Security

  • No findings.

Documentation/Tests

  • No findings. The registry/docs were updated alongside the implementation, and the new tests cover the major changed paths in tests/test_spillover.py:L5652-L6581. I could not execute the suite in this environment because pytest is not installed.

@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 20, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: f076614709e6688a5493036e21ee4fd575b47fce


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review scope: the prior AI review had no P1+ findings to re-check, and I did not identify any new unmitigated P0/P1 issues in the changed diff.
  • The new Wave E.2 SpilloverDiD(vcov_type="conley", survey_design=...) path matches the project’s documented synthesis: per-period PSU aggregation, panel-constant PSU centroids, within-stratum Conley application, Binder-style FPC scaling, and explicit fail-closed rejection of conley_lag_cutoff > 0 are all aligned between code and registry. docs/methodology/REGISTRY.md:L3227-L3248, diff_diff/two_stage.py:L540-L794, diff_diff/survey.py:L1901-L2052, diff_diff/spillover.py:L2200-L2229. citeturn4view2turn2view0turn2view2turn2view3
  • Edge-case handling on the changed path is strong: no-survey dispatch parity, survey-saturation NaN-fail, no-PSU layouts, centroid-stability regression coverage, cluster-vs-PSU routing, and both event-study branches are all exercised in tests/test_spillover.py:L5686-L6581.
  • Remaining limitations are fail-closed and tracked rather than silently accepted: replicate-weight variance, conley_lag_cutoff > 0 under survey, and DiagnosticReport routing. TODO.md:L137-L145
  • I could not execute the test suite here because this environment does not have pytest, numpy, scipy, or pandas installed.

Methodology

  • Severity: P3-informational
    Impact: I did not find an undocumented methodology mismatch. The implementation stays within the cited source scope: Butts’ spillover paper uses Conley-style spatial HAC for inference, Conley is the spatial-HAC foundation, Gardner’s two-stage paper requires a consistent variance estimator for valid asymptotic inference, and Gerber’s survey paper extends Binder-style stratified-cluster variance logic to IF-based modern DiD estimators. The changed code composes those pieces in the same way the registry documents: Psi is aggregated to PSU totals within period, centroids are panel-constant, the inner meat is computed within stratum, and unsupported serial HAC composition is rejected up front. docs/methodology/REGISTRY.md:L3231-L3248, diff_diff/two_stage.py:L366-L381, diff_diff/two_stage.py:L540-L794, diff_diff/survey.py:L1901-L2052, diff_diff/spillover.py:L2200-L2229. citeturn4view2turn2view0turn2view2turn2view3
    Concrete fix: None.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3-informational
    Impact: The main deferred items introduced or referenced by this PR are properly fail-closed and tracked in TODO.md, so they are not blockers under the review rubric: serial Bartlett composition for conley_lag_cutoff > 0, replicate-weight variance, and downstream DiagnosticReport wiring. TODO.md:L137-L145, diff_diff/spillover.py:L2210-L2229
    Concrete fix: None for this PR.

Security

  • No findings.

Documentation/Tests

  • No findings.
  • Residual risk: I could not run the new tests in this environment. On inspection, the changed tests are appropriately targeted to the new variance path, especially tests/test_spillover.py:L5686-L6581.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 20, 2026
@igerber igerber merged commit 32b4c67 into main May 20, 2026
33 of 34 checks passed
@igerber igerber deleted the spillover-conley-wave-e2-conley-survey branch May 20, 2026 15:32
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request May 22, 2026
…e E.2 follow-up)

Extends the panel-aware stratified-Conley spatial sandwich (Wave E.2 cross-
sectional, PR igerber#474) to `conley_lag_cutoff > 0` by adding a within-PSU serial
Bartlett HAC term (Newey-West 1987 separable form). The composition
`meat = meat_spatial + meat_serial` has disjoint index sets, exactly matching
the no-survey panel-block decomposition at
`diff_diff.conley._compute_conley_meat`.

Methodology — documented synthesis of:
- Conley (1999) spatial-HAC
- Newey-West (1987) serial Bartlett kernel weights `(1 - |t-s|/(L+1))`
- Binder (1983) / Gerber (2026) Prop 1 stratified TSL on Wave D Gardner GMM
  influence functions

Serial term uses per-period within-stratum centering (Binder TSL form,
matching the spatial helper); panel-wide per-stratum FPC (the serial sum is a
panel-level construct, so the cluster set is panel-wide); hardcoded Bartlett
serial kernel regardless of `conley_kernel` (mirrors `conley.py:951-965`);
panel-wide dense time codes for lag math (matches `conley.py:940` R deviation).

Supported surface — requires an effective PSU: either an explicit
`survey_design.psu` OR a `cluster=<col>` argument that gets injected as the
effective PSU per Wave E.1's `_inject_cluster_as_psu` routing. No-effective-PSU
survey designs (weights-only / strata-only WITHOUT a cluster fallback) raise
`NotImplementedError` post-resolution at `SpilloverDiD.fit` per
`feedback_no_silent_failures`: the pseudo-PSU = obs-index fallback would
silently zero the serial sum (each pseudo-PSU appears in exactly one period).
Routing the serial loop to `conley_unit` would mix IF allocators with the
spatial term and is queued as a follow-up.

Code changes:
- New sibling helper `_compute_stratified_serial_bartlett_meat` in
  `diff_diff/two_stage.py` (T=1 short-circuit, three-mode singleton-stratum
  branching with FPC inside the multi-PSU block to avoid divide-by-zero,
  panel-wide mean for `lonely_psu='adjust'`, zeroed centering for
  singleton-active-period cells so raw scores don't leak into the serial
  Bartlett cross-products under unbalanced panels)
- Orchestrator `_compute_stratified_conley_meat` extended with
  `conley_lag_cutoff` kwarg; spatial loop unchanged; serial helper called
  after spatial loop when `L > 0`
- Dispatch in `_compute_gmm_corrected_meat` conley branch threads
  `conley_lag_cutoff` through
- `spillover.py:2210` Wave E.2-era `NotImplementedError` gate for lag>0 +
  survey deleted; replaced with post-resolution fail-closed gate that fires
  only when `resolved_survey_fit.psu` is None AFTER cluster injection (so
  the documented `cluster=<col>` injection surface continues to work)

Tests — 24 new methods across two classes
(`TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoff` and
`TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoffEventStudy`):
- `test_a` lag=0 strict bit-identity to shipped Wave E.2 meat
- `test_a2` lag=0 does NOT invoke serial helper (mock-spy)
- `test_b` lag=1 invokes serial helper exactly once (mock-spy)
- `test_c0` raw-vs-centered hand-check pins Binder TSL centering
- `test_c1`/`test_c2` hand-computation methodology anchors at L=1 and L=2
- `test_c3` AR(1) DGP serial inflation behavioral pin (rho=0.7, > 5%)
- `test_d` single-stratum lag=1 finite output
- `test_e` cross-stratum independence of serial term (partition + sum)
- `test_f` singleton-adjust + lag=1 no divide-by-zero
- `test_f2` all-singleton-remove + lag=1 returns zero meat
- `test_g` unbalanced panel + panel-wide dense time codes (hand-computed)
- `test_g2` lag > T-1 well-defined
- `test_h` singleton-active-period centering zeros (sparse-period regression)
- `test_j` no-survey panel-block conley unchanged after gate relaxation
- `test_k` replicate-weight rejection still fires
- `test_l` cluster + lag=1 + survey warn-and-use-PSU
- `test_m` fit-idempotency under lag=1 + survey
- `test_n`/`test_n2` no-effective-PSU survey + lag>0 raises NotImplementedError
- `test_n3` cluster-injected effective-PSU surface fits + matches explicit PSU
- `test_r` drift goldens at lag=1 vs lag=0 (ATT invariant, SE differs)
- `test_o`/`test_p`/`test_r` event-study mirror (both is_staggered branches)

Existing `test_j0_panel_conley_lag_cutoff_rejected_under_survey` (Wave E.2-era
gate-assertion) deleted.

Docs:
- REGISTRY `Variance (Wave E.2 follow-up)` subsection with documented-
  synthesis framing + cross-references + effective-PSU restriction
- `spillover.rst` Wave E.2 follow-up stanza
- CHANGELOG `[Unreleased]` bullet
- `llms.txt` + `README.md` catalog entries updated
- `references.rst` adds Newey-West (1987)
- TODO row deleted (old deferral); new row added for the no-effective-PSU
  follow-up tail

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request May 22, 2026
Release notes consolidate 8 PRs since 3.4.0 (2026-05-19):

Public-surface variance lifts:
- SpilloverDiD survey_design on HC1/CR1 via Binder TSL (Wave E.1, igerber#468)
- SpilloverDiD vcov_type=conley + survey_design via stratified-Conley
  on PSU totals (Wave E.2, igerber#474) + lag_cutoff>0 follow-up (igerber#477)
- SunAbraham vcov_type ∈ {classical, hc1, hc2, hc2_bm} (Phase 1b 1/8, igerber#472)
- WLS-CR2 Bell-McCaffrey gates lifted via clubSandwich port (igerber#475)

Methodology-review-tracker promotions (mostly docs/tests):
- PreTrendsPower R pretrends parity goldens (PR-C, igerber#471)
- HAD methodology-review-tracker promotion (igerber#473)
- ContinuousDiD methodology-review-tracker promotion (igerber#476)

All changes additive; bit-equal defaults preserved across the affected
estimators. No new estimators (patch-level per semver convention).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant