diff --git a/CHANGELOG.md b/CHANGELOG.md index 3a851f73..66f4405d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,9 +8,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] ### Added +- **`SpilloverDiD(vcov_type="conley", survey_design=...)` integration via stratified-Conley sandwich on PSU totals (Wave E.2).** Lifts the Wave E.1 `NotImplementedError` (`spillover.py:2201` upfront, `two_stage.py:217` helper-level) and adds spatial-HAC + design-based variance for the previously deferred composition. **Documented synthesis** of Conley (1999) spatial-HAC × Gerber (2026, arXiv:2605.04124) Proposition 1 Binder TSL (the Wave E.1 foundation) × Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. No reference software combines all three ingredients on a two-stage influence function. **Mechanical composition (panel-aware):** preserves the library's existing `conley_lag_cutoff = 0` semantic at `diff_diff.conley._compute_conley_meat` ("within-period spatial only — exclude cross-period spatial pairs") by looping over periods. For each period `t`, SpilloverDiD's per-obs Hájek-weighted Wave D IF `psi_i` is aggregated to per-period PSU totals `S_psu_t[g] = sum_{i in PSU g, time t} psi_i` (via `np.add.at`); per-PSU spatial centroids are panel-constant (mean of per-observation `conley_coords` within each PSU, vectorized `np.add.at` sums / `np.bincount` counts); for each stratum the within-stratum sandwich is `M_h_t = (1 - f_h) * n_h/(n_h-1) * sum_{j,k in PSUs_h} K(d(centroid_j, centroid_k) / conley_cutoff_km) * (S_psu_t[j] - S_bar_h_t)(S_psu_t[k] - S_bar_h_t)'`, where K is the Bartlett kernel (SpilloverDiD currently exposes Bartlett only and hardcodes it; the survey helper accepts `"uniform"` too but exposing that on the SpilloverDiD constructor is a separate follow-up) and `d` is haversine / euclidean / callable per `ConleyMetric`. Cross-stratum kernel weights are exactly zero by sampling design (strata are independence partitions). Total meat is `sum_t sum_h M_h_t`. Cross-period spatial pairs are excluded by construction — the per-period loop matches the library's panel Conley contract exactly. **Reduction semantics (load-bearing for tests):** the orchestrator's panel-aware meat equals `sum_t` of per-period within-stratum stratified-Conley sandwiches on per-period PSU totals (pinned at `tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant`); single stratum (H = 1, FPC = inf) reduces to `sum_t` plain Conley sandwich on per-period PSU totals (NOT on time-collapsed totals). **Implementation:** new `_compute_stratified_conley_meat_from_psu_scores` helper in `diff_diff/survey.py` (parallel to existing `_compute_stratified_meat_from_psu_scores` 3-tuple `(meat, variance_computed, legitimate_zero_count)` contract; per-stratum loop replaces the inner `centered.T @ centered` with `_compute_conley_meat(scores=centered, coords=psu_coords_h, ...)` in cross-sectional mode); new dispatch wrapper `_compute_stratified_conley_meat` in `diff_diff/two_stage.py` (parallel to existing `_compute_binder_tsl_meat`, performs per-obs Psi → PSU aggregation + centroid derivation + dispatch to survey helper, intentionally drops `cluster_ids` at the dispatch boundary — see Restrictions). `_compute_gmm_corrected_meat` conley branch extended with `if resolved_survey is not None` routing to the new wrapper; the `resolved_survey is None` branch is bit-identical to Wave D. **Singleton-stratum `lonely_psu="adjust"` parity:** the survey helper mirrors the Binder helper's `continue` to skip the FPC scale on singleton strata (with `n_h = 1` the scale `n_h / (n_h - 1)` would divide by zero); the degenerate one-PSU kernel `K = [[K(0)]] = [[1.0]]` reduces to `centered.T @ centered`, matching Binder's singleton-adjust output. **Saturated `df_survey = 0` NaN-fail:** mirrors Wave E.1 (`_compute_stratified_conley_meat` returns NaN meat with `UserWarning` template "Wave E.2 stratified-Conley sandwich: df_survey = 0..." so callers can `pytest.warns(UserWarning, match="Wave E.2 stratified-Conley")`). **Public surface restrictions:** replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` (inherits Wave E.1 gate; per-replicate full refit is separate follow-up scope); `cluster= + survey_design.psu + vcov_type="conley"` coerces `cluster=` to PSU per Wave E.1's warn-and-use-PSU pattern (the Conley cluster product kernel becomes a no-op after PSU aggregation, so `cluster_ids` is intentionally not threaded into the inner Conley kernel call — every PSU is its own cluster post-aggregation, which would zero all cross-PSU pairs); LinearRegression-side `vcov_type="conley" + survey_design=` gate at `diff_diff/linalg.py:2853` remains (separate Bertanha-Imbens 2014 weighted-Conley "Phase 5" roadmap, not Wave E); DiagnosticReport routing for `SpilloverDiDResults(vcov_type="conley", survey_design=)` requires `_APPLICABILITY` / `_PT_METHOD` registration (separate Wave F PR). **Tests:** new `TestSpilloverDiDWaveE2ConleySurveyDesign` and `TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy` classes in `tests/test_spillover.py` (bit-identical no-survey fallback; panel-aware per-period sum invariant on the orchestrator + helper composition; hand-computation methodology anchor; single-stratum ≡ plain Conley on PSU totals; cross-stratum independence as a unit test on the survey helper with interleaved cross-stratum centroids; Binder vs Conley singleton-adjust FPC skip parity; lonely-PSU sensitivity across three modes; FPC large ≡ no-FPC and FPC = n_h zeros stratum; saturated NaN-fail with `pytest.warns(match="Wave E.2 stratified-Conley")`; replicate-weight + non-pweight rejections; cluster warn-and-use-PSU; fit idempotency; `finite_mask` survey-array subsetting; no-PSU coverage — weights-only `SurveyDesign(weights=...)`, strata-only `SurveyDesign(weights=..., strata=...)`, and a per-period re-index unit invariant pinning that no cross-period spatial pairs leak into the meat on implicit-PSU layouts; event-study path on both `is_staggered=True`/`False` branches per `feedback_cohort_loop_trigger_cache_both_branches`; drift goldens at `rtol=1e-12 / atol=1e-14`). The pre-existing `tests/test_spillover.py::test_fit_conley_plus_survey_design_not_implemented` Wave E.1-era gate-assertion test is removed (replaced by the positive-path tests above). Wave E.1 entry's "Public surface restrictions" bullet updated to past-tense the conley+survey gate reference. - **SunAbraham `vcov_type` parameter (Phase 1b PR 1/8).** `SunAbraham(vcov_type=...)` now accepts `{"classical","hc1","hc2","hc2_bm"}` (defaults to `"hc1"`, which preserves prior behavior bit-equally - SA historically hard-coded HC1). Auto-cluster-at-unit dropped when the user opts into explicit `vcov_type="hc2"` or `vcov_type="classical"` (one-way only); preserved for `"hc1"` and `"hc2_bm"`. When `vcov_type in {"classical","hc2","hc2_bm"}`, `_fit_saturated_regression` auto-routes to a full-dummy saturated design (mirrors TWFE Gate 1 from PR #469): FWL preserves cohort coefficients but not the hat matrix, so HC2 leverage and Bell-McCaffrey Satterthwaite DOF must be computed on the full FE projection. Empirically matches R `lm()` summary classical SE, `sandwich::vcovHC(type="HC2")`, and `clubSandwich::vcovCR(..., type="CR2")` + `coef_test()$df_Satt` at atol=1e-10 (cohort SE and BM DOF pinned in `tests/test_methodology_sun_abraham.py`). For `vcov_type="hc2_bm"`, the user-facing aggregated inference (`event_study_effects[e]['p_value']`/`['conf_int']`, `overall_p_value`/`overall_conf_int`) uses CR2 Bell-McCaffrey contrast DOF — matches `clubSandwich::Wald_test(test="HTZ")$df_denom` at atol=1e-10 (mirrors PR #465's `_compute_cr2_bm_contrast_dof` pattern for MultiPeriodDiD's post-period-average ATT). `vcov_type` is now propagated to `SunAbrahamResults.vcov_type` for downstream introspection. `SurveyDesign` (any kind — analytical weights, stratified, PSU, or replicate-weight) combined with `vcov_type in {"classical","hc2","hc2_bm"}` raises `NotImplementedError`: the survey-design TSL (or replicate-weight refit) variance overrides the analytical sandwich family, and the auto-cluster guard for one-way families would silently downgrade unit-level PSUs to per-observation PSUs. Use `vcov_type="hc1"` (default) for survey designs. `conley` rejected at `__init__` with a deferral message (would require threading 6+ `conley_*` params through the saturated regression call). **Deviation from R:** SA's within-transform HC1 SE differs from `fixest::sunab()` by ~1-2% (~2e-3 absolute) on typical panel sizes due to a different `(n-k)` finite-sample correction (fixest counts absorbed FE in k_total; SA's `solve_ols` counts only within-transformed columns); the IW aggregation step is otherwise identical (pinned at atol=5e-3, tracked in TODO.md). First PR of the Phase 1b standalone-estimator threading initiative (7 PRs to follow: StackedDiD, WooldridgeDiD-OLS, CallawaySantAnna, ImputationDiD, TripleDifference, TwoStageDiD, EfficientDiD). - **PreTrendsPower R `pretrends` parity goldens (PR-C closes PR-B's deferred R-parity row).** JSON goldens at `benchmarks/data/r_pretrends_golden.json` generated from the committed `benchmarks/R/generate_pretrends_golden.R` script against `jonathandroth/pretrends` commit `122731d082` (package version 0.1.0, R 4.5.2). 4 fixtures cover regular K=3 grid (`uniform_3_pre_periods_no_anticipation`), irregular K=3 grid `[-5,-3,-1]` (`irregular_pre_periods` — locks the PR-B Step 4 γ-unit linear-weight fix), anticipation-shifted K=4 grid (`anticipation_shifted`), and K=1 closed form (`single_pre_period_closed_form` — Roth Proposition 2 univariate truncated-normal). `TestPretrendsParityR` in `tests/test_methodology_pretrends.py` now active (4 tests): NIS power vs R `pretrends::pretrends()` at `atol=1e-4` across all 4 fixtures × 4 γ values; γ_p MDV vs R `slope_for_power()` at `atol=1e-4` across all 4 fixtures × 2 target_power values; end-to-end `fit()` on irregular grid vs R γ_p at `atol=1e-4` (locks the full `fit() → _extract_pre_period_params → _get_violation_weights → _compute_mdv_nis` chain through the public API); K=1 three-way cross-check (Python ≡ analytical truncated-normal closed form `1 - Φ(z - γ/σ) + Φ(-z - γ/σ)` at `atol=1e-7`; both within `atol=1e-4` of R). Tolerance rationale: R hardcodes `thresholdTstat.Pretest=1.96` while Python uses `scipy.stats.norm.ppf(0.975) = 1.959963984540054` (`dz ≈ 3.6e-5`); R `slope_for_power` uses `uniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)` versus Python `brentq(xtol=2e-12)`; the inverse-solver tolerance gap dominates γ_p, and `mvtnorm::pmvnorm` (R) vs `scipy.stats.multivariate_normal.cdf` (Python) Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5. `METHODOLOGY_REVIEW.md` PreTrendsPower row promoted `**Complete** (R parity pending)` → `**Complete**`. Roth (2022) paper review's `R \`pretrends\` package version pin (provisional)` Gaps bullet struck. Closes the PR-C TODO row. -- **`SpilloverDiD(survey_design=...)` integration on HC1 / CR1 paths via Binder TSL (Wave E.1).** Lifts the Wave B/C/D upfront `NotImplementedError` and adds design-based variance for `vcov_type ∈ {"hc1"}` plus `cluster=` (CR1). **Documented synthesis** of Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper's Appendix — composed with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. No reference software combines all ingredients. **Mechanical composition:** SpilloverDiD's per-obs Wave D IF `psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}` (with survey weights threaded through `gamma_hat` solve, eps construction, and bread inversion via Hájek normalization) is aggregated to PSU totals and passed to the audited `_compute_stratified_meat_from_psu_scores` Binder TSL meat helper. Stage-1 FE estimation extends `_iterative_fe_subset` with a `weights=` kwarg implementing WLS-FE via weighted bincount (numerator `bincount(w*resid)` / denominator `bincount(w)`); the `weights is None` path is bit-identical to the Wave B / C / D unweighted bincount. **Degrees of freedom:** t-distribution lookup uses `ResolvedSurveyDesign.df_survey` (4-way branch: PSU+strata → `n_PSU - n_strata`; PSU only → `n_PSU - 1`; strata only → `n_obs - n_strata`; neither → `n_obs - 1`), threaded through all four `safe_inference` call sites (aggregate `tau_total`, per-ring `delta_j`, event-study per-event-time `tau_k` / `delta_jk`, scalar `att` lincom). **Survey-array subsetting:** when `finite_mask` drops baseline-treated rows, `survey_weights` and `ResolvedSurveyDesign.{weights, strata, psu, fpc, replicate_weights}` are subsetted in parallel; `n_psu`, `n_strata`, and `survey_metadata` are recomputed (mirrors `TwoStageDiD.fit:567-601`). **Cluster + survey resolution:** when `cluster=` and `survey_design.psu` are both supplied with different groupings, a `UserWarning` fires and PSU wins (mirrors `_resolve_effective_cluster` at `survey.py:1253-1275`; TwoStageDiD parity). When `cluster=` is supplied without `survey_design.psu`, the cluster column is injected as the effective PSU via `_inject_cluster_as_psu`, which now honors `SurveyDesign.nest`: under `nest=False`, cluster labels must be globally unique across strata (raises if they repeat, matching the explicit-PSU resolver's contract). **Saturated `df_survey = 0` NaN-fail:** when `lonely_psu="remove"` removes all strata (singleton PSUs), the meat helper returns `(_, var_computed=False, legit_zero=0)` and SpilloverDiD's Wave E.1 path returns NaN meat with a `UserWarning` matching `"df_survey"` so callers can `pytest.warns(UserWarning, match="df_survey")`. This is a **departure from TwoStageDiD** (`two_stage.py:2003-2005`) which currently NaN-fails SILENTLY; Wave E.1 surfaces the diagnostic per `feedback_no_silent_failures`. **Subpopulation limitation (Wave E.3 follow-up):** `SurveyDesign.subpopulation()`-derived designs with zero-weight padding rows that lose stage-1 FE support have those rows physically removed by `finite_mask`, so `n_psu` / `df_survey` / Binder centering reflect the reduced fit sample rather than the full domain design (documented in REGISTRY; Wave E.3 will preserve full-design bookkeeping). **Public surface restrictions:** `vcov_type="conley" + survey_design=` raises `NotImplementedError` pointing at planned Wave E.2 (Conley × survey product-kernel synthesis with within-stratum Conley sandwich on PSU totals); replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` — per Gerber (2026) Appendix A, the IF-reweighting shortcut does not apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit and is queued as a follow-up; non-pweight (`weight_type ∈ {"fweight", "aweight"}`) raises `ValueError` (the Binder TSL assumes probability weights). **Implementation:** `_compute_gmm_corrected_meat` extended with `survey_weights` + `resolved_survey` kwargs at `diff_diff/two_stage.py:56` (TYPE_CHECKING forward reference for `ResolvedSurveyDesign` to avoid circular import); new module-level helper `_compute_binder_tsl_meat` at `diff_diff/two_stage.py` wraps `_compute_stratified_meat_from_psu_scores` with implicit per-obs PSU synthesis for no-PSU survey designs + the Wave E.1 NaN-fail + warning; `_iterative_fe_subset` weighted path at `diff_diff/spillover.py:1382` (in-place extension, bit-identical fallback, positive-weight identification gate); `_inject_cluster_as_psu` honors `nest` (shared survey-helper fix that also benefits TwoStageDiD); `ResolvedSurveyDesign` gains a `nest` field propagated through all 5 construction sites. `SpilloverDiDResults` extended with `survey_metadata`, `n_psu`, `n_strata` fields at `diff_diff/results.py`. **Tests:** new `TestSpilloverDiDWaveE1SurveyDesignHc1` (17 tests: bit-identity fallback, Binder TSL hand-check uniform + non-uniform weights, lonely_psu modes, FPC degenerate limits ×3, saturated NaN-fail with `pytest.warns(match="df_survey")`, cluster+survey warn-and-use-PSU, no-PSU regressions (weights-only, weights+strata, cluster-without-PSU, cluster overlap with nest=False/True), zero-weight Omega_0 exclusion + all-zero raises, replicate-weight + non-pweight + Conley+survey rejections, fit idempotency, finite_mask subsetting) and `TestSpilloverDiDWaveE1SurveyDesignEventStudy` (7 tests: event-study + survey on both `is_staggered` branches with `df_survey` lincom verification, distinguishability between survey-share and sample-share lincom rules via manual reconstruction with cohort-correlated weights + non-constant tau_k, aggregate-vs-event-study parity, drift goldens, subset-path invariant). Wave B/C/D bullets below are unchanged; this entry replaces the pre-Wave-E.1 `survey_design=` rejection. +- **`SpilloverDiD(survey_design=...)` integration on HC1 / CR1 paths via Binder TSL (Wave E.1).** Lifts the Wave B/C/D upfront `NotImplementedError` and adds design-based variance for `vcov_type ∈ {"hc1"}` plus `cluster=` (CR1). **Documented synthesis** of Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper's Appendix — composed with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. No reference software combines all ingredients. **Mechanical composition:** SpilloverDiD's per-obs Wave D IF `psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}` (with survey weights threaded through `gamma_hat` solve, eps construction, and bread inversion via Hájek normalization) is aggregated to PSU totals and passed to the audited `_compute_stratified_meat_from_psu_scores` Binder TSL meat helper. Stage-1 FE estimation extends `_iterative_fe_subset` with a `weights=` kwarg implementing WLS-FE via weighted bincount (numerator `bincount(w*resid)` / denominator `bincount(w)`); the `weights is None` path is bit-identical to the Wave B / C / D unweighted bincount. **Degrees of freedom:** t-distribution lookup uses `ResolvedSurveyDesign.df_survey` (4-way branch: PSU+strata → `n_PSU - n_strata`; PSU only → `n_PSU - 1`; strata only → `n_obs - n_strata`; neither → `n_obs - 1`), threaded through all four `safe_inference` call sites (aggregate `tau_total`, per-ring `delta_j`, event-study per-event-time `tau_k` / `delta_jk`, scalar `att` lincom). **Survey-array subsetting:** when `finite_mask` drops baseline-treated rows, `survey_weights` and `ResolvedSurveyDesign.{weights, strata, psu, fpc, replicate_weights}` are subsetted in parallel; `n_psu`, `n_strata`, and `survey_metadata` are recomputed (mirrors `TwoStageDiD.fit:567-601`). **Cluster + survey resolution:** when `cluster=` and `survey_design.psu` are both supplied with different groupings, a `UserWarning` fires and PSU wins (mirrors `_resolve_effective_cluster` at `survey.py:1253-1275`; TwoStageDiD parity). When `cluster=` is supplied without `survey_design.psu`, the cluster column is injected as the effective PSU via `_inject_cluster_as_psu`, which now honors `SurveyDesign.nest`: under `nest=False`, cluster labels must be globally unique across strata (raises if they repeat, matching the explicit-PSU resolver's contract). **Saturated `df_survey = 0` NaN-fail:** when `lonely_psu="remove"` removes all strata (singleton PSUs), the meat helper returns `(_, var_computed=False, legit_zero=0)` and SpilloverDiD's Wave E.1 path returns NaN meat with a `UserWarning` matching `"df_survey"` so callers can `pytest.warns(UserWarning, match="df_survey")`. This is a **departure from TwoStageDiD** (`two_stage.py:2003-2005`) which currently NaN-fails SILENTLY; Wave E.1 surfaces the diagnostic per `feedback_no_silent_failures`. **Subpopulation limitation (Wave E.3 follow-up):** `SurveyDesign.subpopulation()`-derived designs with zero-weight padding rows that lose stage-1 FE support have those rows physically removed by `finite_mask`, so `n_psu` / `df_survey` / Binder centering reflect the reduced fit sample rather than the full domain design (documented in REGISTRY; Wave E.3 will preserve full-design bookkeeping). **Public surface restrictions:** `vcov_type="conley" + survey_design=` originally raised `NotImplementedError` pointing at planned Wave E.2; lifted in the Wave E.2 entry above (stratified-Conley sandwich on PSU totals). Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` — per Gerber (2026) Appendix A, the IF-reweighting shortcut does not apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit and is queued as a follow-up; non-pweight (`weight_type ∈ {"fweight", "aweight"}`) raises `ValueError` (the Binder TSL assumes probability weights). **Implementation:** `_compute_gmm_corrected_meat` extended with `survey_weights` + `resolved_survey` kwargs at `diff_diff/two_stage.py:56` (TYPE_CHECKING forward reference for `ResolvedSurveyDesign` to avoid circular import); new module-level helper `_compute_binder_tsl_meat` at `diff_diff/two_stage.py` wraps `_compute_stratified_meat_from_psu_scores` with implicit per-obs PSU synthesis for no-PSU survey designs + the Wave E.1 NaN-fail + warning; `_iterative_fe_subset` weighted path at `diff_diff/spillover.py:1382` (in-place extension, bit-identical fallback, positive-weight identification gate); `_inject_cluster_as_psu` honors `nest` (shared survey-helper fix that also benefits TwoStageDiD); `ResolvedSurveyDesign` gains a `nest` field propagated through all 5 construction sites. `SpilloverDiDResults` extended with `survey_metadata`, `n_psu`, `n_strata` fields at `diff_diff/results.py`. **Tests:** new `TestSpilloverDiDWaveE1SurveyDesignHc1` (17 tests: bit-identity fallback, Binder TSL hand-check uniform + non-uniform weights, lonely_psu modes, FPC degenerate limits ×3, saturated NaN-fail with `pytest.warns(match="df_survey")`, cluster+survey warn-and-use-PSU, no-PSU regressions (weights-only, weights+strata, cluster-without-PSU, cluster overlap with nest=False/True), zero-weight Omega_0 exclusion + all-zero raises, replicate-weight + non-pweight + Conley+survey rejections, fit idempotency, finite_mask subsetting) and `TestSpilloverDiDWaveE1SurveyDesignEventStudy` (7 tests: event-study + survey on both `is_staggered` branches with `df_survey` lincom verification, distinguishability between survey-share and sample-share lincom rules via manual reconstruction with cohort-correlated weights + non-constant tau_k, aggregate-vs-event-study parity, drift goldens, subset-path invariant). Wave B/C/D bullets below are unchanged; this entry replaces the pre-Wave-E.1 `survey_design=` rejection. ## [3.4.0] - 2026-05-19 diff --git a/README.md b/README.md index 761d7046..a9b7e238 100644 --- a/README.md +++ b/README.md @@ -106,7 +106,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`. - [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html) - Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies - [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html) - Borusyak, Jaravel & Spiess (2024) imputation estimator, most efficient under homogeneous effects - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html) - Gardner (2022) two-stage estimator with GMM sandwich variance -- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing; supports survey-design variance under `survey_design=` for HC1/CR1 +- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing; supports survey-design variance under `survey_design=` for HC1 / CR1 (Wave E.1 Binder TSL) and Conley (Wave E.2 panel-aware stratified-Conley sandwich on per-period PSU totals; `conley_lag_cutoff=0` only — serial Bartlett HAC composition queued as follow-up) - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - Synthetic DiD combining standard DiD and synthetic control for few treated units - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html) - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html) - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves diff --git a/TODO.md b/TODO.md index 9098ef1e..361ca9a2 100644 --- a/TODO.md +++ b/TODO.md @@ -134,9 +134,9 @@ Deferred items from PR reviews that were not addressed before merge. | SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low | | Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError` at the linalg validator. | `linalg.py::_validate_vcov_args` | Phase 5 (spillover-conley) | Medium | | `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low | -| `SpilloverDiD(vcov_type="conley", survey_design=...)` composition (Wave E.2). Wave E.1 ships HC1 / CR1 + survey via Binder TSL (Gerber 2026 Prop 1 + Wave D GMM). Wave E.2 needs the novel within-stratum Conley sandwich on PSU totals — no reference software combines Conley spatial-HAC with Binder TSL on a two-stage IF. Methodologically: aggregate Psi to PSU totals first, demean within stratum, then apply within-stratum Conley sandwich. Strata become an exact independence partition (no kernel weight crosses stratum boundaries by sampling design). | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_gmm_corrected_meat` | follow-up (Wave E.2) | Medium | | `SpilloverDiD(survey_design=...)` replicate-weight variance (BRR / Fay / JK1 / JKn / SDR). Wave E.1 ships Taylor-linearization only. Per Gerber (2026) Appendix A, the IF-reweighting shortcut does NOT apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit of stage 1 and stage 2 (200+ LoC of test surface beyond E.1). | `spillover.py::SpilloverDiD.fit`, `survey.py::compute_replicate_refit_variance` | follow-up | Low | | `SpilloverDiD(survey_design=...)` subpopulation preservation (Wave E.3). Wave E.1's `finite_mask` block physically removes zero-weight rows that lose stage-1 FE support, so `SurveyDesign.subpopulation()`-derived designs see `n_psu` / `df_survey` / Binder centering recomputed on the reduced fit sample rather than the full domain design. Standard domain-estimation practice (R `survey::svyrecvar` on a `subset()` design) preserves the original PSU/strata counts and treats out-of-domain rows as zero-score padding. Fix requires separating fit-sample alignment (Psi array) from design-level bookkeeping: preserve a full-design `resolved_survey` for inference metadata + zero-pad dropped zero-weight rows' IF contribution. Add `SurveyDesign.subpopulation()` regression test to lock the contract. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_binder_tsl_meat` | follow-up (Wave E.3) | Medium | +| `SpilloverDiD(vcov_type="conley", conley_lag_cutoff > 0, survey_design=...)` serial Bartlett HAC composition. Wave E.2 ships the panel-aware `conley_lag_cutoff = 0` case ("within-period spatial only" — `sum_t sum_h M_h_t` per `tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant`) and raises `NotImplementedError` upfront at `spillover.py:fit` on `conley_lag_cutoff > 0`. The serial Bartlett component (within-unit / within-PSU temporal HAC at lag ≤ L) needs to compose with the panel-aware stratified-Conley spatial sandwich — the natural addition is `meat_serial = sum_g sum_{|t-s|<=L, t!=s} (1 - |t-s|/(L+1)) * (S_psu_t[g] - S_bar_h_t)(S_psu_s[g] - S_bar_h_s)'` per PSU, summed across all PSUs in each stratum, with appropriate Binder FPC scaling — plus a methodology call on whether to include cross-period spatial pairs in the serial term. Regression goldens vs the cross-sectional limit (lag=0, which is now the shipped path). | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_stratified_conley_meat` | follow-up (Wave E.2 follow-up) | Medium | | `SpilloverDiD(ring_method="count")` extension. Currently only the nearest-treated-ring specification is exposed. Count-of-treated-in-ring (paper Section 3.2 end) is methodologically supported by Butts but re-introduces functional-form dependence; expose with an explicit kwarg gate and documentation warning. | `spillover.py::SpilloverDiD.fit` | follow-up | Low | | `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low | | `SpilloverDiD` T22 TVA tutorial (`docs/tutorials/22_spillover_did.ipynb`): synthetic TVA-style DGP reproducing Butts (2021) Section 4 Table 1 Panel A bias-correction direction (~40% understatement). Split from the methodology PR per user-confirmed scope split (2026-05-15). | `docs/tutorials/`, `tests/test_t22_*_drift.py` | follow-up (Wave B) | Medium | diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt index c0dc4936..6ae688fe 100644 --- a/diff_diff/guides/llms.txt +++ b/diff_diff/guides/llms.txt @@ -58,7 +58,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")` - [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html): Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies - [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html): Borusyak, Jaravel & Spiess (2024) imputation estimator — most efficient under homogeneous effects - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html): Gardner (2022) two-stage estimator with GMM sandwich variance -- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=` for CR1 via Gerber (2026) Binder TSL composed with Wave D Gardner GMM correction (Conley × survey + replicate weights queued as follow-up) +- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=` for CR1 via Gerber (2026) Binder TSL (Wave E.1) and under `vcov_type="conley"` via a panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2; `conley_lag_cutoff=0` only — serial Bartlett HAC composition queued as follow-up), both composed with the Wave D Gardner GMM correction (replicate weights queued as follow-up) - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html): Synthetic DiD combining standard DiD and synthetic control methods for few treated units - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html): Triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html): Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves diff --git a/diff_diff/spillover.py b/diff_diff/spillover.py index fe0466a3..a1f0db11 100644 --- a/diff_diff/spillover.py +++ b/diff_diff/spillover.py @@ -2190,22 +2190,43 @@ def fit( heteroskedasticity-robust SE with the GMM correction. """ # Wave E.1: lift the Wave B/C/D upfront survey_design rejection. - # The full resolution block (pweight gate, replicate gate, unit-constant + # Wave E.2 (this PR): conley × survey is now supported via a + # stratified-Conley sandwich on PSU totals (composition of Conley + # 1999 + Gerber 2026 Prop 1 Binder TSL + Wave D Gardner GMM). The + # full resolution block (pweight gate, replicate gate, unit-constant # check, cluster-vs-PSU warn) runs AFTER `_validate_spillover_inputs` # below so it sees the panel columns the validator guarantees. # - # The conley × survey composition is genuinely novel methodology - # (no reference software combines spatial-HAC + Binder TSL on a - # two-stage IF) and ships separately as Wave E.2. Reject upfront so - # users get the pointer without waiting through stage 1 / 2 work. - if survey_design is not None and self.vcov_type == "conley": + # Wave E.2 scope-limit (upfront, before resolution / panel work): + # the panel-block Conley HAC (`conley_lag_cutoff > 0`) is NOT + # composed with the survey path in this PR. The stratified-Conley + # helper applies a cross-sectional kernel on PSU-aggregated totals; + # composing the within-unit serial Bartlett HAC with the within- + # stratum cross-PSU spatial kernel requires carrying PSU-by-time + # scores into the meat construction, which is a separate Wave E.x + # follow-up tracked in TODO.md. Reject upfront with a clear pointer + # so users running `survey_design=` + `conley_lag_cutoff > 0` get + # the error before stage-1 / 2 work (per `feedback_no_silent_failures`). + if ( + survey_design is not None + and self.vcov_type == "conley" + and self.conley_lag_cutoff is not None + and self.conley_lag_cutoff > 0 + ): raise NotImplementedError( - "SpilloverDiD does not yet support vcov_type='conley' " - "combined with survey_design=. Wave E.2 (planned) will " - "compose Conley spatial-HAC with within-stratum Conley " - "sandwich on PSU totals; see TODO.md for the planned PR. " - "For Wave E.1, use vcov_type='hc1' (with optional " - "cluster= for CR1) plus survey_design=." + "SpilloverDiD(vcov_type='conley', conley_lag_cutoff > 0) " + "combined with survey_design= is not supported in Wave E.2. " + "The Wave E.2 stratified-Conley sandwich aggregates Psi to " + "PSU totals before applying the cross-sectional Conley " + "kernel; the panel-block decomposition (within-unit serial " + "Bartlett HAC over time) would require carrying PSU-by-time " + "scores and composing the serial kernel with the within-" + "stratum cross-PSU spatial kernel. This composition is " + "queued as a follow-up (see TODO.md). For Wave E.2, use " + "conley_lag_cutoff=0 (cross-sectional Conley) with " + "survey_design=, or use survey_design= with " + "vcov_type='hc1' (+ cluster= for CR1) for the full " + "Wave E.1 path." ) # Validate `anticipation` up front: must be a non-negative integer. # Accepting fractional or negative values would silently shift diff --git a/diff_diff/survey.py b/diff_diff/survey.py index f2461501..89a75a60 100644 --- a/diff_diff/survey.py +++ b/diff_diff/survey.py @@ -1898,6 +1898,160 @@ def _compute_stratified_meat_from_psu_scores( return meat, _variance_computed, legitimate_zero_count +def _compute_stratified_conley_meat_from_psu_scores( + psu_scores: np.ndarray, + psu_strata: np.ndarray, + psu_coords: np.ndarray, + *, + cutoff: float, + metric, + kernel: str, + fpc_per_psu: "Optional[np.ndarray]" = None, + lonely_psu: str = "remove", +) -> Tuple[np.ndarray, bool, int]: + """Wave E.2 stratified-Conley meat on PSU-aggregated scores. + + Composes Conley (1999) spatial-HAC with Gerber (2026, arXiv:2605.04124) + Proposition 1 Binder TSL (the Wave E.1 foundation) and the Wave D + Gardner GMM first-stage uncertainty correction (Butts 2021 ss3.1 + + Gardner 2022 ss4). Used by SpilloverDiD's Wave E.2 GMM sandwich when + ``vcov_type="conley"`` is combined with ``survey_design=``. + + Per-stratum loop: demean PSU scores within the stratum, apply the + cross-sectional Conley kernel between PSU centroids in that stratum, + scale by the Binder finite-population correction + ``(1 - f_h) * n_h/(n_h-1)``. Cross-stratum kernel weights are zero by + sampling design (strata are exact independence partitions); total meat + is the sum across strata. + + Parameters + ---------- + psu_scores : np.ndarray + Score matrix of shape (G, k) — one row per PSU. + psu_strata : np.ndarray + Stratum assignment per PSU, shape (G,). + psu_coords : np.ndarray + Per-PSU spatial centroid coordinates, shape (G, 2). Typically the + mean of per-observation ``conley_coords`` within each PSU. + cutoff : float + Conley spatial-HAC bandwidth in the same units as ``psu_coords`` + (km when ``metric="haversine"``). + metric : str or callable + Distance metric; ``"haversine"`` / ``"euclidean"`` / callable per + :mod:`diff_diff.conley` (``ConleyMetric``). + kernel : str + Spatial kernel: ``"bartlett"`` or ``"uniform"``. + fpc_per_psu : np.ndarray, optional + FPC population size per PSU, shape (G,). All PSUs in the same + stratum should share the same FPC value (first occurrence used). + lonely_psu : str + How to handle singleton strata: ``"remove"``, ``"certainty"``, or + ``"adjust"``. Matches the existing + :func:`_compute_stratified_meat_from_psu_scores` behaviour exactly, + including the ``"adjust"`` branch's ``continue`` that skips FPC + scaling (with ``n_h=1`` the scale ``n_h/(n_h-1)`` would divide by + zero). + + Returns + ------- + meat : np.ndarray + Meat matrix of shape (k, k). + variance_computed : bool + Whether any actual variance computation happened. + legitimate_zero_count : int + Number of strata that legitimately contribute zero variance. + + Notes + ----- + Reduction semantics (load-bearing for tests): + + - bandwidth -> 0 (Bartlett: ``K(d/tiny) = 0`` for ``d > 0`` and + ``K(0) = 1`` on the diagonal so K is the identity matrix): the + within-stratum sandwich ``sum_{j,k} K_jk c_j c_k' = sum_j c_j c_j' + = centered.T @ centered``, which is precisely Binder's formula at + :func:`_compute_stratified_meat_from_psu_scores`. + - Single stratum (H = 1, FPC = inf): reduces to ordinary Conley + sandwich on PSU totals via :func:`diff_diff.conley._compute_conley_meat`. + + No reference software combines all three ingredients (Conley + spatial-HAC + Binder TSL + Gardner GMM correction) on a two-stage + influence function. + """ + from diff_diff.conley import _compute_conley_meat + + if psu_scores.ndim == 1: + psu_scores = psu_scores[:, np.newaxis] + k = psu_scores.shape[1] + meat = np.zeros((k, k)) + + unique_strata = np.unique(psu_strata) + _variance_computed = False + legitimate_zero_count = 0 + + _global_psu_mean = None + if lonely_psu == "adjust": + _global_psu_mean = psu_scores.mean(axis=0, keepdims=True) + + for h in unique_strata: + mask_h = psu_strata == h + scores_h = psu_scores[mask_h] + coords_h = psu_coords[mask_h] + n_psu_h = scores_h.shape[0] + + if n_psu_h < 2: + if lonely_psu == "remove": + continue + elif lonely_psu == "certainty": + legitimate_zero_count += 1 + continue + elif lonely_psu == "adjust": + # Degenerate one-PSU kernel K = [[K(0)]] = [[1.0]] for both + # Bartlett and uniform; equivalent to centered.T @ centered. + # MUST `continue` to skip the FPC block below — with n_h = 1 + # the scale n_h/(n_h-1) divides by zero. Mirrors the Binder + # helper's singleton-adjust branch exactly. + centered = scores_h - _global_psu_mean + with np.errstate(invalid="ignore", over="ignore"): + meat += centered.T @ centered + _variance_computed = True + continue + + f_h = 0.0 + if fpc_per_psu is not None: + N_h = fpc_per_psu[mask_h][0] + if N_h < n_psu_h: + raise ValueError( + f"FPC ({N_h}) is less than the number of PSUs " + f"({n_psu_h}) in stratum. FPC must be >= n_PSU." + ) + f_h = n_psu_h / N_h + if f_h >= 1.0: + legitimate_zero_count += 1 + + psu_mean_h = scores_h.mean(axis=0, keepdims=True) + centered = scores_h - psu_mean_h + + # Within-stratum Conley sandwich on PSU-centered scores. Pass + # ``cluster_ids=None`` explicitly: after PSU aggregation every PSU + # is its own cluster, so a cluster product kernel would zero all + # cross-PSU pairs. See Wave E.2 plan Chunk 3 step 4. + conley_meat_h = _compute_conley_meat( + centered, + coords_h, + cutoff, + metric, + kernel, + cluster_ids=None, + ) + + adjustment = (1.0 - f_h) * (n_psu_h / (n_psu_h - 1)) + with np.errstate(invalid="ignore", over="ignore"): + meat += adjustment * conley_meat_h + _variance_computed = True + + return meat, _variance_computed, legitimate_zero_count + + def compute_survey_vcov( X: np.ndarray, residuals: np.ndarray, diff --git a/diff_diff/two_stage.py b/diff_diff/two_stage.py index 370560f4..51c15bed 100644 --- a/diff_diff/two_stage.py +++ b/diff_diff/two_stage.py @@ -125,9 +125,20 @@ def _compute_gmm_corrected_meat( TSL has its own ``(1-f_h) * n_h/(n_h-1)`` correction). - ``vcov_type="cluster"``: ``cluster_ids`` IS the PSU (via upstream ``_inject_cluster_as_psu``); identical to the HC1+survey branch. - - ``vcov_type="conley"``: raises ``NotImplementedError``. Wave E.2 - (planned) will compose Conley spatial-HAC with within-stratum - Conley sandwich on PSU totals. + - ``vcov_type="conley"`` (cross-sectional only — ``conley_lag_cutoff = 0``): + Wave E.2 stratified-Conley sandwich on PSU totals via + :func:`_compute_stratified_conley_meat`. Aggregates Psi to PSU + totals + derives per-PSU centroids as the mean of per-obs + ``conley_coords``; for each stratum applies the Conley kernel + between PSU centroids scaled by ``(1 - f_h) * n_h/(n_h-1)``. + Cross-stratum kernel weights are zero by sampling design. + - ``vcov_type="conley"`` with ``conley_lag_cutoff > 0`` (panel-block + Conley): raises ``NotImplementedError`` upstream at + ``SpilloverDiD.fit``. The panel-block decomposition would need to + compose the within-unit serial Bartlett HAC with the within-stratum + cross-PSU spatial kernel on PSU-by-time scores rather than the + collapsed PSU totals; out of Wave E.2 scope and tracked as a + follow-up in ``TODO.md``. **`gamma_hat` solve** (mirror of `TwoStageDiD._compute_gmm_variance` pattern at `two_stage.py:1886-1917`): factorize ``X_10' W X_10`` via @@ -212,16 +223,10 @@ def _compute_gmm_corrected_meat( cluster_ids=cluster_ids, ) - # Wave E.1: reject the conley × survey composition. Wave E.2 (planned) - # will add the within-stratum Conley sandwich on PSU totals. - if vcov_type == "conley" and resolved_survey is not None: - raise NotImplementedError( - "SpilloverDiD does not yet support vcov_type='conley' combined " - "with survey_design=. Wave E.2 (planned) will compose Conley " - "spatial-HAC with within-stratum Conley sandwich on PSU totals; " - "see TODO.md for the planned PR. For now, use vcov_type='hc1' " - "(+ cluster= for CR1) with survey_design=." - ) + # Wave E.2 (this PR): conley × survey is now supported via the + # stratified-Conley sandwich on PSU totals. Dispatch happens inside + # the vcov_type == "conley" branch below (Wave E.1 already routed + # hc1 / cluster + survey to the Binder TSL helper). # 1. gamma_hat = (X_10' W X_10)^{-1} (X_1' W X_2). Mirror the existing # TwoStageDiD method at two_stage.py:1886-1917 — sparse_factorized @@ -358,19 +363,37 @@ def _compute_gmm_corrected_meat( "_compute_gmm_corrected_meat: vcov_type='conley' requires " "conley_coords, conley_cutoff_km, and conley_metric." ) - # Delegate to the shared kernel-application helper. No finite-sample - # multiplier on the Conley path (matches conleyreg / Wave B convention). - meat = _compute_conley_meat( - Psi, - conley_coords, - conley_cutoff_km, - conley_metric, - conley_kernel, - time=conley_time, - unit=conley_unit, - lag_cutoff=conley_lag_cutoff, - cluster_ids=cluster_ids, - ) + if resolved_survey is not None: + # Wave E.2: stratified-Conley sandwich on PSU totals. cluster_ids + # is intentionally NOT threaded through — after PSU aggregation + # every PSU is its own cluster, so a cluster product kernel + # would zero all cross-PSU pairs. Wave E.1's + # _resolve_effective_cluster path already coerced any + # user-supplied cluster= into PSU upstream. + meat = _compute_stratified_conley_meat( + Psi, + conley_coords=np.asarray(conley_coords, dtype=np.float64), + conley_cutoff_km=conley_cutoff_km, + conley_metric=conley_metric, + conley_kernel=conley_kernel, + resolved_survey=resolved_survey, + conley_time=conley_time, # panel-aware per-period sandwich + ) + else: + # Wave D no-survey Conley path UNCHANGED — bit-identical fallback. + # No finite-sample multiplier on the Conley path (matches conleyreg + # / Wave B convention). + meat = _compute_conley_meat( + Psi, + conley_coords, + conley_cutoff_km, + conley_metric, + conley_kernel, + time=conley_time, + unit=conley_unit, + lag_cutoff=conley_lag_cutoff, + cluster_ids=cluster_ids, + ) else: raise ValueError( f"_compute_gmm_corrected_meat: vcov_type must be one of " @@ -514,6 +537,263 @@ def _compute_binder_tsl_meat( return meat +def _compute_stratified_conley_meat( + Psi: np.ndarray, + *, + conley_coords: np.ndarray, + conley_cutoff_km: float, + conley_metric, + conley_kernel: str, + resolved_survey: "ResolvedSurveyDesign", + conley_time: Optional[np.ndarray] = None, +) -> np.ndarray: + """Wave E.2 panel-aware stratified-Conley meat on PSU-by-time scores. + + Composes Conley (1999) spatial-HAC with Gerber (2026, arXiv:2605.04124) + Proposition 1 Binder TSL (the Wave E.1 foundation) and the Wave D + Gardner GMM first-stage uncertainty correction (Butts 2021 ss3.1 + + Gardner 2022 ss4) applied to SpilloverDiD's ring-indicator stage-2 + design. No reference software combines all three ingredients on a + two-stage influence function. + + **Panel-aware composition (preserves the library's panel Conley + contract):** for each period ``t``, aggregate per-obs Psi to PSU + totals WITHIN that period (``S_psu_t[g] = sum_{i in PSU g, time t} + Psi[i]``); derive each PSU's spatial centroid as the mean of + per-observation ``conley_coords`` (panel-constant — PSU is a sampling + unit with fixed location); apply the per-stratum Conley sandwich on + ``S_psu_t`` via + :func:`diff_diff.survey._compute_stratified_conley_meat_from_psu_scores` + (Binder FPC factor ``(1 - f_h) * n_h/(n_h-1)``); sum across periods. + Cross-period spatial pairs are excluded by construction, matching the + library's existing ``conley_lag_cutoff = 0`` semantic ("within-period + spatial only") at :func:`diff_diff.conley._compute_conley_meat`. + Cross-stratum kernel weights are zero by sampling design (strata are + exact independence partitions). + + Parameters + ---------- + Psi : np.ndarray of shape (n, p_2) + Per-obs Wave D Gardner GMM influence-function scores (already + Hajek-weighted via the Wave E.1 upstream eps multiplication). + conley_coords : np.ndarray of shape (n, 2) + Per-observation lat/lon (or generic 2D coordinates). Already + validated finite upstream at ``spillover.py:_validate_spillover_inputs``; + no defensive finiteness check on derived PSU centroids. + conley_cutoff_km : float + Conley spatial-HAC bandwidth in km (haversine) or the + coord units (euclidean / callable). + conley_metric : ConleyMetric + ``"haversine"`` / ``"euclidean"`` / callable, per + :mod:`diff_diff.conley`. + conley_kernel : str + ``"bartlett"`` or ``"uniform"``. + resolved_survey : ResolvedSurveyDesign + ``.psu`` may be None; when absent, each observation is treated as + its own singleton PSU (matches the implicit-PSU convention of + :class:`ResolvedSurveyDesign` no-PSU branches). ``.strata`` and + ``.fpc`` are optional; absent strata synthesize a single stratum. + conley_time : np.ndarray of shape (n,), optional + Per-observation period label. When None, all observations are + treated as a single period (T = 1; the per-period loop reduces to + one iteration on the full Psi, which is the cross-sectional + Wave E.2 design). When provided (the standard SpilloverDiD case), + the per-period loop preserves the within-period spatial semantic. + + Returns + ------- + meat : np.ndarray of shape (p_2, p_2) + Wave E.2 panel-aware stratified-Conley meat + (``sum_t meat_t`` where ``meat_t`` is the within-stratum Conley + sandwich on the period-``t`` PSU totals). + + Notes + ----- + ``cluster_ids`` is intentionally not accepted: after PSU aggregation + every PSU is its own cluster, so threading a cluster product kernel + into the inner :func:`_compute_stratified_conley_meat_from_psu_scores` + would zero all cross-PSU pairs (``1{cluster_j == cluster_k}`` = 0 for + j != k). The Wave E.1 ``_resolve_effective_cluster`` path already + collapsed any user-supplied ``cluster=`` into PSU upstream. + + NaN-fails (with ``UserWarning``) when the inner survey helper + returns ``(False, 0)`` for every period — i.e. no stratum contributed + variance and none was a legitimate zero across any period. Mirrors the + Wave E.1 Binder TSL saturation behavior; departs from TwoStageDiD's + silent NaN-VCV at ``two_stage.py:2003-2005`` per + ``feedback_no_silent_failures``. + + Reductions: + + - ``T = 1`` (single period or ``conley_time is None``): single-pass + stratified-Conley sandwich on the full PSU totals (the original + cross-sectional Wave E.2 design). + - ``H = 1`` stratum, ``FPC = inf``: reduces to ``sum_t`` plain + Conley sandwich on per-period PSU totals. + - Bandwidth -> 0 (``K = I``): reduces to ``sum_t`` per-period + within-stratum HC sandwich on PSU totals (NOT Wave E.1 Binder, + which is over time-collapsed PSU totals). + + Out of scope (deferred follow-up, tracked in TODO.md): + + - ``conley_lag_cutoff > 0`` panel-block: the within-PSU serial + Bartlett HAC over time would compose with the spatial sandwich + here. Rejected upfront at ``SpilloverDiD.fit``. + """ + from diff_diff.survey import _compute_stratified_conley_meat_from_psu_scores + + p_2 = Psi.shape[1] + n_obs = Psi.shape[0] + coords_arr = np.asarray(conley_coords, dtype=np.float64) + + # No-PSU fallback: each obs is its own singleton PSU. Matches Wave E.1 + # Binder TSL convention at _compute_binder_tsl_meat L450-451. + if resolved_survey.psu is None: + psu_arr: np.ndarray = np.arange(n_obs, dtype=np.int64) + else: + psu_arr = np.asarray(resolved_survey.psu) + strata_arr_full = ( + np.asarray(resolved_survey.strata) if resolved_survey.strata is not None else None + ) + fpc_arr_full = ( + np.asarray(resolved_survey.fpc, dtype=np.float64) + if resolved_survey.fpc is not None + else None + ) + + # Panel-constant PSU centroids for explicit-PSU layouts (R4 P1 fix). + # The Wave E.2 registry / api contract specifies + # ``centroid_g = mean over i in PSU g of conley_coords[i]`` (panel-wide, + # not per-period). For a PSU containing multiple units at different + # coordinates with finite_mask dropping different members across + # periods, per-period recomputation would silently shift the spatial + # kernel weights — that would be a documented-contract violation. + # Compute once on the full active sample so each period's helper call + # sees the SAME centroid for the same PSU. + # + # For implicit-PSU (pseudo-PSU = obs index), every pseudo-PSU appears + # in exactly one period, so the per-period slice naturally produces + # the obs's own coordinate as that pseudo-PSU's centroid — no precompute + # needed. The dictionary stays None on that branch. + coord_dim = coords_arr.shape[1] + psu_value_to_centroid: Optional[dict] = None + if resolved_survey.psu is not None: + unique_psus_full, _, psu_indices_full = np.unique( + psu_arr, return_index=True, return_inverse=True + ) + G_full = len(unique_psus_full) + psu_coord_sums_full = np.zeros((G_full, coord_dim)) + for d in range(coord_dim): + np.add.at(psu_coord_sums_full[:, d], psu_indices_full, coords_arr[:, d]) + psu_counts_full = np.bincount(psu_indices_full, minlength=G_full).astype(np.float64) + psu_centroids_full = psu_coord_sums_full / psu_counts_full[:, None] + psu_value_to_centroid = {unique_psus_full[g]: psu_centroids_full[g] for g in range(G_full)} + + # Per-period loop: preserves the library's "within-period spatial only" + # contract for conley_lag_cutoff = 0. PSU set, centroids, strata, and + # FPC are re-built from the ACTIVE rows in each period (not from the + # full panel) so implicit-PSU layouts (`resolved_survey.psu is None`, + # i.e. one pseudo-PSU per observation) don't drag off-period + # zero-padded entries into the kernel via centering. For explicit-PSU + # balanced-panel layouts the per-period centroids equal the + # panel-constant centroids (obs coords are time-invariant), so this + # re-indexing is bit-identical to the prior naive panel-wide PSU + # mapping on that branch. + if conley_time is None: + # Treat all obs as one period (cross-sectional fallback). + time_arr = np.zeros(n_obs, dtype=np.int64) + else: + time_arr = np.asarray(conley_time) + unique_times = np.unique(time_arr) + + # Saturation guard for unstratified single-PSU on the FULL panel. + # The per-period helper invocation will also NaN-fail when no period + # contributes variance, but this front-door check matches Wave E.1's + # ergonomic "df_survey is undefined" message for the panel-level + # degenerate case. + if strata_arr_full is None and len(np.unique(psu_arr)) < 2: + G_total = len(np.unique(psu_arr)) + warnings.warn( + "SpilloverDiD Wave E.2 stratified-Conley sandwich: df_survey is " + f"undefined (single PSU, no strata; G={G_total}). Returning NaN " + "meat so downstream inference NaN-propagates.", + UserWarning, + stacklevel=2, + ) + return np.full((p_2, p_2), np.nan) + + meat = np.zeros((p_2, p_2)) + _variance_computed = False + _legit_zero = 0 + for t in unique_times: + period_mask = time_arr == t + Psi_t = Psi[period_mask] + psu_arr_t = psu_arr[period_mask] + coords_arr_t = coords_arr[period_mask] + unique_psus_t, first_idx_t, psu_indices_t = np.unique( + psu_arr_t, return_index=True, return_inverse=True + ) + G_t = len(unique_psus_t) + + # Per-period PSU totals. + S_psu_t = np.zeros((G_t, p_2)) + for j in range(p_2): + np.add.at(S_psu_t[:, j], psu_indices_t, Psi_t[:, j]) + + # Per-period PSU centroids: panel-constant for explicit-PSU + # (look up from the precomputed dict to match the documented + # ``centroid_g = mean over i in PSU g of conley_coords[i]`` + # panel-wide contract); per-period mean for implicit-PSU + # (pseudo-PSU = obs, each appears in exactly one period, so the + # per-period mean IS the obs's own coord). + if psu_value_to_centroid is not None: + psu_centroids_t = np.array([psu_value_to_centroid[v] for v in unique_psus_t]) + else: + psu_coord_sums_t = np.zeros((G_t, coord_dim)) + for d in range(coord_dim): + np.add.at(psu_coord_sums_t[:, d], psu_indices_t, coords_arr_t[:, d]) + psu_counts_t = np.bincount(psu_indices_t, minlength=G_t).astype(np.float64) + psu_centroids_t = psu_coord_sums_t / psu_counts_t[:, None] + + # Per-period strata + fpc. + if strata_arr_full is not None: + psu_strata_t = strata_arr_full[period_mask][first_idx_t] + else: + psu_strata_t = np.zeros(G_t, dtype=int) + psu_fpc_t: Optional[np.ndarray] = None + if fpc_arr_full is not None: + psu_fpc_t = fpc_arr_full[period_mask][first_idx_t] + + # Stratified Conley sandwich for period t. + meat_t, var_t, legit_zero_t = _compute_stratified_conley_meat_from_psu_scores( + psu_scores=S_psu_t, + psu_strata=psu_strata_t, + psu_coords=psu_centroids_t, + cutoff=conley_cutoff_km, + metric=conley_metric, + kernel=conley_kernel, + fpc_per_psu=psu_fpc_t, + lonely_psu=resolved_survey.lonely_psu, + ) + meat += meat_t + _variance_computed = _variance_computed or var_t + _legit_zero += legit_zero_t + + # Wave E.2 survey-saturated NaN-fail per `feedback_no_silent_failures`. + if not _variance_computed and _legit_zero == 0: + warnings.warn( + "SpilloverDiD Wave E.2 stratified-Conley sandwich: df_survey = 0 " + "(all strata removed by lonely_psu='remove' on single-PSU " + "strata; no PSU contributed to the meat). Returning NaN meat " + "so downstream inference NaN-propagates.", + UserWarning, + stacklevel=2, + ) + return np.full((p_2, p_2), np.nan) + + return meat + + # ============================================================================= # Main Estimator # ============================================================================= diff --git a/docs/api/spillover.rst b/docs/api/spillover.rst index a78f4f25..5f883619 100644 --- a/docs/api/spillover.rst +++ b/docs/api/spillover.rst @@ -243,10 +243,6 @@ and planned follow-up enhancements: Restrictions: - - ``vcov_type="conley" + survey_design=`` raises - ``NotImplementedError``; Wave E.2 (planned) will add the Conley × survey - product-kernel synthesis with within-stratum Conley sandwich on PSU - totals. - Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises ``NotImplementedError``; per Gerber (2026) Appendix A, the IF-reweighting shortcut does not apply because ``gamma_hat`` is @@ -266,6 +262,70 @@ and planned follow-up enhancements: centering reflect the reduced fit sample rather than the full domain design. See the REGISTRY note for details and the Wave E.3 follow-up tracked in ``TODO.md``. +- **Survey-design integration (Wave E.2 — Conley × survey via + stratified-Conley sandwich on PSU totals).** SHIPPED in Wave E.2. + ``vcov_type="conley" + survey_design=`` is now supported via a + per-stratum Conley sandwich applied to PSU-aggregated Wave D Gardner + GMM influence functions. + + .. note:: + + Wave E.2 composes Conley (1999) spatial-HAC with Gerber (2026, + arXiv:2605.04124) Proposition 1 Binder TSL (the Wave E.1 foundation) + and the Wave D Gardner GMM first-stage uncertainty correction + (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's + ring-indicator stage-2 design. The composition is **panel-aware** — + it preserves the library's existing ``conley_lag_cutoff = 0`` + semantic ("within-period spatial only, exclude cross-period pairs") + by looping over periods and aggregating Psi to PSU totals WITHIN + each period (not over the whole panel). For each period ``t``, + ``S_psu_t[g] = sum_{i in PSU g, time t} psi_i``; per-PSU centroids + are panel-constant (mean of per-observation ``conley_coords``); + for each stratum the within-stratum sandwich is + ``M_h_t = (1 - f_h) * n_h/(n_h-1) * sum_{j,k in PSUs_h} + K(d(centroid_j, centroid_k) / cutoff) * + (S_psu_t[j] - S_bar_h_t)(S_psu_t[k] - S_bar_h_t)'``, where K is the + Bartlett kernel (SpilloverDiD currently exposes Bartlett only and + hardcodes it at the fit-call site; the survey helper's ``kernel`` + parameter can also take ``"uniform"``, but exposing that on the + SpilloverDiD constructor is a separate follow-up). Cross-stratum + kernel weights are exactly zero by sampling design (strata are + exact independence partitions). Total meat is ``sum_t sum_h M_h_t``. + Cross-period spatial pairs are excluded by construction. No + reference software combines all three ingredients on a two-stage + influence function. + + Reduction semantics: + + - Per-period sum invariant: ``sum_t`` of per-period within-stratum + stratified-Conley sandwiches on per-period PSU totals. Pinned at + ``tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant`` + (pure unit test on the orchestrator + helper composition). + - Single stratum (H = 1, FPC = inf): reduces to ``sum_t`` plain + Conley sandwich on per-period PSU totals (NOT on time-collapsed + PSU totals — the per-period loop preserves ``lag_cutoff = 0`` + semantics). + - All PSUs singleton + ``lonely_psu="remove"``: ``df_survey = 0`` and + the stratified-Conley meat NaN-fails (matches Wave E.1 saturation + behaviour, with ``UserWarning`` template "Wave E.2 stratified-Conley + sandwich: df_survey = 0..."). + + Restrictions: + + - Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises + ``NotImplementedError`` (inherits Wave E.1 gate; per-replicate refit + is separate follow-up scope). + - ``cluster= + survey_design.psu + vcov_type="conley"``: + ``cluster=`` is coerced to PSU per Wave E.1's warn-and-use-PSU + pattern; the Conley cluster product kernel becomes a no-op after + PSU aggregation. + - The LinearRegression-side ``vcov_type="conley" + survey_design=`` + gate at ``diff_diff/linalg.py`` is a separate Bertanha-Imbens 2014 + weighted-Conley roadmap (not Wave E). + - DiagnosticReport routing for ``SpilloverDiDResults(vcov_type="conley", + survey_design=)`` is queued for a follow-up (the + ``_APPLICABILITY`` / ``_PT_METHOD`` wiring must register the new + combination first). - **Count-of-treated-in-ring** — only the "nearest-treated ring" specification is implemented. The "count" form re-introduces functional-form dependence (paper Section 3.2 end) and is queued. diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index ce624616..62b2223a 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -3215,7 +3215,7 @@ The `_compute_stratified_meat_from_psu_scores` helper at `diff_diff/survey.py` i Degrees of freedom for the t-distribution lookup use `ResolvedSurveyDesign.df_survey` (the standard survey 4-way branch: PSU+strata → `n_PSU - n_strata`; PSU only → `n_PSU - 1`; strata only → `n_obs - n_strata`; neither → `n_obs - 1`). Threaded through all four `safe_inference` call sites: aggregate `tau_total`, per-ring `delta_j`, event-study per-event-time `tau_k` / `delta_jk`, and the scalar `att` lincom in event-study mode. -- **Note (documented synthesis):** Wave E.1 composes Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper's Appendix — with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. The composition is mechanical: SpilloverDiD's Wave D Psi is aggregated to PSU level and passed to the audited Binder TSL meat helper. Survey weights enter via Hájek normalization at the gamma_hat solve, eps construction, and bread inversion. No reference software combines all ingredients; Wave E.2 (planned) will extend with the Conley × survey product-kernel composition. +- **Note (documented synthesis):** Wave E.1 composes Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper's Appendix — with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. The composition is mechanical: SpilloverDiD's Wave D Psi is aggregated to PSU level and passed to the audited Binder TSL meat helper. Survey weights enter via Hájek normalization at the gamma_hat solve, eps construction, and bread inversion. No reference software combines all ingredients; Wave E.2 extends with the Conley × survey product-kernel composition — see "Variance (Wave E.2)" subsection below. - **Note (warn-and-use-PSU for cluster + survey):** when both `cluster=` and `survey_design.psu` are supplied with **different groupings**, the cluster argument emits a `UserWarning` and is overridden by PSU (mirrors `TwoStageDiD._resolve_effective_cluster`). PSU is the design-relevant cluster on survey panels; `cluster=` on SpilloverDiD is more often a spatial / unit-level label, so the design constraint wins. When both knobs are supplied with the **same** groupings, no warning fires and PSU still takes precedence (the inference is unchanged either way). - **Note (limitation — `SurveyDesign.subpopulation()` with FE-undefined zero-weight rows):** when `survey_design` is built via `SurveyDesign.subpopulation()` (or otherwise carries zero-weight padding rows) AND those zero-weight rows lose stage-1 FE support (warn-and-drop unit path), Wave E.1's `finite_mask` block physically removes them from the survey design rather than retaining them as zero-score padding. Consequently `n_psu`, `df_survey`, and the Binder TSL centering are recomputed on the reduced fit sample rather than the full domain design. Standard domain-estimation practice (e.g. R's `survey::svyrecvar` on a `subset()` design) preserves the original PSU/strata counts. Practitioners using subpopulation-derived designs should expect SEs that may differ slightly from textbook domain expectations on warn-and-drop fits. Tracked as Wave E.3 follow-up — see TODO.md. - **Note (saturated `df_survey = 0` NaN-fail):** when `lonely_psu="remove"` removes all strata (single PSU per stratum), `_compute_stratified_meat_from_psu_scores` returns `(_, var_computed=False, legit_zero=0)`. SpilloverDiD's Wave E.1 path returns NaN meat with a `UserWarning` matching `"df_survey"` so callers can pin via `pytest.warns(UserWarning, match="df_survey")`. This is a **departure from TwoStageDiD** (`two_stage.py:2003-2005`) which currently NaN-fails SILENTLY; Wave E.1 surfaces the diagnostic per `feedback_no_silent_failures`. @@ -3224,6 +3224,29 @@ Degrees of freedom for the t-distribution lookup use `ResolvedSurveyDesign.df_su **Implementation:** `_compute_gmm_corrected_meat` extended with `survey_weights` + `resolved_survey` kwargs at `diff_diff/two_stage.py:56`; new module-level helper `_compute_binder_tsl_meat` at `diff_diff/two_stage.py` wraps `_compute_stratified_meat_from_psu_scores` with the Wave E.1 NaN-fail + warning. `_iterative_fe_subset` weighted path at `diff_diff/spillover.py:1382` (in-place extension, bit-identical fallback). `SpilloverDiDResults` extended with `survey_metadata`, `n_psu`, `n_strata` fields at `diff_diff/results.py`. Tests: `TestSpilloverDiDWaveE1SurveyDesignHc1` + `TestSpilloverDiDWaveE1SurveyDesignEventStudy` at `tests/test_spillover.py`. +### Variance (Wave E.2 — Conley × survey via stratified-Conley sandwich on PSU totals) + +`vcov_type="conley" + survey_design=` is now supported via a per-stratum Conley sandwich applied to PSU-aggregated Wave D Gardner GMM influence functions. SHIPPED in Wave E.2. + +- **Note (documented synthesis):** Wave E.2 composes Conley (1999) spatial-HAC with Gerber (2026, arXiv:2605.04124) Proposition 1 Binder TSL (the Wave E.1 foundation) and the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. The composition is **panel-aware** — it preserves the library's existing `conley_lag_cutoff = 0` semantic ("within-period spatial only — exclude cross-period pairs") at `diff_diff.conley._compute_conley_meat`. Per-PSU centroids are computed as `centroid_g = mean over i in PSU g of conley_coords[i]` (panel-constant — PSU is a sampling unit with fixed location). For each period `t`, SpilloverDiD's per-obs Wave D IF `psi_i` is aggregated to per-period PSU totals `S_psu_t[g] = sum_{i in PSU g, time t} psi_i`; the within-stratum sandwich is `M_h_t = (1 - f_h) * n_h/(n_h-1) * sum_{j,k in PSUs_h} K(d(centroid_j, centroid_k) / cutoff) * (S_psu_t[j] - S_bar_h_t)(S_psu_t[k] - S_bar_h_t)'`, where K is the Bartlett kernel (SpilloverDiD currently exposes Bartlett only and hardcodes it at the fit-call site; the survey helper's `kernel` parameter can also take `"uniform"`, but exposing that on the SpilloverDiD constructor is a separate follow-up) and `d` is haversine / euclidean / callable per `ConleyMetric`. Cross-stratum kernel weights are exactly zero by sampling design (strata are independence partitions). Total meat is `sum_t sum_h M_h_t`. Cross-period spatial pairs are excluded by construction — the per-period loop aggregates only within-period observations into each `S_psu_t`, matching the Wave D `conley_lag_cutoff = 0` block decomposition. No reference software combines all three ingredients (Conley spatial-HAC + Binder TSL + Gardner GMM correction) on a two-stage influence function. + +- **Reduction semantics (load-bearing for tests):** + - Per-period sum invariant: the orchestrator's panel-aware meat equals `sum_t` of per-period within-stratum stratified-Conley sandwiches on per-period PSU totals. Pinned at `tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant` (pure unit test on the orchestrator + helper composition). + - Single stratum (H = 1, FPC = inf): reduces to `sum_t` plain Conley sandwich on per-period PSU totals via `_compute_conley_meat(S_psu_t_centered, centroids, ...)`. Note this is NOT plain Conley on time-collapsed PSU totals — the per-period loop preserves the library's `lag_cutoff = 0` semantic. + - All PSUs singleton in their stratum + `lonely_psu="remove"`: `df_survey = 0` and the stratified-Conley meat NaN-fails (matches Wave E.1 saturation behaviour, with `UserWarning` template "Wave E.2 stratified-Conley sandwich: df_survey = 0..."). + - Cross-stratum kernel weight is exactly zero (sampling-design assumption — no kernel pair crosses a stratum boundary). + +- **Note (singleton-stratum `lonely_psu="adjust"` FPC skip parity):** when a stratum has `n_h = 1` and `lonely_psu="adjust"`, the new `_compute_stratified_conley_meat_from_psu_scores` helper mirrors the Binder helper's `continue`-skip-FPC pattern exactly (the FPC scale `(1 - f_h) * n_h / (n_h - 1)` would divide by zero with `n_h = 1`). The degenerate one-PSU kernel `K = [[K(0)]] = [[1.0]]` reduces to `centered.T @ centered`, matching Binder's singleton-adjust contribution bit-identically. + +- **Cluster + Conley + survey routing:** `cluster= + survey_design.psu + vcov_type="conley"` coerces `cluster=` to PSU per Wave E.1's `_resolve_effective_cluster` warn-and-use-PSU pattern. The dispatch wrapper `_compute_stratified_conley_meat` intentionally does NOT thread `cluster_ids` into the inner Conley kernel call — after PSU aggregation every PSU is its own cluster, so a cluster product kernel `1{cluster_j == cluster_k}` would be zero for all `j != k` and the cross-PSU kernel weights would be silently dropped. The Wave E.2 architectural choice: PSU-aggregation handles within-PSU clustering exactly; cross-PSU spatial dependence enters via the kernel; cross-stratum independence is exact. + +- **Restrictions / out-of-scope (Wave E.2):** + - Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` (inherits Wave E.1 gate; per-replicate full refit is separate follow-up scope). + - LinearRegression-side `vcov_type="conley" + survey_design=` gate at `diff_diff/linalg.py:2853` remains (separate Bertanha-Imbens 2014 weighted-Conley "Phase 5" roadmap, not Wave E). + - DiagnosticReport routing for `SpilloverDiDResults(vcov_type="conley", survey_design=)` is queued for a follow-up Wave F PR — `_APPLICABILITY` / `_PT_METHOD` registration is required before the new combination can be claimed consumable downstream (per `feedback_audit_diagnostic_report_wiring_before_claim`). + +**Implementation:** new `_compute_stratified_conley_meat_from_psu_scores` helper in `diff_diff/survey.py` (parallel to existing Binder helper; 3-tuple `(meat, variance_computed, legitimate_zero_count)` return contract; per-stratum loop replaces the inner `centered.T @ centered` with `_compute_conley_meat(centered, coords_h, cutoff, metric, kernel)` cross-sectional mode). New dispatch wrapper `_compute_stratified_conley_meat` in `diff_diff/two_stage.py` (parallel to `_compute_binder_tsl_meat`; per-obs Psi → PSU aggregation via `np.add.at` + PSU centroid derivation via vectorized `np.add.at` sums / `np.bincount` counts + dispatch to survey helper; intentionally no `cluster_ids` parameter). `_compute_gmm_corrected_meat` conley branch extended at `diff_diff/two_stage.py` with `if resolved_survey is not None` routing to the new wrapper; the `resolved_survey is None` branch is bit-identical to Wave D no-survey Conley. Saturation NaN-fail mirrors Wave E.1 (`UserWarning` template "Wave E.2 stratified-Conley sandwich: df_survey = 0..."). Wave E.1 stage-1 weighted FE solver, `finite_mask` survey-array subsetting, `df_survey` threading to `safe_inference` call sites, bread weighting, and `SpilloverDiDResults` survey metadata are all inherited UNCHANGED — Psi construction is bit-identical regardless of `vcov_type`. Tests: `TestSpilloverDiDWaveE2ConleySurveyDesign` + `TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy` at `tests/test_spillover.py`. + **Edge cases (from paper Section 3.2 / Discussion):** | # | Edge case | Handling | @@ -3239,7 +3262,7 @@ Degrees of freedom for the t-distribution lookup use `ResolvedSurveyDesign.df_su **Restrictions / deferred features:** - `event_study=True` SHIPPED in Wave C — see Event-study mode subsection above. Emits `att_dynamic`, MultiIndex `spillover_effects`, and a TwoStageDiD-compatible `event_study_effects` dict alias. -- `survey_design=` for `vcov_type ∈ {"hc1"}` (plus `cluster=` for CR1) SHIPPED in Wave E.1 — see "Variance (Wave E.1)" subsection below. Threads Hájek-normalized survey weights through stage-1 FE estimation, gamma_hat solve, eps construction, and bread inversion; aggregates the Wave D Psi to PSU totals and routes through the audited `_compute_stratified_meat_from_psu_scores` Binder TSL meat helper. `vcov_type="conley"` combined with `survey_design=` still raises `NotImplementedError` and points at planned Wave E.2 (Conley × survey product-kernel composition with within-stratum Conley sandwich on PSU totals). Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` — Gerber (2026) Appendix A notes the IF-reweighting shortcut does NOT apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit and is queued as a follow-up. +- `survey_design=` for `vcov_type ∈ {"hc1"}` (plus `cluster=` for CR1) SHIPPED in Wave E.1 — see "Variance (Wave E.1)" subsection below. Threads Hájek-normalized survey weights through stage-1 FE estimation, gamma_hat solve, eps construction, and bread inversion; aggregates the Wave D Psi to PSU totals and routes through the audited `_compute_stratified_meat_from_psu_scores` Binder TSL meat helper. `vcov_type="conley"` combined with `survey_design=` SHIPPED in Wave E.2 — see "Variance (Wave E.2)" subsection below (stratified-Conley sandwich on PSU totals). Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` — Gerber (2026) Appendix A notes the IF-reweighting shortcut does NOT apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit and is queued as a follow-up. - `covariates=` raises `NotImplementedError` — Gardner-style stage-1 residualization not yet wired through; planned follow-up. - `ring_method="count"` not exposed — only the nearest-treated-ring specification. - `vcov_type` ∈ {`"hc2"`, `"hc2_bm"`, `"classical"`} raises `NotImplementedError` — `hc2`/`hc2_bm` because current stage-2 inference uses generic residual df rather than per-coefficient Bell-McCaffrey / CR2 DOF; `classical` because the Wave D Gardner GMM first-stage correction has not been derived for the classical homoskedastic variance (different meat structure `sigma_hat^2 * (X_10' X_10)` vs the Wave D IF outer product `Psi' Psi`). Use `"hc1"` or `"conley"`, or pair with `cluster=` for CR1 — all three apply the Wave D GMM correction. diff --git a/docs/references.rst b/docs/references.rst index 21410d98..d8acabdb 100644 --- a/docs/references.rst +++ b/docs/references.rst @@ -190,6 +190,10 @@ Multi-Period and Staggered Adoption Identifies the ring-indicator estimator implemented in our ``SpilloverDiD`` class. Section 2-3 covers non-staggered timing (Equations 5/6/8); Section 5 covers staggered timing via two-stage Gardner (Table 2). Section 3.1 (page 13) recommends Conley spatial-HAC for inference with cutoff = ``d_bar``. +- **Conley, T. G. (1999).** "GMM Estimation with Cross Sectional Dependence." *Journal of Econometrics*, 92(1), 1-45. https://doi.org/10.1016/S0304-4076(98)00084-0 + + Primary source for the Conley spatial-HAC variance estimator. Equations 5-9 derive the spatial-kernel cross-product meat. Our ``diff_diff/conley.py`` implements the practitioner specializations (Bartlett / uniform kernels with haversine / euclidean metrics) cited in our ``SpilloverDiD`` Wave A/D Conley path and composed with Binder TSL + Gardner GMM in Wave E.2 (``_compute_stratified_conley_meat`` at ``diff_diff/two_stage.py``). + - **de Chaisemartin, C., & D'Haultfœuille, X. (2020).** "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects." *American Economic Review*, 110(9), 2964-2996. https://doi.org/10.1257/aer.20181169 - **de Chaisemartin, C., & D'Haultfœuille, X. (2022, revised 2024).** "Difference-in-Differences Estimators of Intertemporal Treatment Effects." *NBER Working Paper* 29873. https://www.nber.org/papers/w29873 diff --git a/tests/test_spillover.py b/tests/test_spillover.py index dd02aaf5..81e4f844 100644 --- a/tests/test_spillover.py +++ b/tests/test_spillover.py @@ -577,43 +577,6 @@ def test_set_params_rejects_unknown_key(self): with pytest.raises(ValueError, match="Unknown parameter"): est.set_params(nonexistent_kwarg=42) - def test_fit_conley_plus_survey_design_not_implemented(self): - """Wave E.1 ships HC1 / CR1 + survey_design; Conley × survey is - deferred to Wave E.2 (the novel within-stratum Conley sandwich on - PSU totals). Confirm the upfront rejection points at the planned - follow-up PR. - """ - from diff_diff import SurveyDesign - - est = SpilloverDiD( - rings=[0.0, 50.0], - conley_coords=("lat", "lon"), - conley_metric="euclidean", - conley_cutoff_km=100.0, - vcov_type="conley", - ) - df = pd.DataFrame( - { - "unit": ["A", "A"], - "time": [0, 1], - "y": [1.0, 2.0], - "D": [0, 1], - "lat": [0.0, 0.0], - "lon": [0.0, 0.0], - "w": [1.0, 1.0], - "psu": [0, 0], - } - ) - with pytest.raises(NotImplementedError, match="Wave E.2"): - est.fit( - df, - outcome="y", - unit="unit", - time="time", - treatment="D", - survey_design=SurveyDesign(weights="w", psu="psu"), - ) - # ============================================================================= # Step 3: Two-stage Gardner fit() integration @@ -5684,3 +5647,935 @@ def test_o_drift_golden(self): assert res.n_psu == 8 assert res.n_strata == 2 assert res.survey_metadata.df_survey == 6 + + +class TestSpilloverDiDWaveE2ConleySurveyDesign: + """Wave E.2 conley + survey via stratified-Conley sandwich on PSU totals. + + Methodology anchor: Conley (1999) spatial-HAC composed with Gerber + (2026) Prop 1 Binder TSL (Wave E.1 foundation) and the Wave D Gardner + GMM correction. Verifies reduction semantics (bandwidth -> 0 ≡ Binder; + H=1 ≡ plain Conley on PSU totals), cross-stratum independence, + singleton-adjust FPC skip parity with Binder, and the saturation + NaN-fail. + """ + + _CUTOFF_KM = 1000.0 # large enough that within-stratum PSU pairs are inside + + def _fit(self, df, **kwargs): + design = kwargs.pop("design", None) + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=self._CUTOFF_KM, + conley_lag_cutoff=0, + vcov_type="conley", + event_study=False, + **kwargs, + ) + return est.fit( + df, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + + def test_a_no_survey_conley_path_matches_wave_d_golden(self): + """The `resolved_survey is None` branch of the new dispatch must + produce the SAME no-survey Conley SE as the pre-Wave-E.2 (Wave D) + Conley path. The Wave D path is `_compute_conley_meat(...)` with + no changes; the new dispatch only ADDS an `if resolved_survey is + not None` branch above the existing call. Pin the SE to a golden + captured on this fixture so any future refactor that disturbs the + no-survey path is caught by a behavioral test, not just by + determinism. + """ + df = generate_butts_nonstaggered_dgp(seed=0) + res = self._fit(df) + # Wave D no-survey Conley golden captured on this fixture (seed=0, + # 2-period non-staggered Butts DGP, cutoff=1000 km, Bartlett kernel). + # These values reflect the pre-Wave-E.2 no-survey Conley path. + # The dispatch in `_compute_gmm_corrected_meat` only ADDS a new + # `if resolved_survey is not None` branch above the existing + # `_compute_conley_meat` call, so the `resolved_survey is None` + # path is bit-identical to Wave D; any future refactor that + # disturbs it must update these goldens deliberately. + _WAVE_D_NO_SURVEY_CONLEY_ATT = -0.07471658104745109 + _WAVE_D_NO_SURVEY_CONLEY_SE = 0.0018453344099259904 + np.testing.assert_allclose(res.att, _WAVE_D_NO_SURVEY_CONLEY_ATT, rtol=1e-12, atol=1e-14) + np.testing.assert_allclose(res.se, _WAVE_D_NO_SURVEY_CONLEY_SE, rtol=1e-12, atol=1e-14) + assert np.isfinite(res.se) and res.se > 0 + + def test_a2_no_survey_conley_path_routes_through_wave_d_helper(self): + """Structural anchor: a no-survey conley fit invokes the Wave D + `_compute_conley_meat` helper directly, NOT the Wave E.2 + `_compute_stratified_conley_meat` orchestrator. Pins the dispatch + branch in `_compute_gmm_corrected_meat` (no leak into the new + path when `resolved_survey is None`). + """ + from unittest.mock import patch + + df = generate_butts_nonstaggered_dgp(seed=2) + with patch("diff_diff.two_stage._compute_stratified_conley_meat") as mock_panel_aware: + self._fit(df) + assert not mock_panel_aware.called, ( + "No-survey conley fit must NOT call _compute_stratified_conley_meat " + "(the Wave E.2 panel-aware survey path); it should route through " + "the Wave D _compute_conley_meat directly." + ) + + def test_b_panel_aware_per_period_sum_invariant(self): + """Panel-aware Wave E.2 meat == sum-across-periods of per-period + within-stratum Conley sandwich on per-period PSU totals. + + Pure unit test on the orchestrator + helper composition: with T + periods of synthetic PSU-level data, ``_compute_stratified_conley_meat``'s + per-period loop must produce the same result as manually calling + the survey helper T times (once per period, on per-period PSU + totals) and summing. This pins the library's panel Conley contract + (``conley_lag_cutoff = 0`` means "within-period spatial only") on + the survey path — no cross-period spatial pairs leak through the + collapsed PSU totals. + + Replaces the original "bandwidth → 0 reduces to Wave E.1 Binder" + claim, which only holds under T=1 (the cross-sectional limit). + SpilloverDiD's panel-only contract precludes a T=1 fit, so the + Wave E.1-equivalence claim is meaningful only on this synthetic + unit-test fixture. + """ + from diff_diff.survey import ( + ResolvedSurveyDesign, + _compute_stratified_conley_meat_from_psu_scores, + ) + from diff_diff.two_stage import _compute_stratified_conley_meat + + rng = np.random.default_rng(31) + # 4 PSUs × 2 periods × 3 obs per PSU-period = 24 obs. + n_obs, T, G, p_2 = 24, 2, 4, 3 + obs_per_psu_period = 3 + psu_id = np.repeat(np.arange(G), obs_per_psu_period * T) + time_arr = np.tile(np.repeat(np.arange(T), obs_per_psu_period), G) + Psi = rng.standard_normal((n_obs, p_2)) + psu_centroids = np.array([[40.0, -120.0], [40.1, -120.0], [40.2, -120.0], [40.3, -120.0]]) + coords = psu_centroids[psu_id] + psu_strata = np.array([0, 0, 1, 1]) # 2 PSUs per stratum + fpc_per_psu = np.full(G, 20.0) + resolved = ResolvedSurveyDesign( + weights=np.ones(n_obs), + weight_type="pweight", + strata=np.repeat(psu_strata, obs_per_psu_period * T), + psu=psu_id, + fpc=np.full(n_obs, 20.0), + n_strata=2, + n_psu=4, + lonely_psu="remove", + ) + # Orchestrator (panel-aware). + meat = _compute_stratified_conley_meat( + Psi, + conley_coords=coords, + conley_cutoff_km=0.30, + conley_metric="euclidean", + conley_kernel="bartlett", + resolved_survey=resolved, + conley_time=time_arr, + ) + # Hand: aggregate Psi to PSU WITHIN each period, run the survey + # helper per period, sum. + expected = np.zeros((p_2, p_2)) + for t in range(T): + period_mask = time_arr == t + Psi_t = Psi[period_mask] + psu_id_t = psu_id[period_mask] + S_psu_t = np.zeros((G, p_2)) + for g in range(G): + S_psu_t[g] = Psi_t[psu_id_t == g].sum(axis=0) + meat_t, _, _ = _compute_stratified_conley_meat_from_psu_scores( + S_psu_t, + psu_strata, + psu_centroids, + cutoff=0.30, + metric="euclidean", + kernel="bartlett", + fpc_per_psu=fpc_per_psu, + ) + expected += meat_t + np.testing.assert_allclose(meat, expected, rtol=1e-12, atol=1e-14) + # Sanity: a time-collapsed naive computation (the OLD pre-R2 design) + # would DIFFER from the panel-aware meat on the same inputs. + S_psu_collapsed = np.zeros((G, p_2)) + for g in range(G): + S_psu_collapsed[g] = Psi[psu_id == g].sum(axis=0) + meat_collapsed, _, _ = _compute_stratified_conley_meat_from_psu_scores( + S_psu_collapsed, + psu_strata, + psu_centroids, + cutoff=0.30, + metric="euclidean", + kernel="bartlett", + fpc_per_psu=fpc_per_psu, + ) + # Differs by the cross-period off-diagonal mass (the panel-aware + # contract drops these by construction). + assert not np.allclose(meat, meat_collapsed, rtol=1e-3, atol=1e-3) + + def test_c_hand_computation_methodology_anchor(self): + """Hand-compute the stratified-Conley meat formula on synthetic + PSU-level inputs and assert parity with the new survey helper. + + Mirrors `_scratch/wave_e2_smoke.py` Chunk 1 methodology anchor. + """ + from diff_diff.survey import _compute_stratified_conley_meat_from_psu_scores + + rng = np.random.default_rng(7) + G, k = 8, 3 + psu_strata = np.array([0, 0, 0, 0, 1, 1, 1, 1]) + psu_coords = np.array( + [ + [40.00, -120.0], + [40.10, -120.0], + [40.20, -120.0], + [40.30, -120.0], + [40.05, -120.0], + [40.15, -120.0], + [40.25, -120.0], + [40.35, -120.0], + ] + ) + psu_scores = rng.standard_normal((G, k)) + fpc = np.full(G, 20.0) + cutoff = 0.30 + + meat, var_ok, _ = _compute_stratified_conley_meat_from_psu_scores( + psu_scores, + psu_strata, + psu_coords, + cutoff=cutoff, + metric="euclidean", + kernel="bartlett", + fpc_per_psu=fpc, + lonely_psu="remove", + ) + assert var_ok + + # Hand: per stratum, demean, apply Bartlett K on PSU coords, + # FPC-scale, sum across strata. + expected = np.zeros((k, k)) + for h in [0, 1]: + mask = psu_strata == h + s_h = psu_scores[mask] + c_h = psu_coords[mask] + n_h = s_h.shape[0] + centered = s_h - s_h.mean(axis=0, keepdims=True) + d = np.sqrt(((c_h[:, None, :] - c_h[None, :, :]) ** 2).sum(axis=2)) + K = np.maximum(0.0, 1.0 - d / cutoff) + M_h = centered.T @ K @ centered + f_h = n_h / fpc[mask][0] + M_h *= (1.0 - f_h) * n_h / (n_h - 1) + expected += M_h + np.testing.assert_allclose(meat, expected, rtol=1e-12, atol=1e-14) + + def test_d_single_stratum_reduces_to_plain_conley_on_psu_totals(self): + """H = 1 stratum, FPC = inf: reduces to ordinary Conley sandwich + on PSU totals (modulo the n/(n-1) finite-sample scale). + """ + from diff_diff.conley import _compute_conley_meat + from diff_diff.survey import _compute_stratified_conley_meat_from_psu_scores + + rng = np.random.default_rng(11) + G = 8 + psu_strata = np.zeros(G, dtype=int) + psu_coords = np.array( + [ + [40.00, -120.0], + [40.10, -120.0], + [40.20, -120.0], + [40.30, -120.0], + [40.05, -120.0], + [40.15, -120.0], + [40.25, -120.0], + [40.35, -120.0], + ] + ) + psu_scores = rng.standard_normal((G, 3)) + cutoff = 0.30 + + meat, _, _ = _compute_stratified_conley_meat_from_psu_scores( + psu_scores, + psu_strata, + psu_coords, + cutoff=cutoff, + metric="euclidean", + kernel="bartlett", + ) + # Plain Conley sandwich on PSU totals (no FPC). n/(n-1) scale + # comes from the survey helper's adjustment; FPC term is 1. + centered = psu_scores - psu_scores.mean(axis=0, keepdims=True) + plain = _compute_conley_meat(centered, psu_coords, cutoff, "euclidean", "bartlett") + plain *= G / (G - 1) + np.testing.assert_allclose(meat, plain, rtol=1e-12, atol=1e-14) + + def test_e_cross_stratum_independence_invariant(self): + """Cross-stratum kernel weights are exactly zero by sampling design. + + Pure unit test on the new survey helper: full meat ≡ partition-then-sum + when each partition is fit as a separate single-stratum call. Uses + interleaved cross-stratum centroids so cross-stratum pairs are + CLOSER in km than within-stratum pairs — any kernel leak across + strata would produce a large numerical difference. + """ + from diff_diff.survey import _compute_stratified_conley_meat_from_psu_scores + + rng = np.random.default_rng(13) + G, k = 8, 3 + psu_strata = np.array([0, 0, 0, 0, 1, 1, 1, 1]) + # Interleaved: stratum 0 at lats 40.00/40.10/40.20/40.30; stratum 1 + # at 40.05/40.15/40.25/40.35. Cross-stratum nearest pair = 0.05 vs + # within-stratum nearest = 0.10 — kernel would weight them DOUBLE + # if it leaked. + psu_coords = np.array( + [ + [40.00, -120.0], + [40.10, -120.0], + [40.20, -120.0], + [40.30, -120.0], + [40.05, -120.0], + [40.15, -120.0], + [40.25, -120.0], + [40.35, -120.0], + ] + ) + psu_scores = rng.standard_normal((G, k)) + fpc = np.full(G, 20.0) + cutoff = 0.30 + + meat_full, _, _ = _compute_stratified_conley_meat_from_psu_scores( + psu_scores, + psu_strata, + psu_coords, + cutoff=cutoff, + metric="euclidean", + kernel="bartlett", + fpc_per_psu=fpc, + ) + partitioned = np.zeros((k, k)) + for h in [0, 1]: + mask = psu_strata == h + sub_strata = np.zeros(mask.sum(), dtype=int) + part_meat, _, _ = _compute_stratified_conley_meat_from_psu_scores( + psu_scores[mask], + sub_strata, + psu_coords[mask], + cutoff=cutoff, + metric="euclidean", + kernel="bartlett", + fpc_per_psu=fpc[mask], + ) + partitioned += part_meat + np.testing.assert_allclose(meat_full, partitioned, rtol=1e-12, atol=1e-14) + + def test_f_lonely_psu_modes_accepted(self): + """All three lonely_psu modes flow through the conley+survey path.""" + from diff_diff import SurveyDesign + + df = generate_butts_nonstaggered_dgp(seed=14) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + df_s.loc[df_s["stratum"] == 0, "psu"] = 0 # collapse stratum 0 to a singleton PSU + for mode in ("remove", "certainty", "adjust"): + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + design = SurveyDesign( + weights="w", + strata="stratum", + psu="psu", + fpc="N_h", + lonely_psu=mode, + ) + res = self._fit(df_s, design=design) + if np.isfinite(res.se): + assert res.se >= 0 + else: + assert np.isnan(res.t_stat) and np.isnan(res.p_value) + + def test_f2_singleton_adjust_fpc_skip_parity_binder_vs_conley(self): + """Binder helper and Conley helper produce bit-identical output on + a singleton stratum with lonely_psu="adjust". + + Load-bearing: pins the Chunk 2 `continue`-skip-FPC pattern. Without + the `continue`, the Conley helper would divide by `n_h - 1 = 0` on + the singleton stratum and the meat would NaN-propagate while + Binder's meat stays finite. With the kernel reducing to identity + on a singleton (K = [[K(0)]] = [[1.0]]) the two outputs MUST match. + """ + from diff_diff.survey import ( + _compute_stratified_conley_meat_from_psu_scores, + _compute_stratified_meat_from_psu_scores, + ) + + rng = np.random.default_rng(15) + # 5 PSUs: 1 in stratum 0 (singleton), 4 in stratum 1. + psu_scores = rng.standard_normal((5, 3)) + psu_strata = np.array([0, 1, 1, 1, 1]) + psu_coords = np.array( + [ + [40.0, -120.0], + [40.1, -120.0], + [40.2, -120.0], + [40.3, -120.0], + [40.4, -120.0], + ] + ) + fpc = np.full(5, 20.0) + binder_meat, _, _ = _compute_stratified_meat_from_psu_scores( + psu_scores, + psu_strata, + fpc_per_psu=fpc, + lonely_psu="adjust", + ) + conley_meat, _, _ = _compute_stratified_conley_meat_from_psu_scores( + psu_scores, + psu_strata, + psu_coords, + cutoff=1e-10, + metric="euclidean", + kernel="bartlett", + fpc_per_psu=fpc, + lonely_psu="adjust", + ) + # Conley with bandwidth -> 0 collapses K to identity in EVERY stratum, + # so the entire meat (singleton + multi-PSU stratum) reduces to Binder. + np.testing.assert_allclose(conley_meat, binder_meat, rtol=1e-12, atol=1e-14) + # And both are finite (the singleton FPC skip prevents divide-by-zero). + assert np.all(np.isfinite(conley_meat)) + + def test_g_fpc_large_matches_no_fpc(self): + """Very-large FPC (1-f_h ≈ 1) produces SE close to the no-FPC path.""" + from diff_diff import SurveyDesign + + df = generate_butts_nonstaggered_dgp(seed=16) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=1e9) + design_fpc_large = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + design_no_fpc = SurveyDesign(weights="w", strata="stratum", psu="psu") + res_large = self._fit(df_s, design=design_fpc_large) + res_no = self._fit(df_s, design=design_no_fpc) + np.testing.assert_allclose(res_large.se, res_no.se, rtol=1e-6) + + def test_h_fpc_equals_n_zeros_stratum(self): + """FPC = n_h per stratum makes (1-f_h) = 0; meat is zero, SE = 0.""" + from diff_diff import SurveyDesign + + df = generate_butts_nonstaggered_dgp(seed=17) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=4.0) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + res = self._fit(df_s, design=design) + np.testing.assert_allclose(res.se, 0.0, atol=1e-14) + + def test_i_saturated_design_nan_fails(self): + """All-singleton strata + lonely_psu='remove' -> df_survey = 0 -> + NaN meat + UserWarning matching 'Wave E.2 stratified-Conley'. + """ + from diff_diff import SurveyDesign + + df = generate_butts_nonstaggered_dgp(seed=18) + df_s = df.copy() + df_s["w"] = 1.0 + units_sorted = sorted(df_s["unit"].unique()) + unit_to_idx = {u: idx for idx, u in enumerate(units_sorted)} + df_s["psu"] = df_s["unit"].map(unit_to_idx) + df_s["stratum"] = df_s["unit"].map(unit_to_idx) # H = n_units; every stratum singleton + df_s["N_h"] = 20.0 + design = SurveyDesign( + weights="w", + strata="stratum", + psu="psu", + fpc="N_h", + lonely_psu="remove", + ) + with pytest.warns(UserWarning, match="Wave E.2 stratified-Conley"): + res = self._fit(df_s, design=design) + assert np.isnan(res.se) + assert np.isnan(res.t_stat) + assert np.isnan(res.p_value) + + def test_j0_panel_conley_lag_cutoff_rejected_under_survey(self): + """vcov_type='conley' + conley_lag_cutoff > 0 + survey_design raises + NotImplementedError upfront. Wave E.2 ships cross-sectional only; + the panel-block decomposition (within-unit serial Bartlett HAC over + time) would need PSU-by-time scores rather than the collapsed PSU + totals. Tracked as a Wave E.2 follow-up in TODO.md. + """ + from diff_diff import SurveyDesign + + df = generate_butts_nonstaggered_dgp(seed=180) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=self._CUTOFF_KM, + conley_lag_cutoff=2, # panel-block path + vcov_type="conley", + ) + with pytest.raises(NotImplementedError, match="conley_lag_cutoff > 0"): + est.fit( + df_s, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + + def test_j_replicate_weights_rejection_inherits_wave_e1(self): + """Replicate-weight variance still raises NotImplementedError under + conley+survey (inherits Wave E.1 gate). SurveyDesign requires + replicate_weights to be set WITHOUT strata/psu/fpc (they encode + the design implicitly). + """ + from diff_diff import SurveyDesign + + df = generate_butts_nonstaggered_dgp(seed=19) + df_s = df.copy() + df_s["w"] = 1.0 + # Add 10 replicate-weight columns; must be constant within units + # (panel survey constraint). + rng = np.random.default_rng(19) + units = sorted(df_s["unit"].unique()) + for r in range(10): + rep_by_unit = dict(zip(units, rng.uniform(0.5, 2.0, size=len(units)))) + df_s[f"rep_{r}"] = df_s["unit"].map(rep_by_unit) + design = SurveyDesign( + weights="w", + replicate_weights=[f"rep_{r}" for r in range(10)], + replicate_method="JK1", + ) + with pytest.raises(NotImplementedError, match="(?i)replicate|follow-up"): + self._fit(df_s, design=design) + + def test_k_non_pweight_rejection_inherits_wave_e1(self): + """Non-pweight weight_type still raises ValueError under conley+survey + (inherits Wave E.1 gate). + """ + from diff_diff import SurveyDesign + + df = generate_butts_nonstaggered_dgp(seed=20) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + design = SurveyDesign( + weights="w", + strata="stratum", + psu="psu", + fpc="N_h", + weight_type="aweight", + ) + with pytest.raises((NotImplementedError, ValueError), match="(?i)pweight|aweight"): + self._fit(df_s, design=design) + + def test_l_cluster_plus_conley_plus_survey_warn_and_use_psu(self): + """cluster= + conley + survey with different cluster vs PSU -> + UserWarning fires; PSU wins (mirrors Wave E.1 warn-and-use-PSU). + """ + from diff_diff import SurveyDesign + + df = generate_butts_nonstaggered_dgp(seed=21) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + # Inject a coarser cluster column distinct from PSU (1 cluster + # per unit). The warn-and-use-PSU path requires that cluster and + # PSU are NOT identical groupings. + units_sorted = sorted(df_s["unit"].unique()) + unit_to_cluster = {u: idx // 2 for idx, u in enumerate(units_sorted)} + df_s["my_cluster"] = df_s["unit"].map(unit_to_cluster) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + with pytest.warns(UserWarning, match="(?i)cluster"): + res = self._fit(df_s, design=design, cluster="my_cluster") + assert np.isfinite(res.se) and res.se > 0 + + def test_m_fit_idempotency_under_conley_survey(self): + """clone() + repeat fit produces identical results; survey state + not mutated on fit() (per feedback_fit_does_not_mutate_config). + """ + from diff_diff import SurveyDesign + + df = generate_butts_nonstaggered_dgp(seed=22) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=self._CUTOFF_KM, + conley_lag_cutoff=0, + vcov_type="conley", + ) + res_1 = est.fit( + df_s, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + # Second fit on the SAME estimator instance (idempotency). + res_2 = est.fit( + df_s, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + assert res_1.coefficients == res_2.coefficients + np.testing.assert_array_equal(res_1.vcov, res_2.vcov) + assert res_1.n_psu == res_2.n_psu + assert res_1.n_strata == res_2.n_strata + + def test_n0_no_psu_weights_only_survey_design(self): + """`SurveyDesign(weights=...)` without explicit PSU — each obs is + its own pseudo-PSU. Panel-aware path must re-index PSUs WITHIN + each period (not pad zeros across the full panel) or the centering + leaks off-period spurious structure into the spatial meat. + + Regression for the R3 P0 fix. + """ + from diff_diff import SurveyDesign + + df = generate_butts_nonstaggered_dgp(seed=240) + df_s = df.copy() + df_s["w"] = 1.0 + design = SurveyDesign(weights="w") + res = self._fit(df_s, design=design) + assert np.isfinite(res.att) + assert np.isfinite(res.se) and res.se > 0 + + def test_n1_no_psu_strata_only_survey_design(self): + """`SurveyDesign(weights=..., strata=...)` without explicit PSU — + each obs is its own pseudo-PSU under stratified sampling. Same + per-period re-indexing requirement as test_n0. + """ + from diff_diff import SurveyDesign + + df = generate_butts_nonstaggered_dgp(seed=241) + df_s = df.copy() + df_s["w"] = 1.0 + units_sorted = sorted(df_s["unit"].unique()) + unit_to_stratum = {u: idx % 2 for idx, u in enumerate(units_sorted)} + df_s["stratum"] = df_s["unit"].map(unit_to_stratum) + design = SurveyDesign(weights="w", strata="stratum") + res = self._fit(df_s, design=design) + assert np.isfinite(res.att) + assert np.isfinite(res.se) and res.se > 0 + + def test_b2_explicit_psu_centroid_panel_constant_under_finite_mask(self): + """When a PSU contains multiple units at DIFFERENT coordinates + (simulating a finite_mask drop that varies coverage across + periods), the orchestrator must use PANEL-CONSTANT centroids + (mean across all obs in PSU, regardless of period) — NOT + per-period centroids. This matches the documented Wave E.2 + contract "centroid_g = mean over i in PSU g of conley_coords[i]" + at REGISTRY.md and prevents support-sample-dependent kernel + weights. + + Pure unit test on the orchestrator + helper composition with + synthetic per-obs inputs. + """ + from diff_diff.survey import ( + ResolvedSurveyDesign, + _compute_stratified_conley_meat_from_psu_scores, + ) + from diff_diff.two_stage import _compute_stratified_conley_meat + + rng = np.random.default_rng(331) + # 2 strata × 2 PSUs × 1 obs per PSU-period = 8 obs. + # PSU 0 obs coords differ across periods (simulating finite_mask + # variation): period 0 at [40.0, 0]; period 1 at [42.0, 0]. + # PSU 1/2/3 have constant coords across periods. + n, p_2 = 8, 3 + Psi = rng.standard_normal((n, p_2)) + psu_id = np.array([0, 1, 2, 3, 0, 1, 2, 3]) # PSUs alternate per period + time_arr = np.array([0, 0, 0, 0, 1, 1, 1, 1]) + # Coords vary across periods for PSU 0 only. + coords = np.array( + [ + [40.0, 0.0], # PSU 0, period 0 + [40.5, 0.0], # PSU 1, period 0 + [50.0, 0.0], # PSU 2, period 0 + [50.5, 0.0], # PSU 3, period 0 + [42.0, 0.0], # PSU 0, period 1 — DIFFERENT coord + [40.5, 0.0], # PSU 1, period 1 + [50.0, 0.0], # PSU 2, period 1 + [50.5, 0.0], # PSU 3, period 1 + ] + ) + psu_strata_obs = np.array([0, 0, 1, 1, 0, 0, 1, 1]) + resolved = ResolvedSurveyDesign( + weights=np.ones(n), + weight_type="pweight", + strata=psu_strata_obs, + psu=psu_id, + fpc=None, + n_strata=2, + n_psu=4, + lonely_psu="remove", + ) + meat_panel = _compute_stratified_conley_meat( + Psi, + conley_coords=coords, + conley_cutoff_km=5.0, + conley_metric="euclidean", + conley_kernel="bartlett", + resolved_survey=resolved, + conley_time=time_arr, + ) + # Hand calculation using PANEL-CONSTANT centroids (the contract). + # PSU 0 centroid = mean([40.0, 0], [42.0, 0]) = [41.0, 0]. + # Other PSUs have constant coords → centroid equals that coord. + panel_centroids = np.array([[41.0, 0.0], [40.5, 0.0], [50.0, 0.0], [50.5, 0.0]]) + # Per-period PSU totals (each PSU appears once per period in this + # fixture, so the PSU total per period IS the single obs's Psi). + psu_strata = np.array([0, 0, 1, 1]) + expected = np.zeros((p_2, p_2)) + for t in [0, 1]: + mask = time_arr == t + Psi_t = Psi[mask] + psu_id_t = psu_id[mask] + S_psu_t = np.zeros((4, p_2)) + for g in range(4): + rows = psu_id_t == g + if rows.any(): + S_psu_t[g] = Psi_t[rows].sum(axis=0) + meat_t, _, _ = _compute_stratified_conley_meat_from_psu_scores( + S_psu_t, + psu_strata, + panel_centroids, # panel-constant — same across periods + cutoff=5.0, + metric="euclidean", + kernel="bartlett", + ) + expected += meat_t + np.testing.assert_allclose(meat_panel, expected, rtol=1e-12, atol=1e-14) + # Counter-check: per-period centroids (the OLD pre-fix design) + # would give a different meat for PSU 0 because the centroid + # used in period 1 (42.0) differs from the one used in period 0 + # (40.0). Verify the orchestrator does NOT match that buggy + # construction. + buggy_expected = np.zeros((p_2, p_2)) + period_centroids = { + 0: np.array([[40.0, 0.0], [40.5, 0.0], [50.0, 0.0], [50.5, 0.0]]), + 1: np.array([[42.0, 0.0], [40.5, 0.0], [50.0, 0.0], [50.5, 0.0]]), + } + for t in [0, 1]: + mask = time_arr == t + Psi_t = Psi[mask] + psu_id_t = psu_id[mask] + S_psu_t = np.zeros((4, p_2)) + for g in range(4): + rows = psu_id_t == g + if rows.any(): + S_psu_t[g] = Psi_t[rows].sum(axis=0) + meat_t, _, _ = _compute_stratified_conley_meat_from_psu_scores( + S_psu_t, + psu_strata, + period_centroids[t], # per-period (buggy) + cutoff=5.0, + metric="euclidean", + kernel="bartlett", + ) + buggy_expected += meat_t + # The buggy construction MUST differ measurably from the + # panel-constant orchestrator output. + assert not np.allclose( + meat_panel, buggy_expected, rtol=1e-3, atol=1e-3 + ), "orchestrator unexpectedly matches per-period (buggy) centroid construction" + + def test_n2_no_psu_per_period_reindex_unit_invariant(self): + """Direct unit test on the orchestrator: the no-PSU per-period + re-indexing must NOT mix off-period rows into the kernel. With + synthetic data where obs 0/1 are in period 0 (close in km) and + obs 2/3 are in period 1 (far away), the meat must reflect + ONLY within-period spatial pairs. + """ + from diff_diff.survey import ( + ResolvedSurveyDesign, + _compute_stratified_conley_meat_from_psu_scores, + ) + from diff_diff.two_stage import _compute_stratified_conley_meat + + rng = np.random.default_rng(243) + n, p_2 = 4, 2 + Psi = rng.standard_normal((n, p_2)) + # Period 0: obs 0, 1 at lat 40.00 / 40.01 (close in km). + # Period 1: obs 2, 3 at lat 50.00 / 50.01 (far from period-0 obs). + coords = np.array([[40.00, 0.0], [40.01, 0.0], [50.00, 0.0], [50.01, 0.0]]) + time_arr = np.array([0, 0, 1, 1]) + strata_arr = np.array([0, 0, 1, 1]) + resolved = ResolvedSurveyDesign( + weights=np.ones(n), + weight_type="pweight", + strata=strata_arr, + psu=None, # implicit per-obs pseudo-PSU + fpc=None, + n_strata=2, + n_psu=n, + lonely_psu="remove", + ) + meat_panel = _compute_stratified_conley_meat( + Psi, + conley_coords=coords, + conley_cutoff_km=0.05, + conley_metric="euclidean", + conley_kernel="bartlett", + resolved_survey=resolved, + conley_time=time_arr, + ) + # Hand: per-period only on active rows. + meat_p0, _, _ = _compute_stratified_conley_meat_from_psu_scores( + Psi[:2], + np.array([0, 0]), + coords[:2], + cutoff=0.05, + metric="euclidean", + kernel="bartlett", + ) + meat_p1, _, _ = _compute_stratified_conley_meat_from_psu_scores( + Psi[2:], + np.array([0, 0]), + coords[2:], + cutoff=0.05, + metric="euclidean", + kernel="bartlett", + ) + np.testing.assert_allclose(meat_panel, meat_p0 + meat_p1, rtol=1e-12, atol=1e-14) + + def test_n_finite_mask_survey_array_subsetting(self): + """finite_mask drops baseline-treated rows; survey metadata + reflects the SUBSET sample, not the original. + """ + from diff_diff import SurveyDesign + + df = generate_butts_staggered_dgp(seed=23) + # Pin a unit to always-treated (g = period 0); finite_mask will + # drop its rows from stage 2. + first_unit = sorted(df["unit"].unique())[0] + df.loc[df["unit"] == first_unit, "first_treat"] = 0 + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + res = self._fit(df_s, design=design) + # Survey metadata reflects subset (post-finite_mask), not the full panel. + assert res.survey_metadata is not None + assert res.n_obs <= len(df_s) # at least the always-treated unit's rows dropped + + +class TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy: + """Event-study branch + conley + survey, both is_staggered branches.""" + + _CUTOFF_KM = 1000.0 + + def test_o_event_study_conley_survey_is_staggered_true(self): + """Full plumbing end-to-end on the staggered event-study path.""" + from diff_diff import SurveyDesign + + df = generate_butts_staggered_dgp(seed=24) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=self._CUTOFF_KM, + conley_lag_cutoff=0, + vcov_type="conley", + event_study=True, + horizon_max=2, + ) + res = est.fit( + df_s, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + # Event-study + spillover finite end-to-end + assert np.isfinite(res.att) and np.isfinite(res.se) and res.se > 0 + # spillover_effects populated (non-empty) + assert res.spillover_effects is not None + # df_survey lookup uses the survey branch + assert res.survey_metadata is not None + assert res.survey_metadata.df_survey == 6 + + def test_p_event_study_conley_survey_is_staggered_false(self): + """The non-staggered branch of the event-study path also works + (mirrors `feedback_cohort_loop_trigger_cache_both_branches`). + """ + from diff_diff import SurveyDesign + + df = generate_butts_nonstaggered_dgp(seed=25) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=self._CUTOFF_KM, + conley_lag_cutoff=0, + vcov_type="conley", + event_study=True, + horizon_max=1, + ) + res = est.fit( + df_s, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + assert np.isfinite(res.att) and np.isfinite(res.se) and res.se > 0 + assert res.survey_metadata is not None + + def test_r_drift_goldens(self): + """Pinned ATT + SE on a fixed-seed conley+survey fit. + + Drift goldens captured on initial Wave E.2 implementation + (seed=999, standard 2-strata x 4-PSU augmentation, cutoff=1000km). + `assert_allclose` tolerance acknowledges PSU-aggregation BLAS + reduction order variation across CI runners. + """ + from diff_diff import SurveyDesign + + df = generate_butts_nonstaggered_dgp(seed=999) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=self._CUTOFF_KM, + conley_lag_cutoff=0, + vcov_type="conley", + ) + res = est.fit( + df_s, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + # Goldens — pinned on initial Wave E.2 implementation (seed=999, + # 2-strata x 4-PSU augmentation, cutoff=1000km). ATT is invariant + # to vcov_type, so it matches the Wave E.1 binder golden exactly. + # SE is Wave E.2-specific (stratified-Conley sandwich on PSU totals). + _WAVE_E2_GOLDEN_ATT = -0.07749624543132044 + _WAVE_E2_GOLDEN_SE = 0.0006771937420330884 + np.testing.assert_allclose(res.att, _WAVE_E2_GOLDEN_ATT, rtol=1e-12, atol=1e-14) + np.testing.assert_allclose(res.se, _WAVE_E2_GOLDEN_SE, rtol=1e-12, atol=1e-14) + # Lock down DOF + n_psu (deterministic). + assert res.n_psu == 8 + assert res.n_strata == 2 + assert res.survey_metadata.df_survey == 6