Implement CallawaySantAnna covariate adjustment for conditional parallel trends by igerber · Pull Request #22 · igerber/diff-diff

igerber · 2026-01-03T23:01:09Z

Adds full covariate support to the CallawaySantAnna staggered DiD estimator:

Outcome regression (reg): Regresses outcome changes on covariates for control
units, predicts counterfactual for treated
Inverse probability weighting (ipw): Estimates propensity scores via logistic
regression, reweights control units to match treated covariate distribution
Doubly robust (dr): Combines both methods for consistent estimation if either
the outcome model or propensity model is correctly specified

Implementation details:

Added _logistic_regression() and _linear_regression() helper functions
Updated _outcome_regression(), _ipw_estimation(), and _doubly_robust() to
accept and use covariate matrices
Modified _compute_att_gt() to extract covariates from base period data
Graceful fallback to unconditional estimation if covariate extraction fails

Also includes:

9 new tests for covariate adjustment across all estimation methods
Updated TODO.md to reflect completed implementation

…lel trends Adds full covariate support to the CallawaySantAnna staggered DiD estimator: - Outcome regression (reg): Regresses outcome changes on covariates for control units, predicts counterfactual for treated - Inverse probability weighting (ipw): Estimates propensity scores via logistic regression, reweights control units to match treated covariate distribution - Doubly robust (dr): Combines both methods for consistent estimation if either the outcome model or propensity model is correctly specified Implementation details: - Added _logistic_regression() and _linear_regression() helper functions - Updated _outcome_regression(), _ipw_estimation(), and _doubly_robust() to accept and use covariate matrices - Modified _compute_att_gt() to extract covariates from base period data - Graceful fallback to unconditional estimation if covariate extraction fails Also includes: - 9 new tests for covariate adjustment across all estimation methods - Updated TODO.md to reflect completed implementation

- Move scattered docstring notes to docs/reviews/pr22_covariate_adjustment_review.md - Keep actual code fixes: removed unused self._covariates, fixed IPW influence function - Keep new test for extreme propensity scores

Changes based on PR #22 review: 1. Remove unused self._covariates instance variable - Was stored but never accessed; covariates passed through method chain 2. Fix empty influence function in unconditional IPW - Previously returned np.array([]) placeholder - Now properly computes influence function matching other methods 3. Add docstring example for covariate adjustment - Shows doubly robust usage with covariates 4. Add edge case tests: - test_extreme_propensity_scores: verifies clipping handles near-perfect separation - test_near_collinear_covariates: verifies lstsq handles ill-conditioned matrices - test_missing_values_in_covariates_warning: verifies fallback warning path

Phase 2 silent-failures audit — axis-G (backend parity). Closes the coverage gap the audit flagged in three Rust-backed solver surfaces. Test-only PR; any discovered divergences are marked `xfail(strict=True)` and logged to `TODO.md` as P1 follow-ups rather than fixed in-scope. Finding #21 — `solve_ols` skip-rank-check parity (`linalg.py:369-373, 597-639`): three parity tests in `TestSolveOLSSkipRankCheckParity` covering mixed-scale columns (norm ratio > 1e6), near-singular full-rank (cond > 1e10), and rank-deficient collinear designs under `skip_rank_check=True` on HC1. Backends agree on fitted values within `rtol=1e-6, atol=1e-8`. All pass; no Rust-side code change needed. Finding #22 — `compute_synthetic_weights` parity (`utils.py:1134-1199`): three parity tests in `TestSyntheticWeightsBackendParity`. Near-singular `Y'Y` passes at `atol=1e-7`; extreme Y scale (1e9) and lambda_reg variations are `xfail(strict=True)` with a baselined ~15-80% weight divergence. Root cause: Rust path is Frank-Wolfe, Python fallback is projected gradient descent (`utils.py:1228`) — same QP, different simplex vertices under near-degenerate inputs. Finding #23 — TROP Rust grid-search + bootstrap parity (`trop_global.py:688-750, 966-1006`): two parity tests in `TestTROPRustEdgeCaseParity`, `@pytest.mark.slow` class-level. Both `xfail(strict=True)`: grid-search ATT on rank-deficient Y (~6% divergence), bootstrap SE under `seed=42` (~28% divergence, RNG backend mismatch — Rust `rand` crate vs numpy `default_rng`). Plan governance: - Per `feedback_ci_reviewer_pattern_checks`, greped adjacent Rust entry points (`_solve_ols_rust`, `_rust_synthetic_weights`, `_rust_loocv_grid_search_global`, `_rust_bootstrap_trop_variance_global`); no additional silent-fallback surfaces identified. - Per plan Non-goal #4, did not open an axis-H finding on TROP's `seed=None → 0` substitution at `trop_global.py:994` (out of scope). - No behavioral changes, no warnings, no REGISTRY changes, no flags. TODO.md logs three P1 follow-up entries: algorithmic unification for `compute_synthetic_weights` (FW vs PGD), TROP grid-search divergence on rank-deficient Y, TROP bootstrap RNG unification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…trol_units Post-audit cleanup closing finding #22 from the Phase 2 silent-failures audit. Replaces the previous "fix both backends" approach with "delete the helper and inline correctly in its single caller." Why the scope change. The audit found that Rust and Python backends of `compute_synthetic_weights` solved different QPs (Python shrink-to-uniform, Rust shrink-to-zero) with different step sizes (Python adaptive, Rust broken constant 0.1 that diverges at large Y). Closer inspection revealed: - `compute_synthetic_weights` was private (not in __all__). - Its only non-test caller was `rank_control_units` in prep.py — a user-facing diagnostic that ranks control units by trend similarity. - The `synthetic_weight` column it feeds into the `rank_control_units` DataFrame is informational only — `quality_score` is computed from RMSE + covariate distance, NOT from `synthetic_weight`. So the bug did not affect donor-selection ranking at all. - `SyntheticDiD.fit()` does NOT use this helper. It uses the Frank-Wolfe solver (`compute_sdid_unit_weights` / `_sc_weight_fw_numpy`) directly on a completely independent code path that already matches R synthdid. Maintaining two broken PGD implementations of a one-caller helper was not worth the cost. Delete the shim and inline correctly. Changes: - Delete `compute_synthetic_weights` (utils.py:1134) and `_compute_synthetic_weights_numpy` (utils.py:1202). - Remove `_rust_synthetic_weights` from the `_backend.py` import and None-fallback, and from the `__init__.py` re-export. - Remove the import from `prep.py`. - Inline a Frank-Wolfe computation in `rank_control_units` (prep.py:990) using the shared `_sc_weight_fw` dispatcher — same solver SDID uses, matching R `synthdid::sc.weight.fw`. Threads `zeta=sqrt(lambda_reg/N)` to absorb FW's (1/N) objective scaling, noise-level-scaled `min_decrease` per R convention, and `max_iter=1000` to preserve the existing cost envelope. - Short-circuit `n_control in {0, 1}` to avoid FW loop overhead on trivially-solvable cases. - Suppress non-convergence warnings inside the inline block — the caller uses the output as a heuristic score, not a statistical estimate. Tests: - Delete `TestSyntheticWeightsBackendParity` (3 tests, 2 xfailed) and the 3 direct-Rust-import tests in `test_rust_backend.py`. - Delete `TestComputeSyntheticWeightsEdgeCases` (4 tests) in test_utils.py. - Delete `test_compute_synthetic_weights` in test_estimators.py. - Add `test_extreme_Y_scale_synthetic_weight_column` in test_prep.py — regression guard asserting the `synthetic_weight` column produces a valid non-degenerate simplex vector at `Y ~ 1e9` (the exact input that the old Rust PGD mishandled by collapsing onto a single vertex). SDID invariance. Verified via bit-identical pre/post numerical baseline on a fixed-seed `SyntheticDiD.fit()` (n_units=30, n_pre=8, n_post=2, n_treated=3, n_bootstrap=20, seed=42): before: att=1.1895009159752075 se=0.2576531609311485 weight_sum=1.000000000000002e+00 weight_max=6.820411397386437e-02 after: att=1.1895009159752075 se=0.2576531609311485 weight_sum=1.000000000000002e+00 weight_max=6.820411397386437e-02 Full SDID test suite: 40/40 pass before and after (test outcome diff empty). All 305 tests in the touched files pass. Rust-side `compute_synthetic_weights` PyO3 binding in `rust/src/weights.rs` is now dead code (no Python code calls it). Tracked as a low-priority cleanup follow-up in TODO.md; removal requires `maturin develop` rebuild. User-visible impact: - `rank_control_units` public signature and return schema unchanged. - `quality_score` values and ranking: unchanged. - `synthetic_weight` column: values agree with old code at ~1e-7 on default-parameter typical data; differ only in edge cases where the old code was mathematically wrong (extreme Y scale, lambda_reg > 0). - Private imports break: `from diff_diff.utils import compute_synthetic_weights`, `from diff_diff._rust_backend import compute_synthetic_weights`. These were not part of the documented stable API. Net diff: +110 / -367 lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nding igerber#22) Closes the Rust-side loop on finding igerber#22 from the silent-failures audit. The Python-side wrapper and its sole caller were removed in PR igerber#344; this PR deletes the now-orphaned PyO3 binding, its internal PGD implementation, and the obsolete constants that only serviced it. Deleted - rust/src/weights.rs: compute_synthetic_weights pyfunction + compute_synthetic_weights_internal - rust/src/weights.rs: MAX_ITER, DEFAULT_TOL, DEFAULT_STEP_SIZE (only used by the deleted PGD path) - rust/src/weights.rs: test_compute_weights_sum_to_one (cargo test for the deleted internal helper) - rust/src/lib.rs: m.add_function(wrap_pyfunction!(weights::compute_synthetic_weights, m)?) Retained (still used from Python) - weights::project_simplex — consumed by _rust_project_simplex in utils.py (project_simplex tests in test_rust_backend.py) - weights::compute_sdid_unit_weights, compute_time_weights, sc_weight_fw, compute_noise_level (SDID canonical path) - weights::sum_normalize_internal, sparsify_internal (helpers for SDID) Verification - maturin develop --release --features accelerate: clean rebuild, no warnings - cargo check --lib: clean (cargo test --release hits a known macOS PyO3 Python3.framework rpath issue unrelated to this change) - pytest tests/test_rust_backend.py tests/test_prep.py::TestRankControlUnits: 85 passed - Direct import test: project_simplex / compute_sdid_unit_weights / compute_time_weights / sc_weight_fw still importable; compute_synthetic_weights correctly raises ImportError TODO.md row 86 removed. docs/audits/silent-failures-findings.md row igerber#22 updated to FULLY RESOLVED. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber merged commit fc222c2 into main Jan 3, 2026

igerber deleted the claude/implement-roadmap-priority-dyAWY branch January 3, 2026 23:15

igerber mentioned this pull request Apr 19, 2026

Add axis-G Rust vs Python backend parity edge-case tests #337

Merged

igerber mentioned this pull request Apr 20, 2026

Delete compute_synthetic_weights shim; inline Frank-Wolfe in rank_control_units #344

Merged

igerber mentioned this pull request Apr 20, 2026

Remove dead Rust compute_synthetic_weights (follow-up to PR #344) #345

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement CallawaySantAnna covariate adjustment for conditional parallel trends#22

Implement CallawaySantAnna covariate adjustment for conditional parallel trends#22
igerber merged 1 commit intomainfrom
claude/implement-roadmap-priority-dyAWY

igerber commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

igerber commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants