Skip to content

Implement CallawaySantAnna covariate adjustment for conditional parallel trends#22

Merged
igerber merged 1 commit intomainfrom
claude/implement-roadmap-priority-dyAWY
Jan 3, 2026
Merged

Implement CallawaySantAnna covariate adjustment for conditional parallel trends#22
igerber merged 1 commit intomainfrom
claude/implement-roadmap-priority-dyAWY

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Jan 3, 2026

Adds full covariate support to the CallawaySantAnna staggered DiD estimator:

  • Outcome regression (reg): Regresses outcome changes on covariates for control
    units, predicts counterfactual for treated
  • Inverse probability weighting (ipw): Estimates propensity scores via logistic
    regression, reweights control units to match treated covariate distribution
  • Doubly robust (dr): Combines both methods for consistent estimation if either
    the outcome model or propensity model is correctly specified

Implementation details:

  • Added _logistic_regression() and _linear_regression() helper functions
  • Updated _outcome_regression(), _ipw_estimation(), and _doubly_robust() to
    accept and use covariate matrices
  • Modified _compute_att_gt() to extract covariates from base period data
  • Graceful fallback to unconditional estimation if covariate extraction fails

Also includes:

  • 9 new tests for covariate adjustment across all estimation methods
  • Updated TODO.md to reflect completed implementation

…lel trends

Adds full covariate support to the CallawaySantAnna staggered DiD estimator:

- Outcome regression (reg): Regresses outcome changes on covariates for control
  units, predicts counterfactual for treated
- Inverse probability weighting (ipw): Estimates propensity scores via logistic
  regression, reweights control units to match treated covariate distribution
- Doubly robust (dr): Combines both methods for consistent estimation if either
  the outcome model or propensity model is correctly specified

Implementation details:
- Added _logistic_regression() and _linear_regression() helper functions
- Updated _outcome_regression(), _ipw_estimation(), and _doubly_robust() to
  accept and use covariate matrices
- Modified _compute_att_gt() to extract covariates from base period data
- Graceful fallback to unconditional estimation if covariate extraction fails

Also includes:
- 9 new tests for covariate adjustment across all estimation methods
- Updated TODO.md to reflect completed implementation
@igerber igerber merged commit fc222c2 into main Jan 3, 2026
@igerber igerber deleted the claude/implement-roadmap-priority-dyAWY branch January 3, 2026 23:15
igerber pushed a commit that referenced this pull request Jan 3, 2026
- Move scattered docstring notes to docs/reviews/pr22_covariate_adjustment_review.md
- Keep actual code fixes: removed unused self._covariates, fixed IPW influence function
- Keep new test for extreme propensity scores
igerber pushed a commit that referenced this pull request Jan 3, 2026
Changes based on PR #22 review:

1. Remove unused self._covariates instance variable
   - Was stored but never accessed; covariates passed through method chain

2. Fix empty influence function in unconditional IPW
   - Previously returned np.array([]) placeholder
   - Now properly computes influence function matching other methods

3. Add docstring example for covariate adjustment
   - Shows doubly robust usage with covariates

4. Add edge case tests:
   - test_extreme_propensity_scores: verifies clipping handles near-perfect separation
   - test_near_collinear_covariates: verifies lstsq handles ill-conditioned matrices
   - test_missing_values_in_covariates_warning: verifies fallback warning path
igerber pushed a commit that referenced this pull request Jan 3, 2026
Changes based on PR #22 review:

1. Remove unused self._covariates instance variable
   - Was stored but never accessed; covariates passed through method chain

2. Fix empty influence function in unconditional IPW
   - Previously returned np.array([]) placeholder
   - Now properly computes influence function matching other methods

3. Add docstring example for covariate adjustment
   - Shows doubly robust usage with covariates

4. Add edge case tests:
   - test_extreme_propensity_scores: verifies clipping handles near-perfect separation
   - test_near_collinear_covariates: verifies lstsq handles ill-conditioned matrices
   - test_missing_values_in_covariates_warning: verifies fallback warning path
igerber pushed a commit that referenced this pull request Jan 3, 2026
Changes based on PR #22 review:

1. Remove unused self._covariates instance variable
   - Was stored but never accessed; covariates passed through method chain

2. Fix empty influence function in unconditional IPW
   - Previously returned np.array([]) placeholder
   - Now properly computes influence function matching other methods

3. Add docstring example for covariate adjustment
   - Shows doubly robust usage with covariates

4. Add edge case tests:
   - test_extreme_propensity_scores: verifies clipping handles near-perfect separation
   - test_near_collinear_covariates: verifies lstsq handles ill-conditioned matrices
   - test_missing_values_in_covariates_warning: verifies fallback warning path
igerber added a commit that referenced this pull request Apr 19, 2026
Phase 2 silent-failures audit — axis-G (backend parity). Closes the
coverage gap the audit flagged in three Rust-backed solver surfaces.
Test-only PR; any discovered divergences are marked `xfail(strict=True)`
and logged to `TODO.md` as P1 follow-ups rather than fixed in-scope.

Finding #21 — `solve_ols` skip-rank-check parity (`linalg.py:369-373,
597-639`): three parity tests in `TestSolveOLSSkipRankCheckParity`
covering mixed-scale columns (norm ratio > 1e6), near-singular full-rank
(cond > 1e10), and rank-deficient collinear designs under
`skip_rank_check=True` on HC1. Backends agree on fitted values within
`rtol=1e-6, atol=1e-8`. All pass; no Rust-side code change needed.

Finding #22 — `compute_synthetic_weights` parity (`utils.py:1134-1199`):
three parity tests in `TestSyntheticWeightsBackendParity`. Near-singular
`Y'Y` passes at `atol=1e-7`; extreme Y scale (1e9) and lambda_reg
variations are `xfail(strict=True)` with a baselined ~15-80% weight
divergence. Root cause: Rust path is Frank-Wolfe, Python fallback is
projected gradient descent (`utils.py:1228`) — same QP, different
simplex vertices under near-degenerate inputs.

Finding #23 — TROP Rust grid-search + bootstrap parity
(`trop_global.py:688-750, 966-1006`): two parity tests in
`TestTROPRustEdgeCaseParity`, `@pytest.mark.slow` class-level. Both
`xfail(strict=True)`: grid-search ATT on rank-deficient Y (~6%
divergence), bootstrap SE under `seed=42` (~28% divergence, RNG
backend mismatch — Rust `rand` crate vs numpy `default_rng`).

Plan governance:
- Per `feedback_ci_reviewer_pattern_checks`, greped adjacent Rust
  entry points (`_solve_ols_rust`, `_rust_synthetic_weights`,
  `_rust_loocv_grid_search_global`, `_rust_bootstrap_trop_variance_global`);
  no additional silent-fallback surfaces identified.
- Per plan Non-goal #4, did not open an axis-H finding on TROP's
  `seed=None → 0` substitution at `trop_global.py:994` (out of scope).
- No behavioral changes, no warnings, no REGISTRY changes, no flags.

TODO.md logs three P1 follow-up entries: algorithmic unification for
`compute_synthetic_weights` (FW vs PGD), TROP grid-search divergence on
rank-deficient Y, TROP bootstrap RNG unification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 20, 2026
…trol_units

Post-audit cleanup closing finding #22 from the Phase 2 silent-failures
audit. Replaces the previous "fix both backends" approach with "delete the
helper and inline correctly in its single caller."

Why the scope change. The audit found that Rust and Python backends of
`compute_synthetic_weights` solved different QPs (Python shrink-to-uniform,
Rust shrink-to-zero) with different step sizes (Python adaptive, Rust broken
constant 0.1 that diverges at large Y). Closer inspection revealed:
- `compute_synthetic_weights` was private (not in __all__).
- Its only non-test caller was `rank_control_units` in prep.py — a
  user-facing diagnostic that ranks control units by trend similarity.
- The `synthetic_weight` column it feeds into the `rank_control_units`
  DataFrame is informational only — `quality_score` is computed from
  RMSE + covariate distance, NOT from `synthetic_weight`. So the bug
  did not affect donor-selection ranking at all.
- `SyntheticDiD.fit()` does NOT use this helper. It uses the Frank-Wolfe
  solver (`compute_sdid_unit_weights` / `_sc_weight_fw_numpy`) directly
  on a completely independent code path that already matches R synthdid.

Maintaining two broken PGD implementations of a one-caller helper was not
worth the cost. Delete the shim and inline correctly.

Changes:
- Delete `compute_synthetic_weights` (utils.py:1134) and
  `_compute_synthetic_weights_numpy` (utils.py:1202).
- Remove `_rust_synthetic_weights` from the `_backend.py` import and
  None-fallback, and from the `__init__.py` re-export.
- Remove the import from `prep.py`.
- Inline a Frank-Wolfe computation in `rank_control_units` (prep.py:990)
  using the shared `_sc_weight_fw` dispatcher — same solver SDID uses,
  matching R `synthdid::sc.weight.fw`. Threads `zeta=sqrt(lambda_reg/N)`
  to absorb FW's (1/N) objective scaling, noise-level-scaled
  `min_decrease` per R convention, and `max_iter=1000` to preserve the
  existing cost envelope.
- Short-circuit `n_control in {0, 1}` to avoid FW loop overhead on
  trivially-solvable cases.
- Suppress non-convergence warnings inside the inline block — the caller
  uses the output as a heuristic score, not a statistical estimate.

Tests:
- Delete `TestSyntheticWeightsBackendParity` (3 tests, 2 xfailed) and the
  3 direct-Rust-import tests in `test_rust_backend.py`.
- Delete `TestComputeSyntheticWeightsEdgeCases` (4 tests) in test_utils.py.
- Delete `test_compute_synthetic_weights` in test_estimators.py.
- Add `test_extreme_Y_scale_synthetic_weight_column` in test_prep.py —
  regression guard asserting the `synthetic_weight` column produces a
  valid non-degenerate simplex vector at `Y ~ 1e9` (the exact input that
  the old Rust PGD mishandled by collapsing onto a single vertex).

SDID invariance. Verified via bit-identical pre/post numerical baseline on
a fixed-seed `SyntheticDiD.fit()` (n_units=30, n_pre=8, n_post=2,
n_treated=3, n_bootstrap=20, seed=42):

    before: att=1.1895009159752075  se=0.2576531609311485
             weight_sum=1.000000000000002e+00  weight_max=6.820411397386437e-02
    after:  att=1.1895009159752075  se=0.2576531609311485
             weight_sum=1.000000000000002e+00  weight_max=6.820411397386437e-02

Full SDID test suite: 40/40 pass before and after (test outcome diff
empty). All 305 tests in the touched files pass.

Rust-side `compute_synthetic_weights` PyO3 binding in `rust/src/weights.rs`
is now dead code (no Python code calls it). Tracked as a low-priority
cleanup follow-up in TODO.md; removal requires `maturin develop` rebuild.

User-visible impact:
- `rank_control_units` public signature and return schema unchanged.
- `quality_score` values and ranking: unchanged.
- `synthetic_weight` column: values agree with old code at ~1e-7 on
  default-parameter typical data; differ only in edge cases where the old
  code was mathematically wrong (extreme Y scale, lambda_reg > 0).
- Private imports break: `from diff_diff.utils import compute_synthetic_weights`,
  `from diff_diff._rust_backend import compute_synthetic_weights`. These
  were not part of the documented stable API.

Net diff: +110 / -367 lines.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 20, 2026
…trol_units

Post-audit cleanup closing finding #22 from the Phase 2 silent-failures
audit. Replaces the previous "fix both backends" approach with "delete the
helper and inline correctly in its single caller."

Why the scope change. The audit found that Rust and Python backends of
`compute_synthetic_weights` solved different QPs (Python shrink-to-uniform,
Rust shrink-to-zero) with different step sizes (Python adaptive, Rust broken
constant 0.1 that diverges at large Y). Closer inspection revealed:
- `compute_synthetic_weights` was private (not in __all__).
- Its only non-test caller was `rank_control_units` in prep.py — a
  user-facing diagnostic that ranks control units by trend similarity.
- The `synthetic_weight` column it feeds into the `rank_control_units`
  DataFrame is informational only — `quality_score` is computed from
  RMSE + covariate distance, NOT from `synthetic_weight`. So the bug
  did not affect donor-selection ranking at all.
- `SyntheticDiD.fit()` does NOT use this helper. It uses the Frank-Wolfe
  solver (`compute_sdid_unit_weights` / `_sc_weight_fw_numpy`) directly
  on a completely independent code path that already matches R synthdid.

Maintaining two broken PGD implementations of a one-caller helper was not
worth the cost. Delete the shim and inline correctly.

Changes:
- Delete `compute_synthetic_weights` (utils.py:1134) and
  `_compute_synthetic_weights_numpy` (utils.py:1202).
- Remove `_rust_synthetic_weights` from the `_backend.py` import and
  None-fallback, and from the `__init__.py` re-export.
- Remove the import from `prep.py`.
- Inline a Frank-Wolfe computation in `rank_control_units` (prep.py:990)
  using the shared `_sc_weight_fw` dispatcher — same solver SDID uses,
  matching R `synthdid::sc.weight.fw`. Threads `zeta=sqrt(lambda_reg/N)`
  to absorb FW's (1/N) objective scaling, noise-level-scaled
  `min_decrease` per R convention, and `max_iter=1000` to preserve the
  existing cost envelope.
- Short-circuit `n_control in {0, 1}` to avoid FW loop overhead on
  trivially-solvable cases.
- Suppress non-convergence warnings inside the inline block — the caller
  uses the output as a heuristic score, not a statistical estimate.

Tests:
- Delete `TestSyntheticWeightsBackendParity` (3 tests, 2 xfailed) and the
  3 direct-Rust-import tests in `test_rust_backend.py`.
- Delete `TestComputeSyntheticWeightsEdgeCases` (4 tests) in test_utils.py.
- Delete `test_compute_synthetic_weights` in test_estimators.py.
- Add `test_extreme_Y_scale_synthetic_weight_column` in test_prep.py —
  regression guard asserting the `synthetic_weight` column produces a
  valid non-degenerate simplex vector at `Y ~ 1e9` (the exact input that
  the old Rust PGD mishandled by collapsing onto a single vertex).

SDID invariance. Verified via bit-identical pre/post numerical baseline on
a fixed-seed `SyntheticDiD.fit()` (n_units=30, n_pre=8, n_post=2,
n_treated=3, n_bootstrap=20, seed=42):

    before: att=1.1895009159752075  se=0.2576531609311485
             weight_sum=1.000000000000002e+00  weight_max=6.820411397386437e-02
    after:  att=1.1895009159752075  se=0.2576531609311485
             weight_sum=1.000000000000002e+00  weight_max=6.820411397386437e-02

Full SDID test suite: 40/40 pass before and after (test outcome diff
empty). All 305 tests in the touched files pass.

Rust-side `compute_synthetic_weights` PyO3 binding in `rust/src/weights.rs`
is now dead code (no Python code calls it). Tracked as a low-priority
cleanup follow-up in TODO.md; removal requires `maturin develop` rebuild.

User-visible impact:
- `rank_control_units` public signature and return schema unchanged.
- `quality_score` values and ranking: unchanged.
- `synthetic_weight` column: values agree with old code at ~1e-7 on
  default-parameter typical data; differ only in edge cases where the old
  code was mathematically wrong (extreme Y scale, lambda_reg > 0).
- Private imports break: `from diff_diff.utils import compute_synthetic_weights`,
  `from diff_diff._rust_backend import compute_synthetic_weights`. These
  were not part of the documented stable API.

Net diff: +110 / -367 lines.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request Apr 27, 2026
…nding igerber#22)

Closes the Rust-side loop on finding igerber#22 from the silent-failures audit.
The Python-side wrapper and its sole caller were removed in PR igerber#344; this
PR deletes the now-orphaned PyO3 binding, its internal PGD implementation,
and the obsolete constants that only serviced it.

Deleted
- rust/src/weights.rs: compute_synthetic_weights pyfunction + compute_synthetic_weights_internal
- rust/src/weights.rs: MAX_ITER, DEFAULT_TOL, DEFAULT_STEP_SIZE (only used by the deleted PGD path)
- rust/src/weights.rs: test_compute_weights_sum_to_one (cargo test for the deleted internal helper)
- rust/src/lib.rs: m.add_function(wrap_pyfunction!(weights::compute_synthetic_weights, m)?)

Retained (still used from Python)
- weights::project_simplex — consumed by _rust_project_simplex in utils.py (project_simplex tests in test_rust_backend.py)
- weights::compute_sdid_unit_weights, compute_time_weights, sc_weight_fw, compute_noise_level (SDID canonical path)
- weights::sum_normalize_internal, sparsify_internal (helpers for SDID)

Verification
- maturin develop --release --features accelerate: clean rebuild, no warnings
- cargo check --lib: clean (cargo test --release hits a known macOS PyO3 Python3.framework rpath issue unrelated to this change)
- pytest tests/test_rust_backend.py tests/test_prep.py::TestRankControlUnits: 85 passed
- Direct import test: project_simplex / compute_sdid_unit_weights / compute_time_weights / sc_weight_fw still importable; compute_synthetic_weights correctly raises ImportError

TODO.md row 86 removed. docs/audits/silent-failures-findings.md row igerber#22 updated to FULLY RESOLVED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants