Conversation
…lel trends Adds full covariate support to the CallawaySantAnna staggered DiD estimator: - Outcome regression (reg): Regresses outcome changes on covariates for control units, predicts counterfactual for treated - Inverse probability weighting (ipw): Estimates propensity scores via logistic regression, reweights control units to match treated covariate distribution - Doubly robust (dr): Combines both methods for consistent estimation if either the outcome model or propensity model is correctly specified Implementation details: - Added _logistic_regression() and _linear_regression() helper functions - Updated _outcome_regression(), _ipw_estimation(), and _doubly_robust() to accept and use covariate matrices - Modified _compute_att_gt() to extract covariates from base period data - Graceful fallback to unconditional estimation if covariate extraction fails Also includes: - 9 new tests for covariate adjustment across all estimation methods - Updated TODO.md to reflect completed implementation
igerber
pushed a commit
that referenced
this pull request
Jan 3, 2026
- Move scattered docstring notes to docs/reviews/pr22_covariate_adjustment_review.md - Keep actual code fixes: removed unused self._covariates, fixed IPW influence function - Keep new test for extreme propensity scores
igerber
pushed a commit
that referenced
this pull request
Jan 3, 2026
Changes based on PR #22 review: 1. Remove unused self._covariates instance variable - Was stored but never accessed; covariates passed through method chain 2. Fix empty influence function in unconditional IPW - Previously returned np.array([]) placeholder - Now properly computes influence function matching other methods 3. Add docstring example for covariate adjustment - Shows doubly robust usage with covariates 4. Add edge case tests: - test_extreme_propensity_scores: verifies clipping handles near-perfect separation - test_near_collinear_covariates: verifies lstsq handles ill-conditioned matrices - test_missing_values_in_covariates_warning: verifies fallback warning path
igerber
pushed a commit
that referenced
this pull request
Jan 3, 2026
Changes based on PR #22 review: 1. Remove unused self._covariates instance variable - Was stored but never accessed; covariates passed through method chain 2. Fix empty influence function in unconditional IPW - Previously returned np.array([]) placeholder - Now properly computes influence function matching other methods 3. Add docstring example for covariate adjustment - Shows doubly robust usage with covariates 4. Add edge case tests: - test_extreme_propensity_scores: verifies clipping handles near-perfect separation - test_near_collinear_covariates: verifies lstsq handles ill-conditioned matrices - test_missing_values_in_covariates_warning: verifies fallback warning path
igerber
pushed a commit
that referenced
this pull request
Jan 3, 2026
Changes based on PR #22 review: 1. Remove unused self._covariates instance variable - Was stored but never accessed; covariates passed through method chain 2. Fix empty influence function in unconditional IPW - Previously returned np.array([]) placeholder - Now properly computes influence function matching other methods 3. Add docstring example for covariate adjustment - Shows doubly robust usage with covariates 4. Add edge case tests: - test_extreme_propensity_scores: verifies clipping handles near-perfect separation - test_near_collinear_covariates: verifies lstsq handles ill-conditioned matrices - test_missing_values_in_covariates_warning: verifies fallback warning path
igerber
added a commit
that referenced
this pull request
Apr 19, 2026
Phase 2 silent-failures audit — axis-G (backend parity). Closes the coverage gap the audit flagged in three Rust-backed solver surfaces. Test-only PR; any discovered divergences are marked `xfail(strict=True)` and logged to `TODO.md` as P1 follow-ups rather than fixed in-scope. Finding #21 — `solve_ols` skip-rank-check parity (`linalg.py:369-373, 597-639`): three parity tests in `TestSolveOLSSkipRankCheckParity` covering mixed-scale columns (norm ratio > 1e6), near-singular full-rank (cond > 1e10), and rank-deficient collinear designs under `skip_rank_check=True` on HC1. Backends agree on fitted values within `rtol=1e-6, atol=1e-8`. All pass; no Rust-side code change needed. Finding #22 — `compute_synthetic_weights` parity (`utils.py:1134-1199`): three parity tests in `TestSyntheticWeightsBackendParity`. Near-singular `Y'Y` passes at `atol=1e-7`; extreme Y scale (1e9) and lambda_reg variations are `xfail(strict=True)` with a baselined ~15-80% weight divergence. Root cause: Rust path is Frank-Wolfe, Python fallback is projected gradient descent (`utils.py:1228`) — same QP, different simplex vertices under near-degenerate inputs. Finding #23 — TROP Rust grid-search + bootstrap parity (`trop_global.py:688-750, 966-1006`): two parity tests in `TestTROPRustEdgeCaseParity`, `@pytest.mark.slow` class-level. Both `xfail(strict=True)`: grid-search ATT on rank-deficient Y (~6% divergence), bootstrap SE under `seed=42` (~28% divergence, RNG backend mismatch — Rust `rand` crate vs numpy `default_rng`). Plan governance: - Per `feedback_ci_reviewer_pattern_checks`, greped adjacent Rust entry points (`_solve_ols_rust`, `_rust_synthetic_weights`, `_rust_loocv_grid_search_global`, `_rust_bootstrap_trop_variance_global`); no additional silent-fallback surfaces identified. - Per plan Non-goal #4, did not open an axis-H finding on TROP's `seed=None → 0` substitution at `trop_global.py:994` (out of scope). - No behavioral changes, no warnings, no REGISTRY changes, no flags. TODO.md logs three P1 follow-up entries: algorithmic unification for `compute_synthetic_weights` (FW vs PGD), TROP grid-search divergence on rank-deficient Y, TROP bootstrap RNG unification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 20, 2026
…trol_units Post-audit cleanup closing finding #22 from the Phase 2 silent-failures audit. Replaces the previous "fix both backends" approach with "delete the helper and inline correctly in its single caller." Why the scope change. The audit found that Rust and Python backends of `compute_synthetic_weights` solved different QPs (Python shrink-to-uniform, Rust shrink-to-zero) with different step sizes (Python adaptive, Rust broken constant 0.1 that diverges at large Y). Closer inspection revealed: - `compute_synthetic_weights` was private (not in __all__). - Its only non-test caller was `rank_control_units` in prep.py — a user-facing diagnostic that ranks control units by trend similarity. - The `synthetic_weight` column it feeds into the `rank_control_units` DataFrame is informational only — `quality_score` is computed from RMSE + covariate distance, NOT from `synthetic_weight`. So the bug did not affect donor-selection ranking at all. - `SyntheticDiD.fit()` does NOT use this helper. It uses the Frank-Wolfe solver (`compute_sdid_unit_weights` / `_sc_weight_fw_numpy`) directly on a completely independent code path that already matches R synthdid. Maintaining two broken PGD implementations of a one-caller helper was not worth the cost. Delete the shim and inline correctly. Changes: - Delete `compute_synthetic_weights` (utils.py:1134) and `_compute_synthetic_weights_numpy` (utils.py:1202). - Remove `_rust_synthetic_weights` from the `_backend.py` import and None-fallback, and from the `__init__.py` re-export. - Remove the import from `prep.py`. - Inline a Frank-Wolfe computation in `rank_control_units` (prep.py:990) using the shared `_sc_weight_fw` dispatcher — same solver SDID uses, matching R `synthdid::sc.weight.fw`. Threads `zeta=sqrt(lambda_reg/N)` to absorb FW's (1/N) objective scaling, noise-level-scaled `min_decrease` per R convention, and `max_iter=1000` to preserve the existing cost envelope. - Short-circuit `n_control in {0, 1}` to avoid FW loop overhead on trivially-solvable cases. - Suppress non-convergence warnings inside the inline block — the caller uses the output as a heuristic score, not a statistical estimate. Tests: - Delete `TestSyntheticWeightsBackendParity` (3 tests, 2 xfailed) and the 3 direct-Rust-import tests in `test_rust_backend.py`. - Delete `TestComputeSyntheticWeightsEdgeCases` (4 tests) in test_utils.py. - Delete `test_compute_synthetic_weights` in test_estimators.py. - Add `test_extreme_Y_scale_synthetic_weight_column` in test_prep.py — regression guard asserting the `synthetic_weight` column produces a valid non-degenerate simplex vector at `Y ~ 1e9` (the exact input that the old Rust PGD mishandled by collapsing onto a single vertex). SDID invariance. Verified via bit-identical pre/post numerical baseline on a fixed-seed `SyntheticDiD.fit()` (n_units=30, n_pre=8, n_post=2, n_treated=3, n_bootstrap=20, seed=42): before: att=1.1895009159752075 se=0.2576531609311485 weight_sum=1.000000000000002e+00 weight_max=6.820411397386437e-02 after: att=1.1895009159752075 se=0.2576531609311485 weight_sum=1.000000000000002e+00 weight_max=6.820411397386437e-02 Full SDID test suite: 40/40 pass before and after (test outcome diff empty). All 305 tests in the touched files pass. Rust-side `compute_synthetic_weights` PyO3 binding in `rust/src/weights.rs` is now dead code (no Python code calls it). Tracked as a low-priority cleanup follow-up in TODO.md; removal requires `maturin develop` rebuild. User-visible impact: - `rank_control_units` public signature and return schema unchanged. - `quality_score` values and ranking: unchanged. - `synthetic_weight` column: values agree with old code at ~1e-7 on default-parameter typical data; differ only in edge cases where the old code was mathematically wrong (extreme Y scale, lambda_reg > 0). - Private imports break: `from diff_diff.utils import compute_synthetic_weights`, `from diff_diff._rust_backend import compute_synthetic_weights`. These were not part of the documented stable API. Net diff: +110 / -367 lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 20, 2026
…trol_units Post-audit cleanup closing finding #22 from the Phase 2 silent-failures audit. Replaces the previous "fix both backends" approach with "delete the helper and inline correctly in its single caller." Why the scope change. The audit found that Rust and Python backends of `compute_synthetic_weights` solved different QPs (Python shrink-to-uniform, Rust shrink-to-zero) with different step sizes (Python adaptive, Rust broken constant 0.1 that diverges at large Y). Closer inspection revealed: - `compute_synthetic_weights` was private (not in __all__). - Its only non-test caller was `rank_control_units` in prep.py — a user-facing diagnostic that ranks control units by trend similarity. - The `synthetic_weight` column it feeds into the `rank_control_units` DataFrame is informational only — `quality_score` is computed from RMSE + covariate distance, NOT from `synthetic_weight`. So the bug did not affect donor-selection ranking at all. - `SyntheticDiD.fit()` does NOT use this helper. It uses the Frank-Wolfe solver (`compute_sdid_unit_weights` / `_sc_weight_fw_numpy`) directly on a completely independent code path that already matches R synthdid. Maintaining two broken PGD implementations of a one-caller helper was not worth the cost. Delete the shim and inline correctly. Changes: - Delete `compute_synthetic_weights` (utils.py:1134) and `_compute_synthetic_weights_numpy` (utils.py:1202). - Remove `_rust_synthetic_weights` from the `_backend.py` import and None-fallback, and from the `__init__.py` re-export. - Remove the import from `prep.py`. - Inline a Frank-Wolfe computation in `rank_control_units` (prep.py:990) using the shared `_sc_weight_fw` dispatcher — same solver SDID uses, matching R `synthdid::sc.weight.fw`. Threads `zeta=sqrt(lambda_reg/N)` to absorb FW's (1/N) objective scaling, noise-level-scaled `min_decrease` per R convention, and `max_iter=1000` to preserve the existing cost envelope. - Short-circuit `n_control in {0, 1}` to avoid FW loop overhead on trivially-solvable cases. - Suppress non-convergence warnings inside the inline block — the caller uses the output as a heuristic score, not a statistical estimate. Tests: - Delete `TestSyntheticWeightsBackendParity` (3 tests, 2 xfailed) and the 3 direct-Rust-import tests in `test_rust_backend.py`. - Delete `TestComputeSyntheticWeightsEdgeCases` (4 tests) in test_utils.py. - Delete `test_compute_synthetic_weights` in test_estimators.py. - Add `test_extreme_Y_scale_synthetic_weight_column` in test_prep.py — regression guard asserting the `synthetic_weight` column produces a valid non-degenerate simplex vector at `Y ~ 1e9` (the exact input that the old Rust PGD mishandled by collapsing onto a single vertex). SDID invariance. Verified via bit-identical pre/post numerical baseline on a fixed-seed `SyntheticDiD.fit()` (n_units=30, n_pre=8, n_post=2, n_treated=3, n_bootstrap=20, seed=42): before: att=1.1895009159752075 se=0.2576531609311485 weight_sum=1.000000000000002e+00 weight_max=6.820411397386437e-02 after: att=1.1895009159752075 se=0.2576531609311485 weight_sum=1.000000000000002e+00 weight_max=6.820411397386437e-02 Full SDID test suite: 40/40 pass before and after (test outcome diff empty). All 305 tests in the touched files pass. Rust-side `compute_synthetic_weights` PyO3 binding in `rust/src/weights.rs` is now dead code (no Python code calls it). Tracked as a low-priority cleanup follow-up in TODO.md; removal requires `maturin develop` rebuild. User-visible impact: - `rank_control_units` public signature and return schema unchanged. - `quality_score` values and ranking: unchanged. - `synthetic_weight` column: values agree with old code at ~1e-7 on default-parameter typical data; differ only in edge cases where the old code was mathematically wrong (extreme Y scale, lambda_reg > 0). - Private imports break: `from diff_diff.utils import compute_synthetic_weights`, `from diff_diff._rust_backend import compute_synthetic_weights`. These were not part of the documented stable API. Net diff: +110 / -367 lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HanomicsIMF
pushed a commit
to HanomicsIMF/diff-diff
that referenced
this pull request
Apr 27, 2026
…nding igerber#22) Closes the Rust-side loop on finding igerber#22 from the silent-failures audit. The Python-side wrapper and its sole caller were removed in PR igerber#344; this PR deletes the now-orphaned PyO3 binding, its internal PGD implementation, and the obsolete constants that only serviced it. Deleted - rust/src/weights.rs: compute_synthetic_weights pyfunction + compute_synthetic_weights_internal - rust/src/weights.rs: MAX_ITER, DEFAULT_TOL, DEFAULT_STEP_SIZE (only used by the deleted PGD path) - rust/src/weights.rs: test_compute_weights_sum_to_one (cargo test for the deleted internal helper) - rust/src/lib.rs: m.add_function(wrap_pyfunction!(weights::compute_synthetic_weights, m)?) Retained (still used from Python) - weights::project_simplex — consumed by _rust_project_simplex in utils.py (project_simplex tests in test_rust_backend.py) - weights::compute_sdid_unit_weights, compute_time_weights, sc_weight_fw, compute_noise_level (SDID canonical path) - weights::sum_normalize_internal, sparsify_internal (helpers for SDID) Verification - maturin develop --release --features accelerate: clean rebuild, no warnings - cargo check --lib: clean (cargo test --release hits a known macOS PyO3 Python3.framework rpath issue unrelated to this change) - pytest tests/test_rust_backend.py tests/test_prep.py::TestRankControlUnits: 85 passed - Direct import test: project_simplex / compute_sdid_unit_weights / compute_time_weights / sc_weight_fw still importable; compute_synthetic_weights correctly raises ImportError TODO.md row 86 removed. docs/audits/silent-failures-findings.md row igerber#22 updated to FULLY RESOLVED. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds full covariate support to the CallawaySantAnna staggered DiD estimator:
units, predicts counterfactual for treated
regression, reweights control units to match treated covariate distribution
the outcome model or propensity model is correctly specified
Implementation details:
accept and use covariate matrices
Also includes: