Address code review feedback for data prep utilities by igerber · Pull Request #11 · igerber/diff-diff

igerber · 2026-01-03T14:33:25Z

Performance and correctness improvements:

Replace iterrows() with pd.melt() in wide_to_long for better performance
Fix groupby().ffill() bug in balance_panel to correctly fill only non-key columns
Fix multi-period validation logic to properly check both treatment groups
Fix index labeling in summarize_did_data for non-0/1 time values
Guard np.isinf() against non-numeric types in create_event_time

Tests:

Add test for balance_panel forward/backward fill path

Performance and correctness improvements: - Replace iterrows() with pd.melt() in wide_to_long for better performance - Fix groupby().ffill() bug in balance_panel to correctly fill only non-key columns - Fix multi-period validation logic to properly check both treatment groups - Fix index labeling in summarize_did_data for non-0/1 time values - Guard np.isinf() against non-numeric types in create_event_time Tests: - Add test for balance_panel forward/backward fill path

…StageDiD Addresses axis-C findings #8, #9, and #10 from the silent-failures audit: three sites where a sparse factorization failure silently fell back to dense lstsq without any user-facing signal. - diff_diff/imputation.py:1516 (variance path: scipy.sparse.linalg.spsolve on (A_0' W A_0) z = A_1' w). Bare `except Exception` was swallowing the root cause before dense lstsq. Now emits a UserWarning identifying the exception type and explaining the fallback implication. - diff_diff/two_stage.py:1647 (GMM sandwich: sparse_factorized on X'_{10} W X_{10} for Stage 1 normal equations). `except RuntimeError` was silent; now emits a UserWarning. - diff_diff/two_stage_bootstrap.py:134 (bootstrap path: same pattern as above). `except RuntimeError` was silent; now emits a UserWarning. All three are single-call sites (per fit, or per aggregation level, or per bootstrap replicate at most a handful of times) so no aggregation wrapper pattern is needed — one warning per fallback event is appropriate. REGISTRY.md updated under ImputationDiD and TwoStageDiD. New tests (3): monkey-patch the sparse entry point to raise a RuntimeError, run .fit(), assert the UserWarning fires with the expected message prefix. Works against both the variance and bootstrap surfaces. Axis-C baseline: 3 major silent-fallback sites (imputation, two_stage, two_stage_bootstrap) -> 0 remaining in these files. PowerAnalysis simulation counter (finding #11) and ContinuousDiD B-spline (#12) still open as separate follow-ups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two silent-failure patterns at `power.py:2241` addressed together: 1. Bare `except Exception` absorbed every exception a user-supplied estimator or result extractor could raise, including programming errors (TypeError, AttributeError, IndexError, ...). Those masked bugs counted as "simulation failures" and were aggregated as a reliability signal instead of propagating. Narrowed the catch to `(ValueError, np.linalg.LinAlgError, KeyError, RuntimeError, ZeroDivisionError)` — the set that reasonable DGPs and fit paths can raise on adversarial samples. 2. The internal `n_failures` counter was per-effect-size and thrown away after the outer loop, leaving users with no programmatic way to check whether a run was clean. Surface the primary-effect failure count on the results object as `SimulationPowerResults.n_simulation_failures` and include it in `summary()` output. The proportional > 10% `UserWarning` and the raise-if-all-fail escape are preserved. Covered by audit axis C (silent fallback). Finding #11 from `docs/audits/silent-failures-findings.md`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Lifts the gate at chaisemartin_dhaultfoeuille.py:1230-1234 so per-path event-study disaggregation composes with heterogeneity="<col>" (Web Appendix Section 1.5, Lemma 7), mirroring R did_multiplegt_dyn(..., by_path, predict_het) per-by_level dispatch. Per-path heterogeneity is computed by re-running the Lemma 7 regression on each path-restricted switcher subsample. New `path_groups` (Optional[Set[int]]) parameter on _compute_heterogeneity_test restricts eligibility to switchers ON path p; the variance machinery (standard WLS vcov for non-survey, cell-period IF allocator for Binder TSL, group-level allocator for Rao-Wu replicate) is unchanged from the global heterogeneity path. Cohort dummies absorb baseline by construction, so multi-baseline switcher panels do not produce R-divergence (no parallel UserWarning like controls / trends_linear). Surfaces on results.path_heterogeneity_effects keyed {path: {l: {beta, se, t_stat, p_value, conf_int, n_obs}}} and on to_dataframe(level="by_path") via new always-present het_* columns, populated for positive horizons and NaN otherwise (mirrors cband_* / cumulated_* convention). Per-(path, horizon) inference is refreshed in the final R2 P1b block so all surfaces use the same df_survey after replicate-weight n_valid appends. R parity: introduces the FIRST predict_het R-parity baseline in the repo. Two new scenarios (multi_path_reversible_predict_het global anchor + multi_path_reversible_by_path_predict_het per-path) use dont_drop_larger_lower=TRUE to match drop_larger_lower=False and provide cohort variation under reversal paths. Per-path beta and SE match R within rtol=1e-6. Multiplier bootstrap (n_bootstrap > 0) under by_path + heterogeneity + survey_design inherits the existing per-path multiplier-bootstrap gate from PR #408. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dCDH by_path Wave 5 #11: + heterogeneity (predict_het per-by_level)

GitHub Code Scanning flagged two errors on `.github/workflows/ai_pr_review.yml`: - #11 actions/untrusted-checkout/high — "Checkout of untrusted code in trusted context" - #12 actions/untrusted-checkout-toctou/high — "Insufficient protection against execution of untrusted code on a privileged workflow (issue_comment)" CodeQL flags the structural pattern: the workflow runs `actions/checkout@v6` with the PR head's repo+sha in a context that has access to OPENAI_API_KEY and write permissions on PRs/issues. Today we don't execute the checked-out code (just read it via `git diff`), but the pattern is risky — a future edit adding "run tests" or "install deps" would turn it into an active RCE on fork PRs. The PR-author-vs-commenter mismatch (#12) is real: a maintainer commenting `/ai-review` on a fork PR passes the author_association check, but the code being checked out is the fork's, which the maintainer didn't author. Fix: skip fork PRs entirely. Two-layer gate (different mechanics required for the three event types): 1. Workflow `if:` for `pull_request` events: require `github.event.pull_request.head.repo.full_name == github.repository`. Fork PRs opened against this repo no longer start a workflow run. 2. For `issue_comment` and `pull_request_review_comment` events (event payloads don't include head-repo info — must API-fetch), the resolve-pr step now sets a new `is_fork` output. All 7 post-resolve steps are gated on `steps.pr.outputs.is_fork == 'false'`: - actions/checkout (closed PR path) - actions/checkout (open PR path) — the line CodeQL flags - Pre-fetch base SHA - Fetch previous AI review - Build review prompt - Run Codex - Post PR comment For comment-triggered events on fork PRs, the resolve-pr step prints a yellow `core.notice()` banner in the Actions UI explaining the skip; no checkout, no codex, no comment. Background: per a prior thread on the same workflow, fork PRs already fail today (secrets aren't passed to fork-PR `pull_request` events; comment triggers on fork PRs would crash because OPENAI_API_KEY is empty). So skipping cleanly just makes the failure honest — no real loss of behavior. Tests added (TestWorkflowForkSkip class, 3 contract tests): - test_workflow_pull_request_if_block_excludes_fork_prs - test_workflow_resolve_pr_step_sets_is_fork_output - test_workflow_post_resolve_steps_gated_on_is_fork (asserts ≥7 occurrences of `is_fork == 'false'` so a future workflow refactor adding a new PR-content-touching step must extend the gate) Out of scope: - Adding `pull_request_target` for fork PRs (would need much more hardening; not justified for our use case) - Posting a PR comment explaining the skip (Actions UI notice is enough) Verification: - `pytest tests/test_openai_review.py -q` → 214 passed (3 new). - `python3 -c "import yaml; yaml.safe_load(...)"` → no errors. - 7 step-level `is_fork == 'false'` gates confirmed. - CodeQL on this PR should NOT re-flag #11/#12 — early validation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Skip fork PRs in AI review workflow (closes CodeQL #11, #12)

CodeQL alert igerber#14 (untrusted-checkout-toctou/critical) is the third re-fire of the same rule family on this workflow (igerber#11/igerber#12 → igerber#13 → igerber#14). PR igerber#427's fork-skip gate was load-bearing for igerber#11/igerber#12 but the rule re-classified as igerber#14 at a shifted line range with escalated severity. A structural fix (checkout BASE_SHA only; `git show` for PR-head reads) would close the rule cleanly but degrade reviewer quality: pr_review.md explicitly instructs Codex to `grep -n "pattern" diff_diff/*.py` for pattern-consistency checks, and Codex's `sandbox: read-only` lets it inspect workspace files. Under restructure, those operations would search BASE content, missing patterns added in the PR. Workarounds that materialize PR-head content via non-`actions/checkout` mechanisms (git worktree, overlay) satisfy CodeQL's pattern matcher without changing runtime risk — security theater dressed up. The honest path: dismiss with documented rationale + guard test. The actual threat model accepted by the dismissal: - Codex runs with sandbox: read-only — can read PR-head files, cannot execute or write them - head_sha is API-pinned in the resolve-pr step before checkout; TOCTOU window is sub-second and bounded by the is_fork gate - Same-repo PR contributors who can push to PR head branches already have write access to push to main; the checkout doesn't expand attack surface - Fork PRs are skipped entirely (PR igerber#427 fork-skip gate) The dismissal becomes invalid only if a future workflow edit adds a step that EXECUTES PR-head content. Guard test TestWorkflowDoesNotExecutePRHeadCode in tests/test_openai_review.py fails CI on any of: pip install, pytest, npm install, cargo run/test, make, ./configure, maturin develop/build, poetry install/run, pdm, uv sync/run, tox, setup.py, etc. The maturin entries are load-bearing for this repo (Rust build per CLAUDE.md is the most likely future regression vector). The dismissal API call (gh api PATCH .../code-scanning/alerts/14) will be run post-merge, separately from this PR.

igerber merged commit 493a432 into main Jan 3, 2026

igerber deleted the claude/add-data-prep-utilities-4Unoc branch January 3, 2026 14:33

igerber mentioned this pull request Apr 18, 2026

Surface PowerAnalysis simulation-failure count and narrow except clause #326

Merged

igerber added a commit that referenced this pull request May 11, 2026

Merge pull request #412 from igerber/dcdh-by-path-heterogeneity

2f77f55

dCDH by_path Wave 5 #11: + heterogeneity (predict_het per-by_level)

igerber mentioned this pull request May 14, 2026

Skip fork PRs in AI review workflow (closes CodeQL #11, #12) #427

Merged

igerber added a commit that referenced this pull request May 14, 2026

Merge pull request #427 from igerber/fix-codeql-untrusted-checkout

cfa63de

Skip fork PRs in AI review workflow (closes CodeQL #11, #12)

igerber mentioned this pull request May 14, 2026

Dismiss CodeQL #14 (untrusted-checkout-toctou) with guard test #436

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address code review feedback for data prep utilities#11

Address code review feedback for data prep utilities#11
igerber merged 1 commit into
mainfrom
claude/add-data-prep-utilities-4Unoc

igerber commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

igerber commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants