Address code review feedback for data prep utilities#11
Merged
Conversation
Performance and correctness improvements: - Replace iterrows() with pd.melt() in wide_to_long for better performance - Fix groupby().ffill() bug in balance_panel to correctly fill only non-key columns - Fix multi-period validation logic to properly check both treatment groups - Fix index labeling in summarize_did_data for non-0/1 time values - Guard np.isinf() against non-numeric types in create_event_time Tests: - Add test for balance_panel forward/backward fill path
igerber
added a commit
that referenced
this pull request
Apr 18, 2026
…StageDiD Addresses axis-C findings #8, #9, and #10 from the silent-failures audit: three sites where a sparse factorization failure silently fell back to dense lstsq without any user-facing signal. - diff_diff/imputation.py:1516 (variance path: scipy.sparse.linalg.spsolve on (A_0' W A_0) z = A_1' w). Bare `except Exception` was swallowing the root cause before dense lstsq. Now emits a UserWarning identifying the exception type and explaining the fallback implication. - diff_diff/two_stage.py:1647 (GMM sandwich: sparse_factorized on X'_{10} W X_{10} for Stage 1 normal equations). `except RuntimeError` was silent; now emits a UserWarning. - diff_diff/two_stage_bootstrap.py:134 (bootstrap path: same pattern as above). `except RuntimeError` was silent; now emits a UserWarning. All three are single-call sites (per fit, or per aggregation level, or per bootstrap replicate at most a handful of times) so no aggregation wrapper pattern is needed — one warning per fallback event is appropriate. REGISTRY.md updated under ImputationDiD and TwoStageDiD. New tests (3): monkey-patch the sparse entry point to raise a RuntimeError, run .fit(), assert the UserWarning fires with the expected message prefix. Works against both the variance and bootstrap surfaces. Axis-C baseline: 3 major silent-fallback sites (imputation, two_stage, two_stage_bootstrap) -> 0 remaining in these files. PowerAnalysis simulation counter (finding #11) and ContinuousDiD B-spline (#12) still open as separate follow-ups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 18, 2026
Two silent-failure patterns at `power.py:2241` addressed together: 1. Bare `except Exception` absorbed every exception a user-supplied estimator or result extractor could raise, including programming errors (TypeError, AttributeError, IndexError, ...). Those masked bugs counted as "simulation failures" and were aggregated as a reliability signal instead of propagating. Narrowed the catch to `(ValueError, np.linalg.LinAlgError, KeyError, RuntimeError, ZeroDivisionError)` — the set that reasonable DGPs and fit paths can raise on adversarial samples. 2. The internal `n_failures` counter was per-effect-size and thrown away after the outer loop, leaving users with no programmatic way to check whether a run was clean. Surface the primary-effect failure count on the results object as `SimulationPowerResults.n_simulation_failures` and include it in `summary()` output. The proportional > 10% `UserWarning` and the raise-if-all-fail escape are preserved. Covered by audit axis C (silent fallback). Finding #11 from `docs/audits/silent-failures-findings.md`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
May 11, 2026
Lifts the gate at chaisemartin_dhaultfoeuille.py:1230-1234 so per-path
event-study disaggregation composes with heterogeneity="<col>" (Web
Appendix Section 1.5, Lemma 7), mirroring R did_multiplegt_dyn(...,
by_path, predict_het) per-by_level dispatch.
Per-path heterogeneity is computed by re-running the Lemma 7 regression
on each path-restricted switcher subsample. New `path_groups`
(Optional[Set[int]]) parameter on _compute_heterogeneity_test restricts
eligibility to switchers ON path p; the variance machinery (standard
WLS vcov for non-survey, cell-period IF allocator for Binder TSL,
group-level allocator for Rao-Wu replicate) is unchanged from the
global heterogeneity path. Cohort dummies absorb baseline by
construction, so multi-baseline switcher panels do not produce
R-divergence (no parallel UserWarning like controls / trends_linear).
Surfaces on results.path_heterogeneity_effects keyed
{path: {l: {beta, se, t_stat, p_value, conf_int, n_obs}}} and on
to_dataframe(level="by_path") via new always-present het_* columns,
populated for positive horizons and NaN otherwise (mirrors cband_* /
cumulated_* convention). Per-(path, horizon) inference is refreshed
in the final R2 P1b block so all surfaces use the same df_survey
after replicate-weight n_valid appends.
R parity: introduces the FIRST predict_het R-parity baseline in the
repo. Two new scenarios (multi_path_reversible_predict_het global
anchor + multi_path_reversible_by_path_predict_het per-path) use
dont_drop_larger_lower=TRUE to match drop_larger_lower=False and
provide cohort variation under reversal paths. Per-path beta and SE
match R within rtol=1e-6.
Multiplier bootstrap (n_bootstrap > 0) under by_path + heterogeneity
+ survey_design inherits the existing per-path multiplier-bootstrap
gate from PR #408.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
May 11, 2026
dCDH by_path Wave 5 #11: + heterogeneity (predict_het per-by_level)
igerber
added a commit
that referenced
this pull request
May 14, 2026
GitHub Code Scanning flagged two errors on `.github/workflows/ai_pr_review.yml`: - #11 actions/untrusted-checkout/high — "Checkout of untrusted code in trusted context" - #12 actions/untrusted-checkout-toctou/high — "Insufficient protection against execution of untrusted code on a privileged workflow (issue_comment)" CodeQL flags the structural pattern: the workflow runs `actions/checkout@v6` with the PR head's repo+sha in a context that has access to OPENAI_API_KEY and write permissions on PRs/issues. Today we don't execute the checked-out code (just read it via `git diff`), but the pattern is risky — a future edit adding "run tests" or "install deps" would turn it into an active RCE on fork PRs. The PR-author-vs-commenter mismatch (#12) is real: a maintainer commenting `/ai-review` on a fork PR passes the author_association check, but the code being checked out is the fork's, which the maintainer didn't author. Fix: skip fork PRs entirely. Two-layer gate (different mechanics required for the three event types): 1. Workflow `if:` for `pull_request` events: require `github.event.pull_request.head.repo.full_name == github.repository`. Fork PRs opened against this repo no longer start a workflow run. 2. For `issue_comment` and `pull_request_review_comment` events (event payloads don't include head-repo info — must API-fetch), the resolve-pr step now sets a new `is_fork` output. All 7 post-resolve steps are gated on `steps.pr.outputs.is_fork == 'false'`: - actions/checkout (closed PR path) - actions/checkout (open PR path) — the line CodeQL flags - Pre-fetch base SHA - Fetch previous AI review - Build review prompt - Run Codex - Post PR comment For comment-triggered events on fork PRs, the resolve-pr step prints a yellow `core.notice()` banner in the Actions UI explaining the skip; no checkout, no codex, no comment. Background: per a prior thread on the same workflow, fork PRs already fail today (secrets aren't passed to fork-PR `pull_request` events; comment triggers on fork PRs would crash because OPENAI_API_KEY is empty). So skipping cleanly just makes the failure honest — no real loss of behavior. Tests added (TestWorkflowForkSkip class, 3 contract tests): - test_workflow_pull_request_if_block_excludes_fork_prs - test_workflow_resolve_pr_step_sets_is_fork_output - test_workflow_post_resolve_steps_gated_on_is_fork (asserts ≥7 occurrences of `is_fork == 'false'` so a future workflow refactor adding a new PR-content-touching step must extend the gate) Out of scope: - Adding `pull_request_target` for fork PRs (would need much more hardening; not justified for our use case) - Posting a PR comment explaining the skip (Actions UI notice is enough) Verification: - `pytest tests/test_openai_review.py -q` → 214 passed (3 new). - `python3 -c "import yaml; yaml.safe_load(...)"` → no errors. - 7 step-level `is_fork == 'false'` gates confirmed. - CodeQL on this PR should NOT re-flag #11/#12 — early validation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
TDL77
pushed a commit
to TDL77/diff-diff
that referenced
this pull request
May 18, 2026
CodeQL alert igerber#14 (untrusted-checkout-toctou/critical) is the third re-fire of the same rule family on this workflow (igerber#11/igerber#12 → igerber#13 → igerber#14). PR igerber#427's fork-skip gate was load-bearing for igerber#11/igerber#12 but the rule re-classified as igerber#14 at a shifted line range with escalated severity. A structural fix (checkout BASE_SHA only; `git show` for PR-head reads) would close the rule cleanly but degrade reviewer quality: pr_review.md explicitly instructs Codex to `grep -n "pattern" diff_diff/*.py` for pattern-consistency checks, and Codex's `sandbox: read-only` lets it inspect workspace files. Under restructure, those operations would search BASE content, missing patterns added in the PR. Workarounds that materialize PR-head content via non-`actions/checkout` mechanisms (git worktree, overlay) satisfy CodeQL's pattern matcher without changing runtime risk — security theater dressed up. The honest path: dismiss with documented rationale + guard test. The actual threat model accepted by the dismissal: - Codex runs with sandbox: read-only — can read PR-head files, cannot execute or write them - head_sha is API-pinned in the resolve-pr step before checkout; TOCTOU window is sub-second and bounded by the is_fork gate - Same-repo PR contributors who can push to PR head branches already have write access to push to main; the checkout doesn't expand attack surface - Fork PRs are skipped entirely (PR igerber#427 fork-skip gate) The dismissal becomes invalid only if a future workflow edit adds a step that EXECUTES PR-head content. Guard test TestWorkflowDoesNotExecutePRHeadCode in tests/test_openai_review.py fails CI on any of: pip install, pytest, npm install, cargo run/test, make, ./configure, maturin develop/build, poetry install/run, pdm, uv sync/run, tox, setup.py, etc. The maturin entries are load-bearing for this repo (Rust build per CLAUDE.md is the most likely future regression vector). The dismissal API call (gh api PATCH .../code-scanning/alerts/14) will be run post-merge, separately from this PR.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Performance and correctness improvements:
Tests: