Skip to content

Address code review feedback for data prep utilities#11

Merged
igerber merged 1 commit into
mainfrom
claude/add-data-prep-utilities-4Unoc
Jan 3, 2026
Merged

Address code review feedback for data prep utilities#11
igerber merged 1 commit into
mainfrom
claude/add-data-prep-utilities-4Unoc

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Jan 3, 2026

Performance and correctness improvements:

  • Replace iterrows() with pd.melt() in wide_to_long for better performance
  • Fix groupby().ffill() bug in balance_panel to correctly fill only non-key columns
  • Fix multi-period validation logic to properly check both treatment groups
  • Fix index labeling in summarize_did_data for non-0/1 time values
  • Guard np.isinf() against non-numeric types in create_event_time

Tests:

  • Add test for balance_panel forward/backward fill path

Performance and correctness improvements:
- Replace iterrows() with pd.melt() in wide_to_long for better performance
- Fix groupby().ffill() bug in balance_panel to correctly fill only non-key columns
- Fix multi-period validation logic to properly check both treatment groups
- Fix index labeling in summarize_did_data for non-0/1 time values
- Guard np.isinf() against non-numeric types in create_event_time

Tests:
- Add test for balance_panel forward/backward fill path
@igerber igerber merged commit 493a432 into main Jan 3, 2026
@igerber igerber deleted the claude/add-data-prep-utilities-4Unoc branch January 3, 2026 14:33
igerber added a commit that referenced this pull request Apr 18, 2026
…StageDiD

Addresses axis-C findings #8, #9, and #10 from the silent-failures audit:
three sites where a sparse factorization failure silently fell back to
dense lstsq without any user-facing signal.

- diff_diff/imputation.py:1516 (variance path: scipy.sparse.linalg.spsolve
  on (A_0' W A_0) z = A_1' w). Bare `except Exception` was swallowing
  the root cause before dense lstsq. Now emits a UserWarning identifying
  the exception type and explaining the fallback implication.
- diff_diff/two_stage.py:1647 (GMM sandwich: sparse_factorized on
  X'_{10} W X_{10} for Stage 1 normal equations). `except RuntimeError`
  was silent; now emits a UserWarning.
- diff_diff/two_stage_bootstrap.py:134 (bootstrap path: same pattern as
  above). `except RuntimeError` was silent; now emits a UserWarning.

All three are single-call sites (per fit, or per aggregation level, or
per bootstrap replicate at most a handful of times) so no aggregation
wrapper pattern is needed — one warning per fallback event is
appropriate.

REGISTRY.md updated under ImputationDiD and TwoStageDiD.

New tests (3): monkey-patch the sparse entry point to raise a
RuntimeError, run .fit(), assert the UserWarning fires with the
expected message prefix. Works against both the variance and bootstrap
surfaces.

Axis-C baseline: 3 major silent-fallback sites (imputation, two_stage,
two_stage_bootstrap) -> 0 remaining in these files. PowerAnalysis
simulation counter (finding #11) and ContinuousDiD B-spline (#12)
still open as separate follow-ups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 18, 2026
Two silent-failure patterns at `power.py:2241` addressed together:

1. Bare `except Exception` absorbed every exception a user-supplied
   estimator or result extractor could raise, including programming
   errors (TypeError, AttributeError, IndexError, ...). Those masked
   bugs counted as "simulation failures" and were aggregated as a
   reliability signal instead of propagating. Narrowed the catch to
   `(ValueError, np.linalg.LinAlgError, KeyError, RuntimeError,
   ZeroDivisionError)` — the set that reasonable DGPs and fit paths
   can raise on adversarial samples.

2. The internal `n_failures` counter was per-effect-size and thrown
   away after the outer loop, leaving users with no programmatic way
   to check whether a run was clean. Surface the primary-effect
   failure count on the results object as
   `SimulationPowerResults.n_simulation_failures` and include it in
   `summary()` output. The proportional > 10% `UserWarning` and the
   raise-if-all-fail escape are preserved.

Covered by audit axis C (silent fallback). Finding #11 from
`docs/audits/silent-failures-findings.md`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request May 11, 2026
Lifts the gate at chaisemartin_dhaultfoeuille.py:1230-1234 so per-path
event-study disaggregation composes with heterogeneity="<col>" (Web
Appendix Section 1.5, Lemma 7), mirroring R did_multiplegt_dyn(...,
by_path, predict_het) per-by_level dispatch.

Per-path heterogeneity is computed by re-running the Lemma 7 regression
on each path-restricted switcher subsample. New `path_groups`
(Optional[Set[int]]) parameter on _compute_heterogeneity_test restricts
eligibility to switchers ON path p; the variance machinery (standard
WLS vcov for non-survey, cell-period IF allocator for Binder TSL,
group-level allocator for Rao-Wu replicate) is unchanged from the
global heterogeneity path. Cohort dummies absorb baseline by
construction, so multi-baseline switcher panels do not produce
R-divergence (no parallel UserWarning like controls / trends_linear).

Surfaces on results.path_heterogeneity_effects keyed
{path: {l: {beta, se, t_stat, p_value, conf_int, n_obs}}} and on
to_dataframe(level="by_path") via new always-present het_* columns,
populated for positive horizons and NaN otherwise (mirrors cband_* /
cumulated_* convention). Per-(path, horizon) inference is refreshed
in the final R2 P1b block so all surfaces use the same df_survey
after replicate-weight n_valid appends.

R parity: introduces the FIRST predict_het R-parity baseline in the
repo. Two new scenarios (multi_path_reversible_predict_het global
anchor + multi_path_reversible_by_path_predict_het per-path) use
dont_drop_larger_lower=TRUE to match drop_larger_lower=False and
provide cohort variation under reversal paths. Per-path beta and SE
match R within rtol=1e-6.

Multiplier bootstrap (n_bootstrap > 0) under by_path + heterogeneity
+ survey_design inherits the existing per-path multiplier-bootstrap
gate from PR #408.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request May 11, 2026
dCDH by_path Wave 5 #11: + heterogeneity (predict_het per-by_level)
igerber added a commit that referenced this pull request May 14, 2026
GitHub Code Scanning flagged two errors on `.github/workflows/ai_pr_review.yml`:

  - #11 actions/untrusted-checkout/high — "Checkout of untrusted code in
    trusted context"
  - #12 actions/untrusted-checkout-toctou/high — "Insufficient protection
    against execution of untrusted code on a privileged workflow
    (issue_comment)"

CodeQL flags the structural pattern: the workflow runs `actions/checkout@v6`
with the PR head's repo+sha in a context that has access to OPENAI_API_KEY
and write permissions on PRs/issues. Today we don't execute the checked-out
code (just read it via `git diff`), but the pattern is risky — a future
edit adding "run tests" or "install deps" would turn it into an active RCE
on fork PRs. The PR-author-vs-commenter mismatch (#12) is real: a maintainer
commenting `/ai-review` on a fork PR passes the author_association check,
but the code being checked out is the fork's, which the maintainer didn't
author.

Fix: skip fork PRs entirely. Two-layer gate (different mechanics required
for the three event types):

  1. Workflow `if:` for `pull_request` events: require
     `github.event.pull_request.head.repo.full_name == github.repository`.
     Fork PRs opened against this repo no longer start a workflow run.

  2. For `issue_comment` and `pull_request_review_comment` events (event
     payloads don't include head-repo info — must API-fetch), the
     resolve-pr step now sets a new `is_fork` output. All 7 post-resolve
     steps are gated on `steps.pr.outputs.is_fork == 'false'`:
       - actions/checkout (closed PR path)
       - actions/checkout (open PR path) — the line CodeQL flags
       - Pre-fetch base SHA
       - Fetch previous AI review
       - Build review prompt
       - Run Codex
       - Post PR comment

     For comment-triggered events on fork PRs, the resolve-pr step prints
     a yellow `core.notice()` banner in the Actions UI explaining the
     skip; no checkout, no codex, no comment.

Background: per a prior thread on the same workflow, fork PRs already fail
today (secrets aren't passed to fork-PR `pull_request` events; comment
triggers on fork PRs would crash because OPENAI_API_KEY is empty). So
skipping cleanly just makes the failure honest — no real loss of behavior.

Tests added (TestWorkflowForkSkip class, 3 contract tests):
  - test_workflow_pull_request_if_block_excludes_fork_prs
  - test_workflow_resolve_pr_step_sets_is_fork_output
  - test_workflow_post_resolve_steps_gated_on_is_fork
    (asserts ≥7 occurrences of `is_fork == 'false'` so a future workflow
    refactor adding a new PR-content-touching step must extend the gate)

Out of scope:
  - Adding `pull_request_target` for fork PRs (would need much more
    hardening; not justified for our use case)
  - Posting a PR comment explaining the skip (Actions UI notice is enough)

Verification:
  - `pytest tests/test_openai_review.py -q` → 214 passed (3 new).
  - `python3 -c "import yaml; yaml.safe_load(...)"` → no errors.
  - 7 step-level `is_fork == 'false'` gates confirmed.
  - CodeQL on this PR should NOT re-flag #11/#12 — early validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request May 14, 2026
Skip fork PRs in AI review workflow (closes CodeQL #11, #12)
TDL77 pushed a commit to TDL77/diff-diff that referenced this pull request May 18, 2026
CodeQL alert igerber#14 (untrusted-checkout-toctou/critical) is the third
re-fire of the same rule family on this workflow (igerber#11/igerber#12igerber#13igerber#14).
PR igerber#427's fork-skip gate was load-bearing for igerber#11/igerber#12 but the rule
re-classified as igerber#14 at a shifted line range with escalated severity.

A structural fix (checkout BASE_SHA only; `git show` for PR-head reads)
would close the rule cleanly but degrade reviewer quality: pr_review.md
explicitly instructs Codex to `grep -n "pattern" diff_diff/*.py` for
pattern-consistency checks, and Codex's `sandbox: read-only` lets it
inspect workspace files. Under restructure, those operations would
search BASE content, missing patterns added in the PR.

Workarounds that materialize PR-head content via non-`actions/checkout`
mechanisms (git worktree, overlay) satisfy CodeQL's pattern matcher
without changing runtime risk — security theater dressed up.

The honest path: dismiss with documented rationale + guard test.

The actual threat model accepted by the dismissal:
- Codex runs with sandbox: read-only — can read PR-head files, cannot
  execute or write them
- head_sha is API-pinned in the resolve-pr step before checkout; TOCTOU
  window is sub-second and bounded by the is_fork gate
- Same-repo PR contributors who can push to PR head branches already
  have write access to push to main; the checkout doesn't expand
  attack surface
- Fork PRs are skipped entirely (PR igerber#427 fork-skip gate)

The dismissal becomes invalid only if a future workflow edit adds a
step that EXECUTES PR-head content. Guard test
TestWorkflowDoesNotExecutePRHeadCode in tests/test_openai_review.py
fails CI on any of: pip install, pytest, npm install, cargo run/test,
make, ./configure, maturin develop/build, poetry install/run, pdm,
uv sync/run, tox, setup.py, etc. The maturin entries are load-bearing
for this repo (Rust build per CLAUDE.md is the most likely future
regression vector).

The dismissal API call (gh api PATCH .../code-scanning/alerts/14) will
be run post-merge, separately from this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants