Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions .github/workflows/ai_pr_review.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,66 @@ jobs:
)

steps:
# ─────────────────────────────────────────────────────────────────
# CodeQL alert #14 (untrusted-checkout-toctou/critical) was
# dismissed on 2026-05-14 as "won't fix" with the rationale
# documented below. The dismissal is valid ONLY while ALL of
# the following hold:
# 1. The Codex action runs with sandbox: read-only (see the
# Run Codex step below). No step EXECUTES PR-head FILE
# BYTES — no `pip install -e .`, `pytest`, `npm install`,
# `cargo run`, `make`, `./configure`, `maturin develop`,
# or `python3 <pr-file>.py` against checked-out PR-head
# files. Inline `python3 -c '...'` invocations operating
# on sanitized environment variables (PR_TITLE, PR_BODY,
# PREV_REVIEW) are SAFE — the script body comes from base,
# not from PR head.
# 2. The fork-skip gate (`is_fork == 'false'`) gates every
# post-resolve step, including both actions/checkout
# invocations.
# 3. The author_association check on issue_comment /
# review-comment events restricts triggers to
# OWNER/MEMBER/COLLABORATOR.
# 4. head_sha is API-pinned in this resolve-pr step before
# checkout.
#
# If you are adding a step that VIOLATES any of these
# invariants, this dismissal is invalid. Either remove the
# offending step OR restructure the workflow to checkout
# BASE_SHA only and use `git show` for PR-head reads (degrades
# reviewer quality — see plan archive).
#
# Guard test: tests/test_openai_review.py
# ::TestWorkflowDoesNotExecutePRHeadCode
# Catches the common ACCIDENTAL-regression forms: pip / npm /
# yarn / cargo / make / maturin / poetry / pdm / uv / tox /
# setup.py install/run, python file execution against PR-head
# paths, bash -c / sh -c (including compound flags -lc/-ec/-exc)
# with python/cp inside, env-var prefixes (`VAR=1 python3 ...`),
# wrapper commands (`env`, `nohup`, `exec`, `time`, `command`),
# shell negation (`!`), backslash line continuations, single-line
# `python3 -c` payloads (literal-allowlisted, currently empty),
# subshell `(...)`, brace group `{ ...; }`, and write-overwrites
# of allowlisted /tmp paths between staging and execution
# (`>`, `>>`, `cp`, `mv`, `tee`, `ln`).
#
# NOT exhaustive against an adversarial maintainer. Known
# unmodeled paths (any of these CAN execute PR-head bytes
# without tripping CI; a maintainer reviewing this workflow
# MUST refuse changes that introduce them):
# - bash <script> / sh <script> / ./<script> / source <script>
# / . <script> — direct shell-script execution
# - multi-line `python3 -c` bodies (line-by-line shlex can't
# reassemble across newlines; the workflow's existing 5
# sanitizer bodies are therefore exempt by invisibility)
# - variable expansion: `SCRIPT="$X"; python3 "$SCRIPT"`
# - `eval`, `find -exec`, `xargs -I {}`
# The dismissal's primary defense is THIS comment block plus
# the `dismissed_comment` field on alert #14. The guard test
# is belt-and-suspenders for accidental regressions, not a
# complete adversarial parser.
# See TODO.md for the long-term tracking of this gap.
# ─────────────────────────────────────────────────────────────────
- name: Resolve PR number + metadata
id: pr
uses: actions/github-script@v9
Expand Down
1 change: 1 addition & 0 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ Deferred items from PR reviews that were not addressed before merge.
| HonestDiD `test_m0_short_circuit` uses wall-clock `elapsed < 0.5s` as a proxy for "short-circuit path taken" instead of calling the full optimizer. Replace with a direct correctness signal (mock/spy the optimizer or check a state flag) so the test doesn't depend on CI timing. Not flaky today at 500ms, but load-bearing correctness on a timing proxy is brittle. | `tests/test_methodology_honest_did.py:246` | — | Low |
| SyntheticDiD: rename internal `placebo_effects` variable to `variance_effects` (or `resampled_effects`). Misleading name across the placebo/bootstrap/jackknife dispatch paths — holds three different contents depending on variance method. Low-risk refactor; user-facing field rename should preserve `placebo_effects` as a deprecated alias for one release. | `synthetic_did.py`, `results.py` | follow-up | Medium |
| AI review CI: pin workflow contract via test (uses `openai/codex-action@v1`, passes `prompt-file`, reads `steps.run_codex.outputs.final-message`, preserves diff-exclude paths and comment markers). Currently only the wrapper-tag and closing-tag-escape strings are asserted. | `tests/test_openai_review.py`, `.github/workflows/ai_pr_review.yml` | #416 | Low |
| `TestWorkflowDoesNotExecutePRHeadCode` (CodeQL #14 dismissal guard) does not model: `bash <script>` / `sh <script>` / `./<script>` / `source <script>` / `. <script>` direct shell-script execution; multi-line `python3 -c` bodies (line-by-line shlex can't reassemble across newlines — the workflow's 5 sanitizer bodies are exempt by invisibility); shell-variable-expansion indirection (`SCRIPT="$X"; python3 "$SCRIPT"`); `eval`; `find -exec`; `xargs -I {}`. Each represents a path by which PR-head bytes COULD execute without the test failing. The guard catches accidental regressions of common forms (16 tests covering pip/npm/cargo/maturin/etc. installs, python file exec, bash -c indirection with compound flags, env-var prefixes, line continuations, subshells/brace groups, single-line python -c, write-overwrites of allowlisted /tmp paths). Closing the residuals would require multi-line shell parsing with command-substitution awareness + script-execution allowlists — significant work for diminishing return given the dismissal's primary defense is the documented threat model on the alert and in `.github/workflows/ai_pr_review.yml` comment block. | `tests/test_openai_review.py`, `.github/workflows/ai_pr_review.yml` | #436 | Low |
| Render `docs/methodology/REPORTING.md` and `docs/methodology/REGISTRY.md` as in-site Sphinx pages so cross-references can use `:doc:` instead of off-site GitHub `blob/main` URLs. Current state (#410 fix-audit-r2) restores navigable links via `blob/main`, but stable-docs readers can land on a different revision than the package version they are reading. Two viable paths: (a) add `myst-parser` to `docs/conf.py` extensions + docs extras and link with `:doc:`, or (b) convert both files to `.rst`. | `docs/conf.py`, `docs/api/business_report.rst`, `docs/api/diagnostic_report.rst`, `docs/tutorials/18_geo_experiments.ipynb`, `docs/tutorials/19_dcdh_marketing_pulse.ipynb` | follow-up | Low |

---
Expand Down
Loading
Loading