@description Review an existing PR with parallel agents, fix findings, and push. @arguments $PR_NUMBER: GitHub PR number to review and fix
Read PR #$PR_NUMBER thoroughly using gh pr view. Understand the
full context: description, linked issues, commit history, and the
diff against the base branch.
Detect the upstream repository: if a git remote named upstream
exists, use it as the canonical repo. Otherwise, fall back to
origin. Resolve the canonical repo's owner/name (e.g. from
git remote get-url upstream) and store it — use
--repo <owner/name> on every gh command to ensure they target
the correct repository. Run git fetch <upstream-remote> to
ensure you are working with up-to-date code.
Check out the PR branch locally.
Execute every step below sequentially. Do not stop or ask for confirmation at any step.
Run two review passes in parallel, then merge findings.
Launch these Task tool agents in parallel (single message,
multiple tool calls), each with subagent_type from the
pr-review-toolkit plugin. Tell each agent which files changed
(from git diff --name-only <base>...HEAD):
| agent | focus |
|---|---|
pr-review-toolkit:code-reviewer |
Code quality, style, project guidelines |
pr-review-toolkit:silent-failure-hunter |
Silent failures, swallowed errors, bad fallbacks |
pr-review-toolkit:pr-test-analyzer |
Test coverage gaps and missing edge cases |
Launch these Task tool agents in parallel with Pass A — all
5 agents in a single message, multiple tool calls. Each uses
subagent_type: general-purpose.
Codex reviewer — tell the agent to run:
codex review --base <upstream-remote>/<base-branch> \
-c model='"gpt-5.3-codex"' \
-c model_reasoning_effort='"xhigh"'--basedoes not accept custom prompts (codex readsAGENTS.mdat the repo root if one exists)- If
gpt-5.3-codexfails with an auth error, retry withgpt-5.2-codex - Set
timeout: 600000on the Bash call - Tell the agent to summarize findings only — skip
[thinking]/[exec]blocks and sandbox warnings - If
codexis not installed, report and skip
Gemini reviewer — tell the agent to run:
git diff <upstream-remote>/<base-branch>...HEAD > /tmp/pr-review-diff.txt
# Build prompt file (avoids heredoc shell expansion issues)
{
echo "Review this diff for code quality, bugs, and improvements."
if [ -f CLAUDE.md ] || [ -f .claude/CLAUDE.md ]; then
echo ""
echo "Project conventions:"
echo "---"
cat CLAUDE.md .claude/CLAUDE.md 2>/dev/null
echo "---"
fi
echo ""
echo "Diff:"
cat /tmp/pr-review-diff.txt
} > /tmp/pr-review-prompt.txt
# Pipe prompt via stdin to avoid shell metacharacter issues
cat /tmp/pr-review-prompt.txt | gemini -p - \
-m gemini-3-pro-preview \
--yolo- Uses stdin (
-p -) instead of heredoc to avoid shell expansion issues with$, backticks, etc. in diffs - Set
timeout: 600000on the Bash call - If
geminiis not installed, report and skip
Collect results from all 5 sources (3 toolkit agents + Codex + Gemini). Deduplicate overlapping findings — if multiple sources flag the same issue, keep the most specific description and note the consensus. Rank every finding by severity:
- P1 — blocks merge (correctness bugs, security issues)
- P2 — important (missing error handling, test gaps, logic flaws)
- P3 — nice to have (style, naming, minor simplifications)
- P4 — informational (observations, suggestions for future work)
Address all P1–P3 findings. For each finding, either:
- Fix it — apply the change, or
- Dismiss it — explain why it's a false positive or not worth the churn (e.g. a stylistic disagreement or an impossible edge case). Document the reasoning inline.
When a fix requires external context — unfamiliar library behavior,
unclear API semantics, or an error you don't recognize — use Exa
(mcp__exa__web_search_exa) to search for solutions rather than
guessing.
P4 findings are informational — note them but do not fix unless trivial.
After addressing all findings, review your own fixes: read the diff of changes made in this step and verify each fix is correct, doesn't introduce new issues, and doesn't regress other parts of the PR. If you spot a problem, fix it before proceeding.
Before running anything, read the project's CI configuration to learn what the project actually runs. This takes priority over the fallback tables below.
- Read CI workflows. Scan
.github/workflows/for the main CI workflow (typicallyci.yml,test.yml, orbuild.yml). Extract:- Test commands with feature flags (e.g.
cargo test --features foo,bar) - Lint/format commands with non-default flags
- Any step that runs a command then checks
git diff --exit-code— these are codegen sync checks (schema generation, snapshot updates, help text, etc.). Record the command. - Docs/site build commands (e.g.
make site,mkdocs build)
- Test commands with feature flags (e.g.
- Read the Makefile (if present). Cross-reference targets used in CI — these are the ones that matter.
- Read CLAUDE.md (if present at repo root or
.claude/). It may define project-specific quality gates.
Store the discovered commands. They override the fallback table for any overlapping step.
Detect the project language from manifest files (Cargo.toml →
Rust, pyproject.toml/setup.py → Python, package.json →
Node/TypeScript, go.mod → Go). A project may use multiple
languages; run checks for each.
Run checks in this order. For each step, use the CI-discovered command if one was found; otherwise fall back to the default.
- Build — compile or bundle
- Test — run the full test suite with the same feature flags CI uses. Iterate on failures until green.
- Lint and format — fix any issues
- Extended checks — per-language extras (see fallback table)
- Codegen sync — for every codegen check discovered in 3a,
run the command and verify
git diff --exit-code. If the diff is non-empty, the generated files are stale — regenerate and stage them. - Docs build — if the PR changes documentation files and a docs build command exists, run it to verify the docs compile.
Rust (detected by Cargo.toml):
| step | command |
|---|---|
| build | cargo build |
| test | cargo test |
| lint | cargo clippy -- --deny warnings |
| format | cargo fmt --check |
| supply chain | cargo deny check (if deny.toml exists) |
| careful | cargo careful test (if cargo-careful installed) |
Python (detected by pyproject.toml or setup.py):
| step | command |
|---|---|
| test | pytest -q |
| lint | ruff check |
| format | ruff format --check |
| types | ty check (or mypy if configured) |
| supply chain | pip-audit |
Node/TypeScript (detected by package.json):
| step | command |
|---|---|
| build | per project (npm run build, tsc, etc.) |
| test | vitest (or project test script) |
| lint | oxlint (or project lint script) |
| format | oxfmt --check (or project format script) |
| types | tsc --noEmit |
| supply chain | pnpm audit --audit-level=moderate |
Go (detected by go.mod):
| step | command |
|---|---|
| build | go build ./... |
| test | go test ./... |
| lint | golangci-lint run |
| format | gofmt -l . |
| vet | go vet ./... |
If a tool is not installed, skip it with a note rather than failing the pipeline.
- Commit the fixes as a separate commit (do not squash into the original — preserve review history)
- Write a detailed commit message that covers:
- Subject:
fix: resolve code review findings for PR #$PR_NUMBER - Body: list findings by severity, what was fixed vs dismissed (with brief reasoning), and confirmation that the quality pipeline passes
- Subject:
- Push the branch (regular push, not force-push)
- Delete any todo files in
todos/that were created by the review and are now resolved
Post a review summary as a PR comment using
gh pr comment $PR_NUMBER --repo <owner/name>.
Format the comment body as:
## Review Summary
### Findings
[For each severity level that has findings, list them as a table:]
| # | Severity | Finding | Resolution |
|---|----------|---------|------------|
| 1 | P1 | [description] | Fixed: [what was done] |
| 2 | P2 | [description] | Dismissed: [reasoning] |
| ... | ... | ... | ... |
### Verification
- **Tests**: [pass/fail count]
- **Lint**: [clean/issues]
- **Format**: [clean/issues]
### Commit
[commit SHA and subject line]