bborbe · bborbe · Jun 3, 2026 · Jun 3, 2026
diff --git a/scenarios/001-toolchain-preflight.md b/scenarios/001-toolchain-preflight.md
@@ -0,0 +1,42 @@
+---
+status: draft
+---
+
+# Scenario 001: Dispatcher fail-fasts when ast-grep binary is missing
+
+Validates that `/coding:code-review` and `/coding:pr-review` abort with an actionable error when `ast-grep` / `sg` is absent from PATH — closing the silent-empty-review failure mode observed on [bborbe/coding#34](https://github.com/bborbe/coding/pull/34).
+
+## Setup
+
+- [ ] Build a masked PATH that excludes any directory containing `ast-grep` or `sg`:
+  ```bash
+  PATH_MASKED=$(echo "$PATH" | tr ':' '\n' | while read d; do
+    [ -x "$d/ast-grep" ] || [ -x "$d/sg" ] || echo "$d"
+  done | paste -sd: -)
+  ```
+- [ ] `PATH=$PATH_MASKED command -v ast-grep; echo "exit=$?"` prints `exit=1`
+- [ ] `PATH=$PATH_MASKED command -v sg; echo "exit=$?"` prints `exit=1`
+- [ ] Create a minimal Go fixture: `WORK=$(mktemp -d) && cd "$WORK" && git init -q && printf 'package main\n\nfunc main() {}\n' > main.go && git add . && git commit -qm initial`
+- [ ] Host shell `command -v ast-grep` (outside any subshell) still resolves a path
+
+## Action
+
+- [ ] In a fresh Claude Code session launched as `PATH=$PATH_MASKED claude`, run `/coding:code-review master` against `$WORK`; tee stdout to `/tmp/cr-stdout.log`, stderr to `/tmp/cr-stderr.log`, and capture exit code: `... ; echo $? > /tmp/cr-exit`
+- [ ] In a fresh Claude Code session launched as `PATH=$PATH_MASKED claude`, run `/coding:pr-review master` against `$WORK`; same tee + exit-code capture to `/tmp/pr-{stdout,stderr,exit}.log`
+- [ ] Direct runner test: in a `PATH=$PATH_MASKED` subshell, invoke the `coding:ast-grep-runner` agent via `claude` with the prompt `TARGET_DIR=$WORK. Run every YAML in rules/<lang>/*.yml. Return findings grouped by Owner.` — tee stdout to `/tmp/runner-stdout.log`, capture exit code to `/tmp/runner-exit`
+
+## Expected
+
+- [ ] `/tmp/cr-stderr.log` contains the literal string `ast-grep/sg not in PATH`
+- [ ] `cat /tmp/cr-exit` prints `1`
+- [ ] `grep -c 'coding:ast-grep-runner agent:' /tmp/cr-stdout.log` returns `0` (Step 4a was never invoked)
+- [ ] `/tmp/pr-stderr.log` contains `ast-grep/sg not in PATH` and `cat /tmp/pr-exit` prints `1`
+- [ ] `jq -e '.errors[] | select(.kind == "missing-tool" and .tool == "ast-grep")' /tmp/runner-stdout.log` exits 0
+- [ ] `jq '.stats.yamls_run' /tmp/runner-stdout.log` returns `0` and `jq '.findings_by_owner' /tmp/runner-stdout.log` returns `{}`
+- [ ] `cat /tmp/runner-exit` prints `1`
+- [ ] Each invocation's wall-clock under 30 seconds (`time` output's `real` < 30s) — the regression risk being locked down is a 30-min `activeDeadlineSeconds` kill instead of immediate fail
+- [ ] Host shell `command -v ast-grep` (re-run after all subshells) still resolves a path
+
+## Cleanup
+
+- `rm -rf "$WORK" /tmp/cr-* /tmp/pr-* /tmp/runner-*`
diff --git a/scenarios/002-clean-pr-zero-findings.md b/scenarios/002-clean-pr-zero-findings.md
@@ -0,0 +1,32 @@
+---
+status: draft
+---
+
+# Scenario 002: Zero-violation PR in standard mode produces empty findings, no false positives
+
+Validates that a diff with no mechanical-rule violations flows through `/coding:code-review master standard` (Step 4.0 → 4a → 4b → 4c → 4d → 5) and produces a report with empty Must Fix / Should Fix / Nice to Have sections — locking down the regression risk that the LLM tier hallucinates findings to fill the void when the mechanical layer surfaces none.
+
+## Setup
+
+- [ ] Clone master at the current HEAD: `WORK=$(mktemp -d) && cd "$WORK" && git clone --depth=1 --branch master git@github.com:bborbe/coding.git . && git checkout -b clean-pr-fixture`
+- [ ] Apply a README-only whitespace edit (no `.go` / `.py` / `rules/**` / `docs/**` touched): `sed -i.bak 's/^# bborbe\/coding$/# bborbe\/coding  /' README.md && rm README.md.bak && git add README.md && git commit -qm 'docs: typo whitespace'`
+- [ ] `git diff --name-only master..HEAD` prints exactly one line: `README.md`
+- [ ] `ast-grep --version` resolves on host (Step 4.0 preflight will pass)
+
+## Action
+
+- [ ] Run `/coding:code-review master` against `$WORK` in a fresh Claude Code session (standard mode is the default — do not pass a mode argument); tee stdout to `/tmp/code-review-stdout.log`, stderr to `/tmp/code-review-stderr.log`, capture exit code to `/tmp/code-review-exit`
+
+## Expected
+
+- [ ] `cat /tmp/code-review-exit` prints `0`
+- [ ] Runner block in stdout has `findings_count: 0`: extract the JSON block via `awk '/^{$/,/^}$/' /tmp/code-review-stdout.log > /tmp/code-review-runner.json && jq '.stats.findings_count' /tmp/code-review-runner.json` returns `0`
+- [ ] Report includes a `Must Fix (Critical)` section whose body line is exactly `None.` — verified via: `awk '/^#### Must Fix/{flag=1; next} /^#### /{flag=0} flag && NF' /tmp/code-review-stdout.log` prints exactly `None.`
+- [ ] Report includes a `Should Fix (Important)` section whose body line is exactly `None.` (same `awk` shape)
+- [ ] Report includes a `Nice to Have (Optional)` section whose body line is exactly `None.` (same `awk` shape)
+- [ ] `grep -c 'dropped finding' /tmp/code-review-stderr.log` returns `0` — citation validator had nothing to drop because no findings were generated
+- [ ] Run completes in under 5 minutes (`time` output's `real` < 5m)
+
+## Cleanup
+
+- `rm -rf "$WORK" /tmp/code-review-*`
diff --git a/scenarios/003-scaling-funnel-100-files.md b/scenarios/003-scaling-funnel-100-files.md
@@ -0,0 +1,71 @@
+---
+status: draft
+---
+
+# Scenario 003: 100-file synthetic PR completes review in ≤30 LLM calls
+
+Validates that a 100-file synthetic PR with known mechanical violations completes `/coding:pr-review master standard` within ≤30 LLM calls and ≤30 minutes — proving the ast-grep funnel decouples LLM cost from PR size and stays under the prod bot's `activeDeadlineSeconds=1800` ceiling. Companion decoupling scenario (200-file re-run with ≤30 LLM calls) lives separately as scenario 004.
+
+## Setup
+
+- [ ] Build the synthetic-PR fixture:
+  ```bash
+  WORK=$(mktemp -d) && cd "$WORK" && git init -q
+  mkdir -p pkg/{handler,service,store,worker,internal}/{user,order,product,customer,billing}
+  i=0
+  for dir in pkg/handler/{user,order,product,customer,billing} \
+             pkg/service/{user,order,product,customer,billing} \
+             pkg/store/{user,order,product,customer,billing} \
+             pkg/worker/{user,order,product,customer,billing} \
+             pkg/internal/{user,order,product,customer,billing}; do
+    for n in 1 2 3 4; do
+      i=$((i+1))
+      cat > "$dir/file$n.go" <<EOF
+  package $(basename $dir)
+  import "fmt"
+  func NewService$i() *Service$i { return &Service$i{} }
+  type Service$i struct{}
+  func (s *Service$i) Process() error { return fmt.Errorf("processing failed") }
+  EOF
+    done
+  done
+  git add . && git commit -qm initial
+  ```
+- [ ] `git ls-files '*.go' | wc -l` returns exactly `100`
+- [ ] `ast-grep --version` resolves on host
+- [ ] Pin the LLM-call counter via a Claude CLI shim:
+  ```bash
+  CLAUDE_BIN=$(command -v claude)
+  mkdir -p /tmp/llm-count-shim
+  cat > /tmp/llm-count-shim/claude <<SHIM
+  #!/bin/sh
+  printf '1\n' >> /tmp/llm-call-count.log
+  exec "$CLAUDE_BIN" "\$@"
+  SHIM
+  chmod +x /tmp/llm-count-shim/claude
+  : > /tmp/llm-call-count.log
+  export PATH=/tmp/llm-count-shim:$PATH
+  ```
+  Every `claude` invocation under the shim appends one line; final count is `wc -l < /tmp/llm-call-count.log`
+- [ ] Positive control on the shim: `claude --version >/dev/null && [ "$(wc -l < /tmp/llm-call-count.log)" = "1" ]` (after the manual probe, reset with `: > /tmp/llm-call-count.log` before the Action step)
+
+## Action
+
+- [ ] Run `/coding:pr-review master standard` against `$WORK` in a fresh Claude Code session under the shim PATH; tee stdout to `/tmp/scaling-pr-stdout.log`, stderr to `/tmp/scaling-pr-stderr.log`, capture exit code to `/tmp/scaling-pr-exit`
+- [ ] Record wall-clock duration with `time` wrapping the slash command; capture the `real` line to `/tmp/scaling-pr-time.log`
+- [ ] Extract the Step 5 Consolidated Report section: `awk '/^### Step 5:/{flag=1} flag' /tmp/scaling-pr-stdout.log > /tmp/scaling-pr-report.md`
+
+## Expected
+
+- [ ] `cat /tmp/scaling-pr-exit` prints `0`
+- [ ] `wc -l < /tmp/llm-call-count.log` returns ≤ `30` (proves the funnel: ast-grep mechanical layer carries 100-file scale at constant LLM cost)
+- [ ] `/tmp/scaling-pr-time.log` `real` line parses to ≤ `30m00s` (stays under the prod bot's `activeDeadlineSeconds=1800` ceiling — the same one that killed coding#27)
+- [ ] `jq '.stats.findings_count' /tmp/scaling-pr-stdout.log` returns > `0` — negative control: the synthetic violations were actually surfaced (if 0, the funnel didn't run and the LLM count is misleadingly low)
+- [ ] `grep -oE 'go-[a-z-]+/[a-z-]+' /tmp/scaling-pr-report.md | sort -u | wc -l` returns ≥ `3` — at least 3 distinct rule_ids surfaced (the per-Owner adjudication phase processed findings, didn't drop them)
+- [ ] `grep -c 'dropped finding' /tmp/scaling-pr-stderr.log` returns `0` — citation validator dropped nothing; every surfaced finding cites a real rule_id
+
+## Cleanup
+
+- `rm -rf "$WORK" /tmp/scaling-pr-* /tmp/llm-count-shim /tmp/llm-call-count.log`
+
+After the scenario passes, the operator should record the measured `(LLM count, duration, findings count)` tuple in the Progress section of the task page (`[[Refactor coding pr-review to doc-driven rules pipeline]]`) so future Phase-10 reruns have a baseline. This is a follow-up note, not part of the scenario contract.