chore(scenarios): rewrite 001 + 003 to walkable contracts; promote both to active by bborbe · Pull Request #40 · bborbe/coding

bborbe · 2026-06-03T09:49:46Z

Summary

Original scenarios 001 + 003 required infrastructure that was impractical to orchestrate from a single Claude Code session:

001: fresh `PATH=$PATH_MASKED claude` subprocess running `/coding:code-review` and `/coding:pr-review`
003: LLM-call counter PATH-shim wrapping every `claude` invocation inside a real `/coding:pr-review` run

Rewrote both to test the code under test at a tighter scope, so the contract is satisfiable from any operator's terminal without bespoke harnesses. Then walked both end-to-end on master sha `b695450` and promoted to `status: active`.

Scope changes

Scenario	Before	After
001	"Run /coding:code-review under masked PATH; assert dispatcher aborts"	"Extract Step 4.0 bash block verbatim from pr-review.md + code-review.md + runner Step 0 from ast-grep-runner.md; run each under masked PATH; assert exit 1 + documented stderr + JSON contract"
003	"LLM-call count ≤30 via shim wrapping claude"	"Distinct Owners-with-findings ≤30 via direct funnel measurement (structurally equivalent because Step 4b invokes one Task per Owner)"

The full Phase-10 LLM-count measurement is deferred to a companion scenario 004 — not written yet because it still needs the LLM-shim infrastructure; 003 captures the load-bearing decoupling proof via Owner count.

Walk results

001 — 10/10 PASS

pr-review Step 4.0: exit 1, stderr contains "ast-grep/sg not in PATH"
code-review Step 4.0: exit 1, parity stderr
runner Step 0: exit 1, JSON `yamls_run=0`, `findings_by_owner={}`, `errors[0].kind=missing-tool`
Preflight wall: 10 ms (target ≤ 1s)
Host shell PATH unaffected after all subshells

003 — 5/5 PASS

100-file fixture committed
Funnel wall: 64 ms (target ≤ 30000 ms; 469× margin)
Distinct owners: 2 (target ≤ 30; 15× margin) — go-architecture-assistant + go-time-assistant
Findings count: 115 (negative control: > 0)
Distinct rule_ids: 4 (target ≥ 3) — constructor-returns-interface, no-globals-or-singletons, no-raw-go-func, no-time-now-direct
Citation discipline: every surfaced rule_id present in `rules/index.json`

Aftermath

All 3 scenarios on master will be `active` after merge.

Test plan

`make precommit` clean (131 rules, 28 mechanical YAMLs, no drift)
First successful walk of each scenario, evidence in commit message
Bot review

…th to active Original contracts required infrastructure (fresh PATH-masked claude subprocess for 001; LLM-call counter shim wrapping claude for 003) that is impractical to orchestrate from a single Claude Code session. Rewrote both to test the CODE UNDER TEST at a tighter scope: 001 — extracts the Step 4.0 bash block verbatim from pr-review.md + code-review.md + the Step 0 block from ast-grep-runner.md, runs each under a masked PATH, asserts exit 1 + documented stderr + JSON contract. That IS the dispatcher's preflight — the slash command just invokes this bash. No fresh-claude subprocess needed. 003 — replaces 'LLM-call count ≤30' (which required an LLM-shim + real /coding:pr-review run) with 'distinct Owners with findings ≤30' (structurally equivalent because Step 4b invokes one Task per Owner with findings; the LLM ceiling is the Owner count plus a small fixed overhead). Measurable directly from the mechanical funnel output. The full Phase-10 LLM-count measurement deferred to companion scenario 004 (not yet written). First successful walk 2026-06-03 on master sha b695450: 001: 10/10 checkboxes PASS. pr-review Step 4.0 — exit 1, stderr contains 'ast-grep/sg not in PATH' code-review Step 4.0 — exit 1, parity stderr runner Step 0 — exit 1, JSON: yamls_run=0, findings_by_owner={}, errors[0].kind='missing-tool' preflight wall — 10 ms (≤ 1s target) host shell PATH — unaffected after subshells 003: 5/5 checkboxes PASS. 100-file fixture — 100 .go files committed funnel wall — 64 ms (≤ 30000 ms target; 469× margin) distinct owners — 2 (≤ 30 target; 15× margin: go-architecture-assistant + go-time-assistant) findings_count — 115 (> 0 target — negative control passes) distinct rule_ids — 4 (≥ 3 target): constructor-returns-interface, no-globals-or-singletons, no-raw-go-func, no-time-now-direct citation discipline — every surfaced rule_id present in rules/index.json After this commit all 3 scenarios on master are 'active'.

ben-s-pull-request-reviewer

PR Review: `chore/scenarios-walkable` → `master`

Scope: 2 files changed — both are scenario walkthrough documents (.md).

Must Fix (Critical)

None — no production code, no security-sensitive changes.

Should Fix (Important)

scenarios/003-scaling-funnel-100-files.md:7 — Companion scenario 004 is referenced as if written, but scenarios/004-funnel-decoupling-doubled-fixture.md does not exist in the repo. The scenario description calls it out as the Phase-10 LLM-call measurement but this is a forward reference with no implementation. Future readers will follow the link and find nothing.
scenarios/001-toolchain-preflight.md:48 — Wall-clock assertion tightened from ≤30 s to ≤1 s. The concern in the plan notes CI environment variability. This is a tight budget for a command -v check that also spawns a subshell — unlikely to be the bottleneck, but the rationale for the 30× reduction should be documented inline (e.g., # 1s is generous: the block is a single command-v check + exit; ast-grep itself never runs).

Nice to Have (Optional)

scenarios/003-scaling-funnel-100-files.md — The Expected section tests distinct Owner count via a Python one-liner that reads rules/index.json. If the index is stale (not regenerated after a new rule is added), the distinct-Owner intersection may silently miss newly added rules. Consider adding a note that make build-index must be run before executing the scenario.
scenarios/003-scaling-funnel-100-files.md — The fixture's four violation layers map cleanly to four rule_ids (go-architecture/constructor-returns-interface, go-concurrency/no-raw-go-func, go-architecture/no-globals-or-singletons, go-time/no-time-now-direct). This is correct and all four are in rules/index.json. No action needed.
scenarios/001-toolchain-preflight.md — The setup step extracts bash blocks from commands/pr-review.md, commands/code-review.md, and agents/ast-grep-runner.md by searching for section headers. If any of those section headers shift (e.g., a later refactor renumbers steps), the extraction will silently pick up nothing or the wrong block. This is an accepted fragility of text-based extraction but worth noting.

Concerns from Plan

Concern	Status
Scenario 001 bash block drift from source docs	Partially addressed — all three source blocks verified to exist at the expected section headers; risk of drift remains (see Nice to Have #5)
Scenario 003 fixture / rule_ids mismatch	Addressed — all four rule_ids verified present in `rules/index.json`
Wall-clock ≤1 s achievability in CI	Open concern — tightened assertion is aggressive; no evidence it fails; see Should Fix #2
Companion scenario 004 not written	Open concern — forward reference in scenario 003 is unresolved; see Should Fix #1

Next Steps

/coding:write-test not applicable — no production code changed.
Resolve Should Fix #1 by either writing scenarios/004-funnel-decoupling-doubled-fixture.md or removing the forward reference from scenario 003.
Resolve Should Fix #2 by adding an inline comment in scenario 001 justifying the 1 s wall-clock budget.

{
  "verdict": "request-changes",
  "summary": "Both changed files are scenario walkthroughs with no production code. Two should-fix issues were found: scenario 003 references a non-existent companion scenario 004, and scenario 001's 1-second wall-clock assertion is aggressive without inline justification. The mechanical-layer rule-id mapping in scenario 003 is verified correct against rules/index.json.",
  "comments": [
    {
      "file": "scenarios/003-scaling-funnel-100-files.md",
      "line": 7,
      "severity": "major",
      "message": "Companion scenario 004 (004-funnel-decoupling-doubled-fixture.md) is referenced but does not exist. The scenario description calls it the Phase-10 LLM-call measurement but it has not been written. Either write the companion or remove the forward reference."
    },
    {
      "file": "scenarios/001-toolchain-preflight.md",
      "line": 48,
      "severity": "major",
      "message": "Wall-clock assertion was tightened from ≤30s to ≤1s (30× reduction) without inline justification. Add a comment explaining that the block is a single command-v check that exits immediately — making 1s a generous budget even in CI environments with slower disk I/O."
    },
    {
      "file": "scenarios/003-scaling-funnel-100-files.md",
      "line": 52,
      "severity": "nit",
      "message": "The distinct-Owner count test reads rules/index.json directly. If the index is stale (not regenerated after a new rule is added), the Python intersection may silently miss new rules. Consider noting that make build-index should be run before executing this scenario."
    },
    {
      "file": "scenarios/001-toolchain-preflight.md",
      "line": 19,
      "severity": "nit",
      "message": "Bash block extraction relies on searching for section header text (e.g., '#### 4.0:'). If commands/pr-review.md or agents/ast-grep-runner.md is refactored and section headers shift, the extraction will silently pick up the wrong block. Consider noting this as an accepted fragility."
    }
  ],
  "concerns_addressed": [
    "correctness: scenario 001 bash block drift — blocks verified to exist at expected section headers; risk remains (see nit comment)",
    "correctness: scenario 003 rule_ids — all four fixture violation types map to rule_ids present in rules/index.json",
    "tests: wall-clock ≤1s — open concern, no evidence of failure in CI; needs inline justification (see major comment)",
    "correctness: companion scenario 004 not written — not addressed; forward reference remains (see major comment)"
  ]
}

…nale, NITs MAJORs: - 003 description: removed broken markdown link to non-existent 004-funnel-decoupling-doubled-fixture.md. Reworded as 'deferred to a future scenario 004 — not yet written' so the forward reference no longer pretends to link to a file. - 001 Expected: added inline rationale for the 1s wall-clock budget. Each block is a single command-v exit; 1s is generous on slow CI I/O; the regression risk is the 30-min loop on coding#34, so any threshold under 30 min proves the contract and 1s catches it with 1800x margin. NITs: - 001 Setup: noted the awk extraction is anchored on literal section headers (#### 4.0:, ### 0. Preflight:). If a refactor renames the heading, the extraction silently picks up the wrong block. Future edits should re-anchor. - 003 Setup: added 'make build-index before scanning' precondition. The Owner-count assertion reads rules/index.json directly; a stale index would silently miss new rule_ids during the intersection.

superseded by new automated review

ben-s-pull-request-reviewer

{
  "verdict": "approve",
  "summary": "PR is a documentation-only refactor promoting two scenario markdown files from draft to active status. Changes include extracting inline bash blocks for preflight testing (replacing full slash-command invocations), streamlining the 100-file fixture generation, and shifting measurement from LLM-call counting to structural Owner-count ceiling analysis. No code files were modified; all ast-grep rules target Go code and would produce no findings against markdown content.",
  "comments": [
    {
      "file": "scenarios/001-toolchain-preflight.md",
      "line": 19,
      "severity": "nit",
      "message": "Fragility note acknowledged by author: extraction relies on literal heading `#### 4.0:` — future renumbering will silently pick up wrong block. Consider adding a comment in commands/pr-review.md near that block with a stable anchor marker (e.g., `<!-- SCEN001_ANCHOR -->`) that the scenario can reference instead of the heading text."
    },
    {
      "file": "scenarios/001-toolchain-preflight.md",
      "line": 46,
      "severity": "nit",
      "message": "Cleanup uses brace expansion `/tmp/scen001-*.{stderr,exit,stdout}` — not all shells (e.g., older /bin/sh) support this. Could use explicit glob or separate rm calls for maximum portability, though this is a minor portability concern for a scenario intended for Claude Code environments."
    },
    {
      "file": "scenarios/003-scaling-funnel-100-files.md",
      "line": 55,
      "severity": "nit",
      "message": "Hardcoded `cd ~/Documents/workspaces/coding` — if operator's coding repo root differs, the ast-grep scan will run against the wrong tree. Consider using `${CODING_ROOT:-$PWD}` or requiring the operator to export `CODING_ROOT` in setup."
    },
    {
      "file": "scenarios/003-scaling-funnel-100-files.md",
      "line": 55,
      "severity": "nit",
      "message": "ast-grep presence is validated at line 50 (`ast-grep --version` resolves on host) but the mechanical funnel step at line 55 has no explicit error if ast-grep is present but broken. A broken ast-grep binary could produce a 0-owner result that passes the ≤30 check. Consider adding an explicit pre-validation that ast-grep scan returns exit 0 against the fixture."
    }
  ],
  "concerns_addressed": [
    "correctness: heading anchor brittleness in 001 — documented as fragility note, not fixed",
    "correctness: brace expansion portability in 001 cleanup — not addressed, minor",
    "correctness: hardcoded ~/Documents/workspaces/coding in 003 Step 2 — documented as issue, not fixed",
    "tests: owner-count structural correctness in 003 — addressed by new ≤30 structural ceiling approach instead of LLM-call counting",
    "correctness: ast-grep missing validation in 003 — acknowledged but not structurally fixed"
  ]
}

ben-s-pull-request-reviewer

{
  "verdict": "approve",
  "summary": "Both scenario rewrites are well-structured contract documents. Scenario 001 correctly anchors extraction on literal section headers (`#### 4.0:` and `### 0. Preflight:`) that exist in the source files, and its JSON output schema fields (`stats.yamls_run`, `findings_by_owner`, `errors[].kind`) match the actual runner output documented in `agents/ast-grep-runner.md`. Scenario 003's structural claim (≤30 distinct Owners bounding Step 4b invocations) is verified: the index has 17 distinct owners, well under the ceiling. No issues found.",
  "comments": [],
  "concerns_addressed": [
    "correctness: Scenario 001 extraction anchors verified — `#### 4.0:` exists in pr-review.md:96 and code-review.md:77; `### 0. Preflight:` exists in ast-grep-runner.md:57",
    "correctness: Scenario 001 JSON contract fields match runner schema — `stats.yamls_run`, `findings_by_owner`, `errors[].kind == \"missing-tool\"` all present in runner output at ast-grep-runner.md:75",
    "correctness: Scenario 003 structural equivalence holds — 17 distinct owners in rules/index.json (≤30 ceiling confirmed)",
    "tests: Scenario 001 error message `ast-grep/sg not in PATH` matches verbatim between pr-review.md:102 and code-review.md:83"
  ]
}

…bot review Bot correctly flagged that scenarios/004 was not referenced in README.md or llms.txt; broader audit found scenarios/001-003 were also never indexed (oversight in PRs #37, #38, #40). Closing the gap retroactively for all 4 active scenarios. README.md: new 'Acceptance Scenarios' section above Contributing with a 4-row table linking each scenario file + one-line 'Validates...' description. llms.txt: parallel section at the bottom so AI agents relying on llms.txt for discovery can find the scenarios alongside guides.

/coding:commit's pipeline-only detection lumped scenarios/ with prompts/ and specs/ — meaning a PR adding ONLY scenario files would route to Workflow E (commit + push, no changelog entry). That's wrong for repos where scenarios are shipped acceptance contracts that users invoke via /dark-factory:run-scenario; in that role they're release-relevant artifacts on the same footing as docs/ and rules/. Without this fix, today's PRs #37 (3 scenarios drafted), #40 (scenarios rewritten + promoted to active), and #41 (4th scenario added) would each have shipped with no changelog record. The v0.15.0 release notes that this PR drafts had to retroactively list scenarios as a feature. Updated commands/commit.md:139-153 — drop scenarios/ from the pipeline-only list and the corresponding rationale paragraph. Prompts and specs remain pipeline-only (dark-factory daemon runtime state, not shipped artifacts).

ben-s-pull-request-reviewer Bot previously requested changes Jun 3, 2026

View reviewed changes

ben-s-pull-request-reviewer Bot approved these changes Jun 3, 2026

View reviewed changes

bborbe merged commit 3e0d052 into master Jun 3, 2026
1 check passed

bborbe deleted the chore/scenarios-walkable branch June 3, 2026 10:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(scenarios): rewrite 001 + 003 to walkable contracts; promote both to active#40

chore(scenarios): rewrite 001 + 003 to walkable contracts; promote both to active#40
bborbe merged 2 commits into
masterfrom
chore/scenarios-walkable

bborbe commented Jun 3, 2026

Uh oh!

ben-s-pull-request-reviewer Bot left a comment

Uh oh!

ben-s-pull-request-reviewer Bot left a comment

Uh oh!

ben-s-pull-request-reviewer Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bborbe commented Jun 3, 2026

Summary

Scope changes

Walk results

001 — 10/10 PASS

003 — 5/5 PASS

Aftermath

Test plan

Uh oh!

ben-s-pull-request-reviewer Bot left a comment

Choose a reason for hiding this comment

PR Review: chore/scenarios-walkable → master

Must Fix (Critical)

Should Fix (Important)

Nice to Have (Optional)

Concerns from Plan

Next Steps

Uh oh!

ben-s-pull-request-reviewer Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ben-s-pull-request-reviewer Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PR Review: `chore/scenarios-walkable` → `master`