chore(scenarios): rewrite 001 + 003 to walkable contracts; promote both to active#40
Conversation
…th to active Original contracts required infrastructure (fresh PATH-masked claude subprocess for 001; LLM-call counter shim wrapping claude for 003) that is impractical to orchestrate from a single Claude Code session. Rewrote both to test the CODE UNDER TEST at a tighter scope: 001 — extracts the Step 4.0 bash block verbatim from pr-review.md + code-review.md + the Step 0 block from ast-grep-runner.md, runs each under a masked PATH, asserts exit 1 + documented stderr + JSON contract. That IS the dispatcher's preflight — the slash command just invokes this bash. No fresh-claude subprocess needed. 003 — replaces 'LLM-call count ≤30' (which required an LLM-shim + real /coding:pr-review run) with 'distinct Owners with findings ≤30' (structurally equivalent because Step 4b invokes one Task per Owner with findings; the LLM ceiling is the Owner count plus a small fixed overhead). Measurable directly from the mechanical funnel output. The full Phase-10 LLM-count measurement deferred to companion scenario 004 (not yet written). First successful walk 2026-06-03 on master sha b695450: 001: 10/10 checkboxes PASS. pr-review Step 4.0 — exit 1, stderr contains 'ast-grep/sg not in PATH' code-review Step 4.0 — exit 1, parity stderr runner Step 0 — exit 1, JSON: yamls_run=0, findings_by_owner={}, errors[0].kind='missing-tool' preflight wall — 10 ms (≤ 1s target) host shell PATH — unaffected after subshells 003: 5/5 checkboxes PASS. 100-file fixture — 100 .go files committed funnel wall — 64 ms (≤ 30000 ms target; 469× margin) distinct owners — 2 (≤ 30 target; 15× margin: go-architecture-assistant + go-time-assistant) findings_count — 115 (> 0 target — negative control passes) distinct rule_ids — 4 (≥ 3 target): constructor-returns-interface, no-globals-or-singletons, no-raw-go-func, no-time-now-direct citation discipline — every surfaced rule_id present in rules/index.json After this commit all 3 scenarios on master are 'active'.
There was a problem hiding this comment.
PR Review: chore/scenarios-walkable → master
Scope: 2 files changed — both are scenario walkthrough documents (.md).
Must Fix (Critical)
None — no production code, no security-sensitive changes.
Should Fix (Important)
-
scenarios/003-scaling-funnel-100-files.md:7— Companion scenario 004 is referenced as if written, butscenarios/004-funnel-decoupling-doubled-fixture.mddoes not exist in the repo. The scenario description calls it out as the Phase-10 LLM-call measurement but this is a forward reference with no implementation. Future readers will follow the link and find nothing. -
scenarios/001-toolchain-preflight.md:48— Wall-clock assertion tightened from ≤30 s to ≤1 s. The concern in the plan notes CI environment variability. This is a tight budget for acommand -vcheck that also spawns a subshell — unlikely to be the bottleneck, but the rationale for the 30× reduction should be documented inline (e.g.,# 1s is generous: the block is a single command-v check + exit; ast-grep itself never runs).
Nice to Have (Optional)
-
scenarios/003-scaling-funnel-100-files.md— TheExpectedsection tests distinct Owner count via a Python one-liner that readsrules/index.json. If the index is stale (not regenerated after a new rule is added), the distinct-Owner intersection may silently miss newly added rules. Consider adding a note thatmake build-indexmust be run before executing the scenario. -
scenarios/003-scaling-funnel-100-files.md— The fixture's four violation layers map cleanly to four rule_ids (go-architecture/constructor-returns-interface,go-concurrency/no-raw-go-func,go-architecture/no-globals-or-singletons,go-time/no-time-now-direct). This is correct and all four are inrules/index.json. No action needed. -
scenarios/001-toolchain-preflight.md— The setup step extracts bash blocks fromcommands/pr-review.md,commands/code-review.md, andagents/ast-grep-runner.mdby searching for section headers. If any of those section headers shift (e.g., a later refactor renumbers steps), the extraction will silently pick up nothing or the wrong block. This is an accepted fragility of text-based extraction but worth noting.
Concerns from Plan
| Concern | Status |
|---|---|
| Scenario 001 bash block drift from source docs | Partially addressed — all three source blocks verified to exist at the expected section headers; risk of drift remains (see Nice to Have #5) |
| Scenario 003 fixture / rule_ids mismatch | Addressed — all four rule_ids verified present in rules/index.json |
| Wall-clock ≤1 s achievability in CI | Open concern — tightened assertion is aggressive; no evidence it fails; see Should Fix #2 |
| Companion scenario 004 not written | Open concern — forward reference in scenario 003 is unresolved; see Should Fix #1 |
Next Steps
/coding:write-testnot applicable — no production code changed.- Resolve Should Fix #1 by either writing
scenarios/004-funnel-decoupling-doubled-fixture.mdor removing the forward reference from scenario 003. - Resolve Should Fix #2 by adding an inline comment in scenario 001 justifying the 1 s wall-clock budget.
{
"verdict": "request-changes",
"summary": "Both changed files are scenario walkthroughs with no production code. Two should-fix issues were found: scenario 003 references a non-existent companion scenario 004, and scenario 001's 1-second wall-clock assertion is aggressive without inline justification. The mechanical-layer rule-id mapping in scenario 003 is verified correct against rules/index.json.",
"comments": [
{
"file": "scenarios/003-scaling-funnel-100-files.md",
"line": 7,
"severity": "major",
"message": "Companion scenario 004 (004-funnel-decoupling-doubled-fixture.md) is referenced but does not exist. The scenario description calls it the Phase-10 LLM-call measurement but it has not been written. Either write the companion or remove the forward reference."
},
{
"file": "scenarios/001-toolchain-preflight.md",
"line": 48,
"severity": "major",
"message": "Wall-clock assertion was tightened from ≤30s to ≤1s (30× reduction) without inline justification. Add a comment explaining that the block is a single command-v check that exits immediately — making 1s a generous budget even in CI environments with slower disk I/O."
},
{
"file": "scenarios/003-scaling-funnel-100-files.md",
"line": 52,
"severity": "nit",
"message": "The distinct-Owner count test reads rules/index.json directly. If the index is stale (not regenerated after a new rule is added), the Python intersection may silently miss new rules. Consider noting that make build-index should be run before executing this scenario."
},
{
"file": "scenarios/001-toolchain-preflight.md",
"line": 19,
"severity": "nit",
"message": "Bash block extraction relies on searching for section header text (e.g., '#### 4.0:'). If commands/pr-review.md or agents/ast-grep-runner.md is refactored and section headers shift, the extraction will silently pick up the wrong block. Consider noting this as an accepted fragility."
}
],
"concerns_addressed": [
"correctness: scenario 001 bash block drift — blocks verified to exist at expected section headers; risk remains (see nit comment)",
"correctness: scenario 003 rule_ids — all four fixture violation types map to rule_ids present in rules/index.json",
"tests: wall-clock ≤1s — open concern, no evidence of failure in CI; needs inline justification (see major comment)",
"correctness: companion scenario 004 not written — not addressed; forward reference remains (see major comment)"
]
}…nale, NITs MAJORs: - 003 description: removed broken markdown link to non-existent 004-funnel-decoupling-doubled-fixture.md. Reworded as 'deferred to a future scenario 004 — not yet written' so the forward reference no longer pretends to link to a file. - 001 Expected: added inline rationale for the 1s wall-clock budget. Each block is a single command-v exit; 1s is generous on slow CI I/O; the regression risk is the 30-min loop on coding#34, so any threshold under 30 min proves the contract and 1s catches it with 1800x margin. NITs: - 001 Setup: noted the awk extraction is anchored on literal section headers (#### 4.0:, ### 0. Preflight:). If a refactor renames the heading, the extraction silently picks up the wrong block. Future edits should re-anchor. - 003 Setup: added 'make build-index before scanning' precondition. The Owner-count assertion reads rules/index.json directly; a stale index would silently miss new rule_ids during the intersection.
superseded by new automated review
There was a problem hiding this comment.
{
"verdict": "approve",
"summary": "PR is a documentation-only refactor promoting two scenario markdown files from draft to active status. Changes include extracting inline bash blocks for preflight testing (replacing full slash-command invocations), streamlining the 100-file fixture generation, and shifting measurement from LLM-call counting to structural Owner-count ceiling analysis. No code files were modified; all ast-grep rules target Go code and would produce no findings against markdown content.",
"comments": [
{
"file": "scenarios/001-toolchain-preflight.md",
"line": 19,
"severity": "nit",
"message": "Fragility note acknowledged by author: extraction relies on literal heading `#### 4.0:` — future renumbering will silently pick up wrong block. Consider adding a comment in commands/pr-review.md near that block with a stable anchor marker (e.g., `<!-- SCEN001_ANCHOR -->`) that the scenario can reference instead of the heading text."
},
{
"file": "scenarios/001-toolchain-preflight.md",
"line": 46,
"severity": "nit",
"message": "Cleanup uses brace expansion `/tmp/scen001-*.{stderr,exit,stdout}` — not all shells (e.g., older /bin/sh) support this. Could use explicit glob or separate rm calls for maximum portability, though this is a minor portability concern for a scenario intended for Claude Code environments."
},
{
"file": "scenarios/003-scaling-funnel-100-files.md",
"line": 55,
"severity": "nit",
"message": "Hardcoded `cd ~/Documents/workspaces/coding` — if operator's coding repo root differs, the ast-grep scan will run against the wrong tree. Consider using `${CODING_ROOT:-$PWD}` or requiring the operator to export `CODING_ROOT` in setup."
},
{
"file": "scenarios/003-scaling-funnel-100-files.md",
"line": 55,
"severity": "nit",
"message": "ast-grep presence is validated at line 50 (`ast-grep --version` resolves on host) but the mechanical funnel step at line 55 has no explicit error if ast-grep is present but broken. A broken ast-grep binary could produce a 0-owner result that passes the ≤30 check. Consider adding an explicit pre-validation that ast-grep scan returns exit 0 against the fixture."
}
],
"concerns_addressed": [
"correctness: heading anchor brittleness in 001 — documented as fragility note, not fixed",
"correctness: brace expansion portability in 001 cleanup — not addressed, minor",
"correctness: hardcoded ~/Documents/workspaces/coding in 003 Step 2 — documented as issue, not fixed",
"tests: owner-count structural correctness in 003 — addressed by new ≤30 structural ceiling approach instead of LLM-call counting",
"correctness: ast-grep missing validation in 003 — acknowledged but not structurally fixed"
]
}There was a problem hiding this comment.
{
"verdict": "approve",
"summary": "Both scenario rewrites are well-structured contract documents. Scenario 001 correctly anchors extraction on literal section headers (`#### 4.0:` and `### 0. Preflight:`) that exist in the source files, and its JSON output schema fields (`stats.yamls_run`, `findings_by_owner`, `errors[].kind`) match the actual runner output documented in `agents/ast-grep-runner.md`. Scenario 003's structural claim (≤30 distinct Owners bounding Step 4b invocations) is verified: the index has 17 distinct owners, well under the ceiling. No issues found.",
"comments": [],
"concerns_addressed": [
"correctness: Scenario 001 extraction anchors verified — `#### 4.0:` exists in pr-review.md:96 and code-review.md:77; `### 0. Preflight:` exists in ast-grep-runner.md:57",
"correctness: Scenario 001 JSON contract fields match runner schema — `stats.yamls_run`, `findings_by_owner`, `errors[].kind == \"missing-tool\"` all present in runner output at ast-grep-runner.md:75",
"correctness: Scenario 003 structural equivalence holds — 17 distinct owners in rules/index.json (≤30 ceiling confirmed)",
"tests: Scenario 001 error message `ast-grep/sg not in PATH` matches verbatim between pr-review.md:102 and code-review.md:83"
]
}…bot review Bot correctly flagged that scenarios/004 was not referenced in README.md or llms.txt; broader audit found scenarios/001-003 were also never indexed (oversight in PRs #37, #38, #40). Closing the gap retroactively for all 4 active scenarios. README.md: new 'Acceptance Scenarios' section above Contributing with a 4-row table linking each scenario file + one-line 'Validates...' description. llms.txt: parallel section at the bottom so AI agents relying on llms.txt for discovery can find the scenarios alongside guides.
/coding:commit's pipeline-only detection lumped scenarios/ with prompts/ and specs/ — meaning a PR adding ONLY scenario files would route to Workflow E (commit + push, no changelog entry). That's wrong for repos where scenarios are shipped acceptance contracts that users invoke via /dark-factory:run-scenario; in that role they're release-relevant artifacts on the same footing as docs/ and rules/. Without this fix, today's PRs #37 (3 scenarios drafted), #40 (scenarios rewritten + promoted to active), and #41 (4th scenario added) would each have shipped with no changelog record. The v0.15.0 release notes that this PR drafts had to retroactively list scenarios as a feature. Updated commands/commit.md:139-153 — drop scenarios/ from the pipeline-only list and the corresponding rationale paragraph. Prompts and specs remain pipeline-only (dark-factory daemon runtime state, not shipped artifacts).
Summary
Original scenarios 001 + 003 required infrastructure that was impractical to orchestrate from a single Claude Code session:
Rewrote both to test the code under test at a tighter scope, so the contract is satisfiable from any operator's terminal without bespoke harnesses. Then walked both end-to-end on master sha `b695450` and promoted to `status: active`.
Scope changes
The full Phase-10 LLM-count measurement is deferred to a companion scenario 004 — not written yet because it still needs the LLM-shim infrastructure; 003 captures the load-bearing decoupling proof via Owner count.
Walk results
001 — 10/10 PASS
003 — 5/5 PASS
Aftermath
All 3 scenarios on master will be `active` after merge.
Test plan