Skip to content

feat(scenarios): add 3 acceptance scenarios for pr-review and code-review dispatcher#37

Merged
bborbe merged 1 commit into
masterfrom
feat/add-scenarios
Jun 3, 2026
Merged

feat(scenarios): add 3 acceptance scenarios for pr-review and code-review dispatcher#37
bborbe merged 1 commit into
masterfrom
feat/add-scenarios

Conversation

@bborbe

@bborbe bborbe commented Jun 3, 2026

Copy link
Copy Markdown
Owner

Summary

Adds three end-to-end acceptance scenarios under `scenarios/` following the dark-factory scenario-writing guide. Each scenario was drafted, audited by `dark-factory:scenario-auditor` twice, and tightened against the audit findings before commit.

# Title Validates Regression risk locked
001 toolchain-preflight Dispatcher Step 4.0 (PR #36) aborts when ast-grep is absent coding#34 (silent loop, 30-min activeDeadlineSeconds kill instead of immediate fail)
002 clean-pr-zero-findings Zero-violation diff in standard mode produces empty findings, no hallucination LLM tier invents findings to fill an empty mechanical surface
003 scaling-funnel-100-files 100-file synthetic PR completes in ≤30 LLM calls and ≤30 minutes (Phase 10) coding#27 (30-min activeDeadlineSeconds ceiling)

Four-criteria gate

Each scenario passes the writing-guide's strict bar for adding an E2E scenario:

  • Unit/integration cannot reach (real PATH resolution, real slash-command invocation, real Claude CLI counting)
  • Load-bearing user journey
  • No existing scenario covers it (numbered 001/002/003)
  • Concrete regression risk tied to a prior incident

Out of scope

The other 5 items in the task-page acceptance suite (mode coverage, per-language routing, context loading, citation discipline, broken-YAML isolation) are better as unit/integration tests on the specific components — they don't clear the four-criteria gate for an E2E scenario.

Test plan

  • `make precommit` clean (131 rules, 28 mechanical YAMLs, no drift)
  • Each scenario audited by `dark-factory:scenario-auditor` (twice)
  • All audit-flagged issues resolved (placeholders → concrete commands, multi-mode → split, internal-string greps → contract-level assertions, ambiguous counter → PATH-shim with positive control)
  • Bot review
  • Walk scenario 001 manually to validate the dispatcher fail-fast (after merge)

…view dispatcher

Scenarios live in scenarios/ following the dark-factory scenario
writing guide (~/Documents/workspaces/dark-factory/docs/rules/scenario-writing.md).
All three drafted, audited by dark-factory:scenario-auditor (twice
per file), and tightened against the audit findings.

  001-toolchain-preflight: the dispatcher fail-fasts when ast-grep
  is absent from PATH. Validates PR #36's preflight wiring. The
  regression risk being locked down is the 30-min activeDeadlineSeconds
  kill observed on coding#34 instead of immediate-fail. PATH-masking
  technique is host-safe.

  002-clean-pr-zero-findings: a README-only diff flows through
  standard mode end-to-end and surfaces no findings. Locks down
  the regression risk that the LLM tier hallucinates findings to
  fill the void when the mechanical layer surfaces none. Standard
  mode only — short-mode and full-mode variants are separate
  journeys (split per the scenario-writing guide).

  003-scaling-funnel-100-files: a synthetic 100-file PR with known
  mechanical violations completes /coding:pr-review master standard
  in ≤30 LLM calls and ≤30 minutes. Phase 10 of the doc-driven
  pipeline. LLM-call counter implemented as a PATH-shim wrapper
  around the claude CLI. Companion scenario 004 (rule-count doubling
  decoupling) deferred — split per the one-journey-per-file rule.

Each scenario meets the four-criteria gate from the writing guide:
unit/integration cannot reach the behavior, load-bearing user
journey, no existing scenario, nameable regression risk tied to a
concrete prior incident (coding#34 / coding#27 / Phase 10).

@ben-s-pull-request-reviewer ben-s-pull-request-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed by ben-s-pull-request-reviewer[bot] — no concerns flagged.

@bborbe bborbe merged commit 3b6adf2 into master Jun 3, 2026
1 check passed
@bborbe bborbe deleted the feat/add-scenarios branch June 3, 2026 07:42

@ben-s-pull-request-reviewer ben-s-pull-request-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
  "verdict": "approve",
  "summary": "PR #37 has already been merged into master (commit 3b6adf2). The diff against master is empty — no reviewable changes remain. Manual review of the three new scenario files (001-toolchain-preflight.md, 002-clean-pr-zero-findings.md, 003-scaling-funnel-100-files.md) found two minor observations: Scenario 003 lacks a `trap` to guarantee `$WORK` cleanup on early-exit failure paths (the Cleanup step won't run if the scenario errors out midway), and all three scenarios remain `status: draft` with no documented promotion path to `status: accepted` if that's a project convention requirement.",
  "comments": [
    {
      "file": "scenarios/003-scaling-funnel-100-files.md",
      "line": 52,
      "severity": "minor",
      "message": "Should Fix (Important): The Cleanup step (line 69) runs only on success. If the scenario exits early due to an error or unmet assertion, `rm -rf \"$WORK\"` never executes and the mktemp directory leaks. Adding `trap 'rm -rf \"$WORK\"' EXIT` near the top of the Action block would guarantee cleanup regardless of exit path."
    },
    {
      "file": "scenarios/001-toolchain-preflight.md",
      "line": 13,
      "severity": "nit",
      "message": "Nice to Have (Optional): The `command -v ast-grep` check (line 17-18) validates PATH resolution but doesn't confirm the binary is actually executable. Consider adding `[ -x \"$(command -v ast-grep)\" ]` to verify not just visibility but executability in environments where PATH contains non-executable scripts with the same name."
    }
  ],
  "concerns_addressed": [
    "performance: Scenario 003 shim wrapper is reliable for session-scoped Claude CLI invocations — bypass is a known limitation documented in the scenario",
    "correctness: Scenario 001 dispatcher PATH preflight is validated — the expected `ast-grep/sg not in PATH` error message is specified and verifiable",
    "correctness: Scenario 002 report format uses literal `None.` strings — fragile by design (explicit trade-off documented), verified by specific awk extraction commands",
    "security: Scenario 003 cleanup runs on success but not on early-exit failure — see finding above",
    "tests: All scenarios have status: draft — no promotion criteria documented; may be intentional for test-contract documentation"
  ]
}

bborbe added a commit that referenced this pull request Jun 3, 2026
…bot review

Bot correctly flagged that scenarios/004 was not referenced in
README.md or llms.txt; broader audit found scenarios/001-003 were
also never indexed (oversight in PRs #37, #38, #40). Closing the
gap retroactively for all 4 active scenarios.

README.md: new 'Acceptance Scenarios' section above Contributing
with a 4-row table linking each scenario file + one-line
'Validates...' description.

llms.txt: parallel section at the bottom so AI agents relying on
llms.txt for discovery can find the scenarios alongside guides.
bborbe added a commit that referenced this pull request Jun 3, 2026
/coding:commit's pipeline-only detection lumped scenarios/ with
prompts/ and specs/ — meaning a PR adding ONLY scenario files
would route to Workflow E (commit + push, no changelog entry).
That's wrong for repos where scenarios are shipped acceptance
contracts that users invoke via /dark-factory:run-scenario; in
that role they're release-relevant artifacts on the same footing
as docs/ and rules/.

Without this fix, today's PRs #37 (3 scenarios drafted), #40
(scenarios rewritten + promoted to active), and #41 (4th scenario
added) would each have shipped with no changelog record. The
v0.15.0 release notes that this PR drafts had to retroactively
list scenarios as a feature.

Updated commands/commit.md:139-153 — drop scenarios/ from the
pipeline-only list and the corresponding rationale paragraph.
Prompts and specs remain pipeline-only (dark-factory daemon
runtime state, not shipped artifacts).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant