fix(cli): remove eval benchmark json flag by christso · Pull Request #1348 · EntityProcess/agentv

christso · 2026-06-10T10:01:07Z

Summary

agentv eval no longer accepts the deprecated --benchmark-json <path> escape hatch. Users now have one run-artifact source of truth: --output <dir>, with the canonical run summary at <dir>/benchmark.json; compatibility reshaping should live in a wrapper that reads that run folder instead of asking the CLI to write a second benchmark file.

Consumer audit

Repo audit: rg -- '--benchmark-json|benchmarkJson|benchmark-writer|writeBenchmarkJson|buildBenchmarkJson' found only the eval CLI flag/writer, its dedicated tests, and prior plan notes. No examples or active docs instructed users to pass --benchmark-json.
Local workspace audit: searched /home/entity/workspaces and /home/entity/.agent-orchestrator/projects for --benchmark-json, benchmarkJson, and benchmark_json; hits were only Beads metadata, AgentV worktrees/code-review snapshots, and this task/session context.
GitHub audit: gh search code '"--benchmark-json"' --repo EntityProcess/agentv, gh search code '"--benchmark-json" "agentv"', and gh search code '"benchmark.json" "AgentV" "agent skills"' returned no active code-search results.
Agent Skills compatibility docs now point users at <output>/benchmark.json and call out wrapper conversion for tools that need a narrower shape.

Red/green CLI evidence

Scenario	Before	After
Help	`agentv eval run --help` listed `--benchmark-json <str>` as deprecated.	`agentv eval run --help` has no `--benchmark-json`; `--output` still documents that it writes `benchmark.json`.
Removed flag invocation	`bun apps/cli/src/cli.ts eval ... --output /tmp/av-33j-red-run --benchmark-json /tmp/av-33j-red-benchmark.json` warned and wrote a second file: `Benchmark written to: /tmp/av-33j-red-benchmark.json`; run directory also had `/tmp/av-33j-red-run/benchmark.json`.	The same invocation fails at parse time with `Unknown arguments` and does not create the extra file.
Canonical output	n/a	`bun apps/cli/src/cli.ts eval ... --output /tmp/av-33j-green-run-ok` still writes `/tmp/av-33j-green-run-ok/benchmark.json`.

Evidence files were captured locally under /tmp/av-33j-evidence/ during UAT.

Verification

bun install (to restore local dependencies in this worktree)
bun run build
bun run typecheck
bun run lint
bun test apps/cli/test/eval.integration.test.ts apps/cli/test/commands/eval/artifact-writer.test.ts apps/cli/test/commands/eval/aggregate.test.ts
bun run test
git diff --check
Manual local review: no findings. Dedicated review subagents were not used because the available delegation tool is restricted to explicit user requests.

Post-Deploy Monitoring & Validation

No additional production monitoring required: this is a local CLI surface cleanup. Validate release behavior by checking issue/Discord/GitHub reports for --benchmark-json, Unknown arguments, or missing benchmark.json; healthy signal is users continuing to find benchmark.json in --output run directories. If active consumers surface, mitigation is to document a wrapper conversion path that reads <output>/benchmark.json.

cloudflare-workers-and-pages · 2026-06-10T10:01:16Z

Deploying agentv with Cloudflare Pages

Latest commit:	`fd15a5f`
Status:	✅ Deploy successful!
Preview URL:	https://7718e6ef.agentv.pages.dev
Branch Preview URL:	https://fix-av-33j-remove-benchmark.agentv.pages.dev

View logs

fix(cli): remove eval benchmark json flag

fd15a5f

christso merged commit da45738 into main Jun 10, 2026
8 checks passed

christso deleted the fix/av-33j-remove-benchmark-json branch June 10, 2026 11:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cli): remove eval benchmark json flag#1348

fix(cli): remove eval benchmark json flag#1348
christso merged 1 commit into
mainfrom
fix/av-33j-remove-benchmark-json

christso commented Jun 10, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Jun 10, 2026

Summary

Consumer audit

Red/green CLI evidence

Verification

Post-Deploy Monitoring & Validation

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 10, 2026

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant