Skip to content

fix(cli): remove eval benchmark json flag#1348

Merged
christso merged 1 commit into
mainfrom
fix/av-33j-remove-benchmark-json
Jun 10, 2026
Merged

fix(cli): remove eval benchmark json flag#1348
christso merged 1 commit into
mainfrom
fix/av-33j-remove-benchmark-json

Conversation

@christso

Copy link
Copy Markdown
Collaborator

Summary

agentv eval no longer accepts the deprecated --benchmark-json <path> escape hatch. Users now have one run-artifact source of truth: --output <dir>, with the canonical run summary at <dir>/benchmark.json; compatibility reshaping should live in a wrapper that reads that run folder instead of asking the CLI to write a second benchmark file.

Consumer audit

  • Repo audit: rg -- '--benchmark-json|benchmarkJson|benchmark-writer|writeBenchmarkJson|buildBenchmarkJson' found only the eval CLI flag/writer, its dedicated tests, and prior plan notes. No examples or active docs instructed users to pass --benchmark-json.
  • Local workspace audit: searched /home/entity/workspaces and /home/entity/.agent-orchestrator/projects for --benchmark-json, benchmarkJson, and benchmark_json; hits were only Beads metadata, AgentV worktrees/code-review snapshots, and this task/session context.
  • GitHub audit: gh search code '"--benchmark-json"' --repo EntityProcess/agentv, gh search code '"--benchmark-json" "agentv"', and gh search code '"benchmark.json" "AgentV" "agent skills"' returned no active code-search results.
  • Agent Skills compatibility docs now point users at <output>/benchmark.json and call out wrapper conversion for tools that need a narrower shape.

Red/green CLI evidence

Scenario Before After
Help agentv eval run --help listed --benchmark-json <str> as deprecated. agentv eval run --help has no --benchmark-json; --output still documents that it writes benchmark.json.
Removed flag invocation bun apps/cli/src/cli.ts eval ... --output /tmp/av-33j-red-run --benchmark-json /tmp/av-33j-red-benchmark.json warned and wrote a second file: Benchmark written to: /tmp/av-33j-red-benchmark.json; run directory also had /tmp/av-33j-red-run/benchmark.json. The same invocation fails at parse time with Unknown arguments and does not create the extra file.
Canonical output n/a bun apps/cli/src/cli.ts eval ... --output /tmp/av-33j-green-run-ok still writes /tmp/av-33j-green-run-ok/benchmark.json.

Evidence files were captured locally under /tmp/av-33j-evidence/ during UAT.

Verification

  • bun install (to restore local dependencies in this worktree)
  • bun run build
  • bun run typecheck
  • bun run lint
  • bun test apps/cli/test/eval.integration.test.ts apps/cli/test/commands/eval/artifact-writer.test.ts apps/cli/test/commands/eval/aggregate.test.ts
  • bun run test
  • git diff --check
  • Manual local review: no findings. Dedicated review subagents were not used because the available delegation tool is restricted to explicit user requests.

Post-Deploy Monitoring & Validation

No additional production monitoring required: this is a local CLI surface cleanup. Validate release behavior by checking issue/Discord/GitHub reports for --benchmark-json, Unknown arguments, or missing benchmark.json; healthy signal is users continuing to find benchmark.json in --output run directories. If active consumers surface, mitigation is to document a wrapper conversion path that reads <output>/benchmark.json.


Compound Engineering
Codex

@cloudflare-workers-and-pages

Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: fd15a5f
Status: ✅  Deploy successful!
Preview URL: https://7718e6ef.agentv.pages.dev
Branch Preview URL: https://fix-av-33j-remove-benchmark.agentv.pages.dev

View logs

@christso christso merged commit da45738 into main Jun 10, 2026
8 checks passed
@christso christso deleted the fix/av-33j-remove-benchmark-json branch June 10, 2026 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant