fix(cli): remove eval benchmark json flag#1348
Merged
Merged
Conversation
Deploying agentv with
|
| Latest commit: |
fd15a5f
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://7718e6ef.agentv.pages.dev |
| Branch Preview URL: | https://fix-av-33j-remove-benchmark.agentv.pages.dev |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
agentv evalno longer accepts the deprecated--benchmark-json <path>escape hatch. Users now have one run-artifact source of truth:--output <dir>, with the canonical run summary at<dir>/benchmark.json; compatibility reshaping should live in a wrapper that reads that run folder instead of asking the CLI to write a second benchmark file.Consumer audit
rg -- '--benchmark-json|benchmarkJson|benchmark-writer|writeBenchmarkJson|buildBenchmarkJson'found only the eval CLI flag/writer, its dedicated tests, and prior plan notes. No examples or active docs instructed users to pass--benchmark-json./home/entity/workspacesand/home/entity/.agent-orchestrator/projectsfor--benchmark-json,benchmarkJson, andbenchmark_json; hits were only Beads metadata, AgentV worktrees/code-review snapshots, and this task/session context.gh search code '"--benchmark-json"' --repo EntityProcess/agentv,gh search code '"--benchmark-json" "agentv"', andgh search code '"benchmark.json" "AgentV" "agent skills"'returned no active code-search results.<output>/benchmark.jsonand call out wrapper conversion for tools that need a narrower shape.Red/green CLI evidence
agentv eval run --helplisted--benchmark-json <str>as deprecated.agentv eval run --helphas no--benchmark-json;--outputstill documents that it writesbenchmark.json.bun apps/cli/src/cli.ts eval ... --output /tmp/av-33j-red-run --benchmark-json /tmp/av-33j-red-benchmark.jsonwarned and wrote a second file:Benchmark written to: /tmp/av-33j-red-benchmark.json; run directory also had/tmp/av-33j-red-run/benchmark.json.Unknown argumentsand does not create the extra file.bun apps/cli/src/cli.ts eval ... --output /tmp/av-33j-green-run-okstill writes/tmp/av-33j-green-run-ok/benchmark.json.Evidence files were captured locally under
/tmp/av-33j-evidence/during UAT.Verification
bun install(to restore local dependencies in this worktree)bun run buildbun run typecheckbun run lintbun test apps/cli/test/eval.integration.test.ts apps/cli/test/commands/eval/artifact-writer.test.ts apps/cli/test/commands/eval/aggregate.test.tsbun run testgit diff --checkPost-Deploy Monitoring & Validation
No additional production monitoring required: this is a local CLI surface cleanup. Validate release behavior by checking issue/Discord/GitHub reports for
--benchmark-json,Unknown arguments, or missingbenchmark.json; healthy signal is users continuing to findbenchmark.jsonin--outputrun directories. If active consumers surface, mitigation is to document a wrapper conversion path that reads<output>/benchmark.json.