Problem
The benchmark script (scripts/ci/benchmark-performance.ts) hardcodes 5 iterations per metric (line 18). With only 5 samples, P95/P99/max are all the same value — there's no statistical significance in the percentile calculations:
// P95 with n=5: Math.floor(5 * 0.95) = 4, which is just the max value
p95: sorted[Math.min(Math.floor(n * 0.95), n - 1)]
Additionally, the iterations input exists in the workflow dispatch but the script ignores it — it's always 5.
Changes
- Bump default iterations from 5 to 30 in
scripts/ci/benchmark-performance.ts line 18
- Wire up the workflow dispatch input so
iterations is passed to the script (e.g., via env var or CLI arg)
- Increase workflow timeout from 30 to 60 minutes to accommodate more iterations (30 iterations × ~30s per warm run ≈ 15 minutes for warm alone)
With 30 iterations, P95 = index 28 (the 2nd-worst run), giving meaningful tail-latency data.
Files to Modify
Problem
The benchmark script (
scripts/ci/benchmark-performance.ts) hardcodes 5 iterations per metric (line 18). With only 5 samples, P95/P99/max are all the same value — there's no statistical significance in the percentile calculations:Additionally, the
iterationsinput exists in the workflow dispatch but the script ignores it — it's always 5.Changes
scripts/ci/benchmark-performance.tsline 18iterationsis passed to the script (e.g., via env var or CLI arg)With 30 iterations, P95 = index 28 (the 2nd-worst run), giving meaningful tail-latency data.
Files to Modify
scripts/ci/benchmark-performance.ts— read iteration count from env var, default to 30.github/workflows/performance-monitor.yml— pass input to script, increase timeoutEpic: [Long-term] Add performance benchmarking suite #240
Historical storage: [plan] Add historical benchmark storage and relative regression detection #1760
Daily schedule: [aw] Secret Digger (Copilot) failed #1853
Startup analysis: Container startup benchmarks fail: 15+ seconds of idle wait in warm startup path #1850