fix: increase DEFAULT_CHUNK_TIMEOUT from 2min to 5min#844
fix: increase DEFAULT_CHUNK_TIMEOUT from 2min to 5min#844altimate-harness-bot[bot] wants to merge 1 commit into
Conversation
The 120s SSE chunk timeout was too aggressive for slow LLM providers, causing spurious "SSE read timed out" aborts during long-running multi-tool sessions. Increase to 300s to reduce false positives.
|
Thanks for your contribution! This PR doesn't have a linked issue. All PRs must reference an existing issue. Please:
See CONTRIBUTING.md for details. |
|
This PR doesn't fully meet our contributing guidelines and PR template. What needs to be fixed:
Please edit this PR description to address the above within 2 hours, or it will be automatically closed. If you believe this was flagged incorrectly, please let a maintainer know. |
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
| // altimate_change end | ||
|
|
||
| const DEFAULT_CHUNK_TIMEOUT = 120_000 | ||
| const DEFAULT_CHUNK_TIMEOUT = 300_000 |
There was a problem hiding this comment.
PLease set change markers on any lines we change from upstream
dev-punia-altimate
left a comment
There was a problem hiding this comment.
Multi-Persona Review — Verdict: block
The PR increases the SSE chunk timeout from 2 to 5 minutes, which improves user experience during slow LLM responses but introduces critical risks: it may mask genuine service outages (product-gap, high severity) and increase exposure to DoS via connection exhaustion (security, medium severity). Multiple personas and code review independently flag this as a dangerous trade-off without mitigations like circuit-breakers or configurability. The product manager explicitly requests changes due to the user experience gap, and security raises a valid availability concern. Without mitigation, this change is unsafe to ship.
14/14 agents completed · 115s · 2 findings (0 critical, 1 high, 1 medium)
High
- [product-manager, code-reviewer, tech-lead, cto, devops] Increasing the global SSE chunk timeout to 5 minutes may mask genuine service hangs, delaying failure detection and degrading user experience during outages, without a circuit-breaker or provider-specific timeout to distinguish slow responses from actual failures. →
packages/opencode/src/provider/provider.ts:54- 💡 Implement a circuit-breaker mechanism or provider-configurable timeout to differentiate between slow responses and genuine failures.
Medium
- [security] Increasing SSE chunk timeout from 2min to 5min may increase exposure to resource exhaustion attacks by allowing malicious clients to hold connections open longer, potentially leading to connection pool exhaustion or DoS under high load. →
packages/opencode/src/provider/provider.ts:54- 💡 Implement connection limits per client/IP, add rate limiting on SSE stream initiation, or introduce a maximum concurrent stream limit.
Multi-Persona Review · vllm:qwen3-next-80b (waves) + vllm-fallback (synth) ·
| @@ -54,7 +54,7 @@ import { VALID_ACCOUNT_RE } from "../altimate/plugin/snowflake" | |||
| import { isValidDatabricksHost } from "../altimate/plugin/databricks" | |||
There was a problem hiding this comment.
[HIGH · product-manager, code-reviewer, tech-lead, cto, devops] Increasing the global SSE chunk timeout to 5 minutes may mask genuine service hangs, delaying failure detection and degrading user experience during outages, without a circuit-breaker or provider-specific timeout to distinguish slow responses from actual failures.
💡 Suggestion: Implement a circuit-breaker mechanism or provider-configurable timeout to differentiate between slow responses and genuine failures.
Confidence: 95/100
| @@ -54,7 +54,7 @@ import { VALID_ACCOUNT_RE } from "../altimate/plugin/snowflake" | |||
| import { isValidDatabricksHost } from "../altimate/plugin/databricks" | |||
There was a problem hiding this comment.
[MEDIUM · security] Increasing SSE chunk timeout from 2min to 5min may increase exposure to resource exhaustion attacks by allowing malicious clients to hold connections open longer, potentially leading to connection pool exhaustion or DoS under high load.
💡 Suggestion: Implement connection limits per client/IP, add rate limiting on SSE stream initiation, or introduce a maximum concurrent stream limit.
Confidence: 85/100
Problem
The
wrapSSE()function inprovider.tsenforces a per-chunk timeout: if no SSE chunk arrives within the window, it aborts the stream withError("SSE read timed out"). The previous 120s window was too tight for slow LLM providers (e.g. large reasoning models, cold starts) during multi-turn tool-call sessions.When this fires, the error propagates through
AbortSignal.any()in the custom fetch wrapper and surfaces asUnknownError: SSE read timed outon the assistant message — the chat freezes with no retry path.Change
packages/opencode/src/provider/provider.ts5 minutes gives adequate headroom for slow providers without masking genuine hangs.
Companion PR
AltimateAI/vscode-altimate-mcp-server#343 — adds a retry button for
MessageAbortedErrorin the chat UI, covering the session-restart abort path.Requested by @saravmajestic via harness
Summary by cubic
Increase SSE per-chunk timeout from 2 minutes to 5 minutes by raising
DEFAULT_CHUNK_TIMEOUTto 300_000, giving slow LLM streams more headroom during long tool-call sessions. This reduces false “SSE read timed out” aborts and prevents chat freezes.Written for commit 5f6e687. Summary will update on new commits. Review in cubic