fix(tracing+ollama): split streaming span test into fast mock + slow e2e; add 300s default timeout by planetf1 · Pull Request #1302 · generative-computing/mellea

planetf1 · 2026-06-19T09:10:02Z

The test was hanging because two things combined badly: `OllamaModelBackend` had no httpx timeout at all, so a stalled stream would park the event loop forever; and the test wasn't marked `slow`, so it ran on every push on a CPU-only runner where after ~30 min the model gets evicted and takes minutes to reload — longer than any safe scalar timeout.

The main fix is moving the test out of standard CI. A new `integration`-marked test does the same structural check against a mocked backend (no Ollama needed, ~200 ms). The original real-model test stays but picks up `@pytest.mark.slow`, so `pyproject.toml`'s `addopts = "-m not slow"` keeps it off standard CI pushes. The 300 s default timeout is a secondary safety net for when the slow test does run.

Test	Markers	What it verifies
`test_streaming_span_creates_and_closes_span` (new)	`integration`	TracingPlugin opens a `chat` span, keeps it open for the full stream, and closes it when done — mocked `AsyncClient.chat` returning a fake async-generator with artificial per-chunk delays
`test_streaming_span_duration` (existing)	`e2e + ollama + slow`	Same invariant against a real Ollama model — belongs on the nightly GPU runner

Why not `httpx.Timeout(connect=30, read=600, ...)`? A structured timeout would let us tune connection setup and read time independently. The trade-off is an explicit `import httpx` in `ollama.py` and a wider type annotation. The scalar is fine for now; a follow-up can add the structured form if we need finer control.

Breaking change: default timeout

`OllamaModelBackend` now defaults to a 300 s timeout (previously unlimited). Users running large models on CPU, very long contexts, or multi-minute generations may hit `httpx.ReadTimeout` where they previously saw the call complete eventually. Pass `timeout=None` to restore the previous unlimited behaviour:

```python
backend = OllamaModelBackend("granite3.1-dense:8b", timeout=None)
```

Tested

Mocked integration test — no Ollama needed, runs in standard CI:

```
uv run pytest test/telemetry/test_streaming_tracing_integration.py -v
```

Passed locally in ~4.75 s. CI: ✅ passes on 3.11, 3.12, 3.13.

Slow e2e test — real Ollama with `granite3.1-dense:8b`, excluded from standard CI via `-m not slow`:

```
uv run pytest test/telemetry/test_tracing_backend.py::test_streaming_span_duration -v -m slow
```

Passed locally in ~6 s against a local Ollama instance. This test is excluded from standard CI pushes and will run in overnight/nightly runs where a GPU-backed Ollama server is available.

Closes #1272

test_streaming_span_duration was hanging for the full 900 s pytest-timeout budget on CPU-only CI runners (generative-computing#1272). The root cause: OllamaModelBackend defaulted timeout=None, so the httpx client had no read deadline. When Ollama stalled mid-stream on a loaded runner, the event loop parked on selector.poll(-1) with nothing to wake it — no timers, no ready callbacks — until SIGALRM arrived 15 minutes later. Primary fix: default timeout changed from None to 300 s. A stalled stream now raises httpx.ReadTimeout promptly (the existing error-forwarding path in send_to_queue catches it and puts it on the async queue, where astream() picks it up and raises). Callers can still pass timeout=None to opt out. 300 s is the right balance: generous enough for slow-but-healthy CPU-only runners (non-streaming generation and cold model loads can take 60–120 s), while still bounding genuine stalls to 1/3 of the original 900 s hang budget. 60 s was too tight — test_span_not_closed_prematurely ("Count to 5") and test_multiple_generations_separate_spans were hitting ReadTimeout on Python 3.12/3.13 CI runners after model reload. Closes generative-computing#1272 Signed-off-by: Nigel Jones <jonesn@uk.ibm.com> Assisted-by: Claude Code

Locks the documented escape hatch: OllamaModelBackend(timeout=None) must omit the key from client_kwargs so callers who want unbounded waits keep the upstream SDK default (no timeout). Signed-off-by: Nigel Jones <jonesn@uk.ibm.com> Assisted-by: Claude Code

planetf1 · 2026-06-19T11:19:35Z

Closing while we work out the right approach — see the updated diagnosis in #1272.

…; mark real e2e as slow test_streaming_span_duration was marked e2e+ollama but NOT slow, so it ran on every standard CI push against a CPU-only runner. After ~30 min of test suite the granite4.1:3b model gets evicted by OLLAMA_KEEP_ALIVE=5m; reloading it on a saturated runner routinely exceeds any reasonable scalar timeout, making the test either hang (timeout=None) or flake (timeout=300s). The root fix is to split the coverage in two: 1. test_streaming_span_creates_and_closes_span (new, integration, no Ollama marker) — patches ollama.AsyncClient.chat to return a fake async-generator with artificial per-chunk delays, then asserts that the TracingPlugin opens a "chat" span, keeps it open for the full stream, and closes it only once all chunks are consumed. Runs on every CI push, takes ~200 ms. 2. test_streaming_span_duration (existing) — kept to exercise a real Ollama model end-to-end. Adding @pytest.mark.slow excludes it from the default pyproject.toml addopts ("-m not slow") so it no longer runs on standard CPU-only CI pushes. It belongs on the nightly GPU runner. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

With the new 300 s default, a stalled model pull now triggers httpx.ReadTimeout instead of waiting forever. _pull_ollama_model was catching only ollama.ResponseError, so the httpx exception propagated uncaught from __init__, bypassing the friendly error path. Broaden the except to also handle httpx.TimeoutException and httpx.ConnectError so the caller still gets a clean False return and the "could not be pulled" message fires as expected. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

akihikokuroda

LGTM

planetf1 · 2026-06-19T15:15:37Z

thanks @akihikokuroda . I'll wait for @ajbozarth to give it a once-over given interest in otel -- Alex feel free to merge if you are happy with it

ajbozarth

I'll be cleaning up streaming spans in #1051 as soon as #1289 is merged, but I have one ask for a file rename in addition to Claudes findings below.

ajbozarth

Strategy makes sense — the mocked integration test gives us the structural check in CI without the CPU-only flakiness, and gating the real test behind slow is the right call. A few things to consider inline.

…tion test - Widen _check_ollama_server except clause to catch httpx.TimeoutException and httpx.ConnectError in addition to ConnectionError; with the new 300s default a stalled ps() call raised httpx.TimeoutException, bypassing the friendly "ollama server not running" message - Tighten streaming span-duration assertion from >= 0.05 s to >= 0.1 s; the fake stream is 3 chunks x 50 ms ~= 150 ms, so the looser threshold passed even if the span closed after the first chunk Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

planetf1 · 2026-06-19T16:30:21Z

Thanks @ajbozarth — addressed in the latest commit:

Widened _check_ollama_server to also catch httpx.TimeoutException and httpx.ConnectError (matching the pattern already used in _pull_ollama_model)
Tightened the span-duration assertion to >= 0.1 s with a comment tying the threshold to the 3 × 50 ms fake stream
Updated the PR body to call out the default timeout change and the timeout=None escape hatch for the auto-generated release notes

Happy to leave the file rename to your cleanup in #1051.

github-actions Bot added the bug Something isn't working label Jun 19, 2026

planetf1 force-pushed the worktree-issue-1272 branch 2 times, most recently from 7c17c35 to 91fe247 Compare June 19, 2026 09:31

planetf1 force-pushed the worktree-issue-1272 branch from 91fe247 to 94002c9 Compare June 19, 2026 10:02

planetf1 changed the title ~~fix(ollama): add 120 s default read timeout to prevent mid-stream hangs~~ fix(ollama): add 300 s default HTTP timeout to prevent mid-stream hangs Jun 19, 2026

planetf1 mentioned this pull request Jun 19, 2026

test: test_streaming_span_duration hangs >900s on slow CPU-only Ollama runner #1272

Closed

planetf1 closed this Jun 19, 2026

planetf1 reopened this Jun 19, 2026

planetf1 changed the title ~~fix(ollama): add 300 s default HTTP timeout to prevent mid-stream hangs~~ fix(tracing+ollama): split streaming span test into fast mock + slow e2e; add 300s default timeout Jun 19, 2026

planetf1 marked this pull request as ready for review June 19, 2026 12:59

planetf1 requested a review from a team as a code owner June 19, 2026 12:59

planetf1 requested review from AngeloDanducci, ajbozarth and akihikokuroda June 19, 2026 12:59

planetf1 enabled auto-merge June 19, 2026 12:59

akihikokuroda approved these changes Jun 19, 2026

View reviewed changes

planetf1 added this pull request to the merge queue Jun 19, 2026

planetf1 removed this pull request from the merge queue due to a manual request Jun 19, 2026

ajbozarth reviewed Jun 19, 2026

View reviewed changes

Comment thread test/telemetry/test_streaming_tracing_integration.py

ajbozarth reviewed Jun 19, 2026

View reviewed changes

Comment thread mellea/backends/ollama.py

Comment thread mellea/backends/ollama.py

Comment thread test/telemetry/test_streaming_tracing_integration.py Outdated

This was referenced Jun 19, 2026

feat: streaming span events #1051

Open

feat(telemetry): migrate stream_with_chunking orchestration span to plugin hooks #1290

Open

ajbozarth added this pull request to the merge queue Jun 19, 2026

Merged via the queue into generative-computing:main with commit 31a3479 Jun 19, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tracing+ollama): split streaming span test into fast mock + slow e2e; add 300s default timeout#1302

fix(tracing+ollama): split streaming span test into fast mock + slow e2e; add 300s default timeout#1302
ajbozarth merged 5 commits into
generative-computing:mainfrom
planetf1:worktree-issue-1272

planetf1 commented Jun 19, 2026 •

edited

Loading

Uh oh!

planetf1 commented Jun 19, 2026

Uh oh!

akihikokuroda left a comment

Uh oh!

Uh oh!

planetf1 commented Jun 19, 2026

Uh oh!

ajbozarth left a comment

Uh oh!

Uh oh!

ajbozarth left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

planetf1 commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

planetf1 commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Breaking change: default timeout

Tested

Uh oh!

planetf1 commented Jun 19, 2026

Uh oh!

akihikokuroda left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

planetf1 commented Jun 19, 2026

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

planetf1 commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

planetf1 commented Jun 19, 2026 •

edited

Loading