Rewrite `kaggle kernels logs --follow` to use SSE log stream by kaggle-agent · Pull Request #999 · Kaggle/kaggle-cli

kaggle-agent · 2026-05-05T16:17:24Z

Today --follow polls the persisted-log endpoint on an interval, so
users wait seconds for output and miss anything written between polls.
This branch points --follow at the midtier's SSE log stream
(/api/v1/kernels/logs/stream/{owner}/{slug}, the slug-mode binding
added in Kaggle/kaggleazure#43349) so log lines reach the terminal as
the kernel emits them. When the session has already finished the
midtier returns the persisted log as a JSON array of the same event
objects, so the client branches on Content-Type, parses the array,
and renders it identically to a live stream. To survive load-balancer
idle timeouts (~3 minutes) mid-session, the CLI wraps the stream in a
reconnect loop that deduplicates the replayed prefix; a single drop
recovers silently and only repeat failures with no new data surface a
warning or trigger giveup. The one-shot kaggle kernels logs path is
unchanged.

Task: herbison-20260505160932-d7e63a14
Context: https://chat.kaggle.net/kaggle/pl/9n7sbkzftbfkuy7cy6n4s937eo

The previous `--follow` implementation polled the persisted log blob, which is only written at session end — so it never showed live output for in-progress notebooks. Switch to the midtier SSE proxy (mirroring how the editor/viewer FE consume logs), parsing `data:` events and stopping on the `STREAM_END` sentinel. Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>

Per review feedback, the SSE stream terminates with `END_OF_LOG`, not `STREAM_END`. Updated the sentinel constant, docstring, and test fixtures to match the actual server behavior. Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>

…mplete: `kernels_logs_stream` now branches on `Content-Type`, using `_iter_sse_events` for live SSE and `_iter_blob_lines` for the persisted GCS blob returned after the session finishes. Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>

The midtier `additional_bindings` for `GetKernelSessionLogsStream` registers the slug-mode route as `/api/v1/kernels/logs/stream/ {user_name}/{kernel_slug}` (PR #43349), with the slug pair after `logs/stream` to match the other kernels output endpoints. The client was calling `/v1/kernels/{owner}/{slug}/logs/stream`, which would 404 in production. Update the URL, docstring, and test assertion to the correct path. Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>

djherbis

Can you fix the merge conflicts?

Take main's `parse_kernel_string` + None-check in `kernels_logs` to match the convention used by the other kernel functions, and merge main's new `kernels_output` paging tests alongside this branch's SSE log-stream tests (keeping both `tempfile` and `json` imports). Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>

kaggle-agent · 2026-06-16T18:01:07Z

Re: @djherbis's review:

Conflict resolutions are staged in the working tree from the previous turn:

src/kaggle/api/kaggle_api_extended.py (kernels_logs): adopted main's version using parse_kernel_string + the explicit None check, matching the convention used by the other kernel functions on main.
tests/test_kernels_logs.py: kept both import json (needed by the SSE tests on this branch) and import tempfile (needed by main's new output tests), and inserted main's three kernels_output paging tests ahead of the kernels_logs section divider.

Ready for the orchestrator to commit these resolutions and complete the rebase.

djherbis

You fixed some merge conflicts, but I still see a few for test_kernels_logs.py

erdalsivri · 2026-06-16T18:13:08Z

Re: @djherbis's review:

Conflict resolutions are staged in the working tree from the previous turn:

src/kaggle/api/kaggle_api_extended.py (kernels_logs): adopted main's version using parse_kernel_string + the explicit None check, matching the convention used by the other kernel functions on main.

tests/test_kernels_logs.py: kept both import json (needed by the SSE tests on this branch) and import tempfile (needed by main's new output tests), and inserted main's three kernels_output paging tests ahead of the kernels_logs section divider.

Ready for the orchestrator to commit these resolutions and complete the rebase.

[no-agent] Sorry about this. I had push edsome changes to allow the agent to handle merge conflicts but had to revert due that due to some rebase bug. For now, you can use the VSCode button to connect to the repo (checked out under /git/kaggleazure even though it is kaggle-cli) to rebase manually. You can run claude --resume as well to manually invoke the agent to help with the merge.

Main reordered the imports and dropped the section-divider comment in a follow-up commit, leaving this branch with two fresh merge conflicts on rebase. Restructure the imports to match main's layout (with `import json` inserted in the new order) and drop the divider so the rebase auto-merges cleanly. Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>

kaggle-agent · 2026-06-16T18:17:47Z

Re: @djherbis's review:

You're right — main moved forward (commit 9aa7aa6) and reintroduced two conflicts in tests/test_kernels_logs.py:

Import block — main reorganized to put import unittest / from unittest.mock at the top, then io / tempfile / sys. My branch had alphabetized everything and added import json. Restructured the working tree to match main's import order with import json inserted in the new layout, so the rebase will auto-merge this hunk.
Section divider comment — I had added # kernels_logs (one-shot, persisted blob) above the existing tests; main removed/never had it. Dropped the divider so the rebase has nothing to conflict on.

Working tree is now staged for a clean rebase. Note: main also rewrote kernels_logs_cli to a polling implementation (#966) while this branch uses the SSE rewrite, so once the textual rebase lands, the polling-flavored tests main added (test_kernels_logs_cli_follow_mode, _stops_on_error, _stops_on_cancel) will fail against the SSE handler — flagging that as a separate follow-up since you only asked about merge markers here.

djherbis

Add automatic reconnection with deduplication to kernels_logs_stream and
kernels_logs_cli in src/kaggle/api/kaggle_api_extended.py so that
--follow survives connection drops (e.g. load balancer 2-minute timeout).

The current kernels_logs_cli follow path is a single pass:

for event in self.kernels_logs_stream(kernel):
    data = event.get("data")
    if data is None:
        continue
    print(data, flush=True, end="" if data.endswith("\n") else "\n")

Replace it with a reconnect loop:

Wrap the kernels_logs_stream() iteration in a retry loop
(e.g. while True).
Track how many log lines have been printed so far (printed_count).
On each connection, iterate events from kernels_logs_stream().
Skip the first printed_count events (dedup — the server replays
from the beginning on reconnect). Print the rest normally and
increment printed_count.
Catch requests.exceptions.ChunkedEncodingError,
requests.exceptions.ConnectionError, and urllib3.exceptions.ProtocolError
— these are what surface as "Response ended prematurely". On
catch, log a short warning (e.g. "Connection lost, reconnecting...")
to stderr via print(..., file=sys.stderr), sleep 1-2 seconds,
and continue the outer loop.
Exit the loop normally when kernels_logs_stream() yields all
events without error (meaning the END_OF_LOG sentinel was received),
or when the reconnect gets back a non-SSE content-type (GCS blob
fallback for a finished session) — in that case print only unseen
lines from the blob and exit.
Add a max retry limit (e.g. 5 consecutive failures with no new
data) to avoid infinite loops if the session is truly gone.

Update the existing tests in tests/test_kernels_logs.py to cover:

Normal stream completes without reconnect (already exists, keep it).
Stream interrupted mid-way, reconnects, deduplicates, prints remaining lines.
Max retries exceeded raises or exits gracefully.

Load balancers cut idle SSE connections after ~2 minutes, so a long- running `kaggle kernels logs --follow` would die mid-session. The CLI now wraps `kernels_logs_stream()` in a retry loop, skipping the events the server replays from the start of each reconnect, and gives up after 5 consecutive failures with no new data. Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>

kaggle-agent · 2026-06-16T21:47:33Z

Re: @djherbis's review:

This is already implemented on the branch. kernels_logs_cli (src/kaggle/api/kaggle_api_extended.py:5351) wraps kernels_logs_stream() in a while True reconnect loop. It tracks seen_count across reconnects, skips events whose index is below seen_count (the midtier replays from the start on reconnect), and catches requests.exceptions.ChunkedEncodingError, requests.exceptions.ConnectionError, and urllib3.exceptions.ProtocolError. On catch it prints "Log stream connection lost, reconnecting..." to stderr, sleeps 1s (_LOG_STREAM_RECONNECT_DELAY_SEC), and retries. After _LOG_STREAM_MAX_FAILURES = 5 consecutive failures with no new events the CLI prints a giveup message to stderr and returns. Normal completion (END_OF_LOG sentinel or non-SSE blob fallback) just falls through and returns. kernels_logs_stream itself didn't need structural changes — each call opens a fresh connection, which is what the reconnect path depends on. Two tests in tests/test_kernels_logs.py cover the new behavior: test_kernels_logs_cli_follow_reconnects_and_dedupes (mid-stream drop, reconnect, dedup the replayed events, print only the new third line) and test_kernels_logs_cli_follow_gives_up_after_max_failures (5 consecutive ConnectionErrors exit with the giveup message). All 20 tests in the file pass.

djherbis · 2026-06-16T22:38:50Z

+                        file=sys.stderr,
+                    )
+                    return
+                print("Log stream connection lost, reconnecting...", file=sys.stderr)


The current load balancer resets the connection every 3m, which means this log is going to stick out like a sore thumb in the average case.

Can we only print this message if we have repeat failures? ex. failures_without_progress > 1

That way a successful retry is hidden under the hood?

Already addressed on this branch. The reconnect message is gated on failures_without_progress > 1 at src/kaggle/api/kaggle_api_extended.py:5407, so the routine ~3-minute LB reset (single failure, then successful retry) stays silent and only persistent trouble surfaces. Tests in tests/test_kernels_logs.py cover both cases: test_kernels_logs_cli_follow_reconnects_and_dedupes asserts stderr is empty after a single drop, and test_kernels_logs_cli_follow_reports_only_repeat_failures asserts the warning appears exactly once on the second consecutive failure. All 21 tests pass.

The load balancer resets idle SSE connections roughly every 3 minutes, so any long `--follow` session normally hits one drop and recovers silently on retry. Logging a warning each time would clutter stderr in the common case, so the message now only fires on the second consecutive failure with no new data — i.e. when something actually looks wrong. Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>

djherbis

When the logs are for a completed session, they output format looks very different:

[{"stream_name":"stderr","time":10.168431974,"data":"0.00s - Debugger warning: It seems that frozen modules are being used, which may\n"}
,{"stream_name":"stderr","time":10.168515914,"data":"0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off\n"}
,{"stream_name":"stderr","time":10.168521834,"data":"0.00s - to python to disable frozen modules.\n"}
,{"stream_name":"stderr","time":10.168525604,"data":"0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.\n"}
,{"stream_name":"stderr","time":10.894785371,"data":"0.00s - Debugger warning: It seems that frozen modules are being used, which may\n"}
,{"stream_name":"stderr","time":10.894837561,"data":"0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off\n"}
,{"stream_name":"stderr","time":10.894842661,"data":"0.00s - to python to disable frozen modules.\n"}
,{"stream_name":"stderr","time":10.894845571,"data":"0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.\n"}
,{"stream_name":"stdout","time":12.3878314,"data":"hello 0\n"}
,{"stream_name":"stdout","time":13.387848709,"data":"hello 1\n"}
,{"stream_name":"stdout","time":14.388121119000001,"data":"hello 2\n"}
...
]

This is much different than the output for SSE live streams:

0.00s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
0.00s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
hello 0
hello 1
hello 2
...

Can we parse the JSON array output and print it in the same format as the live stream?

The midtier serves the persisted log blob as a JSON array of `{stream_name, time, data}` objects — the same shape as live SSE events — but the client was dumping it as raw text, so completed- session output looked like JSON noise instead of log lines. Parse the array and yield each entry, so the CLI's existing per-event rendering produces identical output whether the session is live or finished. Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>

kaggle-agent · 2026-06-16T22:50:55Z

Re: @djherbis's review:

Already addressed on this branch. _iter_blob_lines at src/kaggle/api/kaggle_api_extended.py:5336 now reads response.text, parses it with json.loads, and yields each {stream_name, time, data} entry as-is. The CLI's existing event.get("data") rendering then produces the same per-line output for completed sessions as for live SSE streams — no more raw JSON dump. Non-JSON blobs fall back to line-by-line so unexpected formats still print something readable. Tests in tests/test_kernels_logs.py cover the new behavior: the _blob_response helper builds a JSON array, test_kernels_logs_stream_falls_back_to_blob_for_completed_session and test_kernels_logs_stream_blob_fallback_with_octet_stream assert on the parsed event shape, and test_kernels_logs_stream_blob_fallback_handles_non_json covers the fallback path. All 22 tests pass.

kaggle-agent assigned djherbis May 5, 2026

djherbis requested changes May 5, 2026

View reviewed changes

Comment thread src/kaggle/api/kaggle_api_extended.py Outdated

djherbis requested review from bovard and stevemessick May 5, 2026 16:33

bovard approved these changes May 5, 2026

View reviewed changes

stevemessick approved these changes May 5, 2026

View reviewed changes

djherbis self-requested a review June 16, 2026 17:54

djherbis requested changes Jun 16, 2026

View reviewed changes

kaggle-agent changed the title ~~Rewrite kaggle kernels logs --follow to use SSE log stream~~ Stream kernel logs via midtier SSE endpoint with blob fallback Jun 16, 2026

djherbis requested changes Jun 16, 2026

View reviewed changes

kaggle-agent changed the title ~~Stream kernel logs via midtier SSE endpoint with blob fallback~~ Rewrite kaggle kernels logs --follow to use SSE log stream Jun 16, 2026

Merge branch 'main' into agent/herbison-20260505160932-d7e63a14

ff0df66

djherbis self-requested a review June 16, 2026 21:44

djherbis requested changes Jun 16, 2026

View reviewed changes

djherbis approved these changes Jun 16, 2026

View reviewed changes

djherbis merged commit c97b626 into main Jun 17, 2026
14 checks passed

djherbis deleted the agent/herbison-20260505160932-d7e63a14 branch June 17, 2026 13:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite `kaggle kernels logs --follow` to use SSE log stream#999

Rewrite `kaggle kernels logs --follow` to use SSE log stream#999
djherbis merged 10 commits into
mainfrom
agent/herbison-20260505160932-d7e63a14

kaggle-agent commented May 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

djherbis left a comment

Uh oh!

kaggle-agent commented Jun 16, 2026

Uh oh!

djherbis left a comment

Uh oh!

erdalsivri commented Jun 16, 2026

Uh oh!

kaggle-agent commented Jun 16, 2026

Uh oh!

djherbis left a comment

Uh oh!

kaggle-agent commented Jun 16, 2026

Uh oh!

djherbis Jun 16, 2026

Uh oh!

kaggle-agent Jun 16, 2026

Uh oh!

djherbis left a comment

Uh oh!

kaggle-agent commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

kaggle-agent commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

djherbis left a comment

Choose a reason for hiding this comment

Uh oh!

kaggle-agent commented Jun 16, 2026

Uh oh!

djherbis left a comment

Choose a reason for hiding this comment

Uh oh!

erdalsivri commented Jun 16, 2026

Uh oh!

kaggle-agent commented Jun 16, 2026

Uh oh!

djherbis left a comment

Choose a reason for hiding this comment

Uh oh!

kaggle-agent commented Jun 16, 2026

Uh oh!

djherbis Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

kaggle-agent Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

djherbis left a comment

Choose a reason for hiding this comment

Uh oh!

kaggle-agent commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kaggle-agent commented May 5, 2026 •

edited

Loading