Support tool callbacks in MCP sampling#2998
Conversation
|
/review |
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟡 NEEDS ATTENTION
Two medium-severity findings in the newly-added sampling-with-tools code. The core stream-aggregation logic, capability handshake, content building, and limits enforcement are all well-structured and correctly tested.
|
Sorry @EronWright, our project requires all the commit to be signed and verified. Could you sign them? Thanks! |
|
I moved it back to draft @EronWright the time you rewrite the history with signed commits. Sorry for the issue |
b1fad0b to
89a93e6
Compare
|
@aheritier sorry for delay, all commits are now signed/verified. |
89a93e6 to
8e150a2
Compare
|
/review |
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟡 NEEDS ATTENTION
Two medium-severity issues found in the new sampling-with-tools path. Both relate to edge cases in the streaming and reconnect logic.
| if choice.FinishReason != "" { | ||
| providerFinishReason = choice.FinishReason | ||
| } | ||
| if choice.FinishReason == chat.FinishReasonStop || choice.FinishReason == chat.FinishReasonLength { |
There was a problem hiding this comment.
[MEDIUM] Early break on FinishReasonStop may drop tool-call chunks from later SSE events
drainSamplingStreamWithTools breaks out of the stream-read loop as soon as it sees FinishReasonStop or FinishReasonLength:
if choice.FinishReason == chat.FinishReasonStop || choice.FinishReason == chat.FinishReasonLength {
break
}Tool-call deltas that appear in the same chunk as the finish-reason are captured correctly (they're processed in the for _, delta := range choice.Delta.ToolCalls loop a few lines earlier). However, if a provider splits its response so that tool-call argument chunks arrive in subsequent SSE frames after the finish-reason frame, those deltas are silently discarded.
The analogous handleStream in streaming.go guards against this with a continue that skips the early-exit when the current chunk contains only tool-call deltas (no finish reason yet). That pattern is missing here.
Impact: Incomplete or missing tool calls are returned to the MCP server as a ToolUseContent block with truncated arguments, causing the server's tool-execution loop to fail or produce wrong results. The failure is silent — no error is returned.
Suggested fix: Mirror the streaming.go guard: after accumulating tool-call deltas, continue to the next chunk if choice.FinishReason == "", and only break on FinishReasonStop/FinishReasonLength once all pending deltas have been flushed.
There was a problem hiding this comment.
Fixed in e4ff6d496 by mirroring the streaming.go continue-on-tool-call-only-chunk guard — any chunk with tool-call deltas and no finish reason now short-circuits to the next frame instead of falling into the FinishReason capture/break path. Functionally a no-op for the current code, but pre-empts the regression you described if anyone inserts processing between the tool-call loop and the break. New subtest TestDrainSamplingStreamWithTools/tool-call-only chunks continue without falling into finish logic exercises a 4-chunk argument-fragment stream.
| PromptListChangedHandler: promptChanged, | ||
| } | ||
| switch { | ||
| case c.samplingWithToolsHandler != nil: |
There was a problem hiding this comment.
[MEDIUM] Reconnect race: sampling handler may not be registered with the SDK if configureToolsetHandlers hasn't run yet
Initialize (called on both the initial connect and every supervisor-driven reconnect) reads c.samplingWithToolsHandler and c.samplingHandler at call time to decide which SDK-level callback to register:
switch {
case c.samplingWithToolsHandler != nil:
opts.CreateMessageWithToolsHandler = c.handleSamplingWithToolsRequest
case c.samplingHandler != nil:
opts.CreateMessageHandler = c.handleSamplingRequest
}
// if both are nil → no sampling callback registered with the SDK clientThe SDK client is constructed with opts and never updated afterwards. If a reconnect happens before the runtime's configureToolsetHandlers has called SetSamplingWithToolsHandler / SetSamplingHandler, both fields are nil at Initialize time. The resulting SDK session has no CreateMessage* handler at all, so any sampling/createMessage request from the server is rejected with an error for the lifetime of that session.
This is a regression from the previous unconditional registration pattern, where the basic sampling callback was always wired. The comment "callers must invoke … before any reconnect" documents the contract, but it is not enforced by any ordering guarantee in the reconnect path today.
Impact: An MCP server that triggers sampling during or shortly after a reconnect (e.g., a long-running agentic loop that reconnects due to a network hiccup) will receive an error from the sampling handler instead of a model response, breaking its tool-execution loop.
Suggested mitigation: Either (a) guarantee in the supervisor/reconnect path that configureToolsetHandlers is always called before Initialize completes, and add a test that verifies this ordering; or (b) fall back to registering a best-effort handler at Initialize time that reads the handler fields lazily at request time (similar to how handleSamplingWithToolsRequest already reads c.samplingWithToolsHandler under the mutex), so a late SetSamplingWithToolsHandler call is still effective even after reconnect.
There was a problem hiding this comment.
Fixed in e4ff6d496 by extracting the registration logic into sessionClient.applySamplingHandlerOpts (used by both stdio.Initialize and remote.Initialize). The helper now wires CreateMessageWithToolsHandler eagerly even when both handler fields are still nil at Initialize time; CreateMessageHandler is only registered when the caller has explicitly opted into basic-only sampling. The callback itself already reads samplingWithToolsHandler under c.mu at request time (session_client.go:handleSamplingWithToolsRequest), so a late SetSamplingWithToolsHandler — including on a supervisor-driven reconnect that races configureToolsetHandlers — takes effect without re-init.
Coverage: TestApplySamplingHandlerOpts_RegistrationMatrix pins the choice across all four handler-field combinations (and asserts we never register both, which would panic the SDK), and TestHandleSamplingWithToolsRequest_LateSetterTakesEffect walks the helper → nil-error → late SetSamplingWithToolsHandler → invoked sequence end-to-end.
Adds a parallel SamplingWithToolsHandler alongside the existing SamplingHandler so MCP servers can include a tools array in sampling/createMessage requests. The host drives its model with those tools and returns any tool_use blocks as ToolUseContent; the server remains responsible for executing the tool and continuing the loop in a follow-up sampling request. The initialize handshake now advertises sampling.tools capability, and the MCP toolset selects the appropriate go-sdk handler (basic vs. with-tools) based on which handler is registered. Signed-off-by: Eron Wright <eronwright@gmail.com>
Mounts an in-process gomcp.NewServer on an httptest server via StreamableHTTPHandler. Its one tool, ask_with_calculator, runs a sampling loop: sends sampling/createMessage with a calculator tool, gets a tool_use back from the host LLM, "executes" the calculator, sends a follow-up sampling request carrying the tool_result, and returns the final text. The Gemini side is recorded once and replayed on subsequent runs, so the test runs offline in CI. Signed-off-by: Eron Wright <eronwright@gmail.com>
- Reject tool_use blocks under a non-assistant role explicitly rather than silently dropping the message. The MCP spec places tool_use on assistant turns, but a malformed server would previously have its message disappear from the converted chat history with no error. - Document the ordering dependency between SetSampling*Handler and Initialize in both stdio and remote MCP client setups, so future maintainers don't reorder them and silently lose the sampling handler on reconnect. Signed-off-by: Eron Wright <eronwright@gmail.com>
Use errors.New for the static loop-exhausted message and strconv.FormatInt for the calculator result instead of fmt.Errorf / fmt.Sprintf %d. Signed-off-by: Eron Wright <eronwright@gmail.com>
- pkg/runtime/sampling.go: mirror the streaming.go short-circuit in
drainSamplingStreamWithTools so a chunk that carries only tool-call
deltas continues to the next frame instead of falling through into
the FinishReason capture and break. The existing flow happens to be
equivalent, but the explicit guard pre-empts regression if anyone
inserts new processing between the tool-call loop and the break.
- pkg/tools/mcp/{stdio,remote,session_client}.go: collapse the inline
CreateMessage* registration switch into a shared
sessionClient.applySamplingHandlerOpts helper. The helper always
wires the with-tools callback unless the caller has explicitly opted
into the basic-only path, so a SetSamplingWithToolsHandler call
landing after Initialize — including on supervisor-driven reconnect
that races configureToolsetHandlers — still takes effect via the
existing lazy read in handleSamplingWithToolsRequest. Closes the
silent no-handler window the reviewer flagged on stdio.go:53 and
remote.go:104.
Tests:
- TestDrainSamplingStreamWithTools/tool-call-only chunks continue ...
- TestApplySamplingHandlerOpts_RegistrationMatrix
- TestHandleSamplingWithToolsRequest_LateSetterTakesEffect
Signed-off-by: Eron Wright <eronwright@gmail.com>
8e150a2 to
e4ff6d4
Compare
Summary
Closes the tool callbacks functional gap in MCP sampling support — a follow-up to #2815, addressing one of the remaining items from #2809.
When an MCP server includes a
toolsarray in asampling/createMessagerequest, the host now drives its model with those tools and returns anytool_useblocks back to the server asToolUseContent. The server remains responsible for executing the tool and continuing the loop in a follow-up sampling request.sequenceDiagram participant H as cagent participant S as MCP Server participant L as LLM activate H H->>+S: tools/call {name, arguments} note over S: needs LLM inference S->>+H: sampling/createMessage<br/>{messages, tools: [...]} H->>+L: chat completion L-->>-H: ToolUseContent<br/>stopReason: "toolUse" H-->>-S: CreateMessageResult<br/>{tool_use, stopReason: "toolUse"} note over S: executes tool locally S->>+H: sampling/createMessage<br/>{messages + tool_use + tool_result, tools: [...]} H->>+L: chat completion L-->>-H: TextContent<br/>stopReason: "endTurn" H-->>-S: CreateMessageResult<br/>{text, stopReason: "endTurn"} S-->>-H: tool result deactivate HWhat's new
SamplingWithToolsHandlertype andSampleableWithToolsinterface — additive, parallel to the existingSamplingHandler/Sampleable. No breaking changes to the basic sampling path merged in feat(mcp): add sampling/createMessage support #2815.Initialize, exactly one of the SDK's mutually exclusiveClientOptions.CreateMessage*fields is populated — prefer with-tools when registered, fall back to basic.sampling.toolsso servers know the host can receive tool-enabled requests.pkg/runtime/sampling.go):text,image/audio,tool_use→ assistantToolCalls,tool_result→MessageRoleToolrows (parallel tool_results expand to multiple chat.Message rows).[]*mcp.Tool→[]tools.Toolwith a no-op handler (the server, not the host, executes).model.CreateChatCompletionStream, aggregates streamed tool calls.ContentwithTextContent+ToolUseContentblocks;stopReason: "toolUse"when tool calls are present.maxSamplingTools=64,maxSamplingToolCalls=32.e2e/sampling_test.go): mounts an in-processgomcp.NewServeron anhttptestserver viaStreamableHTTPHandler. The server exposes one tool (ask_with_calculator) whose handler drives a real sampling-with-tools loop against the connecting cagent. The Gemini side is recorded once and replayed on subsequent runs, so the test runs offline in CI.Out of scope (separate gaps from #2809)
Test plan