OAuth client: harden SEP-2352/SEP-2350 edge cases; fix conformance comment#2936
Conversation
When --spec-version is omitted, the conformance harness picks the version per-scenario (LATEST_SPEC_VERSION for active scenarios, DRAFT_PROTOCOL_VERSION for draft-only ones), not a flat LATEST_SPEC_VERSION. The variable is still log-only today, but the stateless 2026 client path will branch on it.
Three follow-up fixes to the SEP-2352 / SEP-2350 work: - Backfill an omitted token-response `scope` with the requested scope (RFC 6749 §5.1/§6) before persisting, so the SEP-2350 step-up union still recovers prior grants after a process restart when the authorization server omits `scope` because granted == requested. Applied to both the authorization-code and refresh response handlers. - Clear cached `oauth_metadata` when the SEP-2352 issuer-mismatch block discards bound credentials, so a subsequent ASM-discovery failure for the new authorization server cannot leave the old server's registration/token endpoints in place for Step 4. - Re-evaluate the SEP-2352 issuer-binding check against `oauth_metadata.issuer` after ASM discovery on the legacy no-PRM path (PRM discovery failed, AS found via root well-known fallback), mirroring the existing stamping-side fallback so stale credentials are still discarded when the binding can only be learned post-ASM.
When ASM discovery fails, dynamic client registration falls back to the resource-server origin's `/register`. Recording that registration as bound to a PRM-advertised authorization server we never actually reached would persist a mis-bound record that the SEP-2352 binding check then accepts on every later flow, wedging the client on the wrong `client_id`. Leave the issuer unset in that case so a later 401 with working discovery re-evaluates cleanly.
`registration_endpoint` is optional in RFC 8414 metadata, and DCR falls back to the resource-server origin's `/register` when it is absent. Recording that fallback registration as bound to the discovered issuer asserts a binding that was never established, so only stamp the issuer in the DCR branch when the discovered metadata actually carried a `registration_endpoint`. The CIMD branch is unaffected since CIMD records are portable across authorization servers.
| # Only record the issuer when the registration above actually targeted | ||
| # the discovered AS's registration_endpoint. With no metadata, or | ||
| # metadata that omits registration_endpoint, DCR fell back to the | ||
| # resource-server origin's /register — recording that as bound to a | ||
| # PRM-advertised AS would persist a binding that was never established. | ||
| if ( | ||
| self.context.oauth_metadata is not None | ||
| and self.context.oauth_metadata.registration_endpoint is not None | ||
| ): | ||
| client_information.issuer = discovered_issuer |
There was a problem hiding this comment.
🟡 The new registration_endpoint is not None gate also drops the issuer stamp in the legacy same-origin case (PRM 404 → root ASM discovery succeeds with issuer == resource origin, metadata omits the optional registration_endpoint, DCR falls back to that same origin's /register) — there the registration genuinely targets the discovered issuer's host, and pre-#2933-PR the record was correctly bound, so a later AS migration triggered discard + re-registration. Since credentials_match_issuer returns True for issuer is None, the unstamped record is now silently reused after a migration, regressing SEP-2352 auto-recovery for that deployment shape; consider also stamping when the fallback /register origin matches the discovered issuer's origin (the new parametrized test only covers the cross-origin variant).
Extended reasoning...
The over-suppression. The new gate at oauth2.py:678-682 only stamps client_information.issuer when self.context.oauth_metadata.registration_endpoint is not None. The comment's rationale is to avoid recording "a binding that was never established" when DCR fell back to the resource-server origin's /register instead of the discovered AS's registration_endpoint. But the gate is keyed on registration_endpoint presence, not on whether the fallback URL actually belongs to a different server than the discovered issuer — and that over-suppresses in the legacy case where the AS is the resource origin. There, the fallback /register is on the discovered issuer's own host, so the binding was established; only the path was guessed.
Concrete walkthrough (legacy same-origin embedded AS, no PRM):
- Server at
https://api.example.comruns its own embedded AS. PRM discovery 404s, soauth_server_urlstaysNone; root ASM discovery athttps://api.example.com/.well-known/oauth-authorization-serversucceeds withissuer = https://api.example.com, but the metadata omits the optionalregistration_endpoint(it is optional per RFC 8414 /OAuthMetadatainsrc/mcp/shared/auth.py). - Step 4 DCR:
create_client_registration_request(src/mcp/client/auth/utils.py:284-287) falls back tourljoin(resource-origin, '/register')=https://api.example.com/register— the same server as the discovered issuer. - Pre-PR (commit 4472428 / Bind client credentials to their authorization server (SEP-2352) #2933),
bound_issuerfell back tooauth_metadata.issuer, so the persisted record was correctly bound tohttps://api.example.com. Post-PR, the gate seesregistration_endpoint is Noneand leavesissuerunset. - Later the resource migrates to an external AS: PRM now advertises
https://new-as.example.com. The post-PRM binding check (oauth2.py:584-596) callscredentials_match_issuer, which returns True wheneverclient_info.issueris None (utils.py:343-344). The unstamped record is kept, and the SDK presents the old embedded AS'sclient_idto the new AS — authorization fails with no automatic re-registration. Pre-PR, the same sequence detected the issuer mismatch, discarded the record, and re-registered cleanly. That auto-recovery is exactly what SEP-2352 binding (Bind client credentials to their authorization server (SEP-2352) #2933) added, and this gate silently removes it for AS deployments whose metadata omitsregistration_endpointwhile still serving the legacy/registerpath.
Why nothing else catches it. The new post-ASM re-check earlier in the flow only helps when a binding exists to compare; once records are persisted unstamped, every later 401 reuses them regardless of which AS the resource now advertises. The PR's new parametrized test (asm-metadata-without-registration-endpoint) only exercises the cross-origin variant (issuer https://new-as.example.com vs resource api.example.com), where leaving the issuer unset is the right call — the same-origin variant, where the old stamp was correct, is untested and regresses unnoticed.
Why this is distinct from the existing review comments. The two prior inline comments argued the stamp was wrong when the fallback /register belongs to a different server than the advertised AS (cross-origin case), and this PR implemented that suggestion. This finding is the flip side: the chosen gate (registration_endpoint presence) is too coarse and also drops the stamp when the fallback /register is on the same origin as the discovered issuer, where the binding was both correct and useful.
Scope and severity. The trigger is narrow: an AS that publishes RFC 8414 metadata without registration_endpoint yet serves DCR at the legacy /register path, followed by a later AS migration. The failure mode degrades to the documented "unbound credentials are reused" semantics rather than persisting a wrong binding, so this is non-blocking.
Suggested fix. Instead of gating purely on registration_endpoint presence, also stamp when the discovered issuer's origin matches get_authorization_base_url(server_url) (i.e. the fallback /register URL is on the issuer's own host) — or compute the actual registration URL once and stamp iff it was derived from / belongs to the discovered issuer. Either keeps the cross-origin suppression this PR adds while preserving #2933's auto-recovery for same-origin embedded-AS deployments.
| # RFC 6749 §6: an omitted scope on refresh means the scope is unchanged from | ||
| # the prior access token. Carry it forward so the persisted token stays | ||
| # self-describing for the SEP-2350 step-up union after a restart. | ||
| if token_response.scope is None and self.context.current_tokens is not None: | ||
| token_response.scope = self.context.current_tokens.scope | ||
|
|
There was a problem hiding this comment.
🟣 Pre-existing issue in the block this PR touches: _handle_refresh_response now carries an omitted scope forward per RFC 6749 §6, but an omitted refresh_token (also allowed by §6 and common with non-rotating ASes) is not — the parsed token has refresh_token=None and unconditionally overwrites both context.current_tokens and persistent storage, so the next expiry forces a full interactive re-authorization (or fails for headless clients). The fix is the same one-line pattern this PR adds for scope: backfill token_response.refresh_token from self.context.current_tokens.refresh_token when the response omits it.
Extended reasoning...
The bug. _handle_refresh_response (src/mcp/client/auth/oauth2.py:469-495) parses the refresh response into a fresh OAuthToken and then unconditionally does self.context.current_tokens = token_response followed by await self.context.storage.set_tokens(token_response). OAuthToken.refresh_token is Optional with a default of None (src/mcp/shared/auth.py), so when the authorization server omits refresh_token from the refresh response, the still-valid stored refresh token is overwritten with None both in memory and in persistent storage. RFC 6749 §6 explicitly allows this server behavior — the AS MAY issue a new refresh token; when it does not, the client is expected to keep using the previously issued one. Non-rotating refresh tokens are common in practice (Google and many enterprise ASes omit refresh_token from refresh responses).
Code path. async_auth_flow → can_refresh_token() true → _refresh_token() builds the request → _handle_refresh_response handles the 200. The handler validates the JSON, applies the new scope carry-forward this PR adds (lines 480-484), and then replaces the stored token wholesale. Nothing anywhere in src/mcp/client/auth preserves the prior refresh_token — the only other uses are can_refresh_token() (which requires it) and building the refresh request itself. The only carry-forward logic in the module is the scope backfill introduced two lines above the overwrite.
Step-by-step proof.
- Initial authorization grants
{access_token: A1, refresh_token: R1, expires_in: 3600}; both are persisted. - An hour later the access token expires.
async_auth_flowseescan_refresh_token()is true (R1 is stored) and issues the refresh. - The AS responds
200 {"access_token": "A2", "token_type": "Bearer", "expires_in": 3600}— norefresh_token, which §6 permits and non-rotating ASes routinely do. OAuthToken.model_validate_jsonproduces a token withrefresh_token=None. The handler stores it incontext.current_tokensand callsstorage.set_tokens, wiping R1 from memory and from persistent storage.- At the next expiry,
can_refresh_token()(oauth2.py:142-144) returnsFalsebecausecurrent_tokens.refresh_tokenisNone. The client silently falls back to a full interactive 401 re-authorization (redirect + callback) — an unnecessary user interaction for interactive clients, and an outright failure for headless or long-running ones. After a process restart the persisted record has also lost the refresh token, so the degradation survives restarts.
Why nothing prevents it. The existing tests always include refresh_token in mocked refresh responses, so the omitted case is uncovered, and there is no other code path that carries the prior value forward.
Severity / scope. This is pre-existing — the overwrite line predates this PR and the PR does not change runtime behavior here. It is flagged because the PR edits this exact handler and implements the identical RFC 6749 §6 carry-forward principle for the sibling scope field two lines above, so this is the natural place to fix it.
Fix. Mirror the scope backfill the PR adds:
if token_response.refresh_token is None and self.context.current_tokens is not None:
token_response.refresh_token = self.context.current_tokens.refresh_tokenplaced alongside the new scope carry-forward, before current_tokens is replaced and the token is persisted. A test analogous to test_handle_refresh_response_carries_prior_scope_when_response_omits_it (asserting the stored token keeps the old refresh_token) would cover it.
Three small follow-ups to #2933 / #2931 plus a comment correction from #2911.
Motivation and Context
Review of the merged SEP-2352 and SEP-2350 implementations turned up a few narrow paths where the new guarantees don't hold:
auth_server_url. When PRM discovery fails and the AS is found via the root/.well-known/oauth-authorization-serverfallback, the recorded issuer is never compared, so credentials bound to a previous AS are reused. The stamping side already falls back tooauth_metadata.issuerin this case; this adds the matching comparison.oauth_metadatafrom the old AS is left in place. If discovery for the new AS then fails, dynamic client registration runs against the old server'sregistration_endpointwhile stamping the new issuer, persisting a mis-bound record.scopefrom the token response when granted scope equals requested scope. After a process restart the reloaded token then hasscope=Noneandclient_metadata.scopehas reverted to its constructor value, so the step-up union collapses to just the challenged scope. Backfilling the requested scope before persisting (and carrying the prior scope forward on refresh, per §6) keeps the stored token self-describing.The conformance-client comment correction is doc-only: the harness picks
LATEST_SPEC_VERSIONvsDRAFT_PROTOCOL_VERSIONper-scenario when--spec-versionis omitted, not a flatLATEST_SPEC_VERSION. The value is still log-only today.How Has This Been Tested?
Three new tests in
tests/client/test_auth.py:test_handle_token_response_backfills_omitted_scope_from_requesttest_handle_refresh_response_carries_prior_scope_when_response_omits_ittest_issuer_binding_re_evaluated_after_asm_when_prm_discovery_failed— drives the auth-flow generator through PRM 404×2 → root ASM 200 with a mismatched issuer and asserts the next yield is a DCR request, not the stale client's authorize redirectThe existing
tests/interaction/auth/test_lifecycle.pySEP-2352 migration test exercises theoauth_metadata = Noneclear on the PRM-success path.Breaking Changes
None.
Types of changes
Checklist
Additional context
The 403 step-up path also skips PRM/ASM discovery entirely, so after a restart it can fall back to
/authorizeand/tokenon the resource server's origin rather than the AS. That fix needs the discovery sequence factored into a sub-generator both the 401 and 403 branches can drive — out of scope for this PR.AI Disclaimer