Skip to content

fix: [ENG-2484] validate openai-compatible URL and defer provider activation#549

Open
RyanNg1403 wants to merge 2 commits intomainfrom
fix/ENG-2484
Open

fix: [ENG-2484] validate openai-compatible URL and defer provider activation#549
RyanNg1403 wants to merge 2 commits intomainfrom
fix/ENG-2484

Conversation

@RyanNg1403
Copy link
Copy Markdown
Collaborator

Summary

  • Problem: First-time onboarding for the OpenAI-compatible provider silently set llama3 as the active model — a placeholder that doesn't exist on most users' Ollama/LM Studio setups. A bad base URL was accepted as "connected" with no validation, and the resulting half-configured state could hang the REPL after a disconnect/reconnect cycle.
  • Why it matters: Linear ENG-2484 reports the visible symptom (agent uses a non-existent model on first connect). The follow-on issues — TUI hang, welcome-page-mid-flow detour — are deterministic regressions from the same root design (writing a placeholder + flipping activeProvider before the user has a usable model).
  • What changed:
    • Removed defaultModel: 'llama3' from the openai-compatible registry entry (no sensible canonical default exists).
    • Added server-side /models validation in setupConnect for openai-compatible — bad URL → {success: false} with a friendly message; no keychain or config write.
    • Added setAsActive?: boolean option to connectProvider. setupConnect now skips withActiveProvider when no activeModel will be set, preventing the welcome-page-mid-flow detour that unmounted the slash command's ProviderFlow and hung the REPL.
    • Exposed activeModel on ProviderDTO so the TUI can route a connected-but-no-model provider to the model picker directly (only when the provider is current — non-current half-configured providers still reach the actions menu so Disconnect stays accessible).
    • TUI's handleApiKeySuccess now checks response.success and surfaces the server's error inline at the provider list (previously swallowed silently).
  • What did NOT change (scope boundary): Other providers' connect/disconnect/setActive flows. ByteRover's special-cased flows. The ApiKeyDialog validation contract for non-openai-compatible providers. The picker UI itself. The model:setActive handler.

Type of change

  • Bug fix
  • New feature
  • Refactor (no behavior change)
  • Documentation
  • Test
  • Chore (build, dependencies, CI)

Scope (select all touched areas)

  • TUI / REPL
  • Agent / Tools
  • LLM Providers
  • Server / Daemon
  • Shared (constants, types, transport events)
  • CLI Commands (oclif)
  • Hub / Connectors
  • Cloud Sync
  • CI/CD / Infra

Linked issues

  • Closes ENG-2484
  • Related N/A

Root cause (bug fixes only, otherwise write N/A)

  • Root cause: setupConnect always wrote activeModel: provider?.defaultModel and called withActiveProvider unconditionally. For openai-compatible, defaultModel: 'llama3' was a placeholder that almost never matched the user's actual server. Combined with no URL validation, this meant any connect attempt — successful or not — committed {activeProvider: 'openai-compatible', activeModel: 'llama3'} to disk. The picker step in the TUI either fetched empty models from a wrong URL or showed a model the user hadn't picked. Removing the placeholder without also gating withActiveProvider introduced a different bug: the half-configured state (activeProvider set, activeModel undefined) was interpreted by useAppViewMode as 'config-provider', causing the welcome page to mount mid-flow and unmount the slash command's ProviderFlow before its onComplete could fire — leaving the REPL stuck.
  • Why this was not caught earlier: The original placeholder masked the "active provider with no active model" state from useAppViewMode, so the welcome-page-mid-flow detour was only possible after the placeholder was removed. The first iteration of this fix removed the placeholder without addressing the activation contract, exposing the second bug deterministically. Tests now pin both behaviors.

Test plan

  • Coverage added:
    • Unit test
    • Integration test
    • Manual verification only
  • Test file(s):
    • test/unit/core/domain/entities/provider-registry.test.ts — pins openai-compatible.defaultModel === undefined.
    • test/unit/infra/transport/handlers/provider-handler.test.ts — 8 new tests covering URL validation, no-pre-write of activeModel, setAsActive decisions across openai-compatible (no-existing-model and existing-model paths) and non-openai-compatible providers.
  • Key scenario(s) covered:
    • Bad/unreachable openai-compatible URL → {success: false}, no keychain write, no config write, no broadcast.
    • Good URL → {success: true}, no activeModel placeholder, setAsActive reflects whether a model will end up set.
    • Reconfigure preserves existing activeModel → activation re-enabled.
    • Non-openai-compatible providers untouched (still activate via defaultModel).
  • Manual verification:
    • CLI repro of the deterministic hang scenario (disconnect openai-compatible from non-current state, reconnect to mock URL): activeProvider correctly remained at the prior provider; openai-compatible was connected without activeModel; no welcome-page detour.
    • TUI verified end-to-end by the requester: disconnect/reconnect on openai-compatible no longer hangs; bad URL surfaces inline at the provider list; picker shows real models from the user's Ollama; non-current half-configured providers reach the actions menu (Disconnect accessible).

User-visible changes

  • Bad base URL on openai-compatible now produces an inline error at the provider list (Could not reach OpenAI-compatible endpoint at <URL>: <details>) instead of silently committing a broken config.
  • A successful first-time connect to openai-compatible no longer pre-selects a model. The picker is rendered with the user's actual /models listing, and the user must select one for the provider to become active.
  • Disconnect/reconnect of openai-compatible from the home page no longer hangs the REPL.
  • For openai-compatible specifically: if the provider is connected but no model has been picked yet, picking it from the welcome page jumps straight to the model picker; from /providers (non-current case) the actions menu is shown with Disconnect reachable.

Evidence

  • 6881 unit tests pass (npm test); typecheck clean; build succeeds.
  • CLI verification: brv providers connect openai-compatible --base-url http://localhost:1/nope returns {"success":false,"error":"Could not reach OpenAI-compatible endpoint at http://localhost:1/nope: API error: "} with no state pollution. Good-URL connect produces config without activeModel.
  • TUI deterministic repro (disconnect openai-compatible while another provider is current → reconnect → pick model): activeProvider preserved during connect, slash command flow stays mounted, no hang.

Checklist

  • Tests added or updated and passing (npm test) — 6881 passing
  • Lint passes (npm run lint) — full lint blocked by pre-existing submodule config issue (@workspace/typescript-config/react-library.json not found in packages/byterover-packages/ui/); confirmed by stashing changes. Targeted lint on changed files is clean.
  • Type check passes (npm run typecheck)
  • Build succeeds (npm run build)
  • Commits follow Conventional Commits format
  • Documentation updated (if applicable) — N/A
  • No breaking changes
  • Branch is up to date with main

Risks and mitigations

  • Risk: Other consumers of connectProvider (currently only the OAuth callback handler) might rely on the implicit withActiveProvider. Mitigation: setAsActive defaults to true, so callers that don't pass it preserve existing behavior. The OAuth handler always supplies a defaultModel, so its setAsActive would be true regardless. Verified: existing OAuth tests still pass.
  • Risk: Users with stale post-original-fix state (activeProvider: 'openai-compatible' with no activeModel) land on the welcome page on first launch. Mitigation: the TUI auto-route in handleSelect jumps to the model picker when the user picks the current half-configured provider, so they can finish setup in one screen.
  • Risk: If the openai-compatible endpoint responds 200 OK to /models but isn't actually OpenAI-compatible (returns garbage), validation passes and the picker may show empty/garbage. Mitigation: explicitly disclosed as a narrow edge case; user can recover via Disconnect or brv providers disconnect. Out of scope for this ticket.

…ivation

Removes the `llama3` placeholder default that silently corrupted config when
users couldn't reach the model picker, validates the user-provided base URL
against `/models` before persisting any state, and stops flipping the active
provider during connect when no active model would be set — that intermediate
state was unmounting the slash command's flow via the welcome view, hanging
the REPL after the user picked a model.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 25, 2026

Claude finished @RyanNg1403's task in 3m 51s —— View job


PR Review: fix/ENG-2484 — validate openai-compatible URL and defer provider activation

  • Gather context and read changed files
  • Review provider-registry.ts — defaultModel removal
  • Review i-provider-config-store.ts — interface changes
  • Review file-provider-config-store.ts — store implementation
  • Review provider-handler.ts — URL validation & setAsActive logic
  • Review dto.ts — ProviderDTO changes
  • Review provider-flow.tsx — TUI routing changes
  • Review test files — coverage & correctness
  • Post inline comments & final review summary

Overall Assessment: 4 / 5

The root cause analysis is precise and the fix is well-scoped. The three-part solution (remove placeholder, validate before persist, defer activation) correctly addresses all reported symptoms without touching unrelated flows. Test coverage is solid for the new invariants. A few minor issues noted below.


What's well done

Validation-before-persist ordering (provider-handler.ts:263–274) is the right design: the URL check happens before any keychain or config write, so a failed connect cannot leave a half-written state. This was the most important correctness property to get right.

setAsActive === false as an explicit opt-out (file-provider-config-store.ts:72) is the correct convention — callers that don't pass the flag get the safe default of "also activate". No existing caller is silently broken.

ensureSuccess wrapper (provider-flow.tsx:50–53) is a clean way to promote the transport's success: false response into a thrown error so the existing try/catch picks it up. Previously this path was silently swallowed, which was the UX bug.

setActiveProvider is correctly deferred to model-handler.ts:181 — when the user picks a model via model:setActive, the handler calls both setActiveProvider and setActiveModel. This is the right moment to flip the active provider, and the chain is already in place.

Injected validator (ProviderHandlerDeps.validateOpenAICompatibleEndpoint) makes the network call fully testable without actual HTTP, following the project's DI pattern. The real fetcher is used as the default, which is also the right approach.


Issues found

Redundant getActiveModel I/O for all providersprovider-handler.ts:287–288. For providers that have a defaultModel (e.g., openrouter), provider?.defaultModel is already truthy and existingActiveModel is never needed. The current code always incurs a file read. See inline comment for a short-circuit fix.

activeModel: provider?.defaultModel passes undefined explicitlyprovider-handler.ts:290. For openai-compatible, defaultModel is now undefined, so the key activeModel: undefined is passed to withProviderConnected. Depending on how that method spreads options, it could overwrite a previously stored activeModel with undefined. The value should be omitted entirely when absent. See inline comment.

Missing CLI guard for absent baseUrlprovider-handler.ts:265. The validation block is gated on baseUrl being truthy, which is correct for the TUI (always routed through BaseUrlDialog). But a direct brv providers connect openai-compatible CLI call with no --base-url will succeed and store a config with an empty URL. This is currently a silent footgun for CLI users. See inline comment.

needsModelPick uses provider.id !== 'byterover'provider-flow.tsx:186. The intent is clearly openai-compatible-specific (it's the only provider that can be connected without an activeModel). Using != 'byterover' leaves the condition open to any future provider that also lacks a defaultModel. Being explicit about openai-compatible would be safer. See inline comment.

Missing test: model:setActivesetActiveProvider chainprovider-handler.test.ts:315. The PR's behavioral guarantee rests on the claim that model-handler activates the provider when a model is picked. This is true today (model-handler.ts:181), but no test enforces it. A future refactor that removes that setActiveProvider call would silently break the openai-compatible flow. See inline comment.

selectedProvider not cleared on connect errorprovider-flow.tsx:353. Minor — handleSelect always overwrites it, so there's no user-visible bug, but it's inconsistent with the other error paths that clear selectedProvider. See inline comment.


Security

No concerns. The error message includes the user-supplied baseUrl inline, but this is a CLI tool where the same user who provided the URL sees the error — no injection surface exists. API keys are only persisted after successful validation.

Comment thread src/server/infra/transport/handlers/provider-handler.ts Outdated
Comment thread src/server/infra/transport/handlers/provider-handler.ts
Comment thread src/server/infra/transport/handlers/provider-handler.ts Outdated
Comment thread src/tui/features/provider/components/provider-flow.tsx Outdated
Comment thread src/tui/features/provider/components/provider-flow.tsx
Comment thread test/unit/infra/transport/handlers/provider-handler.test.ts
Comment thread src/server/infra/storage/file-provider-config-store.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d311e45c0d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/server/infra/transport/handlers/provider-handler.ts Outdated
…le setup

Falls back to stored baseUrl/apiKey on partial reconfigure so an endpoint
that requires auth no longer 401s when the user keeps the same key. Rejects
openai-compatible connects that have no effective baseUrl (request + stored)
instead of silently storing a useless config entry. Short-circuits the
existingActiveModel lookup when the registry already supplies a defaultModel.
Tightens the TUI auto-route to openai-compatible specifically, clears
selectedProvider on connect failure for symmetry with sibling error paths,
and adds a model:setActive test pinning the activate-on-pick chain that the
deferred activation design depends on.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 25, 2026

Claude finished @RyanNg1403's task in 2m 44s —— View job


PR Review: fix/ENG-2484 — validate openai-compatible URL and defer provider activation

  • Gather context and read changed files
  • Review latest commit changes (refactor addressing prior feedback)
  • Review provider-handler.ts — URL validation & setAsActive logic
  • Review file-provider-config-store.ts — store implementation
  • Review provider-flow.tsx — TUI routing changes
  • Review test files — coverage & correctness
  • Post final review summary

Overall Assessment: 4.5 / 5

The refactor commit (03d9d9d) addresses every issue raised in the prior review and adds the stored-credential fallback (Codex P2). The implementation is correct, well-scoped, and properly tested. One minor residual nit remains and one slight concern about sequential reads — both documented below.


What the latest refactor fixed ✅

All six prior-review issues were resolved:

  1. Redundant getActiveModel I/O — short-circuited correctly: Boolean(provider?.defaultModel) || Boolean(await ...) only incurs the file read for openai-compatible (the only provider without a defaultModel).

  2. CLI guard for absent baseUrlprovider-handler.ts:271–276 correctly returns a friendly error when neither the request nor the stored config supplies a baseUrl. Validated by the new "should reject with friendly error when no baseUrl is provided and none is stored" test.

  3. needsModelPick scoped to openai-compatible explicitlyprovider-flow.tsx:188–189 now checks provider.id === 'openai-compatible' instead of the broader != 'byterover' form.

  4. selectedProvider cleared on connect errorprovider-flow.tsx:358 now calls setSelectedProvider(null) for symmetry with other error paths.

  5. model:setActivesetActiveProvider chain test — pinned in model-handler.test.ts. This test is a correctness regression guard, not just documentation — well done.

  6. Stored API key fallback on reconfigureprovider-handler.ts:278 falls back to the keychain key when the request omits it, so re-configuring only the URL no longer 401s on auth-required endpoints.


Remaining nit: activeModel: undefined still passed explicitly

provider-handler.ts:306

await this.providerConfigStore.connectProvider(providerId, {
  activeModel: provider?.defaultModel,   // undefined for openai-compatible
  ...
})

For openai-compatible, provider?.defaultModel is undefined, so activeModel: undefined is explicitly included in the options object. This is harmless because withProviderConnected uses:

activeModel: options?.activeModel ?? existingConfig?.activeModel,

The ?? operator treats undefined as "fallthrough", so the stored model is correctly preserved. However, the explicit undefined key is misleading — a future reader might not check withProviderConnected and assume the field is being cleared. Cleaner form:

await this.providerConfigStore.connectProvider(providerId, {
  ...(provider?.defaultModel !== undefined && {activeModel: provider.defaultModel}),
  authMethod: apiKey ? 'api-key' : undefined,
  baseUrl,
  setAsActive: willHaveActiveModel,
})

Fix this →


Minor observation: two sequential read() calls in the openai-compatible block

provider-handler.ts:269 and 304

The validation block calls this.providerConfigStore.read() to get existingBaseUrl at line 269. Then at line 304, getActiveModel(providerId) internally calls read() again. Since FileProviderConfigStore.getActiveModel is:

public async getActiveModel(providerId: string): Promise<string | undefined> {
  const config = await this.read()
  return config.getActiveModel(providerId)
}

For openai-compatible connects (the only path that reaches both), this is two file reads when one would do. This is a performance micro-nit rather than a correctness issue. An alternative that folds both lookups into a single read() would be slightly cleaner, but the current form is correct and the overhead is negligible for an interactive setup flow.


Security

No concerns. Error messages include the user-supplied effectiveBaseUrl but are shown only to the same user who supplied it — no injection surface.


Test coverage

The 10 new tests in provider-handler.test.ts cover all key invariants (validate-before-persist, no-config-on-failure, setAsActive routing, stored-credential fallback). The new model-handler.test.ts test pins the deferred-activation chain that the whole design depends on. Coverage is solid.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant