Skip to content

feat(webapp): add RUNTIME_API_ORIGIN to decouple runner traffic from external origin#3686

Open
ThullyoCunha wants to merge 1 commit into
triggerdotdev:mainfrom
ThullyoCunha:feat/runtime-api-origin
Open

feat(webapp): add RUNTIME_API_ORIGIN to decouple runner traffic from external origin#3686
ThullyoCunha wants to merge 1 commit into
triggerdotdev:mainfrom
ThullyoCunha:feat/runtime-api-origin

Conversation

@ThullyoCunha
Copy link
Copy Markdown
Contributor

@ThullyoCunha ThullyoCunha commented May 21, 2026

Closes #2821

✅ Checklist

  • I have followed every step in the contributing guide
  • The PR title follows the convention.
  • I ran and tested the code works

Summary

The webapp publishes API_ORIGIN to runner pods as TRIGGER_API_URL, so runner-to-webapp traffic flows back through whatever URL is configured for external clients. Self-hosting behind a tracing-enabled gateway (Envoy, Istio, kgateway, ...) breaks the parent->child run link in trigger.dev's run-detail tree because the gateway's W3C traceparent rewrite on egress overwrites the SDK's triggerAndWait() span id. The webapp then writes that gateway-generated span id as the child run's parentSpanId, which never reaches the trigger event store, so the child renders as an orphan in the UI.

We hit this on our self-hosted v4 cluster running kgateway with tracing enabled (spawnUpstreamSpan: true). Reproduced with three rounds of SDK debug instrumentation (capturing the active span at propagation.inject, the wire undici:request:headers payload, and the value the webapp receives at the route level) plus a direct curl bypassing each hop until we isolated kgateway as the rewriter. Details on the investigation are in #2821 — several self-hosted users report the same symptom.

This PR splits the two concerns without sacrificing external auth/callbacks/UI flows that rely on the public API_ORIGIN:

  • Set RUNTIME_API_ORIGIN=http://<service>.<namespace>:<port> (k8s) or http://webapp:3000 (docker) to keep runner->webapp traffic on a cluster-internal hop that bypasses the gateway.
  • Leave API_ORIGIN on the public URL so the dashboard, magic-link emails, waitpoint callbacks, and API apiUrl responses keep working for external clients.

The new env is optional and falls back to API_ORIGIN/APP_ORIGIN, so existing deployments are unaffected.

Changes

  • apps/webapp/app/env.server.ts: new optional RUNTIME_API_ORIGIN env.
  • apps/webapp/app/v3/environmentVariables/environmentVariablesRepository.server.ts: prefer RUNTIME_API_ORIGIN when resolving TRIGGER_API_URL/TRIGGER_STREAM_URL for both dev and prod runner pods, falling back to the existing chain.
  • hosting/k8s/helm/values.yaml + templates/webapp.yaml: expose webapp.runtimeApiOrigin (defaults to empty -> existing behavior).
  • hosting/docker/webapp/docker-compose.yml + hosting/docker/.env.example: same opt-in for docker self-host.

Testing

Validated on our self-hosted v4 staging cluster (kgateway + Envoy tracing).

Before (runners going through public URL, gateway rewrites traceparent):

  • Parent runner SDK triggerAndWait() wrapper spanId: 070bcfdd63b42d2a (confirmed via wire-level undici:request:headers debug)
  • Webapp receives traceparent with spanId: b8298ebb884ade7e (gateway-rewritten)
  • Child TaskRun.parentSpanId = b8298ebb... (orphan: never in event store)

After (with apiOrigin pointed at the in-cluster service):

  • Parent SDK injects spanId 66fa71fda94ccdb9
  • Webapp receives spanId 66fa71fda94ccdb9 unchanged
  • Child TaskRun.parentSpanId = 66fa71fda94ccdb9 matches the parent's triggerAndWait() event in TaskEvent table
  • Run-detail tree renders the child nested under the parent, matching SaaS behavior

End-to-end test task at https://github.com/meistrari/trigger-self-tests/blob/main/src/tasks/link-test.ts (parent does linkTestChild.triggerAndWait(...)).


Changelog

  • webapp: add optional RUNTIME_API_ORIGIN env to advertise a runner-only API origin separate from API_ORIGIN. Lets self-hosted operators route runner-to-webapp traffic cluster-internally, bypassing tracing-enabled gateways that rewrite the W3C traceparent header on egress and break parent-to-child run linkage in the trace tree. Optional and backward-compatible (falls back to existing API_ORIGIN/APP_ORIGIN).
  • hosting/k8s/helm + hosting/docker: expose the new env via webapp.runtimeApiOrigin (Helm) and RUNTIME_API_ORIGIN (docker-compose).

Screenshots

N/A — no UI change. The visible effect is the run-detail tree correctly showing child task runs nested under their parent's triggerAndWait() span (matching trigger.dev SaaS behavior).

💯

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 21, 2026

⚠️ No Changeset found

Latest commit: da151f6

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR adds a new optional RUNTIME_API_ORIGIN environment variable to support self-hosted deployments where traffic routes through tracing-aware gateways. The variable is defined in the environment schema with normalization for empty strings, documented in Docker Compose configuration, and wired into Helm values and the webapp container template via conditional rendering. The value allows operators to specify an in-cluster origin that takes precedence in runner URL resolution.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~4 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly describes the primary change: adding RUNTIME_API_ORIGIN to separate runner traffic from the external origin.
Description check ✅ Passed Description comprehensively covers the problem, solution, testing results, and changes. All required template sections are addressed with detailed explanations.
Linked Issues check ✅ Passed Changes fully address issue #2821 by implementing RUNTIME_API_ORIGIN to allow runners to bypass tracing-enabled gateways and preserve W3C traceparent span linkage.
Out of Scope Changes check ✅ Passed All changes are scoped to implementing RUNTIME_API_ORIGIN configuration across webapp, Helm, and Docker deployment artifacts per the objective.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as resolved.

@ThullyoCunha ThullyoCunha marked this pull request as ready for review May 21, 2026 13:46
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

🐛 1 issue in files not directly in the diff

🐛 RUNTIME_API_ORIGIN not passed through in docker-compose.yml, making the feature inoperative for Docker self-hosting (hosting/docker/webapp/docker-compose.yml:45)

The env.server.ts comment at line 136-137 explicitly references ${RUNTIME_API_ORIGIN:-} passthroughs in docker-compose.yml, and the .env.example documents RUNTIME_API_ORIGIN as a user-configurable option (hosting/docker/.env.example:57). However, hosting/docker/webapp/docker-compose.yml does not include RUNTIME_API_ORIGIN in its environment: section (lines 42-85). Docker Compose only passes env vars to containers that are explicitly listed in the environment: block — values from .env are used for variable substitution in the compose file, not automatically forwarded as container env vars. As a result, Docker self-hosting users who set RUNTIME_API_ORIGIN in their .env file will find it has no effect; the webapp container never receives the value, and runners will continue using API_ORIGIN/APP_ORIGIN. The Helm chart (hosting/k8s/helm/templates/webapp.yaml:189-192) correctly handles this, but the Docker path is broken.

View 4 additional findings in Devin Review.

Open in Devin Review

@ThullyoCunha ThullyoCunha force-pushed the feat/runtime-api-origin branch from ebca943 to eb32a7f Compare May 21, 2026 15:54
devin-ai-integration[bot]

This comment was marked as resolved.

…external origin

The webapp publishes `API_ORIGIN` to runner pods as `TRIGGER_API_URL`, so
runner-to-webapp traffic flows back through whatever URL is configured for
external clients. Self-hosting behind a tracing-enabled gateway (Envoy,
Istio, kgateway, ...) breaks the parent->child run link in trigger.dev's
run-detail tree because the gateway's W3C `traceparent` rewrite on egress
overwrites the SDK's `triggerAndWait()` span id. The webapp then writes
that gateway-generated span id as the child run's `parentSpanId`, which
never reaches the trigger event store, so the child renders as an orphan
in the UI.

Operators can split the two concerns without sacrificing external auth/
callbacks/UI flows that rely on the public `API_ORIGIN`:

- Set `RUNTIME_API_ORIGIN=http://<service>.<namespace>:<port>` (k8s) or
  `http://webapp:3000` (docker) to keep runner->webapp traffic on a
  cluster-internal hop that bypasses the gateway.
- Leave `API_ORIGIN` on the public URL so the dashboard, magic-link
  emails, waitpoint callbacks, and API `apiUrl` responses keep working
  for external clients.

Scope is intentionally limited to MANAGED (deployed) runs. Dev CLI runs
keep the original `API_ORIGIN`/`APP_ORIGIN` chain so a developer running
`trigger.dev dev` from outside the cluster does not lose connectivity.
`STREAM_ORIGIN` is still honored as a dedicated stream endpoint when set;
`RUNTIME_API_ORIGIN` takes precedence over it for `TRIGGER_STREAM_URL`
so the bypass keeps streams on the same internal hop by default.

The new env is optional and falls back to `API_ORIGIN`/`APP_ORIGIN`, so
existing deployments are unaffected. An empty string is normalized to
`undefined` in the zod schema so blank `${RUNTIME_API_ORIGIN:-}`
passthroughs from caller environments do not short-circuit the fallback
chain. Helm chart and Docker Compose are wired to forward the value to
the webapp container.

Refs: triggerdotdev#2821
@ThullyoCunha ThullyoCunha force-pushed the feat/runtime-api-origin branch from eb32a7f to da151f6 Compare May 21, 2026 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: OpenTelemetry Spans Not Displaying in Self-Hosted Instance Timeline

1 participant