feat(supervisor): publish client-side dequeue API latency as a Prometheus histogram by myftija · Pull Request #3887 · triggerdotdev/trigger.dev

myftija · 2026-06-10T11:56:27Z

The supervisor's dequeue round-trip time (POST /engine/v1/worker-actions/dequeue) was measured but only flowed into wide events and OTel span attributes — there was no Prometheus series, so latency percentiles and error rates weren't queryable. This adds queue_consumer_pool_dequeue_duration_seconds (histogram, label outcome=success|empty|error) to the existing consumer-pool metrics, scraped automatically by the existing ServiceMonitors on queue-raider/schedule-raider/supervisor.

Records every dequeue call, including failed ones, which previously emitted no timing at all
The pool's shared ConsumerPoolMetrics instance is injected into each consumer (mirrors the BackpressureMetrics → BackpressureMonitor wiring)
Buckets extend to 30s because wrapZodFetch retries internally (5 attempts, ≥7.5s backoff before a retryable error surfaces)
Existing dequeueResponseMs wide-event/span behavior unchanged

…heus histogram The dequeue round-trip time was only visible in wide events and span attributes, so there was no way to query latency percentiles or error rates. Record it as queue_consumer_pool_dequeue_duration_seconds with an outcome label (success/empty/error), covering failed and timed-out calls that previously emitted no timing at all. The pool's shared ConsumerPoolMetrics instance is injected into each consumer, mirroring how BackpressureMetrics is wired into BackpressureMonitor.

…review fixes The HTTP client retries internally (5 attempts, >=7.5s of backoff before a retryable error surfaces), so the 5s bucket ceiling would have pushed nearly every retried error into +Inf. Extend buckets to 30s and state in the help text that one observation spans the whole logical call including retries. Also: stop clobbering a caller-supplied consumer metrics instance, correct the catch-branch comment (defensive only - wrapZodFetch never throws), and cover the pool-to-consumer metrics injection with tests.

changeset-bot · 2026-06-10T11:56:32Z

🦋 Changeset detected

Latest commit: 6fe5dda

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 25 packages

Name	Type
@trigger.dev/core	Patch
@trigger.dev/build	Patch
trigger.dev	Patch
@trigger.dev/plugins	Patch
@trigger.dev/python	Patch
@trigger.dev/redis-worker	Patch
@trigger.dev/schema-to-json	Patch
@trigger.dev/sdk	Patch
@internal/cache	Patch
@internal/clickhouse	Patch
@internal/llm-model-catalog	Patch
@trigger.dev/rbac	Patch
@internal/redis	Patch
@internal/replication	Patch
@internal/run-engine	Patch
@internal/schedule-engine	Patch
@internal/testcontainers	Patch
@internal/tracing	Patch
@internal/tsql	Patch
@internal/zod-worker	Patch
@internal/sdk-compat-tests	Patch
@trigger.dev/react-hooks	Patch
@trigger.dev/rsc	Patch
@trigger.dev/database	Patch
@trigger.dev/otlp-importer	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

coderabbitai · 2026-06-10T11:56:45Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5741ce28-0b93-4d91-b855-f9d236c3039b

📥 Commits

Reviewing files that changed from the base of the PR and between 16b693c and 6fe5dda.

📒 Files selected for processing (1)

packages/core/src/v3/runEngineWorker/supervisor/consumerPoolMetrics.ts

🚧 Files skipped from review as they are similar to previous changes (1)

packages/core/src/v3/runEngineWorker/supervisor/consumerPoolMetrics.ts

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (40)

GitHub Check: internal / 🧪 Unit Tests: Internal (7, 12)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (10, 10)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 10)
GitHub Check: internal / 🧪 Unit Tests: Internal (2, 12)
GitHub Check: internal / 🧪 Unit Tests: Internal (3, 12)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 10)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 10)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 10)
GitHub Check: internal / 🧪 Unit Tests: Internal (8, 12)
GitHub Check: internal / 🧪 Unit Tests: Internal (11, 12)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 10)
GitHub Check: internal / 🧪 Unit Tests: Internal (5, 12)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 10)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 10)
GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
GitHub Check: internal / 🧪 Unit Tests: Internal (1, 12)
GitHub Check: internal / 🧪 Unit Tests: Internal (9, 12)
GitHub Check: internal / 🧪 Unit Tests: Internal (12, 12)
GitHub Check: sdk-compat / Cloudflare Workers
GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 10)
GitHub Check: internal / 🧪 Unit Tests: Internal (6, 12)
GitHub Check: internal / 🧪 Unit Tests: Internal (4, 12)
GitHub Check: internal / 🧪 Unit Tests: Internal (10, 12)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (9, 10)
GitHub Check: typecheck / typecheck
GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: packages / 🧪 Unit Tests: Packages (3, 3)
GitHub Check: sdk-compat / Deno Runtime
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
GitHub Check: sdk-compat / Bun Runtime
GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: packages / 🧪 Unit Tests: Packages (2, 3)
GitHub Check: packages / 🧪 Unit Tests: Packages (1, 3)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: Build and publish previews
GitHub Check: audit
GitHub Check: audit
GitHub Check: Analyze (javascript-typescript)

Walkthrough

This PR adds a Prometheus histogram for client-side dequeue round-trip latency (labelled by DequeueOutcome: "success" | "empty" | "error"), exposes observeDequeueLatency on ConsumerPoolMetrics, wires the pool’s shared metrics instance into created consumers (with a caller-metrics fallback), measures and records latency in RunQueueConsumer.dequeue() for success/empty/error paths, and adds tests verifying metrics wiring and correct outcome-labeled observations.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The description provides comprehensive context: what changed (Prometheus histogram for dequeue latency), why (enable queryable latency metrics), technical details (buckets, outcome labels, injection pattern), and impact (all dequeue calls now measured). However, it does not follow the provided template structure with explicit checklist items or testing/changelog sections.	Consider restructuring the description to match the repository template: add the checklist with confirmations, separate Testing and Changelog sections, and clarify test coverage for the new metrics.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding a Prometheus histogram metric to publish dequeue API latency in the supervisor, which matches the core objective of the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch supervisor-dequeue-latency-metric

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

devin-ai-integration

Devin Review found 0 potential issues.

View 3 additional findings in Devin Review.

devin-ai-integration

Devin Review found 0 new potential issues.

View 5 additional findings in Devin Review.

…g-poll boundary The server parks empty dequeues on a ~10s blocking pop, so nearly all observations land just above 10s. With only a 10s and a 30s bucket, histogram_quantile interpolated p95/p99 to ~28-30s while the true latency was ~10-11s. Add 11/12.5/15/20s buckets so quantiles read accurately where the distribution actually sits.

…austed error envelope

pkg-pr-new · 2026-06-10T14:03:06Z

Open in StackBlitz

@trigger.dev/build

npm i https://pkg.pr.new/@trigger.dev/build@6fe5dda

trigger.dev

npm i https://pkg.pr.new/trigger.dev@6fe5dda

@trigger.dev/core

npm i https://pkg.pr.new/@trigger.dev/core@6fe5dda

@trigger.dev/plugins

npm i https://pkg.pr.new/@trigger.dev/plugins@6fe5dda

@trigger.dev/python

npm i https://pkg.pr.new/@trigger.dev/python@6fe5dda

@trigger.dev/react-hooks

npm i https://pkg.pr.new/@trigger.dev/react-hooks@6fe5dda

@trigger.dev/redis-worker

npm i https://pkg.pr.new/@trigger.dev/redis-worker@6fe5dda

@trigger.dev/rsc

npm i https://pkg.pr.new/@trigger.dev/rsc@6fe5dda

@trigger.dev/schema-to-json

npm i https://pkg.pr.new/@trigger.dev/schema-to-json@6fe5dda

@trigger.dev/sdk

npm i https://pkg.pr.new/@trigger.dev/sdk@6fe5dda

commit: 6fe5dda

devin-ai-integration

Devin Review found 0 new potential issues.

View 6 additional findings in Devin Review.

## Summary 7 improvements, 1 bug fix. ## Improvements - `trigger init` now sets up your AI coding assistant as part of project setup: pick the MCP server, the agent skills, or both, then scaffold with the CLI or hand off to your assistant. Adds a new `getting-started` agent skill that teaches assistants how to bootstrap Trigger.dev (install the SDK, write `trigger.config.ts`, create a first task, run `trigger dev`), so the AI-driven setup path works end to end. It ships in the CLI alongside the existing skills, version-matched to your SDK. ([#3872](#3872)) - `dev` and `deploy` now fail with a clear error when two tasks are defined with the same id, including across different task types (e.g. a scheduled task and a regular task sharing an id). Previously the second definition silently overwrote the first, so one of the tasks would vanish with no warning. Task ids are detected as duplicates during indexing (naming each offending id and the files it was found in), and the same rule is enforced server-side when the background worker is registered. ([#3865](#3865)) - `trigger skills` installs Trigger.dev agent skills into your coding agent so it knows how to write tasks, schedules, realtime, and chat.agent code. The skills ship with the CLI and are copied into each tool's native skills directory (Claude Code, Cursor, GitHub Copilot, and Codex / AGENTS.md), and `trigger dev` offers to install them on first run. ([#3868](#3868)) - Reliability fixes for `chat.agent`. A user message sent while the agent is streaming is no longer delivered twice (which could run a duplicate turn), input appends now carry an idempotency key so a retried send can't duplicate a message, stopping a generation clears the streaming state so a page reload doesn't replay the stopped turn, and runs can now carry the full set of dashboard tags instead of being silently truncated. `onTurnComplete` now fires on errored turns (with the thrown error attached) and the failed turn's user message is persisted so it isn't lost on the next run. Custom agents and manual `chat.writeTurnComplete` callers now trim the output stream, sending a custom action no longer leaves a second stream reader running, and a long-lived `watch` subscription no longer grows its dedupe set without bound. ([#3891](#3891)) - Continuation chat boots no longer stall for around 10 seconds before the first turn. The `session.in` resume cursor is now found with a non-blocking records read instead of draining an SSE long-poll (which always waited out its full 5 second inactivity window, twice per boot), the boot reads run concurrently, and chat snapshots carry the cursor so subsequent boots skip the scan entirely. ([#3907](#3907)) - Record client-side dequeue API latency in the supervisor consumer pool as a Prometheus histogram (`queue_consumer_pool_dequeue_duration_seconds`, labelled by `outcome`: success/empty/error). ([#3887](#3887)) - Add `GetProjectEnvironmentsResponseBody` and `ProjectEnvironment` schemas for the new `GET /api/v1/projects/{projectRef}/environments` endpoint, which lists the parent environments (dev, staging, preview, prod) a personal access token can access for a project. Dev is scoped to the token owner and branch (preview child) environments are excluded. ([#3880](#3880)) ## Bug fixes - Fix two `chat.createSession()` bugs: stopping a generation no longer wedges the run (the turn loop raced a `totalUsage` promise that never settles after a stop-abort), and continuation runs now wait for the next message instead of invoking the model with an empty prompt. ([#3920](#3920)) <details> <summary>Raw changeset output</summary> ⚠️

⚠️

⚠️ `main` is currently in **pre mode** so this branch has prereleases rather than normal releases. If you want to exit prereleases, run `changeset pre exit` on `main`. ⚠️

⚠️

⚠️ # Releases ## @trigger.dev/build@4.5.0-rc.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.6` ## trigger.dev@4.5.0-rc.6 ### Patch Changes - `trigger init` now sets up your AI coding assistant as part of project setup: pick the MCP server, the agent skills, or both, then scaffold with the CLI or hand off to your assistant. Adds a new `getting-started` agent skill that teaches assistants how to bootstrap Trigger.dev (install the SDK, write `trigger.config.ts`, create a first task, run `trigger dev`), so the AI-driven setup path works end to end. It ships in the CLI alongside the existing skills, version-matched to your SDK. ([#3872](#3872)) - `dev` and `deploy` now fail with a clear error when two tasks are defined with the same id, including across different task types (e.g. a scheduled task and a regular task sharing an id). Previously the second definition silently overwrote the first, so one of the tasks would vanish with no warning. Task ids are detected as duplicates during indexing (naming each offending id and the files it was found in), and the same rule is enforced server-side when the background worker is registered. ([#3865](#3865)) - `trigger skills` installs Trigger.dev agent skills into your coding agent so it knows how to write tasks, schedules, realtime, and chat.agent code. The skills ship with the CLI and are copied into each tool's native skills directory (Claude Code, Cursor, GitHub Copilot, and Codex / AGENTS.md), and `trigger dev` offers to install them on first run. ([#3868](#3868)) ```bash trigger skills --target claude-code ``` Replaces the previous `install-rules` command, which stays as an alias. - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.6` - `@trigger.dev/build@4.5.0-rc.6` - `@trigger.dev/schema-to-json@4.5.0-rc.6` ## @trigger.dev/core@4.5.0-rc.6 ### Patch Changes - Reliability fixes for `chat.agent`. A user message sent while the agent is streaming is no longer delivered twice (which could run a duplicate turn), input appends now carry an idempotency key so a retried send can't duplicate a message, stopping a generation clears the streaming state so a page reload doesn't replay the stopped turn, and runs can now carry the full set of dashboard tags instead of being silently truncated. `onTurnComplete` now fires on errored turns (with the thrown error attached) and the failed turn's user message is persisted so it isn't lost on the next run. Custom agents and manual `chat.writeTurnComplete` callers now trim the output stream, sending a custom action no longer leaves a second stream reader running, and a long-lived `watch` subscription no longer grows its dedupe set without bound. ([#3891](#3891)) - Continuation chat boots no longer stall for around 10 seconds before the first turn. The `session.in` resume cursor is now found with a non-blocking records read instead of draining an SSE long-poll (which always waited out its full 5 second inactivity window, twice per boot), the boot reads run concurrently, and chat snapshots carry the cursor so subsequent boots skip the scan entirely. ([#3907](#3907)) - Record client-side dequeue API latency in the supervisor consumer pool as a Prometheus histogram (`queue_consumer_pool_dequeue_duration_seconds`, labelled by `outcome`: success/empty/error). ([#3887](#3887)) - `dev` and `deploy` now fail with a clear error when two tasks are defined with the same id, including across different task types (e.g. a scheduled task and a regular task sharing an id). Previously the second definition silently overwrote the first, so one of the tasks would vanish with no warning. Task ids are detected as duplicates during indexing (naming each offending id and the files it was found in), and the same rule is enforced server-side when the background worker is registered. ([#3865](#3865)) - Add `GetProjectEnvironmentsResponseBody` and `ProjectEnvironment` schemas for the new `GET /api/v1/projects/{projectRef}/environments` endpoint, which lists the parent environments (dev, staging, preview, prod) a personal access token can access for a project. Dev is scoped to the token owner and branch (preview child) environments are excluded. ([#3880](#3880)) ## @trigger.dev/python@4.5.0-rc.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/sdk@4.5.0-rc.6` - `@trigger.dev/core@4.5.0-rc.6` - `@trigger.dev/build@4.5.0-rc.6` ## @trigger.dev/react-hooks@4.5.0-rc.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.6` ## @trigger.dev/redis-worker@4.5.0-rc.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.6` ## @trigger.dev/rsc@4.5.0-rc.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.6` ## @trigger.dev/schema-to-json@4.5.0-rc.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.6` ## @trigger.dev/sdk@4.5.0-rc.6 ### Patch Changes - Reliability fixes for `chat.agent`. A user message sent while the agent is streaming is no longer delivered twice (which could run a duplicate turn), input appends now carry an idempotency key so a retried send can't duplicate a message, stopping a generation clears the streaming state so a page reload doesn't replay the stopped turn, and runs can now carry the full set of dashboard tags instead of being silently truncated. `onTurnComplete` now fires on errored turns (with the thrown error attached) and the failed turn's user message is persisted so it isn't lost on the next run. Custom agents and manual `chat.writeTurnComplete` callers now trim the output stream, sending a custom action no longer leaves a second stream reader running, and a long-lived `watch` subscription no longer grows its dedupe set without bound. ([#3891](#3891)) - Continuation chat boots no longer stall for around 10 seconds before the first turn. The `session.in` resume cursor is now found with a non-blocking records read instead of draining an SSE long-poll (which always waited out its full 5 second inactivity window, twice per boot), the boot reads run concurrently, and chat snapshots carry the cursor so subsequent boots skip the scan entirely. ([#3907](#3907)) - Fix `chat.headStart` when `hydrateMessages` is registered. The warm route's step-1 partial now reaches the agent's accumulator on the hydrate path, so `onTurnComplete` carries the full first turn (the head-start user message included), tool-call handovers resume from step 2 instead of re-running step 1, and the assistant `messageId` stays stable across the handover. ([#3907](#3907)) - Preserve reasoning parts across the `chat.headStart` handover. Extended-thinking models' step-1 reasoning now lands in the durable session history (and `onTurnComplete`) under the same assistant `messageId`, with provider metadata intact so Anthropic thinking signatures survive replays. ([#3907](#3907)) - Fix two `chat.createSession()` bugs: stopping a generation no longer wedges the run (the turn loop raced a `totalUsage` promise that never settles after a stop-abort), and continuation runs now wait for the next message instead of invoking the model with an empty prompt. ([#3920](#3920)) - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.6` ## @trigger.dev/plugins@4.5.0-rc.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.6` </details> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

myftija added 2 commits June 10, 2026 13:40

devin-ai-integration Bot reviewed Jun 10, 2026

View reviewed changes

chore: add changeset for dequeue latency histogram

e2e9ee0

devin-ai-integration Bot reviewed Jun 10, 2026

View reviewed changes

myftija added 2 commits June 10, 2026 15:57

feat(supervisor): 60s dequeue latency bucket to bracket the retry-exh…

6fe5dda

…austed error envelope

devin-ai-integration Bot reviewed Jun 10, 2026

View reviewed changes

nicktrn approved these changes Jun 10, 2026

View reviewed changes

myftija merged commit 081b6ba into main Jun 10, 2026
55 checks passed

myftija deleted the supervisor-dequeue-latency-metric branch June 10, 2026 14:35

github-actions Bot mentioned this pull request Jun 10, 2026

chore: release v4.5.0-rc.6 #3870

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(supervisor): publish client-side dequeue API latency as a Prometheus histogram#3887

feat(supervisor): publish client-side dequeue API latency as a Prometheus histogram#3887
myftija merged 5 commits into
mainfrom
supervisor-dequeue-latency-metric

myftija commented Jun 10, 2026

Uh oh!

changeset-bot Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

pkg-pr-new Bot commented Jun 10, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

myftija commented Jun 10, 2026

Uh oh!

changeset-bot Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

pkg-pr-new Bot commented Jun 10, 2026

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot Bot commented Jun 10, 2026 •

edited

Loading

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading