Skip to content

refactor(telemetry): migrate application spans to plugin hooks#1289

Open
ajbozarth wants to merge 1 commit into
generative-computing:mainfrom
ajbozarth:feat/1048-session-span-plugin
Open

refactor(telemetry): migrate application spans to plugin hooks#1289
ajbozarth wants to merge 1 commit into
generative-computing:mainfrom
ajbozarth:feat/1048-session-span-plugin

Conversation

@ajbozarth

@ajbozarth ajbozarth commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Pull Request

Issue

Fixes #1048. Phase 2 of #444.

Description

Action span emission moves from inline trace_application calls in stdlib/functional.py to a new ComponentTracingPlugin subscribed to component_* hooks. Session and start_session spans stay in stdlib but route through clean typed helpers in tracing.py instead of the deprecated trace_application surface. Hooks fit spans whose lifecycle lives inside one async function (one Task); session lifecycle crosses multiple _run_async_in_thread calls and OTel Token attach/detach is task-affine, so direct emission is the right shape there.

Reviewer call-outs:

  1. _run_async_in_thread now propagates the calling thread's contextvars into the new Task. One-way, no copy-back. Without it, asyncio.run_coroutine_threadsafe starts the new Task with empty contextvars, so contextvar-backed state on the user thread (OTel active span, Mellea's session_id / request_id / model_id / sampling_iteration, log context) is invisible inside Mellea's async work — including the session > action > chat hierarchy that the docs have promised since feat: instrument telemetry #355. With this PR, that hierarchy actually forms in real usage for the first time. ~6 lines in event_loop_helper.py; aligns Mellea with how asyncio.create_task already inherits Context within a thread.

  2. Span renames. session_contextsession; aactaction. session_context's suffix was redundant, and aact named a Mellea internal function rather than the operation. Trace-tooling consumers filtering on the old names need to update.

  3. Scope-narrowing change vs. the original issue. Issue feat: session and act spans via plugin hooks #1048 asked for session and act spans via plugin hooks. Action lands as planned (ComponentTracingPlugin); session does not. Session lifecycle crosses multiple _run_async_in_thread calls (sync __enter__, sync m.act(...), sync __exit__/cleanup()), each scheduled as a separate Task with its own contextvars.Context. OTel Token.reset is bound to the originating Context, so a span attached in one hook's Task can't be detached in another. Session spans now emit directly from MelleaSession.__enter__/__exit__ and start_session() via typed helpers in tracing.py. The SESSION_* hooks still fire as observation points for non-tracing plugins.

  4. Component payload component_id. Adds a UUID correlation field on ComponentPreExecutePayload / ComponentPostSuccessPayload / ComponentPostErrorPayload, matching the existing generation_id pattern on generation payloads. Plugin authors subscribing to component_* hooks gain a correlation key.

  5. Bug fix from refactor(telemetry)!: move backend tracing onto plugin/hook pattern #1181 review. mellea.response is now gated on MELLEA_TRACES_CONTENT / OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT. Pre-this-PR it was captured unconditionally; this is the first real caller of is_content_tracing_enabled() (per @planetf1's refactor(telemetry)!: move backend tracing onto plugin/hook pattern #1181 review).

  6. Streaming out of scope. mellea/stdlib/streaming.py was added in PR feat(stdlib): add streaming event types, events() iterator, and OTEL bridge (#902) #1095 mid-epic, after the original Phase 2 plan was set, and it imports the deprecated trace_application surface. Migrating it is tracked as a follow-up Phase 2 sub-issue feat(telemetry): migrate stream_with_chunking orchestration span to plugin hooks #1290. The four deprecated public helpers (trace_application, set_span_attribute, set_span_error, set_span_status_error) stay in tracing.py until streaming migrates off them.

  7. Docs touched only where existing claims went stale. Span-hierarchy diagrams updated for the renames; one stale Phase-1 entry corrected (mellea.backend is a string identifier, not a class name). Broader docs cleanup is tracked in docs: add Telemtry example for getting full traces #945.

  8. Bug fix: mellea.sampling_success now reflects the actual flag. Pre-this-PR the attribute was bool(sampling_result.result) — i.e. truthiness of the chosen ModelOutputThunk, which is True whenever sampling produced any non-None output even if no requirement passed. Post-this-PR it reads sampling_result.success, the flag the field name has always promised. Traces where sampling exhausted retries without satisfying requirements will now correctly show mellea.sampling_success=false.

BREAKING CHANGE: Span names renamed:

  • session_contextsession
  • aactaction
    Trace-tooling consumers filtering on the old names need to update.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code was added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

Adding a new component, requirement, sampling strategy, or tool?

If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.

  • Component
  • Requirement
  • Sampling Strategy
  • Tool

NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.

Action span emission moves to ComponentTracingPlugin on component_*
hooks. Session and start_session spans emit directly from stdlib via
typed helpers in tracing.py — session lifecycle crosses _run_async_in_thread
Tasks, which OTel Token pairing can't span across hooks.

Side fix: _run_async_in_thread now propagates calling-thread contextvars
into the new Task (approved with @jakelorocco), so children inside async
work nest under user-thread spans. Delivers the docs-promised
session > action > chat trace tree for the first time.

Closes generative-computing#1048.

Assisted-by: Claude Code
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: session and act spans via plugin hooks

1 participant