chore: instrument controlplane and cas with OpenTelemetry tracing#3081
Merged
javirln merged 9 commits intochainloop-dev:mainfrom May 4, 2026
Merged
chore: instrument controlplane and cas with OpenTelemetry tracing#3081javirln merged 9 commits intochainloop-dev:mainfrom
javirln merged 9 commits intochainloop-dev:mainfrom
Conversation
Add distributed tracing across both services using OpenTelemetry SDK v1.43.0. TracerProvider is wired via Wire and configured through proto-based config. - Add pkg/otelx helper with LayeredTracer for automatic chainloop.layer tagging - Add TracerProvider with OTLP gRPC exporter to both controlplane and CAS - Add otelgrpc stats handlers on both gRPC servers (server + client side) - Instrument SQL queries via XSAM/otelsql in the controlplane data layer - Add spans to all biz, data, middleware, and CAS service layer methods - Extend Observability proto with Tracing config in both services Signed-off-by: Javier Rodriguez <javier@chainloop.dev>
…hing - Replace raw t.Fatal/t.Errorf with testify assert/require in otelx tests - Fix staticcheck QF1008: remove redundant embedded field selector - Update integration test mock expectations from exact ctx to mock.Anything Signed-off-by: Javier Rodriguez <javier@chainloop.dev>
…tests Update dispatcher_test.go, integration_test.go, and organization_integration_test.go to use mock.Anything for context parameters in Register, Attach, and SaveCredentials mock expectations. Signed-off-by: Javier Rodriguez <javier@chainloop.dev>
…s disabled Change NewTracerProvider to return trace.TracerProvider interface with noop.NewTracerProvider() when disabled, eliminating nil checks in consumers. The data layer now always uses otelsql since the provider is always valid. Signed-off-by: Javier Rodriguez <javier@chainloop.dev>
Signed-off-by: Javier Rodriguez <javier@chainloop.dev>
There was a problem hiding this comment.
5 issues found across 98 files
Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed. cubic prioritises the most important files to review.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="app/controlplane/pkg/biz/casbackend_checker.go">
<violation number="1" location="app/controlplane/pkg/biz/casbackend_checker.go:67">
P2: The new span wraps the entire checker lifecycle instead of each backend-check run, creating a long-lived parent span and coupling sampling across all periodic executions.</violation>
</file>
<file name="app/controlplane/internal/server/otel.go">
<violation number="1" location="app/controlplane/internal/server/otel.go:78">
P1: `sampling_ratio=0` is handled as `AlwaysSample`, so a config intended to disable tracing instead samples every trace.</violation>
</file>
<file name="app/controlplane/internal/conf/controlplane/config/v1/conf.proto">
<violation number="1" location="app/controlplane/internal/conf/controlplane/config/v1/conf.proto:66">
P2: `sampling_ratio` should use explicit presence (`optional`) to distinguish unset from an explicit `0.0` value.</violation>
</file>
<file name="app/artifact-cas/internal/server/otel.go">
<violation number="1" location="app/artifact-cas/internal/server/otel.go:77">
P2: Out-of-range `sampling_ratio` values are silently mapped to `AlwaysSample`, which can unexpectedly enable 100% tracing under misconfiguration.</violation>
</file>
<file name="app/controlplane/pkg/data/data.go">
<violation number="1" location="app/controlplane/pkg/data/data.go:129">
P2: `min_open_conns` is being mapped to `SetMaxIdleConns`, which does not enforce a minimum open pool size and changes config semantics.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
jiparis
reviewed
Apr 30, 2026
Adopt the platform's LayeredTracer design: lazy tracer resolution via global provider, per-layer disable support via SetDisabledLayers, and TraceCarrier for async context propagation. The LayeredTracer no longer embeds trace.Tracer and instead resolves it at span creation time. Signed-off-by: Javier Rodriguez <javier@chainloop.dev>
There was a problem hiding this comment.
2 issues found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="pkg/otelx/otelx_test.go">
<violation number="1" location="pkg/otelx/otelx_test.go:67">
P2: Do not reset the global OpenTelemetry tracer provider to nil in test cleanup; it can cause panics in subsequent tracer creation.</violation>
</file>
<file name="pkg/otelx/otelx.go">
<violation number="1" location="pkg/otelx/otelx.go:102">
P1: Returning `trace.SpanFromContext(ctx)` for disabled layers can end the parent span when callers `defer span.End()`. Return an explicit noop span instead.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
- Fix sampling_ratio=0 being treated as AlwaysSample by using optional proto field and explicit 0/1 boundary handling with NeverSample - Fix disabled layer returning parent span from context (could end parent on defer); use pre-allocated noop span instead - Move CASBackendChecker span from Start (long-lived loop) to checkBackends (per-iteration) for correct span lifecycle - Fix test cleanup setting global TracerProvider to nil - Clarify min_open_conns mapping to MaxIdleConns in log message Signed-off-by: Javier Rodriguez <javier@chainloop.dev>
In client-streaming gRPC, Send can return EOF when the server has already terminated the stream with an error. Ignore Send errors in error-path subtests since the assertion happens on CloseAndRecv. Signed-off-by: Javier Rodriguez <javier@chainloop.dev>
Signed-off-by: Javier Rodriguez <javier@chainloop.dev>
migmartri
approved these changes
May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request adds OpenTelemetry tracing support to the artifact-cas service, making tracing configurable via the application config and wiring it into the application's startup. The configuration schema, protobuf definitions, and generated code are updated to support tracing options. The gRPC server is now instrumented for tracing, and the configuration file includes a section for observability tracing settings.
Observability and Tracing Integration:
observability.tracingsection to the config schema (conf.protoandconfig.devel.yaml), allowing tracing to be enabled/disabled, endpoint configuration, insecure mode, and sampling ratio. [1] [2]Bootstrap_Observability_Tracingmessage and related getters/setters inconf.pb.go). [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]Application Wiring and Startup:
wire.go,wire_gen.go, andmain.go) to pass the full bootstrap config and initialize aTracerProviderif tracing is enabled. The tracer provider is now passed to the app and properly cleaned up on shutdown. [1] [2] [3] [4] [5] [6]gRPC Server Instrumentation:
otelgrpcstats handler, enabling distributed tracing for all gRPC requests. [1] [2]Other:
wire.go.These changes make the service ready for distributed tracing with OpenTelemetry, improving observability and making it easier to diagnose issues in production environments.