Skip to content

Harden Language Model Tool telemetry against PII leaks#1644

Merged
wenytang-ms merged 3 commits into
mainfrom
feat/lmt-telemetry-privacy-hardening
May 21, 2026
Merged

Harden Language Model Tool telemetry against PII leaks#1644
wenytang-ms merged 3 commits into
mainfrom
feat/lmt-telemetry-privacy-hardening

Conversation

@wenytang-ms

Copy link
Copy Markdown
Contributor

Centralise all LMT telemetry through src/lmToolTelemetry.ts so user-supplied strings (target, expression, sessionName, file paths, class names, JVM stack traces, etc.) can no longer reach the telemetry pipeline. The new module exposes a typed sanitizedSend choke point that only accepts enums, booleans, numbers and opaque session IDs.

Telemetry changes:

  • Drop sendError(error) on debug_java_application failure (stack trace leaked user class / method names).

  • Strip PII fields from every existing event: target, sessionName, currentFile, currentLine, simpleClassName, detectedClassName, error: String(error), input.reason.

  • Replace bare String(error) propagation with classifyError() -> ErrorCategory enum (mainClassMissing, classpathUnresolved, buildFailure, projectNotDetected, sessionAlreadyRunning, timeout, lsNotReady, noActiveSession, noSuspendedThread, noStackFrame, cancelled, other).

  • Add per-invoke recording for all 10 tools with outcome, errorCategory, durationMs, and a tool-specific enum (targetType / breakpointKind / stepKind / scopeType / evalContext / removeScope). The previous build only emitted telemetry on the launch tool and the session-info tool.

  • Add chatActivationSnapshot one-shot at registration time so we can measure adoption of the chat surfaces without per-turn cost (counts only).

  • evaluate_debug_expression: the expression text is NEVER logged. Only the evalContext enum and outcome are emitted.

Policy:

  • src/lmToolTelemetry.ts is now the only file in the LMT code path allowed to call sendInfo. The top-of-file policy comment is the single source of truth for what may be logged.

  • The recorder is typed against ToolInvocationRecord so excess raw strings are rejected at compile time.

Validated with: npm run tslint, npm run compile.

Copilot AI and others added 2 commits May 20, 2026 15:50
Centralise all LMT telemetry through src/lmToolTelemetry.ts so user-supplied strings (target, expression, sessionName, file paths, class names, JVM stack traces, etc.) can no longer reach the telemetry pipeline. The new module exposes a typed sanitizedSend choke point that only accepts enums, booleans, numbers and opaque session IDs.

Telemetry changes:

- Drop sendError(error) on debug_java_application failure (stack trace leaked user class / method names).

- Strip PII fields from every existing event: target, sessionName, currentFile, currentLine, simpleClassName, detectedClassName, error: String(error), input.reason.

- Replace bare String(error) propagation with classifyError() -> ErrorCategory enum (mainClassMissing, classpathUnresolved, buildFailure, projectNotDetected, sessionAlreadyRunning, timeout, lsNotReady, noActiveSession, noSuspendedThread, noStackFrame, cancelled, other).

- Add per-invoke recording for all 10 tools with outcome, errorCategory, durationMs, and a tool-specific enum (targetType / breakpointKind / stepKind / scopeType / evalContext / removeScope). The previous build only emitted telemetry on the launch tool and the session-info tool.

- Add chatActivationSnapshot one-shot at registration time so we can measure adoption of the chat surfaces without per-turn cost (counts only).

- evaluate_debug_expression: the expression text is NEVER logged. Only the evalContext enum and outcome are emitted.

Policy:

- src/lmToolTelemetry.ts is now the only file in the LMT code path allowed to call sendInfo. The top-of-file policy comment is the single source of truth for what may be logged.

- The recorder is typed against ToolInvocationRecord so excess raw strings are rejected at compile time.

Validated with: npm run tslint, npm run compile.
@wenytang-ms wenytang-ms marked this pull request as ready for review May 20, 2026 08:41
@wenytang-ms wenytang-ms requested a review from Copilot May 20, 2026 08:43

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Centralizes Language Model Tool (LMT) telemetry behind a new sanitized sender to reduce risk of PII leaking into telemetry, while expanding per-tool invocation metrics (outcome/error category/duration and tool-specific enums).

Changes:

  • Added src/lmToolTelemetry.ts with classification helpers and a recordToolInvocation “choke point” for telemetry.
  • Replaced ad-hoc sendInfo/sendError usage in LMT tools with sanitized recording and error classification.
  • Added a one-shot chat activation snapshot during extension activation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
src/lmToolTelemetry.ts Introduces telemetry policy, classifiers, and sanitized recording APIs.
src/languageModelTool.ts Routes tool telemetry through recordToolInvocation and removes raw string properties from events.
src/extension.ts Emits a one-shot “chat activation snapshot” telemetry event with counts only.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/lmToolTelemetry.ts
Comment thread src/lmToolTelemetry.ts Outdated
Comment thread src/lmToolTelemetry.ts
Comment thread src/languageModelTool.ts
Comment thread src/languageModelTool.ts
Comment thread src/languageModelTool.ts
Comment thread src/languageModelTool.ts
- classifyStep: unknown step operations now report 'unknown' instead of being silently mislabeled as 'over'. Also adds a runtime guard in debug_step_operation so an unknown operation no longer reaches commandMap[op]/executeCommand(undefined) or session.customRequest with an arbitrary string.

- recordToolInvocation: introduces a private normalizeToolInvocationRecord that keeps 'outcome' and 'errorCategory' in lock-step for the six shared terminal values (cancelled / timeout / lsNotReady / noActiveSession / noSuspendedThread / noStackFrame). Fixes the case where debug_java_application returns {success:false,message:'Operation cancelled by user'} but outcome was 'failure' while errorCategory was 'cancelled'.

- get_debug_stack_trace: empty-stack-frame early return now sets errorCategory='noStackFrame' alongside outcome (was only setting outcome).

- recordLaunchInternal: signature is now a discriminated union (LaunchInternalEvent) instead of (operationName: string, properties: Record<string, SafeValue>). Unknown event names and unexpected property keys are now rejected at compile time. Updated all 8 call sites.

- elapsedTime (string from .toFixed) split from elapsedMs (number) so the telemetry value is numeric and aggregable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@wenytang-ms wenytang-ms merged commit dcbb93f into main May 21, 2026
8 checks passed
@wenytang-ms wenytang-ms deleted the feat/lmt-telemetry-privacy-hardening branch May 21, 2026 01:52
wenytang-ms added a commit that referenced this pull request Jun 10, 2026
…+ diagnostic fields + InvocationGuard) (#1650)

* feat(telemetry): add diagnostic fields, GDPR annotations, and InvocationGuard infra

Telemetry contract changes to unblock per-tool quality analysis (driven by the 9-query deep-dive in 

1. GDPR annotation blocks (the root cause of PR #1644 fields being filtered)

   - Added /* __GDPR__ ... */ blocks above the three sanitizedSend call sites

     (recordToolInvocation, recordChatActivation, recordLaunchInternal) so the

     ddtelfiltered cluster keeps new Properties keys instead of stripping them.

2. Five new enums to replace stringly-typed fields:

   - OperatingSystem: win | mac | linux | other

   - ClassNameDetectionStrategy: mavenStandard | gradleStandard | vscodeSrc | workspaceRoot

   - ClassNameDetectionFailure: sourceDirMissing | fileNotFound | parseError | noPackageDeclaration

   - SentinelOutcome: recorded | silentReturn | cancelled | exception

3. ToolInvocationRecord extended with platform/version context:

   - os, javaMajorVersion, projectSystem, retryCount, previousOutcome

   These split the per-tool funnel by OS / Java major / project system so we can

   confirm the Linux 4.3 percent vs Windows 27.7 percent start-rate divergence.

4. LaunchInternalEvent union extended with four new sentinel variants and elapsedMs/thresholdMs:

   - debugSession.sentinel: emitted on EVERY invoke so even infinite hangs leave a trail

   - debugSession.silentReturn: invoke returned with no outcome event (the 32.8 percent gap)

   - debugSession.cancelled: user cancelled mid-flight

   - debugSession.exception: handler threw an exception, classified

   - classNameDetection.failed: replaces collapsed boolean with strategy + failureReason

   - debugSessionStarted/Timeout now carry elapsedMs + thresholdMs for histograms

5. InvocationGuard infrastructure (beginDebugSessionInvocation()):

   - Returns markOutcomeRecorded / markException / close methods

   - Emits sentinel at begin; close auto-emits silentReturn if no outcome recorded

   - Lets us measure the 32.8 percent silent-loss bucket with closed-loop attribution

6. SessionInvocationTracker (nextAttempt / completeAttempt):

   - Per-window, per-tool in-memory retry counter

   - Stores ONLY enum (previousOutcome) and integer (count) — no user data

   - Quantifies the LM auto-retry-pays-off pattern (17.7 percent at 1 -> 64.9 percent at 6-10)

Compiles clean (tsc --noEmit) and passes tslint.

* feat(lmtool): wire diagnostic context, retry attribution, and bump timeouts

Consumes the new telemetry contracts from the previous commit and threads them

through all 10 LMT invoke handlers. Also tunes the two session-wait thresholds

to cover the Started P95 latency observed in 30d telemetry.

Timeout tuning (driven by 30d Started latency: P50=12s / P90=67s / P95=100s):

  SESSION_WAIT_TIMEOUT:   45000 -> 120000   (eventBased path, waitForSession=true)

  SMART_POLLING_MAX_WAIT: 15000 -> 90000    (default polling path)

Previous values clipped ~10 percent of legitimate slow starts as timeouts and

biased the two strategies against each other. Aligning at 120s/90s also lets

us cleanly compare cross-strategy success rates.

Per-tool context fields (added to all 10 recordToolInvocation call sites):

  os / javaMajorVersion: from getOs() / getJavaMajorVersion()

    getJavaMajorVersion probes the redhat.java extension API (cached),

    falls back to JAVA_VERSION env var, then 'unknown'.

    classifyJavaMajorVersion handles legacy '1.8.0_xxx' -> '8' and modern '21.0.1' -> '21'.

  retryCount / previousOutcome: from nextAttempt() / completeAttempt()

    Per-window, per-tool counter. Used to test the LM auto-retry-pays-off pattern.

debug_java_application handler:

  Wraps the entire invoke in a beginDebugSessionInvocation() guard so even

  infinite hangs leave a sentinel + silentReturn trail. This is the closed-loop

  half of the 32.8 percent silent-loss bucket detected by D2 in the deep-dive.

classNameDetection rewrite:

  findFullyQualifiedClassName() now returns { className, failureReason, strategy }

  instead of string|null. The failure branch emits the new classNameDetection.failed

  event with structured strategy + failureReason so we can distinguish

  sourceDirMissing / fileNotFound / parseError / noPackageDeclaration — the

  previous boolean detected:false collapsed all four causes.

Timeout / Started events:

  debugSessionStarted.eventBased + debugSessionTimeout.eventBased +

  debugSessionTimeout.smartPolling now carry elapsedMs + thresholdMs so we can

  histogram the Started/Timeout latency distribution and verify the threshold

  bump moves the right mass.

Compiles clean (tsc --noEmit) and passes tslint.

* review: fix 4 reviewer-flagged issues

Address review feedback on PR #1650:

1. InvocationGuard was effectively disabled — guard.markOutcomeRecorded() was called unconditionally in finally, so guard.close() could never emit debugSession.silentReturn / cancelled / exception. Moved the mark calls inline next to the four session-terminal recordLaunchInternal sites (debugSessionStarted.eventBased, debugSessionTimeout.eventBased, debugSessionDetected, debugSessionTimeout.smartPolling). Threaded guard parameter into debugJavaApplication. The catch path now lets close() emit debugSession.exception via markException as designed.

2. targetInfo formatting at the call site stringified the new structured ClassNameDetectionResult as [object Object]. Renamed local to 'detection' and explicitly read detection.className. The if-truthy branch was also always taken, suppressing the no-package warning.

3. The 'found file but no package' branch returned { className: simpleClassName } which made callers treat it as a successful detection, hiding the failure event. Now returns { className: null, failureReason: 'noPackageDeclaration' } so classNameDetection.failed fires. Behavior is preserved because the call site already falls back to input.target on null.

4. ClassNameDetectionResult is now a discriminated union (success has no failureReason; failure requires it). Eliminates the 'unused on success' sentinel and lets TS narrow at use sites. Required strict-null narrowing (className !== null) at the two call sites.

tsc clean; tslint clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants