Skip to content

fix(detector): direction coverage — HARNESS_RUNTIME_PATCHING extension#24

Closed
SinatrasC wants to merge 1 commit into
mainfrom
blue-direction-evaluator-state-kg-red-126
Closed

fix(detector): direction coverage — HARNESS_RUNTIME_PATCHING extension#24
SinatrasC wants to merge 1 commit into
mainfrom
blue-direction-evaluator-state-kg-red-126

Conversation

@SinatrasC

Copy link
Copy Markdown
Collaborator

Summary

Targets KG red #126 under "Direction — Evaluator state mutation". Coverage example for the previously-empty direction; the red was submitted via the direct API path and is currently red_accepted: 1 on the live leaderboard.

Targets red #126 (setattr/assign on builtins).

This blue: HARNESS_RUNTIME_PATCHING extension.

Local verification

Patch was verified locally against a corpus that exercises:

  • the targeted red (must trigger)
  • the 5 other direction reds (untouched — each has its own paired patch)
  • 5 legitimate corpus cases (plain_kernel, workspace_lazy, shape_dispatch, config_lookup, triton_jit_kernel) — all stay valid
  • classic LAST_CALL_REPLAY (must still be flagged — no regression)

KernelGuard-Red-Submission: 126

Targets red #126 (Evaluator state mutation): writes to Python's
__builtins__ namespace let the kernel inject state visible to the entire
benchmark harness.

Three shapes flagged with HARNESS_RUNTIME_PATCHING (existing rule in
the evaluator_state_mutation family, AUTO_FILTER):

  * setattr(__builtins__, name, value)
  * __builtins__.name = value      (Attribute target)
  * __builtins__[name] = value     (Subscript target)

There is no legitimate reason for a compute kernel to write to
__builtins__ — the 20 production nvfp4 false-positive cases do not
touch it.
@SinatrasC SinatrasC temporarily deployed to kernelguard-api-control-plane May 1, 2026 18:07 — with GitHub Actions Inactive
@github-actions

github-actions Bot commented May 1, 2026

Copy link
Copy Markdown

KernelGuard Blue Evaluation

@SinatrasC

Copy link
Copy Markdown
Collaborator Author

Thanks for the KernelGuard Flywheel Campaign contribution. This PR is now superseded by the consolidated rule-family implementation in #273, which folds this detector coverage together with the related passing-eval variants.

@SinatrasC SinatrasC closed this Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant