Skip to content

fix(detector): direction coverage — FAKE_BENCHMARK_EMIT extension#26

Closed
SinatrasC wants to merge 1 commit into
mainfrom
blue-direction-fake-output-kg-red-128
Closed

fix(detector): direction coverage — FAKE_BENCHMARK_EMIT extension#26
SinatrasC wants to merge 1 commit into
mainfrom
blue-direction-fake-output-kg-red-128

Conversation

@SinatrasC

Copy link
Copy Markdown
Collaborator

Summary

Targets KG red #128 under "Direction — Fake benchmark output injection". Coverage example for the previously-empty direction; the red was submitted via the direct API path and is currently red_accepted: 1 on the live leaderboard.

Targets red #128 (print/stdout injection from custom_kernel).

This blue: FAKE_BENCHMARK_EMIT extension.

Local verification

Patch was verified locally against a corpus that exercises:

  • the targeted red (must trigger)
  • the 5 other direction reds (untouched — each has its own paired patch)
  • 5 legitimate corpus cases (plain_kernel, workspace_lazy, shape_dispatch, config_lookup, triton_jit_kernel) — all stay valid
  • classic LAST_CALL_REPLAY (must still be flagged — no regression)

KernelGuard-Red-Submission: 128

…rness-output spoofing

Targets red #128 (Fake benchmark output injection):

    def custom_kernel(data):
        print('BENCHMARK_PASSED', flush=True)
        return data.clone()

The existing FAKE_BENCHMARK_EMIT detector catches very specific shapes
(_emit_fake_benchmark function, builtins.print reassignment, the literal
'Ranked Benchmark' string). It does NOT catch a plain print() call from
inside the entrypoint — yet that's the simplest possible harness-output
spoof.

Add an AST walk inside detect_print_injection that, for each entrypoint-
named function in the tree, flags:

  * print(...)
  * sys.stdout.write(...) / sys.stderr.write(...)
  * os.write(fd=1, ...) / os.write(fd=2, ...)

Any of those inside custom_kernel earns FAKE_BENCHMARK_EMIT. Real
compute kernels do not emit to stdout/stderr at runtime — the 20
production nvfp4 false-positive cases do not have any such call.
@SinatrasC SinatrasC temporarily deployed to kernelguard-api-control-plane May 1, 2026 18:07 — with GitHub Actions Inactive
@github-actions

github-actions Bot commented May 1, 2026

Copy link
Copy Markdown

KernelGuard Blue Evaluation

@SinatrasC

Copy link
Copy Markdown
Collaborator Author

Thanks for the KernelGuard Flywheel Campaign contribution. This PR is now superseded by the consolidated rule-family implementation in #273, which folds this detector coverage together with the related passing-eval variants.

@SinatrasC SinatrasC closed this Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant