chore(evals): Update model evaluations 2026-05-26 by rhacs-bot · Pull Request #135 · stackrox/stackrox-mcp

rhacs-bot · 2026-05-26T07:38:30Z

Automated weekly model evaluation update.

Models evaluated: gpt-5-mini
Date: 2026-05-26

This PR was automatically generated by the Model Evaluation workflow.

codecov-commenter · 2026-05-26T07:41:20Z

❌ 2 Tests Failed:

Tests completed	Failed	Passed	Skipped
380	2	378	12

View the full list of 2 ❄️ flaky test(s)

::policy 1
Flake rate in main: 100.00% (Passed 0 times, Failed 44 times)
Stack Traces | 0s run time
- test violation 1
- test violation 2
- test violation 3

::policy 4
Flake rate in main: 100.00% (Passed 0 times, Failed 44 times)
Stack Traces | 0s run time
- testing multiple alert violation messages 1
- testing multiple alert violation messages 2
- testing multiple alert violation messages 3

To view more test analytics, go to the Test Analytics Dashboard
_{📋 Got 3 mins? Take this short survey to help us improve Test Analytics.}

github-actions · 2026-05-26T07:49:19Z

E2E Test Results

Commit: fa76bca
Workflow Run: View Details
Artifacts: Download test results & logs

=== Evaluation Summary ===

  ✓ cve-clusters-general (assertions: 3/3)
  ✓ list-clusters (assertions: 3/3)
  ✓ cve-detected-workloads (assertions: 3/3)
  ✓ cve-cluster-does-exist (assertions: 3/3)
  ✓ cve-cluster-does-not-exist (assertions: 3/3)
  ✓ cve-detected-clusters (assertions: 3/3)
  ✓ cve-log4shell (assertions: 3/3)
  ~ rhsa-not-supported (assertions: 1/2)
      - MaxToolCalls: Too many tool calls: expected <= 4, got 7
  ✓ cve-multiple (assertions: 3/3)
  ✓ cve-nonexistent (assertions: 3/3)
  ✓ cve-cluster-list (assertions: 3/3)

Tasks:      11/11 passed (100.00%)
Assertions: 31/32 passed (96.88%)
Tokens:     ~67974 (estimate - excludes system prompt & cache)
MCP schemas: ~12562 (included in token total)
Agent used tokens:
  Input:  16455 tokens
  Output: 27327 tokens
Judge used tokens:
  Input:  60248 tokens
  Output: 47068 tokens

Update model evaluations 2026-05-26

fa76bca

rhacs-bot requested a review from janisz as a code owner May 26, 2026 07:38

janisz approved these changes May 26, 2026

View reviewed changes

janisz merged commit 81ce9af into main May 26, 2026
10 checks passed

janisz deleted the chore/update-model-evaluation-2026-05-26 branch May 26, 2026 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(evals): Update model evaluations 2026-05-26#135

chore(evals): Update model evaluations 2026-05-26#135
janisz merged 1 commit into
mainfrom
chore/update-model-evaluation-2026-05-26

rhacs-bot commented May 26, 2026

Uh oh!

codecov-commenter commented May 26, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rhacs-bot commented May 26, 2026

Uh oh!

codecov-commenter commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ 2 Tests Failed:

Uh oh!

github-actions Bot commented May 26, 2026

E2E Test Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented May 26, 2026 •

edited

Loading