Skip to content

[Experiment] code-review: BCQuality integration arm (live skills)#715

Draft
gggdttt wants to merge 18 commits into
mainfrom
experiment/code-review/bcquality-integration
Draft

[Experiment] code-review: BCQuality integration arm (live skills)#715
gggdttt wants to merge 18 commits into
mainfrom
experiment/code-review/bcquality-integration

Conversation

@gggdttt

@gggdttt gggdttt commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Experiment Description

Enable the live BCQuality integration arm for the code-review category — the "after" side of the BCApps #8700 change. Instead of static in-repo checklists, the agent consumes microsoft/BCQuality at runtime: clone (pinned SHA) → filter → route through skills/entry.md, then emit the BC-Bench review.json schema.

This is the counterpart to the inline-knowledge (pre-#8700) arm (experiment/code-review/inline-knowledge). Together they let us compare: vanilla < inline knowledge < live BCQuality.

Configuration Changes

  • Custom instructions (instructions.enabled: true)
  • Skills (skills.enabled: true)
  • Custom agents
  • MCP servers
  • Other: bcquality.enabled: true — code-review-only switch. The filtered BCQuality clone becomes the Copilot CWD (knowledge read before the diff); the repo under review is granted via --add-dir; static instruction injection is skipped. No effect on bug-fix / test-generation.

Key pieces:

  • config.yaml: bcquality: section (repo + pinned ref SHA, enabled-layers, disabled-skills, knowledge allow/deny globs, task-context dimensions). enabled: true on this branch.
  • agent/shared/codereview_bcquality.py: clone_bcquality (pinned SHA, shallow), filter_clone (mirrors Invoke-BCQualityFilter.ps1, writes _filter-report.json), task-context writer, bootstrap prompt routing through skills/entry.md.
  • copilot/agent.py: live branch wiring (clone as CWD, --add-dir, hooks into the clone).
  • types.py: ExperimentConfiguration.bcquality flag → routes results to the Experiment Leaderboard.

Agent & Model

  • Agent: GitHub Copilot CLI
  • Model: (default)
  • Category: code-review

Hypothesis / Expected Outcome

Consuming BCQuality's live knowledge base should match or exceed the pre-#8700 inline checklists on finding quality (precision/recall/F1 vs gold), since it carries the same domain knowledge plus ongoing BCQuality updates and explicit knowledge-backed routing. Expected ordering: vanilla < inline knowledge < live BCQuality.

Notes

wenjiefan and others added 18 commits June 25, 2026 14:36
Adds a bcquality config section (default disabled) and a Python module that clones BCQuality at a pinned SHA, filters it per enabled-layers/knowledge globs, builds task-context, and a skills/entry.md bootstrap prompt -- replicating how microsoft/BCApps consumes microsoft/BCQuality today. Not yet wired into the agent; no effect on existing categories.
- ExperimentConfiguration: add bcquality flag
- copilot agent: live BCQuality branch (clone CWD, --add-dir repo, skip static injection)
- add 23 unit tests for codereview_bcquality module
…line arm

- Extract the 6 faithful domain checklists (accessibility/performance/privacy/
  security/style/upgrade) verbatim from BCApps 30e2b18ca3^ (the version BCApps
  shipped before adopting BCQuality), NOT the benchmark-tuned experiment snapshot
- AGENTS.md: add review section routing /review through the 6 domain checklists
- Enables a faithful before/after comparison: vanilla < old inline < live BCQuality
- Inert by default (instructions.enabled=false); arm activated via config toggle
…nistic severity mapping, relocate bcquality module to agent/shared)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant