Skip to content

[Experiment] code-review: inline-knowledge arm (pre-#8700 al-code-review skill)#714

Draft
gggdttt wants to merge 1 commit into
mainfrom
experiment/code-review/inline-knowledge
Draft

[Experiment] code-review: inline-knowledge arm (pre-#8700 al-code-review skill)#714
gggdttt wants to merge 1 commit into
mainfrom
experiment/code-review/inline-knowledge

Conversation

@gggdttt

@gggdttt gggdttt commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Experiment Description

Replicate how BCApps prod ran Copilot PR review before microsoft/BCApps#8700 — the "inline knowledge" arm — for the code-review category.

Before #8700, the reviewer lived in-repo under tools/Code Review/: an al-code-review orchestrating super-skill that dispatched the 6 domain checklists (security / performance / style / accessibility / upgrade / privacy). #8700 later replaced this with a runtime clone+filter of microsoft/BCQuality ("live skills"). This branch reconstructs the pre-#8700 mechanism faithfully so we can measure it as a treatment arm.

The 6 domain checklists already landed on main via #707. This PR adds the orchestrating skill and wires the config.

Configuration Changes

  • Custom instructions (instructions.enabled: true) — superset; copies the whole microsoft-BCApps/ folder (the al-code-review skill + the 6 instructions/*.md checklists) into <repo>/.github/
  • Skills (skills.enabled: true) — not needed (covered by instructions.enabled)
  • Custom agents (agents.enabled: true, name: ___)
  • MCP servers
  • Other:
    • Add instructions/microsoft-BCApps/skills/al-code-review/SKILL.md (super-skill; references ../../instructions/<domain>.md)
    • config.yaml code-review-template now invokes the al-code-review skill (full-domain, no domain arg) instead of /review. The review.json output schema is unchanged (current evaluator contract preserved)
    • Remove test-generation confounders (agents/ALTest.agent.md, skills/al-test-generation/) to keep the experiment variable clean

Agent & Model

  • Agent: GitHub Copilot CLI
  • Model: (default)
  • Category: code-review

Hypothesis / Expected Outcome

Injecting the pre-#8700 inline review knowledge (al-code-review skill + 6 domain checklists) should improve code-review quality (precision/recall/F1 of findings against gold) over the vanilla /review baseline, since the agent reviews against explicit domain rules instead of generic judgment. Expected ordering: vanilla < inline knowledge (this arm) < live BCQuality.

Notes

  • Draft only — not meant to merge; serves as the entry point describing exactly what is evaluated.
  • All 81 codereview.jsonl entries target microsoft/BCApps, so only the microsoft-BCApps/ instruction tree matters here.
  • The al-code-review skill content is taken verbatim from the prior proven BC-Bench inline-knowledge run (commit 6c2437b).

Replicate how BCApps prod ran PR review before microsoft/BCApps#8700: the al-code-review orchestrating skill dispatches the 6 domain checklists (security/performance/style/accessibility/upgrade/privacy) already on main.

- Add skills/al-code-review/SKILL.md (super-skill referencing ../../instructions/<domain>.md)

- config.yaml: code-review-template invokes the al-code-review skill (full-domain); instructions.enabled=true so the whole microsoft-BCApps folder (skill + 6 instructions) is copied into .github/

- Remove test-generation confounders (agents/ALTest.agent.md, skills/al-test-generation) to keep the experiment variable clean

- review.json output schema unchanged (current evaluator contract preserved)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant