[Experiment] code-review: inline-knowledge arm (pre-#8700 al-code-review skill)#714
Draft
gggdttt wants to merge 1 commit into
Draft
[Experiment] code-review: inline-knowledge arm (pre-#8700 al-code-review skill)#714gggdttt wants to merge 1 commit into
gggdttt wants to merge 1 commit into
Conversation
Replicate how BCApps prod ran PR review before microsoft/BCApps#8700: the al-code-review orchestrating skill dispatches the 6 domain checklists (security/performance/style/accessibility/upgrade/privacy) already on main. - Add skills/al-code-review/SKILL.md (super-skill referencing ../../instructions/<domain>.md) - config.yaml: code-review-template invokes the al-code-review skill (full-domain); instructions.enabled=true so the whole microsoft-BCApps folder (skill + 6 instructions) is copied into .github/ - Remove test-generation confounders (agents/ALTest.agent.md, skills/al-test-generation) to keep the experiment variable clean - review.json output schema unchanged (current evaluator contract preserved)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Experiment Description
Replicate how BCApps prod ran Copilot PR review before microsoft/BCApps#8700 — the "inline knowledge" arm — for the
code-reviewcategory.Before #8700, the reviewer lived in-repo under
tools/Code Review/: anal-code-revieworchestrating super-skill that dispatched the 6 domain checklists (security / performance / style / accessibility / upgrade / privacy). #8700 later replaced this with a runtime clone+filter ofmicrosoft/BCQuality("live skills"). This branch reconstructs the pre-#8700 mechanism faithfully so we can measure it as a treatment arm.The 6 domain checklists already landed on
mainvia #707. This PR adds the orchestrating skill and wires the config.Configuration Changes
instructions.enabled: true) — superset; copies the wholemicrosoft-BCApps/folder (theal-code-reviewskill + the 6instructions/*.mdchecklists) into<repo>/.github/skills.enabled: true) — not needed (covered byinstructions.enabled)agents.enabled: true, name: ___)instructions/microsoft-BCApps/skills/al-code-review/SKILL.md(super-skill; references../../instructions/<domain>.md)config.yamlcode-review-templatenow invokes theal-code-reviewskill (full-domain, no domain arg) instead of/review. Thereview.jsonoutput schema is unchanged (current evaluator contract preserved)agents/ALTest.agent.md,skills/al-test-generation/) to keep the experiment variable cleanAgent & Model
Hypothesis / Expected Outcome
Injecting the pre-#8700 inline review knowledge (al-code-review skill + 6 domain checklists) should improve code-review quality (precision/recall/F1 of findings against gold) over the vanilla
/reviewbaseline, since the agent reviews against explicit domain rules instead of generic judgment. Expected ordering: vanilla < inline knowledge (this arm) < live BCQuality.Notes
codereview.jsonlentries targetmicrosoft/BCApps, so only themicrosoft-BCApps/instruction tree matters here.al-code-reviewskill content is taken verbatim from the prior proven BC-Bench inline-knowledge run (commit6c2437b).