Background
The group has discussed whether models reliably discover and follow skills delivered via MCP compared to file-based approaches (e.g., agent pointed to a skills folder). While general model reliability is the responsibility of model providers and client harnesses, this group should verify that MCP-based skill delivery works at least as well as file-based delivery — and identify any gaps.
Cross-provider model-level reliability testing is already underway externally in skilljack-evals.
Scope
Compare skill delivery mechanisms to ensure MCP-based delivery is competitive with file-based approaches:
- File-based (baseline): Agent pointed to a folder containing SKILL.md files — the default approach for most clients today
- MCP-based (under test): Skills delivered as MCP resources, with variations:
- Resource-only (skill available as a resource for the agent to read)
- Resource + server instructions (server instruction pointing agent to the skill resource)
- Tool-description-embedded (skilljack pattern — skill content in tool descriptions)
Key questions
- Does MCP-based delivery result in comparable skill discovery and adherence vs file-based?
- Which MCP delivery pattern (resources, server instructions, tool descriptions) is most reliable?
- Are there gaps in MCP that make skill delivery less effective than file-based?
Acceptance Criteria
References
Background
The group has discussed whether models reliably discover and follow skills delivered via MCP compared to file-based approaches (e.g., agent pointed to a skills folder). While general model reliability is the responsibility of model providers and client harnesses, this group should verify that MCP-based skill delivery works at least as well as file-based delivery — and identify any gaps.
Cross-provider model-level reliability testing is already underway externally in skilljack-evals.
Scope
Compare skill delivery mechanisms to ensure MCP-based delivery is competitive with file-based approaches:
Key questions
Acceptance Criteria
References