Skip to content

Compare skill delivery mechanisms: file-based vs MCP-based #37

@olaservo

Description

@olaservo

Background

The group has discussed whether models reliably discover and follow skills delivered via MCP compared to file-based approaches (e.g., agent pointed to a skills folder). While general model reliability is the responsibility of model providers and client harnesses, this group should verify that MCP-based skill delivery works at least as well as file-based delivery — and identify any gaps.

Cross-provider model-level reliability testing is already underway externally in skilljack-evals.

Scope

Compare skill delivery mechanisms to ensure MCP-based delivery is competitive with file-based approaches:

  1. File-based (baseline): Agent pointed to a folder containing SKILL.md files — the default approach for most clients today
  2. MCP-based (under test): Skills delivered as MCP resources, with variations:
    • Resource-only (skill available as a resource for the agent to read)
    • Resource + server instructions (server instruction pointing agent to the skill resource)
    • Tool-description-embedded (skilljack pattern — skill content in tool descriptions)

Key questions

  • Does MCP-based delivery result in comparable skill discovery and adherence vs file-based?
  • Which MCP delivery pattern (resources, server instructions, tool descriptions) is most reliable?
  • Are there gaps in MCP that make skill delivery less effective than file-based?

Acceptance Criteria

  • Documented test methodology comparing file-based vs MCP-based delivery
  • Results for at least 2 MCP delivery patterns vs file-based baseline
  • Identified gaps (if any) where MCP delivery is less reliable
  • Recommendations for how MCP should deliver skills to align with existing patterns
  • Findings added to experimental-findings.md

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    researchInvestigation or analysis neededskill-reliabilityRelates to model adherence or behavioral testing

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions