Skip to content

Skills feedback fixes#3

Merged
brando-dill merged 59 commits into
mainfrom
skills-feedback-fixes
Jun 9, 2026
Merged

Skills feedback fixes#3
brando-dill merged 59 commits into
mainfrom
skills-feedback-fixes

Conversation

@george-bafaloukas-forgerock

@george-bafaloukas-forgerock george-bafaloukas-forgerock commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Key Changes
1. Skill Definition Updates (v1.0.0)

Version Bumps: All six primary skills (ping-app-integration, ping-foundation, ping-identity-for-ai, ping-orchestration, ping-quickstart, ping-universal-services) have been bumped to version 1.0.0.

Prompt Routing Refinements: Skill descriptions (SKILL.md) were heavily rewritten to explicitly define boundaries and prevent model confusion. For example, the descriptions now clearly delineate that integrating an SDK goes to ping-app-integration, flow design goes to ping-orchestration, and policy/device configuration goes to ping-universal-services.

2. Eval & Benchmarking Additions

Cross-Model Benchmarking: The README.md now includes a detailed routing evaluation across three model tiers (Haiku 4.5, Sonnet 4.6, and Opus 4.7). It documents specific weak spots for each model (e.g., Opus struggles with the ping-quickstart front door, while Haiku struggles with the dense ping-universal-services description).

Feedback-Driven Prompts: Several new test prompts (labeled "Feedback use cases (June 2026)") were added to the YAML eval files. These test the agent's ability to correctly route requests regarding WS-Federation, Platform SSO, passkeys, authenticator app enrollment, transaction approvals, and MFA device management.

3. Comprehensive MFA Documentation Updates

New MFA Configuration Anchor: Added a brand-new, detailed file (mfa-configuration.md) under ping-universal-services. It covers PingOne MFA policy configuration, supported AMR codes, device management, pairing keys, and headless MFA endpoints.

Skill Disambiguation: Added strict rules across the documentation to clarify that capturing IDs or liveness checks belongs to Verify, MFA factor enrollment inside a flow belongs to Orchestration, and configuring MFA policies or managing devices outside of a flow belongs to Universal Services.

App Integration Enhancements: Added documentation in mobile-integration-basics.md regarding the PingOne MFA SDK, specifically covering push MFA, pairing key prerequisites, and custom AMR strings (face, pin, ftp).

4. Foundation & Quickstart Expansion

Admin & Tenant Setup: Expanded pingone-mt reference files with comprehensive details on the Administrator account registration flow (including email verification and invite expiry), environment URLs, and the isolated "Administrators environment."

Directory Management: Added a breakdown of user ingestion methods (Self-registration, Bulk import, SCIM, etc.) and user detail panel tabs.

Getting Started Sequence: Updated the getting-started-overview.md to reflect the official 8-step sequence for PingOne MT, including mandatory admin MFA enrollment and session timeout behaviors.

5. Automated Manifest Updates

All generated top-*.json reference files were updated with a new timestamp (2026-06-04T13:06:47Z) and adjusted to include the newly added documentation.

…mpts

ping-foundation/SKILL.md:
- description: 'Use this skill' -> 'Use this'; added WS-Federation and
  SSO/Platform SSO to trigger phrases
- When to use: added WS-Federation app registration; added SSO /
  Platform SSO / workforce SSO as explicit trigger
- eval prompts T-56/57/58: WS-Fed, Platform SSO, workforce SSO

ping-orchestration/SKILL.md:
- When to use: added passwordless authentication (passkeys/FIDO2/magic
  links), authenticator app login / TOTP enrollment, transaction
  approvals via email or push (CIBA / out-of-band step-up)
- eval prompts T-55/56/57/58: passwordless, TOTP enrollment, email
  transaction approval, push MFA approval

ping-universal-services/SKILL.md:
- When to use: added PingOne MFA (device management, policy config,
  enrollment API)
- Routing table: added MFA branch pointing to choosing-the-right-service.md
  for MFA vs flow-level routing split
- When NOT to use: clarified MFA node/connector wiring in flows belongs
  in ping-orchestration not here
- Added PingOne Recognize note (not yet GA)
- eval prompts T-52/53: PingOne MFA managed service, enrollment API

All SKILL.md files under 120 lines. Validator and mock L1 eval: all PASS.
Rewrote all 6 descriptions to: trigger instruction first, dense keyword
lists, no prose preamble, explicit NOT trigger guards for ambiguous
overlap cases.

ping-quickstart: explicit trigger phrases, catch-all rule, test/validate
  use case keywords, no router-style padding
ping-foundation: keyword-first list of all setup/admin triggers; added
  WS-Federation, SSO, Platform SSO, MFA policy; "Use this skill whenever"
  retained (assertive language for agent triggering)
ping-orchestration: added passwordless, authenticator app / TOTP, CIBA /
  transaction approvals; restored clarifying-question cue for platform
  comparison prompts
ping-universal-services: PingOne MFA added as named service; explicit NOT
  trigger for flow-level MFA node wiring
ping-app-integration: SDK/code trigger-first; explicit NOT-for guard
  listing Verify, Protect, Authorize at policy level
ping-identity-for-ai: AI/LLM trigger-first; explicit NOT trigger guard
  for automated process / scheduled job without AI context

Validator fixes (from parallel run):
- admin-roles-and-access.md: 2 UI navigation phrases replaced with
  field-table language
- directory-and-populations.md: 1 UI navigation phrase replaced

Content improvements (from parallel run):
- pingone-mt/tenant-and-environment-setup.md: added Environment URLs
  section + Administrators environment section; last_updated 2026-06-04
- getting-started-overview.md: revised PingOne MT setup sequence to
  match official 8-task onboarding; last_updated 2026-06-04

Live Layer 1 eval (Bedrock): 6/6 PASS
  ping-app-integration  100% / 100% / 100%
  ping-foundation        95% / 100% / 100%
  ping-identity-for-ai  100% / 100% / 100%
  ping-orchestration    100% / 100% / 100%
  ping-quickstart        92% / 100% / 100%
  ping-universal-services 100% / 100% / 100%
…ile SDK AMR

New anchor — ping-universal-services/references/curated/mfa-configuration.md:
- PING_ONE_MFA BOM prerequisite and license check
- Workforce vs Customer environment config split
- Device Authentication Policies (= 'MFA Policies' in UI) — field table
- 7 AMR method codes with descriptions (EMAIL, SMS, TEL, OTP, MCA, USER, SWK)
- USER vs MCA distinction (interactive vs silent push)
- Pairing keys — binding mechanism for push MFA; API endpoint
- /deviceAuthentications headless endpoint for non-authorize MFA flows
- Per-user device management, bypass MFA, MyAccount self-service
- PingID-specific config (Workforce only: Offline MFA, Windows login)
- Routing split table (config vs flow vs SDK)
- Common gotchas including BOM missing, pairing key missing, policy not assigned

choosing-the-right-service.md:
- Added PingOne MFA to intent-to-service mapping table
- Updated Verify vs MFA disambiguation rule to clarify three-way split:
  flow-level MFA → orchestration; service config/device mgmt → universal-services

mobile-integration-basics.md:
- Added PingOne MFA SDK section: push MFA, custom AMR strings (face/pin/ftp
  via approve()), pairing key prerequisite, cross-reference to mfa-configuration.md

ping-universal-services/SKILL.md:
- Added mfa-configuration.md to Step 2 routing table

index.json: added mfa-configuration.md path
Manifests: rebuilt (63 curated anchors, up from 62)

Description consistency: fixed ping-identity-for-ai and ping-quickstart
descriptions to match "Use this skill whenever the task involves [gerund]" pattern.
…dex fixes

- Cross-model Layer 1 eval ran against Claude Haiku 4.5, Sonnet 4.6,
  Opus 4.7 on Bedrock. Results, per-tier weak spots, and tuning
  targets documented in README § Eval status.
- Normalised SKILL.md version fields to "1.0.0" (was a mix of
  0.2.0 / 1.0 / 1.0.0).
- Added node-fundamentals.md to references/index.json — was referenced
  from ping-orchestration/SKILL.md but missing from the index.
- Added ## Invocation section to ping-foundation/SKILL.md for
  consistency with the other 5 skills; trimmed redundant blockquote
  + separator to stay within the 120-line budget.
Three targeted description tweaks based on per-model failure analysis:

- ping-universal-services: add explicit "service-in-flow rule" — when
  a Protect/Verify/IGA/Authorize node sits inside a DaVinci flow or
  AIC journey, configuring it belongs here, not in ping-orchestration.
  Targets Haiku 4.5's 69% trigger rate (5/5 misroutes were this).
- ping-orchestration: make the clarifying-question cue imperative
  ("you MUST ask one clarifying question before recommending"). Targets
  Sonnet 4.6's 67% ambiguous score on A-02 (AIC vs DaVinci?).
- ping-quickstart: add "BEFORE any more specialised skill" priority cue
  for orientation framing ("where do we start", "evaluating",
  "migrating"). Targets Opus 4.7's 85% trigger rate where it routes
  orientation prompts to specialised skills.

Mock eval still 100% across all 6 skills.
- Sonnet 4.6: 5/6 → 6/6 (perfect score)
- Opus 4.7: 4/6 → 5/6 (ping-quickstart fixed)
- Haiku 4.5: 5/6 → 4/6 (orchestration regression — tier limit, not
  a description bug; Haiku over-asks for clarification when platform
  is stated explicitly)

Documents pre/post movement, the trade-off keeping the imperative
clarifying-question cue, and known eval prompt weak spots.
Adds an OpenAI adapter mirroring the Claude one — same routing
system prompt, same JSON schema, drop-in via --adapter openai.
Wired into run_eval.py choices and requirements.txt.

Cross-vendor Layer 1 results documented in README:
- gpt-5.5: 2/6 pass
- gpt-5.4-mini: 0/6 pass
- gpt-5.4-nano: 1/6 pass

Headline finding: trigger discipline transfers to OpenAI (94-100%),
but ambiguous-prompt clarifying-question behaviour is structurally
different — GPT-5.x prefers to route confidently even when the
description says "you MUST ask one clarifying question". Routing
decisions are vendor-portable; clarifying-question cues need
vendor-specific tuning.

Note: GPT-5.x rejects max_tokens; the adapter uses
max_completion_tokens (=512 to leave headroom for JSON output).
Adds an at-a-glance comparison putting all 6 models side by side:
Haiku 4.5, Sonnet 4.6, Opus 4.7, gpt-5.4-nano, gpt-5.4-mini, gpt-5.5.

Plus an aggregate-metrics table ranking models by skills-passing
across both vendors. Sonnet 4.6 leads at 6/6 / 100%; Opus 4.7 and
Haiku 4.5 follow; GPT-5.x family clusters at 0-2/6 due to a
structural ambiguous-prompt gap (56% avg vs 98% Anthropic).

Per-vendor detail tables retained below the headline table.
Pass 1 — eval prompt fix (Opus T-09):
- ping-foundation T-09: rewrite from "MFA policies" to "sign-on policy"
  to remove genuine overlap with ping-universal-services.

Pass 2 — GPT Option A: format-constrained clarifying-question cue:
- Added to ping-foundation, ping-orchestration, ping-universal-services,
  ping-app-integration, ping-quickstart: "your reply MUST be a single
  clarifying question ending with '?'" — explicit format target that
  GPT-5.x respects vs the weaker imperative it ignored.

Pass 3 — GPT Option B: verb/noun NOT-trigger guards:
- ping-app-integration: verb/noun rule — design/build/configure/invoke
  does not trigger this skill; only integrate/embed/wire does.
- ping-universal-services: SDK/integrate/embed keywords → redirect to
  ping-app-integration regardless of service name in the prompt.
… fix

The three-pass description tuning in d30e07d caused regressions across
all Claude tiers (Sonnet dropped from 6/6 to 4/6) — stricter format
constraints ("reply MUST be a single clarifying question ending with
'?'") caused models to over-ask on well-specified prompts.

Root cause: the ambiguous-prompt gap in GPT-5.x is a vendor-behavioural
trait (GPT defaults to confident routing; Claude defaults to caution).
It cannot be closed by tuning shared descriptions without breaking
Claude. Requires a vendor-specific adapter-level instruction (Phase 4).

Changes kept from d30e07d:
- evals/prompts/ping-foundation.yaml T-09: "sign-on policy" rewrite
  (removes genuine overlap with ping-universal-services)
- evals/harness/adapters/openai.py: max_completion_tokens 512 → 1024
  (GPT-5.5 was truncating on longer system prompts)

SKILL.md files reverted to commit 5530876 (last-known-good, Sonnet 6/6).
README updated: gpt-5.5 ping-foundation corrected to ✅ after T-09 fix;
aggregate metrics updated; vendor-gap finding documented.
…tent

Removes: phase delivery table, internal commit references, tuning
post-mortems, backlog notes, and internal narrative about what failed
and why. Keeps: install instructions, skill table, how-it-works,
repo layout, cross-model eval results, and authoring guide.
Old prompt tested end-user authentication for an LLM-fronted portal,
which belongs in ping-foundation or ping-app-integration. The skill
covers securing agents with Identity for AI offerings — not general
end-user auth in apps that happen to contain an LLM.

New prompt: AI agent with scoped identity acting on behalf of
employees across multiple APIs — correct Identity for AI scenario
(least-privilege token, audit trail, agent-security-patterns).
…rification rule

Implements evals/scorecards/gpt-5x-improvement-plan.md Step 1.

Two adapter-level additions to openai.py system prompt only
(no SKILL.md changes, zero risk to Claude scores):

1. Routing tie-breaker: when a prompt contains both a service/product
   name and an SDK/integration signal, the integration verb wins.
   Fixes the noun-overrides-intent misroutes (N-04/N-05 pattern).

2. Clarification rule: prompts ≤10 words with no named platform, tech
   stack, or service must clarify before routing. Targets the 13
   ambiguous-prompt failures where GPT routed instead of asking.

Results (post-fix vs pre-fix):
  gpt-5.4-nano: 1/6 → 3/6  (+2, non-trigger 90%→97%, ambiguous 56%→83%)
  gpt-5.4-mini: 0/6 → 1/6  (+1, non-trigger 91%→97%, ambiguous 56%→67%)
  gpt-5.5:      3/6 → 3/6  (same count, different skills; non-trigger 95%→97%)

Sonnet 4.6 confirmed 6/6 — no regression.
Opus 4.7 improved: ping-app-integration now passes (ping-identity-for-ai
T-09 rewrite also contributed).
Problem: "Add a user to Ping" was being answered directly by models
routing to ping-foundation, assuming PingOne MT. Ping has separate
user populations in PingOne MT, AIC, PingFederate, and PingDirectory
— the platform must be established before answering.

Changes:
- ping-quickstart description: explicit trigger for bare user-management
  commands ("Add a user to Ping", "Create a user in Ping") with a
  description-level clarify action directive (body-level instructions
  are not visible at Layer 1 eval)
- ping-quickstart SKILL.md body: mandatory clarification note naming
  the four products with separate user populations
- ping-foundation description: prerequisite — bare user commands
  without a platform belong in ping-quickstart first
- evals/prompts/ping-foundation.yaml: A-03 "Add a user to Ping" moved
  from ambiguous to non-trigger (N-06, expected ping-quickstart);
  A-03 slot removed from ambiguous set
- evals/prompts/ping-quickstart.yaml: A-04 added for "Add a user to
  Ping" as an ambiguous prompt with platform-clarification keywords

Result: both Sonnet and Opus now correctly route + clarify on A-04.
---
name: ping-foundation
description: Platform setup, administration, and core configuration for PingOne MT, PingOne ST (AIC), and on-premises Ping software. Use this skill whenever a user asks ANY question about setting up environments, registering OIDC/SAML apps, managing directories and user populations, configuring authentication policies, branding, or administering PingFederate/PingAccess/PingDirectory/PingID — including advisory, planning, and "how should I..." questions, not just execution tasks. Also invoke with /ping-foundation.
description: Use this skill whenever the task involves setting up, configuring, or administering any Ping Identity platform — PingOne MT, PingOne ST (AIC), PingFederate, PingAccess, PingDirectory, or PingID. Triggers: create or manage environments, tenants, realms; register OIDC, SAML, WS-Federation, or OAuth 2.0 apps; configure SSO, Platform SSO, or workforce single sign-on; manage directories, LDAP, user populations, or schema; configure sign-on policies, authentication policies, or step-up MFA policy settings at the platform level; configure MFA methods or PingID in PingFederate; branding, custom domains, or notification templates; administer on-premises Ping software; advisory questions like "how should I structure my tenant" or "what grant type should I use". Prerequisite — a specific platform must be named or clearly implied; "add a user to Ping" or "create a user in Ping" without a named platform belongs in ping-quickstart first. Also invoke with /ping-foundation.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PingOne MT / PingOne ST read as internal shorthand rather than customer-facing product names, and that leaked into model output during testing. I’d suggest using customer-facing names as the primary labels here, e.g. PingOne (multi-tenant cloud) and PingOne Advanced Identity Cloud (AIC), and only keeping MT/ST secondarily/in parentheses if they still help internally.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — agreed, we'll update to customer-facing names as primary (PingOne and PingOne Advanced Identity Cloud / AIC), with MT/ST kept only as secondary shorthand where it genuinely helps with disambiguation. Will apply this across all six skill descriptions.

- Configure SSO, Platform SSO, or workforce single sign-on
- Manage directories, identity stores, or user populations
- Configure authentication policies, sign-on policies, or branding
- Administer PingFederate, PingAccess, PingDirectory, or PingID

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PingID feels too ambiguous as the only MFA-ish term in scope here. It can refer to legacy PingID admin portal tasks, Workforce environments using the PingID service, or specific methods like PingID mobile/desktop. As written, it makes the skill read as if MFA admin = PingID, and it hides the current PingOne / PingOne MFA framing, which varies by geography and environment type. We're moving away from referring to a PingID environment in favor of Workforce/ Customer.
I suggest explicitly mentioning PingOne / PingOne MFA in scope text, and using PingID service only where that is actually the right term for non-Singapore Workforce / PingID-service scenarios.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really helpful context, thank you. A couple of questions before we update the skill to make sure we get the terminology right:

  1. Is "PingID" being retired as a term entirely in customer-facing content, or is it staying in use for specific scenarios (e.g. the PingID service in Workforce environments outside Singapore)? Understanding whether this is a full sunset or a repositioning will help us decide how much to reference it in the skill.

  2. For environments that currently use the PingID service (non-Singapore Workforce), what's the right terminology to use in the skill description — "PingOne MFA" as the umbrella, with "PingID service" called out as the path for those environments? Or is there a preferred phrasing we should align with?

Once we have that clarity we can update the description to lead with PingOne / PingOne MFA and reference the PingID service only where it's the actual applicable path.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@george-bafaloukas-forgerock it's a moving landscape, and the current state has PingID used in three different contexts:

  • Legacy v1: PingID admin portal
  • hybrid v1 out of v2 (PIngID out of PingOne):
    All geographies except Singapore require PingID service for a Workforce use case.
    Singapore: This is the only native v2 environment. In this geography currently the admin chooses PingOne MFA service and then receives either a customer or workforce environment based on their environment license.

Other contexts: Authentication methods: PingID mobile app, PingID desktop app.
PingID device trust.

However the PM confirmed with me this afternoon that is there's a requirement to move away from the name PingID as a service and leave it only for the mobile/desktop apps.

So the answer to 1 going forward is actually yes. They are currently in discussion with the licensing team to change the name of the offering for Workforce use cases to PingOne MFA for WF

"PingOne MFA for WF" (hybrid) (for hybrid v1 out of v2) vs.
"PingOne MFA for WF"(for the Singapore /V2 native service).

He suggested that customers should understand that PingID is the legacy name of the service and we are moving away from it, but currently it is potentially confusing.

@@ -1,24 +1,27 @@
---
name: ping-foundation

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think perhaps this skill needs an explicit Singapore guardrail against legacy PingID assumptions. In testing, the model jumped too quickly to legacy PingID admin-portal / linking flows.
The docs make Singapore a distinct branch: admins select PingOne MFA there, Customer vs Workforce depends on license, and the legacy PingID policy-usage differences are not relevant because Singapore does not rely on the PingID service. By contrast, other geographies still use the PingID service for Workforce.

At present this distinction is only relevant for Singapore (as the only V2-native environment). However Canada will join this in Q3 and other geographies later in the year. 

I’d recommend adding an explicit guardrail so the model does not default to non-Singapore PingID-service assumptions (and legacy pingid admin portal steps) unless the geography / service model / migration state actually requires that path.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really valuable observation — this is exactly the kind of thing that bites us in practice. Before we add a guardrail we want to make sure we implement it correctly rather than guessing at the branching logic:

  1. From a user's perspective, what's the most natural signal that indicates Singapore (V2-native) vs. other geographies still on the PingID service? Is it something the admin would typically know and state (e.g. "we're in Singapore", or a specific admin console URL / environment indicator), or is it something we'd need to ask for explicitly?

  2. Is there a doc or internal reference we can point the skill to for the V2 vs. PingID-service distinction? Ideally something we can anchor the guardrail against so the model has accurate context rather than just a rule.

  3. You mentioned Canada joins Singapore in Q3 — should the guardrail be written in a geography-agnostic way ("ask if the environment is V2-native") rather than listing specific countries, so it doesn't need updating each time a new geography rolls over?

Happy to implement once we have those answers.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. They should know their PingOne environment geography. They might not know that other geographies work a different way. V2 is an internal term, so they wouldn't be aware of that.
    Environmental indicators might be best indicator - if the environment is apps.pingone.com, or apps.pingone.sg etc.
  2. I think this is the best overview of the differences between customer and workforce environments that highlights the difference between singapore (V2) and other geographies.

https://docs.pingidentity.com/pingone/strong_authentication_mfa/p1_pid_what_is_the_difference.md

It's a changing landscape, as currently for example, some integrations are not available in singapore/native v2 environments. In Q3, a new generation of those integrations will start being introduced to V2 native environments. The docs in the https://docs.pingidentity.com/pingone/strong_authentication_mfa/p1_strong_authentication_start.md section will be updated to reflect the latest offering.
3. I don't think the admin will know that. Would it work to ask them what geography they are in, and have a list of v2 native geographies that we keep updated in one of the reference files?

@@ -1,24 +1,27 @@
---
name: ping-foundation

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In testing, the response generated from this skill blurred together three separate decisions that should stay separate in the skill logic:

  • environment type (Customer vs Workforce)
  • service model for the geography (PingOne MFA in Singapore vs PingID service for Workforce in other geographies)
  • authentication methods enabled in policy

The docs also distinguish between methods available in both environment types and methods that are Workforce-only or Customer-only. I’d suggest making the skill handle those as separate steps in that order, so it doesn’t fall into incorrect either/or framing around PingID push versus other MFA methods.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — the current skill conflates those three decisions rather than treating them as a sequential gate. The right structure is: environment type first (Customer vs Workforce), then service model for the geography (PingOne MFA natively vs PingID service), then method selection within policy.

Before we restructure the routing logic here, we have a related question from your previous comment: once we know the right terminology for the V2-native vs PingID-service split, should this three-step decision tree live in ping-foundation (as a platform-level setup concern) or in ping-universal-services (as an MFA policy configuration concern)? Our instinct is that environment type and service model detection belong in foundation, with the method-level policy details handed off to universal-services — but if testing showed the blurring happened at the foundation level, you may have a stronger view on where the guardrail needs to be.

…entity-for-ai

ping-orchestration T-52: add 'PingOne ST' to passkey journey prompt —
removes ambiguity that was causing models to route to ping-quickstart.

ping-orchestration T-57: add 'In our AIC journey' and 'which nodes'
to transaction approval prompt — removes service-vs-flow ambiguity
that caused intermittent Sonnet empty routes.

ping-identity-for-ai description: extend the clarifying-question guard
to cover bare 'agent' or 'authenticate an agent' without AI/LLM/
agentic context — 'agent' is ambiguous across Ping products. Fixes
Opus A-01 which was routing to orchestration/app-integration instead
of asking.

Results:
- Sonnet 4.6: ping-orchestration 100%, ping-foundation 100% — 5/6
- Opus 4.7: ping-identity-for-ai 100%, ping-orchestration 100% — 5/6
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ests

Also fixes prompt-set schema to accept cross_skill_prompts (M-/H- ids,
singular and plural secondary-skill variants) so pre-commit validation
passes for ping-foundation and ping-app-integration prompt sets.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…yer 2 scaffolding

- Replace PingOne MT / PingOne ST with PingOne and PingOne Advanced Identity Cloud (AIC)
  as primary labels across all 6 SKILL.md descriptions and body text (PR feedback from @rochlev)
- Rename forgerock-to-ping-migration-paths.md → migration-overview.md; strip SDK API
  implementation tables (Kotlin Coroutines, Swift typed properties, package names) which
  belong in ping-app-integration, keeping only orientation-level content; update index.json
- Add cross-skill medium/hard prompt cases to ping-foundation.yaml (M-01, H-01, H-02)
  and ping-app-integration.yaml (M-03) for boundary testing
- Promote shared/evals/routing-eval.md to evals/routing-eval.md as canonical; add Layer 2
  anchor-selection section documenting relationship to automated harness and extension path

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add claude-mcp.json and cursor-mcp.json for MCP routing
- Add .cursor-plugin config for Cursor IDE support
- Update marketplace.json, plugin.json, and SKILL.md files with light revisions
- Remove empty generated/runtime template placeholders across all ping-identity skills

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
george-bafaloukas-forgerock and others added 21 commits June 8, 2026 16:05
Add scripts/update_readme_eval_table.py + tests; insert HTML comment markers
in README.md around the Layer 1 eval block and add a Layer 3 placeholder block.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Five advisory/prose eval tasks covering orientation and routing scenarios
for the ping-quickstart skill: new CIAM project start, ForgeRock migration
entry point, DaVinci vs Journey decision, PingOne MT vs ST trade-offs, and
mobile app SDK selection. Each task has deterministic grep/regex checks plus
an LLM judge rubric.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Five deterministic eval tasks covering: AIC email-verification registration
node sequence, DaVinci risk-triggered step-up MFA, AIC scripted decision
node API, DaVinci-vs-Journey selection rationale (LLM-judged), and AIC
passkey/FIDO2 authentication tree node names and config.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Five deterministic eval tasks covering PingOne MT app registration (confidential
client grant types + auth method), AIC realm OAuth2 service config (correct API
paths), PingOne directory custom attribute schema (STRING type), PingFederate
OAuth client PKCE config (requireProofKeyForCodeExchange), and a judge-rubric
branding/i18n advisory task (PingOne MT vs AIC). Validator passes OK for all 5.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Five deterministic eval tasks covering PingOne Protect risk policy config,
PingOne Verify KYC transaction lifecycle, PingOne Credentials DaVinci issuance,
PingOne Authorize PDP decision endpoint, and PingOne MFA device policy —
targeting Ping-specific field names and enum values a model without the skill
would guess wrong.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Tasks 01-05 ping-app-integration: add explicit 'Write to disk' instruction
  to prevent model from describing code in chat instead of writing files
- Task 02: loosen grep patterns to regex, drop variable-name dependency
- Task 05: loosen callback-name check from exact 'callback.name' to '.name ='
- claude_code_cli.py: read error from result.result field not subtype
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…atterns

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. grading.py: judge_task os.environ[] -> .get() + RuntimeError (was
   KeyError: 'OPENAI_API_KEY' killing all judge tasks silently)

2. ping-app-integration 02-04: fix grep-with-regex-pattern bugs — type
   was 'grep' (substring) not 'regex', so backslash patterns like
   'Task\s*{' searched literally; change to type: regex throughout

3. ping-identity-for-ai 01/02/04/05: add write-to-file instruction
   (model was answering in chat; 0 files matched grading globs)
…CP commit

The MCP config/Cursor plugin commit pushed SKILL.md files over the 120-line
limit and used YAML block scalars (>-) for description fields which the
validator expects as plain strings. Also introduced a missing doc_type value
and referenced generated JSON files that didn't exist.

Fixes:
- Convert description: >- block scalars back to inline strings in all 6 SKILL.md files
- Extract ## MCP config preflight sections to references/runtime/mcp-preflight.md
  per skill; replace with single ## MCP execution pointer line
- Create generated JSON stubs for ping-foundation and ping-orchestration branches
  (pingone-mt, pingone-st, ping-software, cross-platform)
- Fix doc_type: use-case → guide in mfa-method-selection-registration.md

Also commit MFA region and service model work:
- New references/curated/mfa-region-and-service-model.md in ping-universal-services
- MFA region guardrail added to ping-universal-services description
- Cross-reference note added to mfa-configuration.md
- index.json updated with new curated anchor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

**Admin surface:** PingOne admin console → Authentication → MFA → Device Authentication Policies

**Policy scope:** Per-environment. A policy is then referenced by sign-on policies or DaVinci/AIC flow connectors.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the terms used in the reference files (Sign-on-policies.md/mfa-configuration.md) are not outputting the terms as they are used in the UI/documentation.
For example Sign-on policy - should now be authentication policy
"Create or update the MFA device policy" - should just be MFA policy.

I wonder whether the current references or routing might be blending API-oriented terminology with UI/documentation terminology?

- Run 60-task Layer 3 eval (haiku, all 6 skills): 33% with_skill vs 20%
  without, −62% tokens aggregate
- Add flush=True to run_layer3.py print statements for live progress
- Expand Layer 3 README table with absolute pass rates, Δ pass, and % token
  savings columns
- Add "Why absolute pass rates look modest" explanation covering binary
  all-or-nothing scoring, Haiku model-tier floor, write-to-disk compliance,
  and ping-identity-for-ai inversion
- Extend "What the results show" with Layer 3 bullet points
- Add Layer 3 CLI invocation examples to "Run the eval yourself" section

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tement

Eval results are being reviewed before publishing. Replaced all per-skill
score tables and analysis with a brief statement noting consistent token
savings across all six skills. Full results to follow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rewrite README with nav bar, features section, MCP server links, and 15 example prompts
- Add Install Manually section with npx skills CLI commands
- Expand skills table descriptions and rename columns
- Promote Repo layout to top-level section
- Move eval status into callout block
- Fix typo and heading levels throughout
- Add banner image and update plugin README

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@brando-dill brando-dill left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good so far, I am going to merge this so we can see a clean main and make any final tweaks in another PR.

@brando-dill brando-dill merged commit 7590e53 into main Jun 9, 2026
@brando-dill brando-dill deleted the skills-feedback-fixes branch June 9, 2026 15:54
george-bafaloukas-forgerock added a commit that referenced this pull request Jun 9, 2026
Complete the customer-facing product rename and repair drift left by the PR #3
merge and the MCP-config cleanup commit:

- .well-known/agent-skills/index.json: PingOne MT/ST → PingOne / PingOne Advanced
  Identity Cloud (AIC) in ping-foundation and ping-orchestration descriptions
  (this is the index marketplaces and agents consume)
- Finish MT/ST → customer-facing rename in SKILL.md bodies/descriptions for
  ping-orchestration, ping-quickstart, ping-foundation; and in ping-marketplace.json
  display descriptions + .cursor-plugin description. Machine routing slugs
  (product_family/products arrays) intentionally unchanged.
- Repair 5 dead links to deleted references/runtime/docs-mcp-routing.md: inline the
  docs-MCP fallback guidance in 4 SKILL.md files; point app-integration-overview.md
  to the existing mcp-preflight.md
- Standardize contact email to developer-experience@pingidentity.com across all 4
  plugin manifests (was split with devex@)
- Fix 8 generated shortlist stubs to conform to reference-manifest-schema.json
  ({skill, branch, generated_at, max_docs, docs}) instead of malformed placeholder

Validator clean; Layer 1 + Layer 2 mock evals 100%.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants