Skills feedback fixes by george-bafaloukas-forgerock · Pull Request #3 · pingidentity/agent-plugins

george-bafaloukas-forgerock · 2026-06-04T15:17:07Z

Key Changes
1. Skill Definition Updates (v1.0.0)

Version Bumps: All six primary skills (ping-app-integration, ping-foundation, ping-identity-for-ai, ping-orchestration, ping-quickstart, ping-universal-services) have been bumped to version 1.0.0.

Prompt Routing Refinements: Skill descriptions (SKILL.md) were heavily rewritten to explicitly define boundaries and prevent model confusion. For example, the descriptions now clearly delineate that integrating an SDK goes to ping-app-integration, flow design goes to ping-orchestration, and policy/device configuration goes to ping-universal-services.

2. Eval & Benchmarking Additions

Cross-Model Benchmarking: The README.md now includes a detailed routing evaluation across three model tiers (Haiku 4.5, Sonnet 4.6, and Opus 4.7). It documents specific weak spots for each model (e.g., Opus struggles with the ping-quickstart front door, while Haiku struggles with the dense ping-universal-services description).

Feedback-Driven Prompts: Several new test prompts (labeled "Feedback use cases (June 2026)") were added to the YAML eval files. These test the agent's ability to correctly route requests regarding WS-Federation, Platform SSO, passkeys, authenticator app enrollment, transaction approvals, and MFA device management.

3. Comprehensive MFA Documentation Updates

New MFA Configuration Anchor: Added a brand-new, detailed file (mfa-configuration.md) under ping-universal-services. It covers PingOne MFA policy configuration, supported AMR codes, device management, pairing keys, and headless MFA endpoints.

Skill Disambiguation: Added strict rules across the documentation to clarify that capturing IDs or liveness checks belongs to Verify, MFA factor enrollment inside a flow belongs to Orchestration, and configuring MFA policies or managing devices outside of a flow belongs to Universal Services.

App Integration Enhancements: Added documentation in mobile-integration-basics.md regarding the PingOne MFA SDK, specifically covering push MFA, pairing key prerequisites, and custom AMR strings (face, pin, ftp).

4. Foundation & Quickstart Expansion

Admin & Tenant Setup: Expanded pingone-mt reference files with comprehensive details on the Administrator account registration flow (including email verification and invite expiry), environment URLs, and the isolated "Administrators environment."

Directory Management: Added a breakdown of user ingestion methods (Self-registration, Bulk import, SCIM, etc.) and user detail panel tabs.

Getting Started Sequence: Updated the getting-started-overview.md to reflect the official 8-step sequence for PingOne MT, including mandatory admin MFA enrollment and session timeout behaviors.

5. Automated Manifest Updates

All generated top-*.json reference files were updated with a new timestamp (2026-06-04T13:06:47Z) and adjusted to include the newly added documentation.

…mpts ping-foundation/SKILL.md: - description: 'Use this skill' -> 'Use this'; added WS-Federation and SSO/Platform SSO to trigger phrases - When to use: added WS-Federation app registration; added SSO / Platform SSO / workforce SSO as explicit trigger - eval prompts T-56/57/58: WS-Fed, Platform SSO, workforce SSO ping-orchestration/SKILL.md: - When to use: added passwordless authentication (passkeys/FIDO2/magic links), authenticator app login / TOTP enrollment, transaction approvals via email or push (CIBA / out-of-band step-up) - eval prompts T-55/56/57/58: passwordless, TOTP enrollment, email transaction approval, push MFA approval ping-universal-services/SKILL.md: - When to use: added PingOne MFA (device management, policy config, enrollment API) - Routing table: added MFA branch pointing to choosing-the-right-service.md for MFA vs flow-level routing split - When NOT to use: clarified MFA node/connector wiring in flows belongs in ping-orchestration not here - Added PingOne Recognize note (not yet GA) - eval prompts T-52/53: PingOne MFA managed service, enrollment API All SKILL.md files under 120 lines. Validator and mock L1 eval: all PASS.

Rewrote all 6 descriptions to: trigger instruction first, dense keyword lists, no prose preamble, explicit NOT trigger guards for ambiguous overlap cases. ping-quickstart: explicit trigger phrases, catch-all rule, test/validate use case keywords, no router-style padding ping-foundation: keyword-first list of all setup/admin triggers; added WS-Federation, SSO, Platform SSO, MFA policy; "Use this skill whenever" retained (assertive language for agent triggering) ping-orchestration: added passwordless, authenticator app / TOTP, CIBA / transaction approvals; restored clarifying-question cue for platform comparison prompts ping-universal-services: PingOne MFA added as named service; explicit NOT trigger for flow-level MFA node wiring ping-app-integration: SDK/code trigger-first; explicit NOT-for guard listing Verify, Protect, Authorize at policy level ping-identity-for-ai: AI/LLM trigger-first; explicit NOT trigger guard for automated process / scheduled job without AI context Validator fixes (from parallel run): - admin-roles-and-access.md: 2 UI navigation phrases replaced with field-table language - directory-and-populations.md: 1 UI navigation phrase replaced Content improvements (from parallel run): - pingone-mt/tenant-and-environment-setup.md: added Environment URLs section + Administrators environment section; last_updated 2026-06-04 - getting-started-overview.md: revised PingOne MT setup sequence to match official 8-task onboarding; last_updated 2026-06-04 Live Layer 1 eval (Bedrock): 6/6 PASS ping-app-integration 100% / 100% / 100% ping-foundation 95% / 100% / 100% ping-identity-for-ai 100% / 100% / 100% ping-orchestration 100% / 100% / 100% ping-quickstart 92% / 100% / 100% ping-universal-services 100% / 100% / 100%

…ile SDK AMR New anchor — ping-universal-services/references/curated/mfa-configuration.md: - PING_ONE_MFA BOM prerequisite and license check - Workforce vs Customer environment config split - Device Authentication Policies (= 'MFA Policies' in UI) — field table - 7 AMR method codes with descriptions (EMAIL, SMS, TEL, OTP, MCA, USER, SWK) - USER vs MCA distinction (interactive vs silent push) - Pairing keys — binding mechanism for push MFA; API endpoint - /deviceAuthentications headless endpoint for non-authorize MFA flows - Per-user device management, bypass MFA, MyAccount self-service - PingID-specific config (Workforce only: Offline MFA, Windows login) - Routing split table (config vs flow vs SDK) - Common gotchas including BOM missing, pairing key missing, policy not assigned choosing-the-right-service.md: - Added PingOne MFA to intent-to-service mapping table - Updated Verify vs MFA disambiguation rule to clarify three-way split: flow-level MFA → orchestration; service config/device mgmt → universal-services mobile-integration-basics.md: - Added PingOne MFA SDK section: push MFA, custom AMR strings (face/pin/ftp via approve()), pairing key prerequisite, cross-reference to mfa-configuration.md ping-universal-services/SKILL.md: - Added mfa-configuration.md to Step 2 routing table index.json: added mfa-configuration.md path Manifests: rebuilt (63 curated anchors, up from 62) Description consistency: fixed ping-identity-for-ai and ping-quickstart descriptions to match "Use this skill whenever the task involves [gerund]" pattern.

…dex fixes - Cross-model Layer 1 eval ran against Claude Haiku 4.5, Sonnet 4.6, Opus 4.7 on Bedrock. Results, per-tier weak spots, and tuning targets documented in README § Eval status. - Normalised SKILL.md version fields to "1.0.0" (was a mix of 0.2.0 / 1.0 / 1.0.0). - Added node-fundamentals.md to references/index.json — was referenced from ping-orchestration/SKILL.md but missing from the index. - Added ## Invocation section to ping-foundation/SKILL.md for consistency with the other 5 skills; trimmed redundant blockquote + separator to stay within the 120-line budget.

Three targeted description tweaks based on per-model failure analysis: - ping-universal-services: add explicit "service-in-flow rule" — when a Protect/Verify/IGA/Authorize node sits inside a DaVinci flow or AIC journey, configuring it belongs here, not in ping-orchestration. Targets Haiku 4.5's 69% trigger rate (5/5 misroutes were this). - ping-orchestration: make the clarifying-question cue imperative ("you MUST ask one clarifying question before recommending"). Targets Sonnet 4.6's 67% ambiguous score on A-02 (AIC vs DaVinci?). - ping-quickstart: add "BEFORE any more specialised skill" priority cue for orientation framing ("where do we start", "evaluating", "migrating"). Targets Opus 4.7's 85% trigger rate where it routes orientation prompts to specialised skills. Mock eval still 100% across all 6 skills.

- Sonnet 4.6: 5/6 → 6/6 (perfect score) - Opus 4.7: 4/6 → 5/6 (ping-quickstart fixed) - Haiku 4.5: 5/6 → 4/6 (orchestration regression — tier limit, not a description bug; Haiku over-asks for clarification when platform is stated explicitly) Documents pre/post movement, the trade-off keeping the imperative clarifying-question cue, and known eval prompt weak spots.

Adds an OpenAI adapter mirroring the Claude one — same routing system prompt, same JSON schema, drop-in via --adapter openai. Wired into run_eval.py choices and requirements.txt. Cross-vendor Layer 1 results documented in README: - gpt-5.5: 2/6 pass - gpt-5.4-mini: 0/6 pass - gpt-5.4-nano: 1/6 pass Headline finding: trigger discipline transfers to OpenAI (94-100%), but ambiguous-prompt clarifying-question behaviour is structurally different — GPT-5.x prefers to route confidently even when the description says "you MUST ask one clarifying question". Routing decisions are vendor-portable; clarifying-question cues need vendor-specific tuning. Note: GPT-5.x rejects max_tokens; the adapter uses max_completion_tokens (=512 to leave headroom for JSON output).

Adds an at-a-glance comparison putting all 6 models side by side: Haiku 4.5, Sonnet 4.6, Opus 4.7, gpt-5.4-nano, gpt-5.4-mini, gpt-5.5. Plus an aggregate-metrics table ranking models by skills-passing across both vendors. Sonnet 4.6 leads at 6/6 / 100%; Opus 4.7 and Haiku 4.5 follow; GPT-5.x family clusters at 0-2/6 due to a structural ambiguous-prompt gap (56% avg vs 98% Anthropic). Per-vendor detail tables retained below the headline table.

Pass 1 — eval prompt fix (Opus T-09): - ping-foundation T-09: rewrite from "MFA policies" to "sign-on policy" to remove genuine overlap with ping-universal-services. Pass 2 — GPT Option A: format-constrained clarifying-question cue: - Added to ping-foundation, ping-orchestration, ping-universal-services, ping-app-integration, ping-quickstart: "your reply MUST be a single clarifying question ending with '?'" — explicit format target that GPT-5.x respects vs the weaker imperative it ignored. Pass 3 — GPT Option B: verb/noun NOT-trigger guards: - ping-app-integration: verb/noun rule — design/build/configure/invoke does not trigger this skill; only integrate/embed/wire does. - ping-universal-services: SDK/integrate/embed keywords → redirect to ping-app-integration regardless of service name in the prompt.

… fix The three-pass description tuning in d30e07d caused regressions across all Claude tiers (Sonnet dropped from 6/6 to 4/6) — stricter format constraints ("reply MUST be a single clarifying question ending with '?'") caused models to over-ask on well-specified prompts. Root cause: the ambiguous-prompt gap in GPT-5.x is a vendor-behavioural trait (GPT defaults to confident routing; Claude defaults to caution). It cannot be closed by tuning shared descriptions without breaking Claude. Requires a vendor-specific adapter-level instruction (Phase 4). Changes kept from d30e07d: - evals/prompts/ping-foundation.yaml T-09: "sign-on policy" rewrite (removes genuine overlap with ping-universal-services) - evals/harness/adapters/openai.py: max_completion_tokens 512 → 1024 (GPT-5.5 was truncating on longer system prompts) SKILL.md files reverted to commit 5530876 (last-known-good, Sonnet 6/6). README updated: gpt-5.5 ping-foundation corrected to ✅ after T-09 fix; aggregate metrics updated; vendor-gap finding documented.

…tent Removes: phase delivery table, internal commit references, tuning post-mortems, backlog notes, and internal narrative about what failed and why. Keeps: install instructions, skill table, how-it-works, repo layout, cross-model eval results, and authoring guide.

…rigger and ambiguous failures

Old prompt tested end-user authentication for an LLM-fronted portal, which belongs in ping-foundation or ping-app-integration. The skill covers securing agents with Identity for AI offerings — not general end-user auth in apps that happen to contain an LLM. New prompt: AI agent with scoped identity acting on behalf of employees across multiple APIs — correct Identity for AI scenario (least-privilege token, audit trail, agent-security-patterns).

…rification rule Implements evals/scorecards/gpt-5x-improvement-plan.md Step 1. Two adapter-level additions to openai.py system prompt only (no SKILL.md changes, zero risk to Claude scores): 1. Routing tie-breaker: when a prompt contains both a service/product name and an SDK/integration signal, the integration verb wins. Fixes the noun-overrides-intent misroutes (N-04/N-05 pattern). 2. Clarification rule: prompts ≤10 words with no named platform, tech stack, or service must clarify before routing. Targets the 13 ambiguous-prompt failures where GPT routed instead of asking. Results (post-fix vs pre-fix): gpt-5.4-nano: 1/6 → 3/6 (+2, non-trigger 90%→97%, ambiguous 56%→83%) gpt-5.4-mini: 0/6 → 1/6 (+1, non-trigger 91%→97%, ambiguous 56%→67%) gpt-5.5: 3/6 → 3/6 (same count, different skills; non-trigger 95%→97%) Sonnet 4.6 confirmed 6/6 — no regression. Opus 4.7 improved: ping-app-integration now passes (ping-identity-for-ai T-09 rewrite also contributed).

Problem: "Add a user to Ping" was being answered directly by models routing to ping-foundation, assuming PingOne MT. Ping has separate user populations in PingOne MT, AIC, PingFederate, and PingDirectory — the platform must be established before answering. Changes: - ping-quickstart description: explicit trigger for bare user-management commands ("Add a user to Ping", "Create a user in Ping") with a description-level clarify action directive (body-level instructions are not visible at Layer 1 eval) - ping-quickstart SKILL.md body: mandatory clarification note naming the four products with separate user populations - ping-foundation description: prerequisite — bare user commands without a platform belong in ping-quickstart first - evals/prompts/ping-foundation.yaml: A-03 "Add a user to Ping" moved from ambiguous to non-trigger (N-06, expected ping-quickstart); A-03 slot removed from ambiguous set - evals/prompts/ping-quickstart.yaml: A-04 added for "Add a user to Ping" as an ambiguous prompt with platform-clarification keywords Result: both Sonnet and Opus now correctly route + clarify on A-04.

rochlev · 2026-06-04T19:12:12Z

 ---
 name: ping-foundation
-description: Platform setup, administration, and core configuration for PingOne MT, PingOne ST (AIC), and on-premises Ping software. Use this skill whenever a user asks ANY question about setting up environments, registering OIDC/SAML apps, managing directories and user populations, configuring authentication policies, branding, or administering PingFederate/PingAccess/PingDirectory/PingID — including advisory, planning, and "how should I..." questions, not just execution tasks. Also invoke with /ping-foundation.
+description: Use this skill whenever the task involves setting up, configuring, or administering any Ping Identity platform — PingOne MT, PingOne ST (AIC), PingFederate, PingAccess, PingDirectory, or PingID. Triggers: create or manage environments, tenants, realms; register OIDC, SAML, WS-Federation, or OAuth 2.0 apps; configure SSO, Platform SSO, or workforce single sign-on; manage directories, LDAP, user populations, or schema; configure sign-on policies, authentication policies, or step-up MFA policy settings at the platform level; configure MFA methods or PingID in PingFederate; branding, custom domains, or notification templates; administer on-premises Ping software; advisory questions like "how should I structure my tenant" or "what grant type should I use". Prerequisite — a specific platform must be named or clearly implied; "add a user to Ping" or "create a user in Ping" without a named platform belongs in ping-quickstart first. Also invoke with /ping-foundation.


PingOne MT / PingOne ST read as internal shorthand rather than customer-facing product names, and that leaked into model output during testing. I’d suggest using customer-facing names as the primary labels here, e.g. PingOne (multi-tenant cloud) and PingOne Advanced Identity Cloud (AIC), and only keeping MT/ST secondarily/in parentheses if they still help internally.

Good catch — agreed, we'll update to customer-facing names as primary (PingOne and PingOne Advanced Identity Cloud / AIC), with MT/ST kept only as secondary shorthand where it genuinely helps with disambiguation. Will apply this across all six skill descriptions.

rochlev · 2026-06-04T19:17:08Z

+- Configure SSO, Platform SSO, or workforce single sign-on
 - Manage directories, identity stores, or user populations
 - Configure authentication policies, sign-on policies, or branding
 - Administer PingFederate, PingAccess, PingDirectory, or PingID


PingID feels too ambiguous as the only MFA-ish term in scope here. It can refer to legacy PingID admin portal tasks, Workforce environments using the PingID service, or specific methods like PingID mobile/desktop. As written, it makes the skill read as if MFA admin = PingID, and it hides the current PingOne / PingOne MFA framing, which varies by geography and environment type. We're moving away from referring to a PingID environment in favor of Workforce/ Customer.
I suggest explicitly mentioning PingOne / PingOne MFA in scope text, and using PingID service only where that is actually the right term for non-Singapore Workforce / PingID-service scenarios.

This is really helpful context, thank you. A couple of questions before we update the skill to make sure we get the terminology right:

Is "PingID" being retired as a term entirely in customer-facing content, or is it staying in use for specific scenarios (e.g. the PingID service in Workforce environments outside Singapore)? Understanding whether this is a full sunset or a repositioning will help us decide how much to reference it in the skill.

For environments that currently use the PingID service (non-Singapore Workforce), what's the right terminology to use in the skill description — "PingOne MFA" as the umbrella, with "PingID service" called out as the path for those environments? Or is there a preferred phrasing we should align with?

Once we have that clarity we can update the description to lead with PingOne / PingOne MFA and reference the PingID service only where it's the actual applicable path.

@george-bafaloukas-forgerock it's a moving landscape, and the current state has PingID used in three different contexts:

Legacy v1: PingID admin portal

hybrid v1 out of v2 (PIngID out of PingOne):
All geographies except Singapore require PingID service for a Workforce use case.
Singapore: This is the only native v2 environment. In this geography currently the admin chooses PingOne MFA service and then receives either a customer or workforce environment based on their environment license.

Other contexts: Authentication methods: PingID mobile app, PingID desktop app.
PingID device trust.

However the PM confirmed with me this afternoon that is there's a requirement to move away from the name PingID as a service and leave it only for the mobile/desktop apps.

So the answer to 1 going forward is actually yes. They are currently in discussion with the licensing team to change the name of the offering for Workforce use cases to PingOne MFA for WF

"PingOne MFA for WF" (hybrid) (for hybrid v1 out of v2) vs.
"PingOne MFA for WF"(for the Singapore /V2 native service).

He suggested that customers should understand that PingID is the legacy name of the service and we are moving away from it, but currently it is potentially confusing.

rochlev · 2026-06-04T19:29:09Z

@@ -1,24 +1,27 @@
 ---
 name: ping-foundation


I think perhaps this skill needs an explicit Singapore guardrail against legacy PingID assumptions. In testing, the model jumped too quickly to legacy PingID admin-portal / linking flows.
The docs make Singapore a distinct branch: admins select PingOne MFA there, Customer vs Workforce depends on license, and the legacy PingID policy-usage differences are not relevant because Singapore does not rely on the PingID service. By contrast, other geographies still use the PingID service for Workforce.

At present this distinction is only relevant for Singapore (as the only V2-native environment). However Canada will join this in Q3 and other geographies later in the year.

I’d recommend adding an explicit guardrail so the model does not default to non-Singapore PingID-service assumptions (and legacy pingid admin portal steps) unless the geography / service model / migration state actually requires that path.

Really valuable observation — this is exactly the kind of thing that bites us in practice. Before we add a guardrail we want to make sure we implement it correctly rather than guessing at the branching logic:

From a user's perspective, what's the most natural signal that indicates Singapore (V2-native) vs. other geographies still on the PingID service? Is it something the admin would typically know and state (e.g. "we're in Singapore", or a specific admin console URL / environment indicator), or is it something we'd need to ask for explicitly?

Is there a doc or internal reference we can point the skill to for the V2 vs. PingID-service distinction? Ideally something we can anchor the guardrail against so the model has accurate context rather than just a rule.

You mentioned Canada joins Singapore in Q3 — should the guardrail be written in a geography-agnostic way ("ask if the environment is V2-native") rather than listing specific countries, so it doesn't need updating each time a new geography rolls over?

Happy to implement once we have those answers.

They should know their PingOne environment geography. They might not know that other geographies work a different way. V2 is an internal term, so they wouldn't be aware of that.
Environmental indicators might be best indicator - if the environment is apps.pingone.com, or apps.pingone.sg etc.

I think this is the best overview of the differences between customer and workforce environments that highlights the difference between singapore (V2) and other geographies.

https://docs.pingidentity.com/pingone/strong_authentication_mfa/p1_pid_what_is_the_difference.md

It's a changing landscape, as currently for example, some integrations are not available in singapore/native v2 environments. In Q3, a new generation of those integrations will start being introduced to V2 native environments. The docs in the https://docs.pingidentity.com/pingone/strong_authentication_mfa/p1_strong_authentication_start.md section will be updated to reflect the latest offering.
3. I don't think the admin will know that. Would it work to ask them what geography they are in, and have a list of v2 native geographies that we keep updated in one of the reference files?

rochlev · 2026-06-04T19:35:16Z

@@ -1,24 +1,27 @@
 ---
 name: ping-foundation


In testing, the response generated from this skill blurred together three separate decisions that should stay separate in the skill logic:

environment type (Customer vs Workforce)

service model for the geography (PingOne MFA in Singapore vs PingID service for Workforce in other geographies)

authentication methods enabled in policy

The docs also distinguish between methods available in both environment types and methods that are Workforce-only or Customer-only. I’d suggest making the skill handle those as separate steps in that order, so it doesn’t fall into incorrect either/or framing around PingID push versus other MFA methods.

Agreed — the current skill conflates those three decisions rather than treating them as a sequential gate. The right structure is: environment type first (Customer vs Workforce), then service model for the geography (PingOne MFA natively vs PingID service), then method selection within policy.

Before we restructure the routing logic here, we have a related question from your previous comment: once we know the right terminology for the V2-native vs PingID-service split, should this three-step decision tree live in ping-foundation (as a platform-level setup concern) or in ping-universal-services (as an MFA policy configuration concern)? Our instinct is that environment type and service model detection belong in foundation, with the method-level policy details handed off to universal-services — but if testing showed the blurring happened at the foundation level, you may have a stronger view on where the guardrail needs to be.

…entity-for-ai ping-orchestration T-52: add 'PingOne ST' to passkey journey prompt — removes ambiguity that was causing models to route to ping-quickstart. ping-orchestration T-57: add 'In our AIC journey' and 'which nodes' to transaction approval prompt — removes service-vs-flow ambiguity that caused intermittent Sonnet empty routes. ping-identity-for-ai description: extend the clarifying-question guard to cover bare 'agent' or 'authenticate an agent' without AI/LLM/ agentic context — 'agent' is ambiguous across Ping products. Fixes Opus A-01 which was routing to orchestration/app-integration instead of asking. Results: - Sonnet 4.6: ping-orchestration 100%, ping-foundation 100% — 5/6 - Opus 4.7: ping-identity-for-ai 100%, ping-orchestration 100% — 5/6

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ests Also fixes prompt-set schema to accept cross_skill_prompts (M-/H- ids, singular and plural secondary-skill variants) so pre-commit validation passes for ping-foundation and ping-app-integration prompt sets. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@rochlev

…yer 2 scaffolding - Replace PingOne MT / PingOne ST with PingOne and PingOne Advanced Identity Cloud (AIC) as primary labels across all 6 SKILL.md descriptions and body text (PR feedback from @rochlev) - Rename forgerock-to-ping-migration-paths.md → migration-overview.md; strip SDK API implementation tables (Kotlin Coroutines, Swift typed properties, package names) which belong in ping-app-integration, keeping only orientation-level content; update index.json - Add cross-skill medium/hard prompt cases to ping-foundation.yaml (M-01, H-01, H-02) and ping-app-integration.yaml (M-03) for boundary testing - Promote shared/evals/routing-eval.md to evals/routing-eval.md as canonical; add Layer 2 anchor-selection section documenting relationship to automated harness and extension path Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Add claude-mcp.json and cursor-mcp.json for MCP routing - Add .cursor-plugin config for Cursor IDE support - Update marketplace.json, plugin.json, and SKILL.md files with light revisions - Remove empty generated/runtime template placeholders across all ping-identity skills Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add scripts/update_readme_eval_table.py + tests; insert HTML comment markers in README.md around the Layer 1 eval block and add a Layer 3 placeholder block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Five advisory/prose eval tasks covering orientation and routing scenarios for the ping-quickstart skill: new CIAM project start, ForgeRock migration entry point, DaVinci vs Journey decision, PingOne MT vs ST trade-offs, and mobile app SDK selection. Each task has deterministic grep/regex checks plus an LLM judge rubric. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Five deterministic eval tasks covering: AIC email-verification registration node sequence, DaVinci risk-triggered step-up MFA, AIC scripted decision node API, DaVinci-vs-Journey selection rationale (LLM-judged), and AIC passkey/FIDO2 authentication tree node names and config. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Five deterministic eval tasks covering PingOne MT app registration (confidential client grant types + auth method), AIC realm OAuth2 service config (correct API paths), PingOne directory custom attribute schema (STRING type), PingFederate OAuth client PKCE config (requireProofKeyForCodeExchange), and a judge-rubric branding/i18n advisory task (PingOne MT vs AIC). Validator passes OK for all 5. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Five deterministic eval tasks covering PingOne Protect risk policy config, PingOne Verify KYC transaction lifecycle, PingOne Credentials DaVinci issuance, PingOne Authorize PDP decision endpoint, and PingOne MFA device policy — targeting Ping-specific field names and enum values a model without the skill would guess wrong. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…essage

…c models

- Tasks 01-05 ping-app-integration: add explicit 'Write to disk' instruction to prevent model from describing code in chat instead of writing files - Task 02: loosen grep patterns to regex, drop variable-name dependency - Task 05: loosen callback-name check from exact 'callback.name' to '.name =' - claude_code_cli.py: read error from result.result field not subtype

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…atterns Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

1. grading.py: judge_task os.environ[] -> .get() + RuntimeError (was KeyError: 'OPENAI_API_KEY' killing all judge tasks silently) 2. ping-app-integration 02-04: fix grep-with-regex-pattern bugs — type was 'grep' (substring) not 'regex', so backslash patterns like 'Task\s*{' searched literally; change to type: regex throughout 3. ping-identity-for-ai 01/02/04/05: add write-to-file instruction (model was answering in chat; 0 files matched grading globs)

…CP commit The MCP config/Cursor plugin commit pushed SKILL.md files over the 120-line limit and used YAML block scalars (>-) for description fields which the validator expects as plain strings. Also introduced a missing doc_type value and referenced generated JSON files that didn't exist. Fixes: - Convert description: >- block scalars back to inline strings in all 6 SKILL.md files - Extract ## MCP config preflight sections to references/runtime/mcp-preflight.md per skill; replace with single ## MCP execution pointer line - Create generated JSON stubs for ping-foundation and ping-orchestration branches (pingone-mt, pingone-st, ping-software, cross-platform) - Fix doc_type: use-case → guide in mfa-method-selection-registration.md Also commit MFA region and service model work: - New references/curated/mfa-region-and-service-model.md in ping-universal-services - MFA region guardrail added to ping-universal-services description - Cross-reference note added to mfa-configuration.md - index.json updated with new curated anchor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

rochlev · 2026-06-08T15:40:46Z

+
+**Admin surface:** PingOne admin console → Authentication → MFA → Device Authentication Policies
+
+**Policy scope:** Per-environment. A policy is then referenced by sign-on policies or DaVinci/AIC flow connectors.


Some of the terms used in the reference files (Sign-on-policies.md/mfa-configuration.md) are not outputting the terms as they are used in the UI/documentation.
For example Sign-on policy - should now be authentication policy
"Create or update the MFA device policy" - should just be MFA policy.

I wonder whether the current references or routing might be blending API-oriented terminology with UI/documentation terminology?

- Run 60-task Layer 3 eval (haiku, all 6 skills): 33% with_skill vs 20% without, −62% tokens aggregate - Add flush=True to run_layer3.py print statements for live progress - Expand Layer 3 README table with absolute pass rates, Δ pass, and % token savings columns - Add "Why absolute pass rates look modest" explanation covering binary all-or-nothing scoring, Haiku model-tier floor, write-to-disk compliance, and ping-identity-for-ai inversion - Extend "What the results show" with Layer 3 bullet points - Add Layer 3 CLI invocation examples to "Run the eval yourself" section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tement Eval results are being reviewed before publishing. Replaced all per-skill score tables and analysis with a brief statement noting consistent token savings across all six skills. Full results to follow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Rewrite README with nav bar, features section, MCP server links, and 15 example prompts - Add Install Manually section with npx skills CLI commands - Expand skills table descriptions and rename columns - Promote Repo layout to top-level section - Move eval status into callout block - Fix typo and heading levels throughout - Add banner image and update plugin README Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

brando-dill

This looks good so far, I am going to merge this so we can see a clean main and make any final tweaks in another PR.

Complete the customer-facing product rename and repair drift left by the PR #3 merge and the MCP-config cleanup commit: - .well-known/agent-skills/index.json: PingOne MT/ST → PingOne / PingOne Advanced Identity Cloud (AIC) in ping-foundation and ping-orchestration descriptions (this is the index marketplaces and agents consume) - Finish MT/ST → customer-facing rename in SKILL.md bodies/descriptions for ping-orchestration, ping-quickstart, ping-foundation; and in ping-marketplace.json display descriptions + .cursor-plugin description. Machine routing slugs (product_family/products arrays) intentionally unchanged. - Repair 5 dead links to deleted references/runtime/docs-mcp-routing.md: inline the docs-MCP fallback guidance in 4 SKILL.md files; point app-integration-overview.md to the existing mcp-preflight.md - Standardize contact email to developer-experience@pingidentity.com across all 4 plugin manifests (was split with devex@) - Fix 8 generated shortlist stubs to conform to reference-manifest-schema.json ({skill, branch, generated_at, max_docs, docs}) instead of malformed placeholder Validator clean; Layer 1 + Layer 2 mock evals 100%. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

george-bafaloukas-forgerock added 6 commits June 4, 2026 11:46

revert: restore 'Use this skill' in ping-foundation description

74ba4e7

george-bafaloukas-forgerock requested a review from brando-dill June 4, 2026 15:17

george-bafaloukas-forgerock added 13 commits June 4, 2026 16:37

docs(evals): GPT-5.x improvement plan — adapter-level fixes for non-t…

ec54453

…rigger and ambiguous failures

docs(evals): remaining issues plan with input needed markers

ce672ad

feat(evals): add Layer 3 task schema and first task

34bacf0

fix(evals): align task.schema.json prompt.minLength with plan (20)

3e546c5

rochlev reviewed Jun 4, 2026

View reviewed changes

george-bafaloukas-forgerock and others added 9 commits June 4, 2026 21:08

refactor(evals): dedupe validate_prompts helpers and clean up tests

c4b33a0

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat(evals): deterministic check evaluator for Layer 3

5df5735

fix(evals): json_path distinguishes null values from missing keys

5b86498

feat(evals): claude-code CLI runner for Layer 3

2590717

feat(evals): OpenAI runner for Layer 3 task execution

b32779a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

george-bafaloukas-forgerock and others added 21 commits June 8, 2026 16:05

feat(evals): README updater for Layer 3 eval table

3069b42

Add scripts/update_readme_eval_table.py + tests; insert HTML comment markers in README.md around the Layer 1 eval block and add a Layer 3 placeholder block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(evals): Layer 3 tasks 02-05 for ping-app-integration

8c67e6b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(evals): Layer 3 tasks for ping-identity-for-ai

455360f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

build: Makefile with eval-layer3 targets

8867379

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ci: workflow_dispatch for Layer 3 eval

125873d

fix(evals): treat eu./us./ap. Bedrock model prefixes as Anthropic

50cb5e3

fix(evals): parse is_error result text instead of subtype for error m…

6164dd0

…essage

fix(evals): accept claude CLI aliases (haiku/sonnet/opus) as Anthropi…

55555c4

…c models

chore(evals): update README Layer 3 table with haiku smoke results

ad63d79

chore(evals): update README Layer 3 table — haiku full run

5ad3737

fix(evals): add write-to-disk instructions to prose/config tasks

69a99d4

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

fix(evals): enable judge_rubric on quickstart; fix foundation check p…

7cadecf

…atterns Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore(evals): update README Layer 3 table — run 2

52001f6

chore(evals): update README Layer 3 table — run 3 with judge

5574ed7

rochlev reviewed Jun 8, 2026

View reviewed changes

george-bafaloukas-forgerock and others added 3 commits June 8, 2026 16:56

brando-dill approved these changes Jun 9, 2026

View reviewed changes

brando-dill merged commit 7590e53 into main Jun 9, 2026

brando-dill deleted the skills-feedback-fixes branch June 9, 2026 15:54

george-bafaloukas-forgerock mentioned this pull request Jun 22, 2026

Address dev-experience feedback: tone, durability, AIC naming, skill purpose #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Skills feedback fixes#3

Skills feedback fixes#3
brando-dill merged 59 commits into
mainfrom
skills-feedback-fixes

george-bafaloukas-forgerock commented Jun 4, 2026 •

edited

Loading

Uh oh!

rochlev Jun 4, 2026

Uh oh!

george-bafaloukas-forgerock Jun 5, 2026

Uh oh!

rochlev Jun 4, 2026

Uh oh!

george-bafaloukas-forgerock Jun 5, 2026

Uh oh!

rochlev Jun 8, 2026

Uh oh!

rochlev Jun 4, 2026

Uh oh!

george-bafaloukas-forgerock Jun 5, 2026

Uh oh!

rochlev Jun 8, 2026

Uh oh!

rochlev Jun 4, 2026

Uh oh!

george-bafaloukas-forgerock Jun 5, 2026

Uh oh!

rochlev Jun 8, 2026

Uh oh!

brando-dill left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		Admin surface: PingOne admin console → Authentication → MFA → Device Authentication Policies

		Policy scope: Per-environment. A policy is then referenced by sign-on policies or DaVinci/AIC flow connectors.

Uh oh!

Conversation

george-bafaloukas-forgerock commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brando-dill left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

george-bafaloukas-forgerock commented Jun 4, 2026 •

edited

Loading