Skip to content

Commit 377d62d

Browse files
lpcoxCopilot
andauthored
refactor: use gh aw logs for token analysis workflows (#1884)
Replace manual artifact downloading (gh run download --name firewall-audit-logs) with gh aw logs --json in all 4 token analysis workflows. This aligns with the approach used by gh-aw's own token-audit and token-optimizer workflows. Benefits: - Eliminates dependency on knowing the correct artifact name (the root cause of the bug fixed in PR #1883) - Gets pre-aggregated structured data (token counts, costs, turns) instead of parsing raw token-usage.jsonl - Handles artifact naming backward/forward compat automatically - Pre-downloads data in steps block (faster, more reliable) Changes: - copilot-token-usage-analyzer.md: replaced manual run discovery + artifact download with gh aw logs --engine copilot --json pre-step - claude-token-usage-analyzer.md: same for --engine claude - copilot-token-optimizer.md: added gh aw logs --engine copilot pre-step for 7-day data, removed manual artifact download - claude-token-optimizer.md: same for --engine claude - shared/mcp/gh-aw.md: new shared component to install gh-aw CLI (mirroring upstream gh-aw's shared/mcp/gh-aw.md) Refs: #1883, github/gh-aw#25683 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 59d8840 commit 377d62d

9 files changed

Lines changed: 494 additions & 397 deletions

.github/workflows/claude-token-optimizer.lock.yml

Lines changed: 26 additions & 16 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.github/workflows/claude-token-optimizer.md

Lines changed: 62 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ permissions:
1414
actions: read
1515
issues: read
1616
pull-requests: read
17+
imports:
18+
- uses: shared/mcp/gh-aw.md
1719
network:
1820
allowed:
1921
- github
@@ -23,11 +25,39 @@ tools:
2325
bash: true
2426
safe-outputs:
2527
create-issue:
26-
title-prefix: " Claude Token Optimization"
28+
title-prefix: "\u26a1 Claude Token Optimization"
2729
labels: [claude-token-optimization]
2830
close-older-issues: true
2931
timeout-minutes: 10
3032
strict: true
33+
steps:
34+
- name: Download recent Claude workflow logs
35+
env:
36+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
37+
run: |
38+
set -euo pipefail
39+
mkdir -p /tmp/gh-aw/token-audit
40+
41+
echo "\U0001F4E5 Downloading Claude workflow logs (last 7 days)..."
42+
43+
LOGS_EXIT=0
44+
gh aw logs \
45+
--engine claude \
46+
--start-date -7d \
47+
--json \
48+
-c 50 \
49+
> /tmp/gh-aw/token-audit/claude-logs.json || LOGS_EXIT=$?
50+
51+
if [ -s /tmp/gh-aw/token-audit/claude-logs.json ]; then
52+
TOTAL=$(jq '.runs | length' /tmp/gh-aw/token-audit/claude-logs.json)
53+
echo "\u2705 Downloaded $TOTAL Claude workflow runs (last 7 days)"
54+
if [ "$LOGS_EXIT" -ne 0 ]; then
55+
echo "\u26a0\ufe0f gh aw logs exited with code $LOGS_EXIT (partial results)"
56+
fi
57+
else
58+
echo "\u274c No log data downloaded (exit code $LOGS_EXIT)"
59+
echo '{"runs":[],"summary":{}}' > /tmp/gh-aw/token-audit/claude-logs.json
60+
fi
3161
---
3262

3363
# Daily Claude Token Optimization Advisor
@@ -57,7 +87,7 @@ From the report's **Workflow Summary** table, identify the workflow with:
5787

5888
Extract these key metrics for the target workflow:
5989
- Total tokens per run
60-
- Cache hit rate (read and write separately Anthropic exposes both)
90+
- Cache hit rate (read and write separately \u2014 Anthropic exposes both)
6191
- Cache write rate
6292
- Input/output ratio
6393
- Number of LLM turns (request count)
@@ -81,53 +111,32 @@ cat "$WORKFLOW_FILE"
81111
```
82112

83113
Analyze:
84-
- **Tools loaded** List all tools in the `tools:` section. Flag any that may not be needed.
85-
- **Network groups** List network groups in `network.allowed:`. Flag unused ones.
86-
- **Prompt length** Estimate the markdown body size. Is it verbose?
87-
- **Pre-agent steps** Does it use `steps:` to pre-compute deterministic work?
88-
- **Post-agent steps** Does it use `post-steps:` for validation?
114+
- **Tools loaded** \u2014 List all tools in the `tools:` section. Flag any that may not be needed.
115+
- **Network groups** \u2014 List network groups in `network.allowed:`. Flag unused ones.
116+
- **Prompt length** \u2014 Estimate the markdown body size. Is it verbose?
117+
- **Pre-agent steps** \u2014 Does it use `steps:` to pre-compute deterministic work?
118+
- **Post-agent steps** \u2014 Does it use `post-steps:` for validation?
89119

90-
## Step 4: Analyze Recent Run Artifacts
120+
## Step 4: Analyze Recent Run Data
91121

92-
Download the most recent successful run's artifacts to understand actual tool usage:
122+
The pre-agent step downloaded the last 7 days of Claude workflow logs to `/tmp/gh-aw/token-audit/claude-logs.json`. Filter this data for the target workflow:
93123

94124
```bash
95-
# Find the latest successful run using the resolved workflow file
96-
LOCK_FILE="$(basename "$WORKFLOW_FILE" .md).lock.yml"
97-
RUN_ID=$(gh run list --repo "$GITHUB_REPOSITORY" \
98-
--workflow "$LOCK_FILE" \
99-
--status success --limit 1 \
100-
--json databaseId --jq '.[0].databaseId')
101-
102-
if [ -z "$RUN_ID" ] || [ "$RUN_ID" = "null" ]; then
103-
echo "No successful runs found for $LOCK_FILE — skipping artifact analysis"
104-
else
105-
# Download artifacts
106-
TMPDIR=$(mktemp -d)
107-
gh run download "$RUN_ID" --repo "$GITHUB_REPOSITORY" \
108-
--name agent-artifacts --dir "$TMPDIR" 2>/dev/null || \
109-
gh run download "$RUN_ID" --repo "$GITHUB_REPOSITORY" \
110-
--name agent --dir "$TMPDIR" 2>/dev/null
111-
112-
# Check token usage
113-
find "$TMPDIR" -name "token-usage.jsonl" -exec cat {} \;
114-
115-
# Check agent stdio log for tool calls (|| true to handle no matches)
116-
find "$TMPDIR" -name "agent-stdio.log" -exec grep -h "^●" {} \; || true
117-
118-
# Check prompt size
119-
find "$TMPDIR" -name "prompt.txt" -exec wc -c {} \;
120-
fi
125+
# Extract runs for the target workflow
126+
cat /tmp/gh-aw/token-audit/claude-logs.json | \
127+
jq --arg name "$WORKFLOW_NAME" '[.runs[] | select(.workflow_name == $name)]'
121128
```
122129

123-
From the artifacts, determine:
124-
- **Which tools were actually called** vs which are loaded
125-
- **How many LLM turns** were used
126-
- **Per-turn token breakdown** (first turn is usually the most expensive)
127-
- **Cache write vs cache read ratio** — Anthropic charges 12.5x more for cache writes than reads
128-
- **Whether cache writes are amortized** — Are they reused across subsequent turns?
130+
From the run data, determine:
131+
- **Per-run token breakdown** (token_usage, estimated_cost per run)
132+
- **Average turns** per run
133+
- **Error/warning patterns**
134+
- **Token usage summary** (per-model breakdown from `token_usage_summary` if available)
135+
- **Cache write vs read ratio** \u2014 Anthropic charges 12.5x more for cache writes than reads
136+
137+
Also check the `tool_usage` and `mcp_tool_usage` fields in the JSON to identify which tools are actually being used vs loaded.
129138

130-
Clean up: `rm -rf "$TMPDIR"`
139+
Clean up is not needed \u2014 data is pre-downloaded to /tmp.
131140

132141
## Step 5: Generate Optimization Recommendations
133142

@@ -137,18 +146,16 @@ Produce **specific, implementable recommendations** based on these patterns:
137146
If many tools are loaded but few are used:
138147
- List which tools to remove from `tools:` in the workflow `.md`
139148
- Estimate token savings (each tool schema is ~500-700 tokens)
140-
- Example: "Remove `agentic-workflows:`, `web-fetch:` saves ~15K tokens/turn"
149+
- Example: "Remove `agentic-workflows:`, `web-fetch:` \u2014 saves ~15K tokens/turn"
141150

142151
### Pre-Agent Steps
143152
If the workflow does deterministic work (API calls, file creation, data fetching) inside the agent:
144153
- Identify which operations could move to `steps:` (pre-agent)
145154
- Show example `steps:` configuration
146-
- Example: "Move `gh pr list` to a pre-step, inject results via `${{ steps.X.outputs.Y }}`"
147155

148156
### Prompt Optimization
149157
If the prompt is verbose or contains data the agent doesn't need:
150158
- Suggest specific cuts or rewrites
151-
- Example: "Replace 15-line test instructions with 3-line summary referencing pre-computed results"
152159

153160
### GitHub Toolset Restriction
154161
If `github:` tools are loaded without `toolsets:` restriction:
@@ -163,8 +170,7 @@ If unused network groups are configured (e.g., `node`, `playwright`):
163170
Anthropic charges significantly more for cache writes than reads:
164171
- Cache write: $3.75/M tokens (Sonnet), cache read: $0.30/M tokens
165172
- If cache writes are high but not reused across turns, the caching cost may exceed the benefit
166-
- Check if `cache_write_tokens` from Turn 1 are reflected as `cache_read_tokens` in Turn 2+
167-
- If prompts change substantially between turns, caching provides no benefit
173+
- Check if prompts change substantially between turns
168174

169175
### Cache Read Optimization
170176
If cache hit rate is low (<50%):
@@ -174,7 +180,7 @@ If cache hit rate is low (<50%):
174180

175181
## Step 6: Create the Optimization Issue
176182

177-
Create an issue with title: `YYYY-MM-DD <workflow-name>`
183+
Create an issue with title: `YYYY-MM-DD \u2014 <workflow-name>`
178184

179185
Body structure:
180186

@@ -248,11 +254,11 @@ Body structure:
248254

249255
## Important Guidelines
250256

251-
- **Be concrete** Every recommendation must include specific file changes, not just "reduce tools"
252-
- **Estimate savings** Quantify each recommendation in tokens and percentage
253-
- **Prioritize by impact** Order recommendations from highest to lowest token savings
254-
- **Include implementation steps** Someone should be able to follow your recommendations without additional research
255-
- **Reference the report** Link back to the source token usage report issue
256-
- **One workflow per issue** Focus on the single most expensive workflow
257-
- **Anthropic-specific insights** Leverage cache write data that Copilot workflows don't expose
258-
- **Clean up** temporary files after analysis
257+
- **Be concrete** \u2014 Every recommendation must include specific file changes, not just "reduce tools"
258+
- **Estimate savings** \u2014 Quantify each recommendation in tokens and percentage
259+
- **Prioritize by impact** \u2014 Order recommendations from highest to lowest token savings
260+
- **Include implementation steps** \u2014 Someone should be able to follow your recommendations without additional research
261+
- **Reference the report** \u2014 Link back to the source token usage report issue
262+
- **One workflow per issue** \u2014 Focus on the single most expensive workflow
263+
- **Anthropic-specific insights** \u2014 Leverage cache write data that Copilot workflows don't expose
264+
- **Use pre-downloaded data** \u2014 All run data is at `/tmp/gh-aw/token-audit/claude-logs.json`. Do not download artifacts manually.

0 commit comments

Comments
 (0)