fix: deterministic audit metrics via run_summary.json cache and workflow-logs/ exclusion by Copilot · Pull Request #26148 · github/gh-aw

Copilot · 2026-04-14T03:46:25Z

audit returned inconsistent token_usage and turns on repeated calls for the same run (observed: 9 turns / 381k tokens vs 22 turns / 4.7M tokens) due to two compounding issues.

Root Cause

The inconsistency had two causes:

Re-processing on every call: AuditWorkflowRun always re-processed local log files even when a valid run_summary.json was on disk.
Double-counting via workflow-logs/: The .log file walk in extractLogMetrics was not excluding the workflow-logs/ directory. When audit is called with a full URL (owner+repo), downloadWorkflowRunLogs successfully downloads GitHub Actions step-output into workflow-logs/. These step captures include the agent's full stdout, so the same token-usage JSON blocks already present in agent-stdio.log appeared again (~12 times) and were summed into metrics.TokenUsage — causing the ~12× inflation.

Changes

Cache-hit early path in AuditWorkflowRun: before any API calls or log processing, check loadRunSummary. On a hit, reconstruct processedRun from the cached summary and return immediately — no re-download, no re-parse, no API calls.
renderAuditReport helper: extracted the shared render + finalization logic (comparison snapshot, JSON/console output, --parse handling, success message) into a new function called by both the cache-hit path and the fresh-download path.
Save summary before rendering: in the fresh-download path, saveRunSummary is now called before renderAuditReport so the cache is persisted as early as possible.
Skip workflow-logs/ in log walk: extractLogMetrics now returns filepath.SkipDir when the walk visits a directory named workflow-logs, preventing GitHub Actions runner captures from being counted alongside the dedicated agent artifact logs.

Cache invalidation is unchanged — loadRunSummary already rejects summaries whose CLIVersion doesn't match the running binary, so improvements in log parsing are picked up automatically after upgrades.

// New early-return in AuditWorkflowRun (cache hit path)
if summary, ok := loadRunSummary(runOutputDir, verbose); ok {
    processedRun := ProcessedRun{
        Run: summary.Run, /* ... all fields from summary */
    }
    return renderAuditReport(processedRun, summary.Metrics, summary.MCPToolUsage, ...)
}
// Fresh-download path: process → saveRunSummary → renderAuditReport

// workflow-logs/ is now skipped in the log walk
if info.IsDir() {
    if info.Name() == "workflow-logs" {
        return filepath.SkipDir
    }
    return nil
}

Testing

Added TestExtractLogMetricsExcludesWorkflowLogsDir as a regression test: creates a run directory with agent.log (1000 tokens) plus copies of the same data under workflow-logs/, and asserts the total remains 1000 — confirming workflow-logs/ files are ignored.

When AuditWorkflowRun is called for a run that already has a valid run_summary.json on disk, return the cached report immediately without re-processing local log files. Re-processing the same files can produce different results (e.g. when the GitHub API aggregates token usage from a different source than the locally-stored firewall proxy logs), causing the token_usage and turns metrics to differ across calls for the same run. The fix: - Add a cache-check early path in AuditWorkflowRun: if loadRunSummary succeeds, reconstruct processedRun from the summary and render without touching any API or log files. - Extract the shared render + finalization logic into renderAuditReport, called by both the cache-hit path and the fresh-download path. - The fresh-download path now saves the summary BEFORE calling renderAuditReport, so the cache is written as early as possible. - The run_summary.json is not overwritten on cached calls, ensuring the first-processed (firewall-based) metrics are preserved. Cache invalidation is already handled: loadRunSummary rejects summaries whose CLIVersion differs from the current binary version, so bug fixes in log parsing are automatically picked up after a tool upgrade. Adds two new unit tests: - TestAuditUsesRunSummaryCache: verifies that cached token_usage and turns are preserved and run_summary.json is not modified. - TestRenderAuditReportUsesProvidedMetrics: exercises the new helper. Fixes: Audit shows inconsistent metrics on repeated calls for same run Agent-Logs-Url: https://github.com/github/gh-aw/sessions/f8e72352-eed9-4bb9-bc4b-7c856bcb8bbf Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

pelikhan · 2026-04-14T04:18:07Z

@copilot review why different code paths return different number, refactor and fix, add tests

…okens The log-file walk in extractLogMetrics was not excluding the workflow-logs/ directory, which is populated by downloadWorkflowRunLogs with GitHub Actions step-output files. These files capture the combined stdout/stderr of each step including the agent's own log lines, so the same token-usage JSON blocks that already appear in the agent artifact (agent-stdio.log / agent.log) were being found again and their token counts were summed a second (or nth) time. Call 1 of audit (run ID only): downloadWorkflowRunLogs may fail silently because owner/repo is resolved from the local git remote (may not match), so workflow-logs/ is absent → only agent.log is counted → 381k tokens (correct). Call 2 of audit (full URL with owner/repo): downloadWorkflowRunLogs succeeds, so workflow-logs/ is populated with step logs → the same agent data is encountered ~12 more times → 4.7M tokens (12× inflated). Fix: return filepath.SkipDir when the walk visits a directory named "workflow-logs", so only the agent artifact files contribute to the metrics. Adds TestExtractLogMetricsExcludesWorkflowLogsDir as a regression test that explicitly verifies the fix: agent.log is counted once (1000 tokens), and copies inside workflow-logs/ are ignored even when they pass the filename filter. Agent-Logs-Url: https://github.com/github/gh-aw/sessions/3c72016d-4028-4648-88f8-6ab62aa01277 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/3c72016d-4028-4648-88f8-6ab62aa01277 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-04-14T04:39:40Z

@copilot review why different code paths return different number, refactor and fix, add tests

Root cause identified and fixed in commit d63e3be:

Why the two paths returned different numbers:

extractLogMetrics has two code paths:

events.jsonl path – single authoritative source (Copilot session events) → deterministic
.log file walk fallback – sums TokenUsage across every matched .log/*log*.txt file

The inconsistency came from workflow-logs/: when audit was called with a full URL (owner+repo), downloadWorkflowRunLogs successfully downloaded GitHub Actions step-output into workflow-logs/. These step captures include the agent's full stdout, so the same token-usage JSON blocks already present in agent-stdio.log appeared again (and again) in the step files — each copy was summed, inflating token_usage ~12×.

Call 1 (run ID only) → downloadWorkflowRunLogs may fail/skip when owner/repo isn't in the URL → workflow-logs/ absent → only agent.log counted → 381k tokens ✅
Call 2 (full URL) → downloadWorkflowRunLogs succeeds → workflow-logs/ populated → same data counted ~12 more times → 4.7M tokens ❌

Fixes:

logs_metrics.go: return filepath.SkipDir when the walk visits workflow-logs/, since those are GHA runner logs that duplicate agent artifact content
audit.go: serve repeated audit calls from run_summary.json cache (PR commit 1) so even if the walk were non-deterministic, the first correctly-computed result is always reused
copilot_metrics_fix_test.go: regression test TestExtractLogMetricsExcludesWorkflowLogsDir that asserts workflow-logs/ copies are ignored

Initial plan

e0e6d66

Copilot AI assigned Copilot and pelikhan Apr 14, 2026

Copilot started work on behalf of pelikhan April 14, 2026 03:46 View session

Copilot AI linked an issue Apr 14, 2026 that may be closed by this pull request

[cli-tools-test] Daily CLI Tools Test: Audit shows inconsistent metrics on repeated calls for same run #25985

Open

Copilot AI changed the title ~~[WIP] Fix inconsistent metrics in audit tool for same run~~ fix: use cached run_summary.json to ensure deterministic audit metrics Apr 14, 2026

Copilot AI requested a review from pelikhan April 14, 2026 04:15

Copilot finished work on behalf of pelikhan April 14, 2026 04:15

Copilot started work on behalf of pelikhan April 14, 2026 04:25 View session

Copilot AI and others added 2 commits April 14, 2026 04:37

fix: address code review feedback (comment clarity + file permissions)

44bd32e

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/3c72016d-4028-4648-88f8-6ab62aa01277 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~fix: use cached run_summary.json to ensure deterministic audit metrics~~ fix: deterministic audit metrics via run_summary.json cache and workflow-logs/ exclusion Apr 14, 2026

Copilot finished work on behalf of pelikhan April 14, 2026 04:40

github-actions bot mentioned this pull request Apr 14, 2026

[PR Triage Report] PR Triage Report - 2026-04-14 #26170

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: deterministic audit metrics via run_summary.json cache and workflow-logs/ exclusion#26148

fix: deterministic audit metrics via run_summary.json cache and workflow-logs/ exclusion#26148
Copilot wants to merge 4 commits intomainfrom
copilot/fix-inconsistent-audit-metrics

Copilot AI commented Apr 14, 2026 •

edited

Loading

Uh oh!

pelikhan commented Apr 14, 2026

Uh oh!

Copilot AI commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root Cause

Changes

Testing

Uh oh!

pelikhan commented Apr 14, 2026

Uh oh!

Copilot AI commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Apr 14, 2026 •

edited

Loading