fix: deterministic audit metrics via run_summary.json cache and workflow-logs/ exclusion#26148
fix: deterministic audit metrics via run_summary.json cache and workflow-logs/ exclusion#26148
Conversation
When AuditWorkflowRun is called for a run that already has a valid run_summary.json on disk, return the cached report immediately without re-processing local log files. Re-processing the same files can produce different results (e.g. when the GitHub API aggregates token usage from a different source than the locally-stored firewall proxy logs), causing the token_usage and turns metrics to differ across calls for the same run. The fix: - Add a cache-check early path in AuditWorkflowRun: if loadRunSummary succeeds, reconstruct processedRun from the summary and render without touching any API or log files. - Extract the shared render + finalization logic into renderAuditReport, called by both the cache-hit path and the fresh-download path. - The fresh-download path now saves the summary BEFORE calling renderAuditReport, so the cache is written as early as possible. - The run_summary.json is not overwritten on cached calls, ensuring the first-processed (firewall-based) metrics are preserved. Cache invalidation is already handled: loadRunSummary rejects summaries whose CLIVersion differs from the current binary version, so bug fixes in log parsing are automatically picked up after a tool upgrade. Adds two new unit tests: - TestAuditUsesRunSummaryCache: verifies that cached token_usage and turns are preserved and run_summary.json is not modified. - TestRenderAuditReportUsesProvidedMetrics: exercises the new helper. Fixes: Audit shows inconsistent metrics on repeated calls for same run Agent-Logs-Url: https://github.com/github/gh-aw/sessions/f8e72352-eed9-4bb9-bc4b-7c856bcb8bbf Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
@copilot review why different code paths return different number, refactor and fix, add tests |
…okens The log-file walk in extractLogMetrics was not excluding the workflow-logs/ directory, which is populated by downloadWorkflowRunLogs with GitHub Actions step-output files. These files capture the combined stdout/stderr of each step including the agent's own log lines, so the same token-usage JSON blocks that already appear in the agent artifact (agent-stdio.log / agent.log) were being found again and their token counts were summed a second (or nth) time. Call 1 of audit (run ID only): downloadWorkflowRunLogs may fail silently because owner/repo is resolved from the local git remote (may not match), so workflow-logs/ is absent → only agent.log is counted → 381k tokens (correct). Call 2 of audit (full URL with owner/repo): downloadWorkflowRunLogs succeeds, so workflow-logs/ is populated with step logs → the same agent data is encountered ~12 more times → 4.7M tokens (12× inflated). Fix: return filepath.SkipDir when the walk visits a directory named "workflow-logs", so only the agent artifact files contribute to the metrics. Adds TestExtractLogMetricsExcludesWorkflowLogsDir as a regression test that explicitly verifies the fix: agent.log is counted once (1000 tokens), and copies inside workflow-logs/ are ignored even when they pass the filename filter. Agent-Logs-Url: https://github.com/github/gh-aw/sessions/3c72016d-4028-4648-88f8-6ab62aa01277 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/3c72016d-4028-4648-88f8-6ab62aa01277 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Root cause identified and fixed in commit Why the two paths returned different numbers:
The inconsistency came from Call 1 (run ID only) → Fixes:
|
auditreturned inconsistenttoken_usageandturnson repeated calls for the same run (observed: 9 turns / 381k tokens vs 22 turns / 4.7M tokens) due to two compounding issues.Root Cause
The inconsistency had two causes:
Re-processing on every call:
AuditWorkflowRunalways re-processed local log files even when a validrun_summary.jsonwas on disk.Double-counting via
workflow-logs/: The.logfile walk inextractLogMetricswas not excluding theworkflow-logs/directory. Whenauditis called with a full URL (owner+repo),downloadWorkflowRunLogssuccessfully downloads GitHub Actions step-output intoworkflow-logs/. These step captures include the agent's full stdout, so the same token-usage JSON blocks already present inagent-stdio.logappeared again (~12 times) and were summed intometrics.TokenUsage— causing the ~12× inflation.Changes
Cache-hit early path in
AuditWorkflowRun: before any API calls or log processing, checkloadRunSummary. On a hit, reconstructprocessedRunfrom the cached summary and return immediately — no re-download, no re-parse, no API calls.renderAuditReporthelper: extracted the shared render + finalization logic (comparison snapshot, JSON/console output,--parsehandling, success message) into a new function called by both the cache-hit path and the fresh-download path.Save summary before rendering: in the fresh-download path,
saveRunSummaryis now called beforerenderAuditReportso the cache is persisted as early as possible.Skip
workflow-logs/in log walk:extractLogMetricsnow returnsfilepath.SkipDirwhen the walk visits a directory namedworkflow-logs, preventing GitHub Actions runner captures from being counted alongside the dedicated agent artifact logs.Cache invalidation is unchanged —
loadRunSummaryalready rejects summaries whoseCLIVersiondoesn't match the running binary, so improvements in log parsing are picked up automatically after upgrades.Testing
Added
TestExtractLogMetricsExcludesWorkflowLogsDiras a regression test: creates a run directory withagent.log(1000 tokens) plus copies of the same data underworkflow-logs/, and asserts the total remains 1000 — confirmingworkflow-logs/files are ignored.