Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ __pycache__/
.mypy_cache/
.ruff_cache/
.coverage*
node_modules/
playwright-report/
test-results/
build/
dist/
*.egg-info/
Expand Down
6 changes: 4 additions & 2 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ Codex Usage Tracker is a local sidecar app. It reads aggregate token counters fr
- `costing.py`, `pricing_config.py`, `pricing_openai.py`, `pricing_estimates.py`, and `allowance.py` own cost, credit, rate-card, and allowance annotation. Keep estimate confidence and source metadata attached to rows.
- `projects.py`, `threads.py`, and `recommendations.py` annotate aggregate rows with project identity, thread relationships, and actionable signals. Project privacy redaction also belongs in `projects.py` so CLI, MCP, dashboard, CSV, and support-bundle surfaces share the same behavior.
- `dashboard.py` builds aggregate-only static dashboard payloads and writes HTML/assets. `server.py` adds localhost refresh, the compatibility `/api/usage` endpoint, SQL-backed live API slices, and explicit lazy context loading.
- `plugin_data/dashboard/dashboard_format.js` owns dashboard formatting primitives. `dashboard_data.js` owns row payload and thread relationship helpers. `dashboard_analysis.js` owns scoring, sorting, recommendation, and thread grouping logic. `dashboard_cells.js` owns reusable table/cell HTML helpers. `dashboard_details.js` owns sidebar detail and thread narrative rendering. `dashboard_insights.js` owns insight cards and investigation preset UI. `dashboard_tables.js` owns Calls, Threads, and expanded thread-call table rendering. `dashboard_diagnostics.js` owns the Diagnostics tab that consumes `/api/diagnostics/*` aggregate payloads. `dashboard_filters.js` owns date range parsing and row date matching. `dashboard_state.js` owns URL, CSV, and download state utilities. `dashboard_i18n.js`, `dashboard_payload_cache.js`, and `dashboard_tooltips.js` own localization, session aggregate cache, and fast tooltip helpers. `dashboard_call_investigator.js` owns the dedicated call drilldown surface. `dashboard.js` owns top-level DOM rendering, event handling, and API refresh orchestration.
- `diagnostic_snapshots.py` owns persisted diagnostic snapshot refresh/load orchestration. `diagnostic_snapshot_analysis.py`, `diagnostic_snapshot_events.py`, `diagnostic_snapshot_rows.py`, and `diagnostic_snapshot_concentration.py` own source-log aggregation, safe event parsing, row shaping, and concentration math. `diagnostic_snapshot_report.py` owns CLI rendering. Keep these modules synthetic-testable and aggregate-only.
- `plugin_data/dashboard/dashboard_format.js` owns dashboard formatting primitives. `dashboard_data.js` owns row payload and thread relationship helpers. `dashboard_analysis.js` owns scoring, sorting, recommendation, and thread grouping logic. `dashboard_cells.js` owns reusable table/cell HTML helpers. `dashboard_details.js` owns sidebar detail and thread narrative rendering. `dashboard_insights.js` owns insight cards and investigation preset UI. `dashboard_tables.js` owns Calls, Threads, and expanded thread-call table rendering. `dashboard_diagnostics.js` coordinates the Diagnostics tab data flow and events, `dashboard_diagnostics_snapshots.js` renders on-demand snapshot panels, and `dashboard_diagnostics_facts.js` renders the fact tables and drilldowns. `dashboard_filters.js` owns date range parsing and row date matching. `dashboard_state.js` owns URL, CSV, and download state utilities. `dashboard_i18n.js`, `dashboard_payload_cache.js`, and `dashboard_tooltips.js` own localization, session aggregate cache, and fast tooltip helpers. `dashboard_call_investigator.js` owns the dedicated call drilldown surface. `dashboard.js` owns top-level DOM rendering, event handling, and API refresh orchestration.
- `context.py` is the only normal path that reads raw log context, and it does so only for one selected record on demand with redaction and size limits. Its default quick mode omits tool output and serialized groups; full serialized JSONL group analysis is explicit.
- `plugin_installer.py`, `.mcp.json`, `skills/`, and `scripts/check_release.py` own install and packaging behavior.
- `scripts/benchmark_synthetic_history.py` owns generated large-history query timing and threshold enforcement for 10k, 100k, and 500k aggregate-row fixtures. Its optional `--with-source-logs` mode writes synthetic JSONL source logs to time explicit context loading and to guard normal dashboard payload assembly against source-log reads. It must stay synthetic-only and must not read real Codex logs.
Expand All @@ -26,10 +27,11 @@ Codex Usage Tracker is a local sidecar app. It reads aggregate token counters fr
1. Add new persisted usage-event metrics through `UsageEvent`, `schema.py`, migrations, store queries, dashboard payload tests, and CSV/export checks. Add auxiliary aggregate tables such as `thread_summaries` or `source_files` through `store.py` migrations plus focused migration/privacy tests.
2. Add new report views through `reports.py` first, then wire CLI and MCP wrappers to that shared service.
3. Add new machine-readable outputs through `api_payloads.py` or report payload methods with a `schema` value, a `json_contracts.py` entry, and focused tests.
4. Add dashboard-only interactions in `plugin_data/dashboard/dashboard.js` and keep URL state in `dashboard_state.js`.
4. Add dashboard-only interactions in the narrowest dashboard module and keep URL state in `dashboard_state.js`. Diagnostics snapshot panels should stay in `dashboard_diagnostics_snapshots.js`; fact tables should stay in `dashboard_diagnostics_facts.js`.
5. Keep all examples, screenshots, mocks, and tests synthetic. Never derive fixtures from real logs.
6. When editing skill instructions, update both the source `skills/...` file and the bundled `src/codex_usage_tracker/plugin_data/skills/...` copy. `scripts/check_release.py` verifies that installable plugin assets stay complete and synced.
7. When adding fields derived from `cwd`, Git metadata, source paths, or log-event metadata, decide how they behave in `normal`, `redacted`, and `strict` privacy modes before exposing them in dashboard, JSON, CSV, MCP, or support-bundle output.
8. Diagnostic snapshot refresh must remain explicit and on demand. Normal usage refresh paths may load stored snapshots, but they must not rescan source logs for diagnostic sections unless the user calls a diagnostics `--refresh` command or a `/api/diagnostics/<section>/refresh` endpoint.

## Validation

Expand Down
259 changes: 259 additions & 0 deletions docs/cli-json-schemas.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,12 @@ Tracked schema ids:
| `codex-usage-tracker-query-v1` | CLI `query`, MCP `usage_query(...)` |
| `codex-usage-tracker-recommendations-v1` | CLI `recommendations --json`, MCP `usage_recommendations(response_format="json")` |
| `codex-usage-tracker-diagnostics-v1` | CLI `diagnostics ... --json`, dashboard server `/api/diagnostics/*` |
| `codex-usage-tracker-diagnostic-overview-v1` | CLI `diagnostics overview --json`, dashboard server `/api/diagnostics/overview` |
| `codex-usage-tracker-diagnostic-tool-output-v1` | CLI `diagnostics tool-output --json`, dashboard server `/api/diagnostics/tool-output` |
| `codex-usage-tracker-diagnostic-commands-v1` | CLI `diagnostics commands --json`, dashboard server `/api/diagnostics/commands` |
| `codex-usage-tracker-diagnostic-file-reads-v1` | CLI `diagnostics file-reads --json`, dashboard server `/api/diagnostics/file-reads` |
| `codex-usage-tracker-diagnostic-read-productivity-v1` | CLI `diagnostics read-productivity --json`, dashboard server `/api/diagnostics/read-productivity` |
| `codex-usage-tracker-diagnostic-concentration-v1` | CLI `diagnostics concentration --json`, dashboard server `/api/diagnostics/concentration` |
| `codex-usage-tracker-session-v1` | CLI `session --json`, MCP `session_usage(response_format="json")` |
| `codex-usage-tracker-context-v1` | CLI `context`, MCP `usage_call_context` when raw context is explicitly enabled |
| `codex-usage-tracker-context-disabled-v1` | MCP `usage_call_context` when raw context is disabled |
Expand Down Expand Up @@ -281,6 +287,259 @@ Schema: `codex-usage-tracker-diagnostics-v1`

Diagnostics payloads report aggregate structured facts such as compaction, tool/function/MCP activity, command families, structured skill labels, search/read loops, and outcome events. They do not include prompts, assistant messages, tool arguments, tool output, patch text, raw commands, command arguments, file contents, or JSONL fragments. Token totals are associated with facts observed before a token-count row; they are not causal allocations.

Diagnostic snapshots use separate section endpoints instead of one large read payload. `GET` returns the latest stored section snapshot or `status: "missing"`; `POST /api/diagnostics/<section>/refresh` recomputes and replaces only that section. The dashboard button calls `POST /api/diagnostics/refresh`, which returns a small wrapper with `sections` and recomputes source-log-derived sections with one shared analyzer pass. This keeps ordinary dashboard refresh fast and prevents source-log rescans unless a diagnostics refresh is explicit.

## Diagnostic Overview Snapshot

Commands:

```bash
codex-usage-tracker diagnostics overview --json
codex-usage-tracker diagnostics overview --refresh --json
```

Dashboard server API:

- `GET /api/diagnostics/overview`
- `POST /api/diagnostics/overview/refresh`

Schema: `codex-usage-tracker-diagnostic-overview-v1`

```json
{
"schema": "codex-usage-tracker-diagnostic-overview-v1",
"section": "overview",
"status": "ready",
"refreshed": false,
"raw_context_included": false,
"snapshot": {
"computed_at": "2026-06-20T18:00:00+00:00",
"history_scope": "active",
"source_logs_scanned": 3,
"usage_rows_scanned": 10,
"raw_content_included": false
},
"overview": {
"usage_rows": 10,
"total_tokens": 12345,
"cached_input_tokens": 9000,
"uncached_input_tokens": 2000,
"cache_ratio": 0.75
},
"notes": []
}
```

The overview snapshot is recomputed only when explicitly refreshed. Ordinary dashboard usage refreshes do not update diagnostic snapshots.

## Diagnostic Tool Output Snapshot

Commands:

```bash
codex-usage-tracker diagnostics tool-output --json
codex-usage-tracker diagnostics tool-output --refresh --json
```

Dashboard server API:

- `GET /api/diagnostics/tool-output`
- `POST /api/diagnostics/tool-output/refresh`

Schema: `codex-usage-tracker-diagnostic-tool-output-v1`

```json
{
"schema": "codex-usage-tracker-diagnostic-tool-output-v1",
"section": "tool-output",
"status": "ready",
"refreshed": false,
"raw_context_included": false,
"snapshot": {},
"summary": {
"function_calls": 1,
"function_outputs": 1,
"outputs_with_original_token_count": 1,
"outputs_missing_original_token_count": 0,
"original_token_sum": 42
},
"functions": [],
"command_roots": [],
"missing_reasons": [],
"notes": []
}
```

The tool-output snapshot stores function names, conservative command roots, numeric counts, and terminal `Original token count` totals. It does not store raw tool output or command text.

## Diagnostic Commands Snapshot

Commands:

```bash
codex-usage-tracker diagnostics commands --json
codex-usage-tracker diagnostics commands --refresh --json
```

Dashboard server API:

- `GET /api/diagnostics/commands`
- `POST /api/diagnostics/commands/refresh`

Schema: `codex-usage-tracker-diagnostic-commands-v1`

```json
{
"schema": "codex-usage-tracker-diagnostic-commands-v1",
"section": "commands",
"status": "ready",
"refreshed": false,
"raw_context_included": false,
"snapshot": {},
"summary": {
"shell_function_calls": 1,
"command_root_count": 1,
"missing_command": 0
},
"commands": [
{
"root": "git",
"total": 1,
"children": [{"child": "status", "count": 1}]
}
],
"notes": []
}
```

The commands snapshot keeps only command roots and a bounded list of safe one-level child labels such as `status`, `diff`, or `-m:pytest`.

## Diagnostic File Reads Snapshot

Commands:

```bash
codex-usage-tracker diagnostics file-reads --json
codex-usage-tracker diagnostics file-reads --refresh --json
```

Dashboard server API:

- `GET /api/diagnostics/file-reads`
- `POST /api/diagnostics/file-reads/refresh`

Schema: `codex-usage-tracker-diagnostic-file-reads-v1`

```json
{
"schema": "codex-usage-tracker-diagnostic-file-reads-v1",
"section": "file-reads",
"status": "ready",
"refreshed": false,
"raw_context_included": false,
"snapshot": {},
"summary": {
"read_commands": 1,
"read_events": 1,
"unique_paths_read": 1,
"read_events_with_output_count": 1,
"read_events_missing_output_count": 0,
"allocated_output_token_sum": 42
},
"by_reader": [],
"top_paths": [],
"largest_read_commands": [],
"path_privacy": {},
"notes": []
}
```

The file-reads snapshot classifies common shell readers such as `cat`, `sed`, `nl`, `rg`, and `find`. Path labels are basename-only with a short irreversible hash; raw commands, command arguments, absolute paths, file contents, and tool output are not stored.

## Diagnostic Read Productivity Snapshot

Commands:

```bash
codex-usage-tracker diagnostics read-productivity --json
codex-usage-tracker diagnostics read-productivity --refresh --json
```

Dashboard server API:

- `GET /api/diagnostics/read-productivity`
- `POST /api/diagnostics/read-productivity/refresh`

Schema: `codex-usage-tracker-diagnostic-read-productivity-v1`

```json
{
"schema": "codex-usage-tracker-diagnostic-read-productivity-v1",
"section": "read-productivity",
"status": "ready",
"refreshed": false,
"raw_context_included": false,
"snapshot": {},
"summary": {
"read_events": 1,
"read_events_modified_later": 1,
"read_events_modified_later_pct": 1.0,
"unique_paths_read": 1,
"unique_paths_modified_later": 1,
"unique_path_modified_later_pct": 1.0,
"correlation_note": "Read-to-modify counts are temporal correlations."
},
"by_reader": [],
"top_modified_paths": [],
"path_privacy": {},
"notes": []
}
```

Read productivity is a temporal correlation, not causation. A read is counted as modified later only when the same privacy-preserving path key appears in a later structured patch event in the same source log.

## Diagnostic Concentration Snapshot

Commands:

```bash
codex-usage-tracker diagnostics concentration --json
codex-usage-tracker diagnostics concentration --refresh --json
```

Dashboard server API:

- `GET /api/diagnostics/concentration`
- `POST /api/diagnostics/concentration/refresh`

Schema: `codex-usage-tracker-diagnostic-concentration-v1`

```json
{
"schema": "codex-usage-tracker-diagnostic-concentration-v1",
"section": "concentration",
"status": "ready",
"refreshed": false,
"raw_context_included": false,
"snapshot": {},
"summary": {
"usage_rows": 4,
"total_tokens": 100,
"dimension_count": 3,
"history_scope": "active"
},
"metrics": [
{"metric": "top_1_source_log_share", "dimension": "source_log", "top_n": 1, "share": 0.5}
],
"dimensions": [],
"largest_impact_rows": [],
"privacy": {},
"notes": []
}
```

The concentration snapshot computes top-1/top-3/top-5 share and effective group count by source log/session, cwd/project label, and day. Metric ids such as `top_1_source_log_share` are stable JSON contract fields; dashboard views should render them as reader-facing labels. Source log labels use session-id prefixes or source hashes, cwd labels use basename-only labels, and raw source paths/cwd paths are not included.

## Pricing Coverage

Command:
Expand Down
12 changes: 11 additions & 1 deletion docs/cli-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,12 +120,22 @@ codex-usage-tracker diagnostics summary
codex-usage-tracker diagnostics facts --sort uncached
codex-usage-tracker diagnostics compactions
codex-usage-tracker diagnostics tools
codex-usage-tracker diagnostics overview --refresh
codex-usage-tracker diagnostics tool-output --refresh
codex-usage-tracker diagnostics commands --refresh
codex-usage-tracker diagnostics file-reads --refresh
codex-usage-tracker diagnostics read-productivity --refresh
codex-usage-tracker diagnostics concentration --refresh
codex-usage-tracker diagnostics fact-calls --fact-type compaction --fact-name post_compaction
```

Diagnostics expose structured event patterns and their associated token totals. They can show compactions, tool/function/MCP activity, safe command families, structured skill labels, patch outcomes, task completion, search/read loops, and aborted or rolled-back turns. Associated totals are not causal allocations and are not additive when one model call has multiple diagnostic facts.

Diagnostic payloads are aggregate-only. They do not include prompts, assistant text, tool arguments, tool output, patch text, raw commands, command arguments, file contents, or JSONL fragments.
Snapshot diagnostics are persisted aggregate reports. Without `--refresh`, snapshot commands return the latest stored payload or a `missing` status. With `--refresh`, they recompute from indexed source logs and replace the stored section snapshot. Ordinary `refresh`, `open-dashboard`, and dashboard `Refresh` update usage rows only; they do not recompute diagnostic snapshots.

The snapshot sections answer different questions: `overview` summarizes usage rows and aggregate token totals, `tool-output` counts functions and terminal `Original token count` coverage, `commands` keeps command roots plus bounded safe child labels, `file-reads` counts reader/path activity and allocated read-output tokens, `read-productivity` reports later-edit correlations for matching path keys, and `concentration` shows top-N token share by source/session, cwd/project, and day.

Diagnostic payloads are aggregate-only. They do not include prompts, assistant text, tool arguments, tool output, patch text, raw commands, command arguments, file contents, raw absolute paths, or JSONL fragments. File-read diagnostics use basename-only path labels plus short irreversible hashes, read-productivity percentages are temporal correlations rather than proof that a read caused a later edit, and concentration reports use safe source/session, cwd, and day labels only.

## JSON Queries

Expand Down
Loading
Loading