Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 128 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Changelog

All notable changes to repowise are documented here.
This project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.2.0] — 2026-04-07

A large overhaul: faster indexing, smarter doc generation, transactional storage,
new analysis capabilities, and a completely revamped web UI that surfaces every
new signal — all without changing the eight MCP tool surface.

### Added

#### Pipeline & ingestion
- **Parallel indexing.** AST parsing now runs across all CPU cores via
`ProcessPoolExecutor`. Graph construction and git history indexing run
concurrently with `asyncio.gather`. Per-file git history fetched through a
thread executor with a semaphore.
- **RAG-aware doc generation.** Pages are generated in topological order; each
generation prompt now includes summaries of the file's direct dependencies,
pulled from the vector store of already-generated pages.
- **Atomic three-store coordinator.** New `AtomicStorageCoordinator` buffers
writes across SQL, the in-memory dependency graph, and the vector store, then
flushes them as a single transaction. Failure in any store rolls back all three.
- **Dynamic import hint extractors.** The dependency graph now captures edges
that pure AST parsing misses: Django `INSTALLED_APPS` / `ROOT_URLCONF` /
`MIDDLEWARE`, pytest `conftest.py` fixture wiring, and Node/TS path aliases
from `tsconfig.json` and `package.json` `exports`.

#### Analysis
- **Temporal hotspot decay.** New `temporal_hotspot_score` column on
`git_metadata`, computed as `Σ exp(-ln2 · age_days / 180) · min(lines/100, 3)`
per commit. Hotspot ranking now uses this score; commits from a year ago
contribute ~25% as much as commits from today.
- **Percentile ranks via SQL window function.** `recompute_git_percentiles()`
is now a single `PERCENT_RANK() OVER (PARTITION BY repo ORDER BY ...)` UPDATE
instead of an in-Python sort. Faster and correct on large repos.
- **PR blast radius analyzer.** New `PRBlastRadiusAnalyzer` returns direct
risks, transitive affected files, co-change warnings, recommended reviewers,
test gaps, and an overall 0–10 risk score. Surfaced via `get_risk(changed_files=...)`
and a new web page.
- **Security pattern scanner.** Indexing now runs `SecurityScanner` over each
file. Findings (eval/exec, weak crypto, raw SQL string construction,
hardcoded secrets, `pickle.loads`, etc.) are stored in a new
`security_findings` table.
- **Knowledge map.** Top owners, "bus factor 1" knowledge silos (>80% single
owner), and high-centrality "onboarding targets" with thin documentation —
surfaced in `get_overview` and the web overview page.

#### LLM cost tracking
- New `llm_costs` table records every LLM call (model, tokens, USD cost).
- `CostTracker` aggregates session totals; pricing covers Claude 4.6 family,
GPT-4.1 family, and Gemini.
- New `repowise costs` CLI: `--since`, `--by operation|model|day`.
- Indexing progress bar shows a live `Cost: $X.XXXX` counter.

#### MCP tool enhancements (still 8 tools — strictly more capable)
- `get_risk(targets, changed_files=None)` — when `changed_files` is provided,
returns the full PR blast-radius report (transitive affected, co-change
warnings, recommended reviewers, test gaps, overall 0–10 score). Per-file
responses now include `test_gap: bool` and `security_signals: list`.
- `get_overview()` — now includes a `knowledge_map` block (top owners, silos,
onboarding targets).
- `get_dead_code(min_confidence?, include_internals?, include_zombie_packages?)` —
sensitivity controls for false positives in framework-heavy code.

#### REST endpoints (new)
- `GET /api/repos/{id}/costs` and `/costs/summary` — grouped LLM spend.
- `GET /api/repos/{id}/security` — security findings, filterable by file/severity.
- `POST /api/repos/{id}/blast-radius` — PR impact analysis.
- `GET /api/repos/{id}/knowledge-map` — owners / silos / onboarding targets.
- `GET /api/repos/{id}/health/coordinator` — three-store drift status.
- `GET /api/repos/{id}/hotspots` now returns `temporal_hotspot_score` and is
ordered by it.
- `GET /api/repos/{id}/git-metadata` now returns `test_gap`.
- Job SSE stream now emits `actual_cost_usd` (running cost since job start).

#### Web UI (new pages and components)
- **Costs page** — daily bar chart, grouped tables by operation/model/day.
- **Blast Radius page** — paste files (or click hotspot suggestion chips) to
see risk gauge, transitive impact, co-change warnings, reviewers, test gaps.
- **Knowledge Map card** on the overview dashboard.
- **Trend column** on the hotspots table with flame indicator (default sort).
- **Security Panel** in the wiki page right sidebar.
- **"No tests" badge** on wiki pages with no detected test file.
- **System Health card** on the settings page (SQL / Vector / Graph counts +
drift % + status).
- **Live cost indicator** on the generation progress bar.

#### CLI
- `repowise costs [--since DATE] [--by operation|model|day]` — new command.
- `repowise dead-code` — new flags `--min-confidence`, `--include-internals`,
`--include-zombie-packages`, `--no-unreachable`, `--no-unused-exports`.
- `repowise doctor` — new Check #10 reports coordinator drift across all
three stores. `--repair` deletes orphaned vectors and rebuilds missing graph
nodes from SQL.

### Fixed
- C++ dependency resolution edge cases.
- Decision extraction timeout on very large histories.
- Resume / progress bar visibility for oversized files.
- Coordinator `health_check` falsely reporting 100% drift on LanceDB / Pg
vector stores (was returning -1 for the count). Now uses `list_page_ids()`.
- Coordinator `health_check` returning `null` graph node count when no
in-memory `GraphBuilder` is supplied. Now falls back to SQL `COUNT(*)`.

### Internal
- Three new Alembic migrations: `0009_llm_costs`, `0010_temporal_hotspot_score`,
`0011_security_findings`.
- New module: `packages/core/.../persistence/coordinator.py`
- New module: `packages/core/.../ingestion/dynamic_hints/` (5 files)
- New module: `packages/core/.../analysis/pr_blast.py`
- New module: `packages/core/.../analysis/security_scan.py`
- New module: `packages/core/.../generation/cost_tracker.py`
- New module: `packages/server/.../services/knowledge_map.py`

### Compatibility
- Existing repositories must run migrations: `repowise doctor` will detect
the missing tables and prompt; alternatively re-run `repowise init` to
rebuild from scratch.
- The eight MCP tool names and signatures are backwards compatible — new
parameters are all optional.

---

## [0.1.31] — earlier

See git history for releases prior to 0.2.0.
28 changes: 21 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,11 +94,11 @@ Most tools are designed around data entities — one module, one file, one symbo
|---|---|---|
| `get_overview()` | Architecture summary, module map, entry points | First call on any unfamiliar codebase |
| `get_context(targets, include?)` | Docs, ownership, decisions, freshness for any targets — files, modules, or symbols | Before reading or modifying code. Pass all relevant targets in one call. |
| `get_risk(targets)` | Hotspot scores, dependents, co-change partners, plain-English risk summary | Before modifying files — understand what could break |
| `get_risk(targets?, changed_files?)` | Hotspot scores, dependents, co-change partners, blast radius, recommended reviewers, test gaps, security signals, 0–10 risk score | Before modifying files — understand what could break |
| `get_why(query?)` | Three modes: NL search over decisions · path-based decisions for a file · no-arg health dashboard | Before architectural changes — understand existing intent |
| `search_codebase(query)` | Semantic search over the full wiki. Natural language. | When you don't know where something lives |
| `get_dependency_path(from, to)` | Connection path between two files, modules, or symbols | When tracing how two things are connected |
| `get_dead_code()` | Unreachable code sorted by confidence and cleanup impact | Cleanup tasks |
| `get_dead_code(min_confidence?, include_internals?, include_zombie_packages?)` | Unreachable code sorted by confidence and cleanup impact | Cleanup tasks |
| `get_architecture_diagram(module?)` | Mermaid diagram for the repo or a specific module | Documentation and presentation |

### Tool call comparison — a real task
Expand Down Expand Up @@ -172,9 +172,13 @@ This is what happens when an AI agent has real codebase intelligence.
| **Symbols** | Searchable index of every function, class, and method |
| **Coverage** | Doc freshness per file with one-click regeneration |
| **Ownership** | Contributor attribution and bus factor risk |
| **Hotspots** | Ranked high-churn files with commit history |
| **Hotspots** | Ranked by trend-weighted score (180-day decay) and churn |
| **Dead Code** | Unused code with confidence scores and bulk actions |
| **Decisions** | Architectural decisions with staleness monitoring |
| **Costs** | LLM spend by day, model, or operation, with running session totals |
| **Blast Radius** | Paste a PR file list, see transitive impact, reviewers, and test gaps |
| **Knowledge Map** | Top owners, bus-factor silos, and onboarding targets on the dashboard |
| **System Health** | SQL/vector/graph drift status from the atomic store coordinator |

---

Expand Down Expand Up @@ -333,9 +337,18 @@ repowise search "<query>" # semantic search over the wiki
repowise status # coverage, freshness, dead code summary

# Dead code
repowise dead-code # full report
repowise dead-code --safe-only # only safe-to-delete findings
repowise dead-code resolve <id> # mark resolved / false positive
repowise dead-code # full report
repowise dead-code --safe-only # only safe-to-delete findings
repowise dead-code --min-confidence 0.8 # raise the confidence threshold
repowise dead-code --include-internals # include private/underscore symbols
repowise dead-code --include-zombie-packages # include unused declared packages
repowise dead-code resolve <id> # mark resolved / false positive

# Cost tracking
repowise costs # total LLM spend to date
repowise costs --by operation # grouped by operation type
repowise costs --by model # grouped by model
repowise costs --by day # grouped by day

# Decisions
repowise decision add # record a decision (interactive)
Expand All @@ -348,7 +361,8 @@ repowise generate-claude-md # regenerate CLAUDE.md

# Utilities
repowise export [PATH] # export wiki as markdown files
repowise doctor # check setup, API keys, connectivity
repowise doctor # check setup, API keys, store drift
repowise doctor --repair # check and fix detected store mismatches
repowise reindex # rebuild vector store (no LLM calls)
```

Expand Down
2 changes: 1 addition & 1 deletion packages/cli/src/repowise/cli/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@
AI-generated documentation.
"""

__version__ = "0.1.31"
__version__ = "0.2.0"
157 changes: 157 additions & 0 deletions packages/cli/src/repowise/cli/commands/costs_cmd.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
"""``repowise costs`` — display LLM cost history from the cost ledger."""

from __future__ import annotations

from datetime import datetime
from pathlib import Path
from typing import Any

import click
from rich.table import Table

from repowise.cli.helpers import (
console,
get_db_url_for_repo,
resolve_repo_path,
run_async,
)


def _parse_date(value: str | None) -> datetime | None:
"""Parse an ISO date string into a datetime, or return None."""
if value is None:
return None
try:
return datetime.fromisoformat(value)
except ValueError:
try:
from dateutil.parser import parse as _parse # type: ignore[import-untyped]

return _parse(value)
except Exception as exc:
raise click.BadParameter(f"Cannot parse date '{value}': {exc}") from exc


@click.command("costs")
@click.argument("path", required=False, default=None)
@click.option(
"--since",
default=None,
metavar="DATE",
help="Only show costs since this date (ISO format, e.g. 2026-01-01).",
)
@click.option(
"--by",
"group_by",
type=click.Choice(["operation", "model", "day"]),
default="operation",
show_default=True,
help="Group costs by operation, model, or day.",
)
@click.option(
"--repo-path",
"repo_path_flag",
default=None,
metavar="PATH",
help="Repository path (defaults to current directory).",
)
def costs_command(
path: str | None,
since: str | None,
group_by: str,
repo_path_flag: str | None,
) -> None:
"""Show LLM cost history for a repository.

PATH (or --repo-path) defaults to the current directory.
"""
# Support both positional PATH and --repo-path flag
raw_path = path or repo_path_flag
repo_path = resolve_repo_path(raw_path)

repowise_dir = repo_path / ".repowise"
if not repowise_dir.exists():
console.print("[yellow]No .repowise/ directory found. Run 'repowise init' first.[/yellow]")
return

since_dt = _parse_date(since)

rows = run_async(_query_costs(repo_path, since=since_dt, group_by=group_by))

if not rows:
msg = "No cost records found"
if since_dt:
msg += f" since {since_dt.date()}"
msg += ". Run 'repowise init' with an LLM provider to generate costs."
console.print(f"[yellow]{msg}[/yellow]")
return

# Build table
group_label = group_by.capitalize()
table = Table(
title=f"LLM Costs — grouped by {group_by}",
border_style="dim",
show_footer=True,
)
table.add_column(group_label, style="cyan", footer="[bold]TOTAL[/bold]")
table.add_column("Calls", justify="right", footer=str(sum(r["calls"] for r in rows)))
table.add_column(
"Input Tokens",
justify="right",
footer=f"{sum(r['input_tokens'] for r in rows):,}",
)
table.add_column(
"Output Tokens",
justify="right",
footer=f"{sum(r['output_tokens'] for r in rows):,}",
)
table.add_column(
"Cost USD",
justify="right",
footer=f"[bold green]${sum(r['cost_usd'] for r in rows):.4f}[/bold green]",
)

for row in rows:
table.add_row(
str(row["group"] or "—"),
str(row["calls"]),
f"{row['input_tokens']:,}",
f"{row['output_tokens']:,}",
f"[green]${row['cost_usd']:.4f}[/green]",
)

console.print()
console.print(table)
console.print()


async def _query_costs(
repo_path: Path,
since: datetime | None,
group_by: str,
) -> list[dict[str, Any]]:
"""Open the DB, look up the repo, and return aggregated cost rows."""
from repowise.core.generation.cost_tracker import CostTracker
from repowise.core.persistence import (
create_engine,
create_session_factory,
get_session,
init_db,
)
from repowise.core.persistence.crud import get_repository_by_path

url = get_db_url_for_repo(repo_path)
engine = create_engine(url)
await init_db(engine)
sf = create_session_factory(engine)

try:
async with get_session(sf) as session:
repo = await get_repository_by_path(session, str(repo_path))
if repo is None:
return []

tracker = CostTracker(session_factory=sf, repo_id=repo.id)
return await tracker.totals(since=since, group_by=group_by)
finally:
await engine.dispose()
Loading
Loading