Fix/minor by swati510 · Pull Request #60 · repowise-dev/repowise

swati510 · 2026-04-09T17:39:17Z

Summary

Related Issues

Test Plan

Tests pass (pytest)
Lint passes (ruff check .)
Web build passes (npm run build) (if frontend changes)

Checklist

My code follows the project's code style
I have added tests for new functionality
All existing tests still pass
I have updated documentation if needed

…res, cost tracking, PR blast radius Adds 11 capabilities across the indexing pipeline, persistence layer, MCP tools, and CLI. MCP tool count is unchanged; new functionality is folded into existing tools (get_risk, get_overview, get_dead_code). Pipeline & generation - ProcessPool-based parsing with sequential fallback; ingestion and git stages now run concurrently via asyncio.gather - RAG-aware doc generation: dependency summaries are pre-fetched from the vector store and injected into the file_page prompt; pages generated in topological order so leaves are summarized before their dependents - Dynamic import hint extractors (Django INSTALLED_APPS/ROOT_URLCONF/ MIDDLEWARE/url include, pytest conftest fixtures, Node package.json exports + tsconfig path aliases) wired into GraphBuilder.add_dynamic_edges Persistence - AtomicStorageCoordinator with async transaction() context manager and health_check() spanning SQL, in-memory graph, and vector store - recompute_git_percentiles now uses a single SQL PERCENT_RANK() window function instead of in-memory Python ranking - New temporal_hotspot_score column on git_metadata, computed via exp decay (180-day half-life) and used as the primary percentile sort key - New llm_costs and security_findings tables; matching ORM models - vector_store.get_page_summary_by_path() on all three backends Cost tracking - CostTracker with per-call recording, persisted to llm_costs; pricing table covers Claude 4.6 family, GPT-4o, and Gemini 1.5/2.5/3.x variants - Wired into Anthropic, Gemini, OpenAI, and LiteLLM providers - Live USD column on the indexing progress bar - New `repowise costs` CLI grouping by operation/model/day Analysis - PRBlastRadiusAnalyzer: transitive ancestor BFS over graph_edges, co-change warnings, recommended reviewers by temporal ownership, test gaps, 0–10 overall risk score - SecurityScanner: pattern-based scan for eval/exec/pickle/raw SQL/ hardcoded secrets/weak hashes; persisted at index time MCP tool extensions - get_risk(changed_files=[...]) returns blast radius; per-file payload now includes test_gap and security_signals - get_overview returns knowledge_map with top owners, knowledge silos (>80% ownership), and onboarding targets - get_dead_code accepts min_confidence, include_internals, include_zombie_packages, no_unreachable, no_unused_exports CLI - `repowise dead-code` exposes the same sensitivity flags - `repowise doctor` adds a coordinator drift health check (Check #10) - `repowise costs` command registered Tests - test_models.py: expected table set updated to include llm_costs and security_findings; full suite green (757 passed, 9 skipped) - End-to-end validated against test-repos/microdot: 164 files ingested, 83 pages generated, 132 git_metadata rows with temporal hotspot score, 83 cost rows totaling $0.0258, 2 security findings, drift = 0

…pact context default Adds two new MCP tools, two supporting alembic migrations, and a set of ingestion / generation improvements that make the wiki layer usable for single-call agent workflows. All existing tools continue to work unchanged. Bumps the public tool count from 8 to 10. New MCP tools ------------- - mcp_server/tool_answer.py (new): get_answer(question, scope?, repo?) is a one-call RAG endpoint over the wiki layer. It runs an FTS pass with a coverage re-ranker, splits relational questions on connectives and boosts pages at the intersection of both halves, gates synthesis on a top/second dominance ratio (>= 1.2x), and only invokes the LLM when retrieval is clearly dominant. High-confidence responses include a note explaining the consumer can cite directly without verification reads. Ambiguous retrievals return ranked excerpts so the agent grounds in source instead of anchoring on a wrong frame. Synthesised answers are persisted to AnswerCache by question hash so repeat questions return at zero LLM cost. Degrades cleanly to retrieval-only mode when no provider is configured. - mcp_server/tool_symbol.py (new): get_symbol(symbol_id) resolves a qualified id of the form "path/to/file.py::Class::method" (also accepts the dot separator) to its source body, signature, file location, line range, and docstring. Recovers the rich on-disk signature so base classes, decorators, and full type annotations reach the LLM (the stripped DB form would lose these). Handles duplicate-row resolution by canonical pick rather than raising MultipleResultsFound. - mcp_server/_meta.py (new): shared _meta envelope and per-tool hint builders used by tool_answer / tool_context / tool_symbol so all three return a consistent metadata block (timing, hint, page counts). - mcp_server/__init__.py: re-exports the new tools, updates the module docstring to "10 tools". Schema migrations ----------------- - alembic/versions/0012_page_summary.py (new): adds wiki_pages.summary TEXT NOT NULL DEFAULT "". Stores a 1–3 sentence purpose blurb per page so get_context can return narrative file-level descriptions without shipping content_md on every turn. Server default backfills existing rows on upgrade. Reversible downgrade defined. - alembic/versions/0013_answer_cache.py (new): creates the answer_cache table with (id, repository_id, question_hash, question, payload_json, provider_name, model_name, created_at), a unique constraint on (repository_id, question_hash), an index on repository_id, and a CASCADE foreign key to repositories so dropping a repo cleans up its cache automatically. Pure CREATE TABLE — no impact on existing data. Reversible downgrade defined. - core/persistence/models.py: adds the Page.summary column and the AnswerCache ORM model matching the migrations above. - core/persistence/crud.py: helpers for upserting page summaries and reading/writing AnswerCache rows. Existing MCP tools ------------------ - mcp_server/tool_context.py: get_context now defaults to compact=True. Compact mode drops the structure block, the imported_by list, and per-symbol docstring/end_line fields, keeping responses under ~10K characters on dense files. Pass compact=False to get the full payload on demand. Docstring trimmed to clean tool documentation. Internal Fallback labels relabeled in plain English. - mcp_server/tool_search.py: docstring expanded into clean tool documentation; behaviour unchanged. - mcp_server/tool_risk.py: cleanup pass; behaviour unchanged. - server/chat_tools.py and docstring counts: updated to 10 tools. Ingestion / generation ---------------------- - core/generation/page_generator.py: _is_significant_file() now treats any file tagged is_test=True (with at least one extracted symbol) as significant, regardless of PageRank. Test files have near-zero centrality because nothing imports them back, but they answer "what test exercises X" / "where is Y verified" questions and the doc layer is the right place to surface those. Filtering remains available via --skip-tests. - core/ingestion/traverser.py: removes the workaround that excluded tests/, test/, spec/, specs/, __tests__ from the traversal. The underlying pagerank-inflation bug it guarded against is fixed in graph.py via the deterministic stem-priority disambiguation (_stem_priority / _build_stem_map), so test files can now be indexed safely while still being tagged is_test=True for downstream filtering. - core/ingestion/graph.py: prose cleanup in the stem-priority docstring and _build_stem_map; explains the test-fixture-named-like-the-package failure mode in neutral terms. Framework-aware synthetic-edge code (_add_conftest_edges, _add_django_edges, _add_fastapi_edges, _add_flask_edges, dispatched by add_framework_edges(tech_stack)) is unchanged. - core/ingestion/parser.py, core/generation/models.py: small cleanups feeding the new wiki_pages.summary field through the generation pipeline. CLI --- - cli/main.py: minor wiring for the new tools and the compact default. Tests ----- - tests/unit/server/test_tool_symbol.py (new): unit tests for _resolve_symbol covering separator-style mismatches between Class.method and Class::method and MultipleResultsFound handling on duplicate lookup keys. - tests/unit/server/test_mcp.py: counter and fixture updates for the 10-tool surface. - tests/unit/ingestion/test_graph.py: fixture updates around the stem-priority cleanup. Docs ---- - README.md: bumps "Eight MCP tools" → "Ten MCP tools" in the headline, abstract, comparison table, and competitor matrix; adds get_answer, get_symbol, and compact-default rows to the tool table; documents the test-files-in-wiki and single-call-answer additions in the "What's new" section. - docs/ARCHITECTURE.md: schema table now lists the summary column on wiki_pages and the new answer_cache table; the page-generator section documents the test-file inclusion rule; references to "8 tools" updated to 10. - docs/CHANGELOG.md: Unreleased Added entries for get_answer, get_symbol, the two migrations, and test-file indexing; Changed entry for the get_context compact default. - docs/USER_GUIDE.md: tool table updated to 10 entries. - docs/architecture-guide.md, docs/CHAT.md: tool counts updated. - packages/server/README.md, plugins/claude-code/DEVELOPER.md, website/index.md, website/concepts.md, website/mcp-server.md, website/claude-md-generator.md: tool counts updated; mcp-server.md gains full sections (parameters, returns, examples) for get_answer and get_symbol and documents the new compact parameter on get_context. Verified -------- Ran `repowise init --index-only` end-to-end against pallets/flask: 125 files, 1,624 symbols, 125 nodes, 241 edges (191 imports + 28 framework + 22 dynamic), 8 languages, 14 hotspots, 13 dead-code findings. SQL audit confirmed both new migrations applied (answer_cache table present; wiki_pages.summary column present), test files contributed 920 symbols, and conftest framework edges fired. Live MCP-tool checks against the full-mode wiki: get_symbol resolved src/flask/app.py::Flask to its source body and signature across lines 109–508; get_context returned the LLM summary without the structure / imported_by blocks (compact default); get_answer ran retrieval, hit the dominance gate at 1.07× < 1.2×, and correctly returned ranked excerpts instead of synthesising a wrong frame.

RaghavChamadiya and others added 5 commits April 7, 2026 15:33

Merge branch 'main' into feat/new-tools

8b2dfb7

test fix

fdb6ecd

benchmarks

55b27c6

swati510 requested a review from RaghavChamadiya as a code owner April 9, 2026 17:39

Merge branch 'main' into fix/minor

39e9ecb

RaghavChamadiya approved these changes Apr 9, 2026

View reviewed changes

swati510 merged commit c239c15 into main Apr 9, 2026
5 checks passed

swati510 deleted the fix/minor branch April 9, 2026 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/minor#60

Fix/minor#60
swati510 merged 6 commits intomainfrom
fix/minor

swati510 commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

swati510 commented Apr 9, 2026

Summary

Related Issues

Test Plan

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants