Feat/new tools by swati510 · Pull Request #59 · repowise-dev/repowise

swati510 · 2026-04-09T13:18:04Z

Summary

Adds two new MCP tools, two supporting alembic migrations, and a set of
ingestion / generation improvements that make the wiki layer usable for
single-call agent workflows. All existing tools continue to work
unchanged. Bumps the public tool count from 8 to 10.

New MCP tools

mcp_server/tool_answer.py (new): get_answer(question, scope?, repo?)
is a one-call RAG endpoint over the wiki layer. It runs an FTS pass
with a coverage re-ranker, splits relational questions on connectives
and boosts pages at the intersection of both halves, gates synthesis
on a top/second dominance ratio (>= 1.2x), and only invokes the LLM
when retrieval is clearly dominant. High-confidence responses include
a note explaining the consumer can cite directly without verification
reads. Ambiguous retrievals return ranked excerpts so the agent
grounds in source instead of anchoring on a wrong frame. Synthesised
answers are persisted to AnswerCache by question hash so repeat
questions return at zero LLM cost. Degrades cleanly to retrieval-only
mode when no provider is configured.
mcp_server/tool_symbol.py (new): get_symbol(symbol_id) resolves a
qualified id of the form "path/to/file.py::Class::method" (also
accepts the dot separator) to its source body, signature, file
location, line range, and docstring. Recovers the rich on-disk
signature so base classes, decorators, and full type annotations
reach the LLM (the stripped DB form would lose these). Handles
duplicate-row resolution by canonical pick rather than raising
MultipleResultsFound.
mcp_server/_meta.py (new): shared _meta envelope and per-tool hint
builders used by tool_answer / tool_context / tool_symbol so all
three return a consistent metadata block (timing, hint, page counts).
mcp_server/init.py: re-exports the new tools, updates the
module docstring to "10 tools".

Schema migrations

alembic/versions/0012_page_summary.py (new): adds wiki_pages.summary
TEXT NOT NULL DEFAULT "". Stores a 1–3 sentence purpose blurb per
page so get_context can return narrative file-level descriptions
without shipping content_md on every turn. Server default backfills
existing rows on upgrade. Reversible downgrade defined.
alembic/versions/0013_answer_cache.py (new): creates the answer_cache
table with (id, repository_id, question_hash, question, payload_json,
provider_name, model_name, created_at), a unique constraint on
(repository_id, question_hash), an index on repository_id, and a
CASCADE foreign key to repositories so dropping a repo cleans up its
cache automatically. Pure CREATE TABLE — no impact on existing data.
Reversible downgrade defined.
core/persistence/models.py: adds the Page.summary column and the
AnswerCache ORM model matching the migrations above.
core/persistence/crud.py: helpers for upserting page summaries and
reading/writing AnswerCache rows.

Existing MCP tools

mcp_server/tool_context.py: get_context now defaults to compact=True.
Compact mode drops the structure block, the imported_by list, and
per-symbol docstring/end_line fields, keeping responses under ~10K
characters on dense files. Pass compact=False to get the full payload
on demand. Docstring trimmed to clean tool documentation. Internal
Fallback labels relabeled in plain English.
mcp_server/tool_search.py: docstring expanded into clean tool
documentation; behaviour unchanged.
mcp_server/tool_risk.py: cleanup pass; behaviour unchanged.
server/chat_tools.py and docstring counts: updated to 10 tools.

Ingestion / generation

core/generation/page_generator.py: _is_significant_file() now treats
any file tagged is_test=True (with at least one extracted symbol) as
significant, regardless of PageRank. Test files have near-zero
centrality because nothing imports them back, but they answer
"what test exercises X" / "where is Y verified" questions and the
doc layer is the right place to surface those. Filtering remains
available via --skip-tests.
core/ingestion/traverser.py: removes the workaround that excluded
tests/, test/, spec/, specs/, tests from the traversal. The
underlying pagerank-inflation bug it guarded against is fixed in
graph.py via the deterministic stem-priority disambiguation
(_stem_priority / _build_stem_map), so test files can now be
indexed safely while still being tagged is_test=True for downstream
filtering.
core/ingestion/graph.py: prose cleanup in the stem-priority docstring
and _build_stem_map; explains the test-fixture-named-like-the-package
failure mode in neutral terms. Framework-aware synthetic-edge code
(_add_conftest_edges, _add_django_edges, _add_fastapi_edges,
_add_flask_edges, dispatched by add_framework_edges(tech_stack))
is unchanged.
core/ingestion/parser.py, core/generation/models.py: small cleanups
feeding the new wiki_pages.summary field through the generation
pipeline.

CLI

cli/main.py: minor wiring for the new tools and the compact default.

Tests

tests/unit/server/test_tool_symbol.py (new): unit tests for
_resolve_symbol covering separator-style mismatches between
Class.method and Class::method and MultipleResultsFound handling
on duplicate lookup keys.
tests/unit/server/test_mcp.py: counter and fixture updates for the
10-tool surface.
tests/unit/ingestion/test_graph.py: fixture updates around the
stem-priority cleanup.

Docs

README.md: bumps "Eight MCP tools" → "Ten MCP tools" in the headline,
abstract, comparison table, and competitor matrix; adds get_answer,
get_symbol, and compact-default rows to the tool table; documents
the test-files-in-wiki and single-call-answer additions in the
"What's new" section.
docs/ARCHITECTURE.md: schema table now lists the summary column on
wiki_pages and the new answer_cache table; the page-generator
section documents the test-file inclusion rule; references to "8
tools" updated to 10.
docs/CHANGELOG.md: Unreleased Added entries for get_answer,
get_symbol, the two migrations, and test-file indexing; Changed
entry for the get_context compact default.
docs/USER_GUIDE.md: tool table updated to 10 entries.
docs/architecture-guide.md, docs/CHAT.md: tool counts updated.
packages/server/README.md, plugins/claude-code/DEVELOPER.md,
website/index.md, website/concepts.md, website/mcp-server.md,
website/claude-md-generator.md: tool counts updated; mcp-server.md
gains full sections (parameters, returns, examples) for get_answer
and get_symbol and documents the new compact parameter on
get_context.

…res, cost tracking, PR blast radius Adds 11 capabilities across the indexing pipeline, persistence layer, MCP tools, and CLI. MCP tool count is unchanged; new functionality is folded into existing tools (get_risk, get_overview, get_dead_code). Pipeline & generation - ProcessPool-based parsing with sequential fallback; ingestion and git stages now run concurrently via asyncio.gather - RAG-aware doc generation: dependency summaries are pre-fetched from the vector store and injected into the file_page prompt; pages generated in topological order so leaves are summarized before their dependents - Dynamic import hint extractors (Django INSTALLED_APPS/ROOT_URLCONF/ MIDDLEWARE/url include, pytest conftest fixtures, Node package.json exports + tsconfig path aliases) wired into GraphBuilder.add_dynamic_edges Persistence - AtomicStorageCoordinator with async transaction() context manager and health_check() spanning SQL, in-memory graph, and vector store - recompute_git_percentiles now uses a single SQL PERCENT_RANK() window function instead of in-memory Python ranking - New temporal_hotspot_score column on git_metadata, computed via exp decay (180-day half-life) and used as the primary percentile sort key - New llm_costs and security_findings tables; matching ORM models - vector_store.get_page_summary_by_path() on all three backends Cost tracking - CostTracker with per-call recording, persisted to llm_costs; pricing table covers Claude 4.6 family, GPT-4o, and Gemini 1.5/2.5/3.x variants - Wired into Anthropic, Gemini, OpenAI, and LiteLLM providers - Live USD column on the indexing progress bar - New `repowise costs` CLI grouping by operation/model/day Analysis - PRBlastRadiusAnalyzer: transitive ancestor BFS over graph_edges, co-change warnings, recommended reviewers by temporal ownership, test gaps, 0–10 overall risk score - SecurityScanner: pattern-based scan for eval/exec/pickle/raw SQL/ hardcoded secrets/weak hashes; persisted at index time MCP tool extensions - get_risk(changed_files=[...]) returns blast radius; per-file payload now includes test_gap and security_signals - get_overview returns knowledge_map with top owners, knowledge silos (>80% ownership), and onboarding targets - get_dead_code accepts min_confidence, include_internals, include_zombie_packages, no_unreachable, no_unused_exports CLI - `repowise dead-code` exposes the same sensitivity flags - `repowise doctor` adds a coordinator drift health check (Check #10) - `repowise costs` command registered Tests - test_models.py: expected table set updated to include llm_costs and security_findings; full suite green (757 passed, 9 skipped) - End-to-end validated against test-repos/microdot: 164 files ingested, 83 pages generated, 132 git_metadata rows with temporal hotspot score, 83 cost rows totaling $0.0258, 2 security findings, drift = 0

…pact context default Adds two new MCP tools, two supporting alembic migrations, and a set of ingestion / generation improvements that make the wiki layer usable for single-call agent workflows. All existing tools continue to work unchanged. Bumps the public tool count from 8 to 10. New MCP tools ------------- - mcp_server/tool_answer.py (new): get_answer(question, scope?, repo?) is a one-call RAG endpoint over the wiki layer. It runs an FTS pass with a coverage re-ranker, splits relational questions on connectives and boosts pages at the intersection of both halves, gates synthesis on a top/second dominance ratio (>= 1.2x), and only invokes the LLM when retrieval is clearly dominant. High-confidence responses include a note explaining the consumer can cite directly without verification reads. Ambiguous retrievals return ranked excerpts so the agent grounds in source instead of anchoring on a wrong frame. Synthesised answers are persisted to AnswerCache by question hash so repeat questions return at zero LLM cost. Degrades cleanly to retrieval-only mode when no provider is configured. - mcp_server/tool_symbol.py (new): get_symbol(symbol_id) resolves a qualified id of the form "path/to/file.py::Class::method" (also accepts the dot separator) to its source body, signature, file location, line range, and docstring. Recovers the rich on-disk signature so base classes, decorators, and full type annotations reach the LLM (the stripped DB form would lose these). Handles duplicate-row resolution by canonical pick rather than raising MultipleResultsFound. - mcp_server/_meta.py (new): shared _meta envelope and per-tool hint builders used by tool_answer / tool_context / tool_symbol so all three return a consistent metadata block (timing, hint, page counts). - mcp_server/__init__.py: re-exports the new tools, updates the module docstring to "10 tools". Schema migrations ----------------- - alembic/versions/0012_page_summary.py (new): adds wiki_pages.summary TEXT NOT NULL DEFAULT "". Stores a 1–3 sentence purpose blurb per page so get_context can return narrative file-level descriptions without shipping content_md on every turn. Server default backfills existing rows on upgrade. Reversible downgrade defined. - alembic/versions/0013_answer_cache.py (new): creates the answer_cache table with (id, repository_id, question_hash, question, payload_json, provider_name, model_name, created_at), a unique constraint on (repository_id, question_hash), an index on repository_id, and a CASCADE foreign key to repositories so dropping a repo cleans up its cache automatically. Pure CREATE TABLE — no impact on existing data. Reversible downgrade defined. - core/persistence/models.py: adds the Page.summary column and the AnswerCache ORM model matching the migrations above. - core/persistence/crud.py: helpers for upserting page summaries and reading/writing AnswerCache rows. Existing MCP tools ------------------ - mcp_server/tool_context.py: get_context now defaults to compact=True. Compact mode drops the structure block, the imported_by list, and per-symbol docstring/end_line fields, keeping responses under ~10K characters on dense files. Pass compact=False to get the full payload on demand. Docstring trimmed to clean tool documentation. Internal Fallback labels relabeled in plain English. - mcp_server/tool_search.py: docstring expanded into clean tool documentation; behaviour unchanged. - mcp_server/tool_risk.py: cleanup pass; behaviour unchanged. - server/chat_tools.py and docstring counts: updated to 10 tools. Ingestion / generation ---------------------- - core/generation/page_generator.py: _is_significant_file() now treats any file tagged is_test=True (with at least one extracted symbol) as significant, regardless of PageRank. Test files have near-zero centrality because nothing imports them back, but they answer "what test exercises X" / "where is Y verified" questions and the doc layer is the right place to surface those. Filtering remains available via --skip-tests. - core/ingestion/traverser.py: removes the workaround that excluded tests/, test/, spec/, specs/, __tests__ from the traversal. The underlying pagerank-inflation bug it guarded against is fixed in graph.py via the deterministic stem-priority disambiguation (_stem_priority / _build_stem_map), so test files can now be indexed safely while still being tagged is_test=True for downstream filtering. - core/ingestion/graph.py: prose cleanup in the stem-priority docstring and _build_stem_map; explains the test-fixture-named-like-the-package failure mode in neutral terms. Framework-aware synthetic-edge code (_add_conftest_edges, _add_django_edges, _add_fastapi_edges, _add_flask_edges, dispatched by add_framework_edges(tech_stack)) is unchanged. - core/ingestion/parser.py, core/generation/models.py: small cleanups feeding the new wiki_pages.summary field through the generation pipeline. CLI --- - cli/main.py: minor wiring for the new tools and the compact default. Tests ----- - tests/unit/server/test_tool_symbol.py (new): unit tests for _resolve_symbol covering separator-style mismatches between Class.method and Class::method and MultipleResultsFound handling on duplicate lookup keys. - tests/unit/server/test_mcp.py: counter and fixture updates for the 10-tool surface. - tests/unit/ingestion/test_graph.py: fixture updates around the stem-priority cleanup. Docs ---- - README.md: bumps "Eight MCP tools" → "Ten MCP tools" in the headline, abstract, comparison table, and competitor matrix; adds get_answer, get_symbol, and compact-default rows to the tool table; documents the test-files-in-wiki and single-call-answer additions in the "What's new" section. - docs/ARCHITECTURE.md: schema table now lists the summary column on wiki_pages and the new answer_cache table; the page-generator section documents the test-file inclusion rule; references to "8 tools" updated to 10. - docs/CHANGELOG.md: Unreleased Added entries for get_answer, get_symbol, the two migrations, and test-file indexing; Changed entry for the get_context compact default. - docs/USER_GUIDE.md: tool table updated to 10 entries. - docs/architecture-guide.md, docs/CHAT.md: tool counts updated. - packages/server/README.md, plugins/claude-code/DEVELOPER.md, website/index.md, website/concepts.md, website/mcp-server.md, website/claude-md-generator.md: tool counts updated; mcp-server.md gains full sections (parameters, returns, examples) for get_answer and get_symbol and documents the new compact parameter on get_context. Verified -------- Ran `repowise init --index-only` end-to-end against pallets/flask: 125 files, 1,624 symbols, 125 nodes, 241 edges (191 imports + 28 framework + 22 dynamic), 8 languages, 14 hotspots, 13 dead-code findings. SQL audit confirmed both new migrations applied (answer_cache table present; wiki_pages.summary column present), test files contributed 920 symbols, and conftest framework edges fired. Live MCP-tool checks against the full-mode wiki: get_symbol resolved src/flask/app.py::Flask to its source body and signature across lines 109–508; get_context returned the LLM summary without the structure / imported_by blocks (compact default); get_answer ran retrieval, hit the dominance gate at 1.07× < 1.2×, and correctly returned ranked excerpts instead of synthesising a wrong frame.

RaghavChamadiya and others added 2 commits April 7, 2026 15:33

swati510 requested a review from RaghavChamadiya as a code owner April 9, 2026 13:18

swati510 and others added 2 commits April 9, 2026 18:51

Merge branch 'main' into feat/new-tools

8b2dfb7

test fix

fdb6ecd

RaghavChamadiya approved these changes Apr 9, 2026

View reviewed changes

swati510 merged commit f43d0cf into main Apr 9, 2026
5 checks passed

swati510 deleted the feat/new-tools branch April 9, 2026 13:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/new tools#59

Feat/new tools#59
swati510 merged 4 commits intomainfrom
feat/new-tools

swati510 commented Apr 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

swati510 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New MCP tools

Schema migrations

Existing MCP tools

Ingestion / generation

CLI

Tests

Docs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

swati510 commented Apr 9, 2026 •

edited

Loading