Skip to content

Fix/minor#60

Merged
swati510 merged 6 commits intomainfrom
fix/minor
Apr 9, 2026
Merged

Fix/minor#60
swati510 merged 6 commits intomainfrom
fix/minor

Conversation

@swati510
Copy link
Copy Markdown
Collaborator

@swati510 swati510 commented Apr 9, 2026

Summary

Related Issues

Test Plan

  • Tests pass (pytest)
  • Lint passes (ruff check .)
  • Web build passes (npm run build) (if frontend changes)

Checklist

  • My code follows the project's code style
  • I have added tests for new functionality
  • All existing tests still pass
  • I have updated documentation if needed

RaghavChamadiya and others added 5 commits April 7, 2026 15:33
…res, cost tracking, PR blast radius

Adds 11 capabilities across the indexing pipeline, persistence layer, MCP
tools, and CLI. MCP tool count is unchanged; new functionality is folded
into existing tools (get_risk, get_overview, get_dead_code).

Pipeline & generation
- ProcessPool-based parsing with sequential fallback; ingestion and git
  stages now run concurrently via asyncio.gather
- RAG-aware doc generation: dependency summaries are pre-fetched from the
  vector store and injected into the file_page prompt; pages generated in
  topological order so leaves are summarized before their dependents
- Dynamic import hint extractors (Django INSTALLED_APPS/ROOT_URLCONF/
  MIDDLEWARE/url include, pytest conftest fixtures, Node package.json
  exports + tsconfig path aliases) wired into GraphBuilder.add_dynamic_edges

Persistence
- AtomicStorageCoordinator with async transaction() context manager and
  health_check() spanning SQL, in-memory graph, and vector store
- recompute_git_percentiles now uses a single SQL PERCENT_RANK() window
  function instead of in-memory Python ranking
- New temporal_hotspot_score column on git_metadata, computed via exp
  decay (180-day half-life) and used as the primary percentile sort key
- New llm_costs and security_findings tables; matching ORM models
- vector_store.get_page_summary_by_path() on all three backends

Cost tracking
- CostTracker with per-call recording, persisted to llm_costs; pricing
  table covers Claude 4.6 family, GPT-4o, and Gemini 1.5/2.5/3.x variants
- Wired into Anthropic, Gemini, OpenAI, and LiteLLM providers
- Live USD column on the indexing progress bar
- New `repowise costs` CLI grouping by operation/model/day

Analysis
- PRBlastRadiusAnalyzer: transitive ancestor BFS over graph_edges,
  co-change warnings, recommended reviewers by temporal ownership,
  test gaps, 0–10 overall risk score
- SecurityScanner: pattern-based scan for eval/exec/pickle/raw SQL/
  hardcoded secrets/weak hashes; persisted at index time

MCP tool extensions
- get_risk(changed_files=[...]) returns blast radius; per-file payload
  now includes test_gap and security_signals
- get_overview returns knowledge_map with top owners, knowledge silos
  (>80% ownership), and onboarding targets
- get_dead_code accepts min_confidence, include_internals,
  include_zombie_packages, no_unreachable, no_unused_exports

CLI
- `repowise dead-code` exposes the same sensitivity flags
- `repowise doctor` adds a coordinator drift health check (Check #10)
- `repowise costs` command registered

Tests
- test_models.py: expected table set updated to include llm_costs and
  security_findings; full suite green (757 passed, 9 skipped)
- End-to-end validated against test-repos/microdot: 164 files ingested,
  83 pages generated, 132 git_metadata rows with temporal hotspot score,
  83 cost rows totaling $0.0258, 2 security findings, drift = 0
…pact context default

  Adds two new MCP tools, two supporting alembic migrations, and a set of
  ingestion / generation improvements that make the wiki layer usable for
  single-call agent workflows. All existing tools continue to work
  unchanged. Bumps the public tool count from 8 to 10.

  New MCP tools
  -------------
  - mcp_server/tool_answer.py (new): get_answer(question, scope?, repo?)
    is a one-call RAG endpoint over the wiki layer. It runs an FTS pass
    with a coverage re-ranker, splits relational questions on connectives
    and boosts pages at the intersection of both halves, gates synthesis
    on a top/second dominance ratio (>= 1.2x), and only invokes the LLM
    when retrieval is clearly dominant. High-confidence responses include
    a note explaining the consumer can cite directly without verification
    reads. Ambiguous retrievals return ranked excerpts so the agent
    grounds in source instead of anchoring on a wrong frame. Synthesised
    answers are persisted to AnswerCache by question hash so repeat
    questions return at zero LLM cost. Degrades cleanly to retrieval-only
    mode when no provider is configured.

  - mcp_server/tool_symbol.py (new): get_symbol(symbol_id) resolves a
    qualified id of the form "path/to/file.py::Class::method" (also
    accepts the dot separator) to its source body, signature, file
    location, line range, and docstring. Recovers the rich on-disk
    signature so base classes, decorators, and full type annotations
    reach the LLM (the stripped DB form would lose these). Handles
    duplicate-row resolution by canonical pick rather than raising
    MultipleResultsFound.

  - mcp_server/_meta.py (new): shared _meta envelope and per-tool hint
    builders used by tool_answer / tool_context / tool_symbol so all
    three return a consistent metadata block (timing, hint, page counts).

  - mcp_server/__init__.py: re-exports the new tools, updates the
    module docstring to "10 tools".

  Schema migrations
  -----------------
  - alembic/versions/0012_page_summary.py (new): adds wiki_pages.summary
    TEXT NOT NULL DEFAULT "". Stores a 1–3 sentence purpose blurb per
    page so get_context can return narrative file-level descriptions
    without shipping content_md on every turn. Server default backfills
    existing rows on upgrade. Reversible downgrade defined.

  - alembic/versions/0013_answer_cache.py (new): creates the answer_cache
    table with (id, repository_id, question_hash, question, payload_json,
    provider_name, model_name, created_at), a unique constraint on
    (repository_id, question_hash), an index on repository_id, and a
    CASCADE foreign key to repositories so dropping a repo cleans up its
    cache automatically. Pure CREATE TABLE — no impact on existing data.
    Reversible downgrade defined.

  - core/persistence/models.py: adds the Page.summary column and the
    AnswerCache ORM model matching the migrations above.

  - core/persistence/crud.py: helpers for upserting page summaries and
    reading/writing AnswerCache rows.

  Existing MCP tools
  ------------------
  - mcp_server/tool_context.py: get_context now defaults to compact=True.
    Compact mode drops the structure block, the imported_by list, and
    per-symbol docstring/end_line fields, keeping responses under ~10K
    characters on dense files. Pass compact=False to get the full payload
    on demand. Docstring trimmed to clean tool documentation. Internal
    Fallback labels relabeled in plain English.

  - mcp_server/tool_search.py: docstring expanded into clean tool
    documentation; behaviour unchanged.

  - mcp_server/tool_risk.py: cleanup pass; behaviour unchanged.

  - server/chat_tools.py and docstring counts: updated to 10 tools.

  Ingestion / generation
  ----------------------
  - core/generation/page_generator.py: _is_significant_file() now treats
    any file tagged is_test=True (with at least one extracted symbol) as
    significant, regardless of PageRank. Test files have near-zero
    centrality because nothing imports them back, but they answer
    "what test exercises X" / "where is Y verified" questions and the
    doc layer is the right place to surface those. Filtering remains
    available via --skip-tests.

  - core/ingestion/traverser.py: removes the workaround that excluded
    tests/, test/, spec/, specs/, __tests__ from the traversal. The
    underlying pagerank-inflation bug it guarded against is fixed in
    graph.py via the deterministic stem-priority disambiguation
    (_stem_priority / _build_stem_map), so test files can now be
    indexed safely while still being tagged is_test=True for downstream
    filtering.

  - core/ingestion/graph.py: prose cleanup in the stem-priority docstring
    and _build_stem_map; explains the test-fixture-named-like-the-package
    failure mode in neutral terms. Framework-aware synthetic-edge code
    (_add_conftest_edges, _add_django_edges, _add_fastapi_edges,
    _add_flask_edges, dispatched by add_framework_edges(tech_stack))
    is unchanged.

  - core/ingestion/parser.py, core/generation/models.py: small cleanups
    feeding the new wiki_pages.summary field through the generation
    pipeline.

  CLI
  ---
  - cli/main.py: minor wiring for the new tools and the compact default.

  Tests
  -----
  - tests/unit/server/test_tool_symbol.py (new): unit tests for
    _resolve_symbol covering separator-style mismatches between
    Class.method and Class::method and MultipleResultsFound handling
    on duplicate lookup keys.
  - tests/unit/server/test_mcp.py: counter and fixture updates for the
    10-tool surface.
  - tests/unit/ingestion/test_graph.py: fixture updates around the
    stem-priority cleanup.

  Docs
  ----
  - README.md: bumps "Eight MCP tools" → "Ten MCP tools" in the headline,
    abstract, comparison table, and competitor matrix; adds get_answer,
    get_symbol, and compact-default rows to the tool table; documents
    the test-files-in-wiki and single-call-answer additions in the
    "What's new" section.
  - docs/ARCHITECTURE.md: schema table now lists the summary column on
    wiki_pages and the new answer_cache table; the page-generator
    section documents the test-file inclusion rule; references to "8
    tools" updated to 10.
  - docs/CHANGELOG.md: Unreleased Added entries for get_answer,
    get_symbol, the two migrations, and test-file indexing; Changed
    entry for the get_context compact default.
  - docs/USER_GUIDE.md: tool table updated to 10 entries.
  - docs/architecture-guide.md, docs/CHAT.md: tool counts updated.
  - packages/server/README.md, plugins/claude-code/DEVELOPER.md,
    website/index.md, website/concepts.md, website/mcp-server.md,
    website/claude-md-generator.md: tool counts updated; mcp-server.md
    gains full sections (parameters, returns, examples) for get_answer
    and get_symbol and documents the new compact parameter on
    get_context.

  Verified
  --------
  Ran `repowise init --index-only` end-to-end against pallets/flask:
  125 files, 1,624 symbols, 125 nodes, 241 edges (191 imports + 28
  framework + 22 dynamic), 8 languages, 14 hotspots, 13 dead-code
  findings. SQL audit confirmed both new migrations applied
  (answer_cache table present; wiki_pages.summary column present),
  test files contributed 920 symbols, and conftest framework edges
  fired. Live MCP-tool checks against the full-mode wiki: get_symbol
  resolved src/flask/app.py::Flask to its source body and signature
  across lines 109–508; get_context returned the LLM summary without
  the structure / imported_by blocks (compact default); get_answer
  ran retrieval, hit the dominance gate at 1.07× < 1.2×, and correctly
  returned ranked excerpts instead of synthesising a wrong frame.
@swati510 swati510 merged commit c239c15 into main Apr 9, 2026
5 checks passed
@swati510 swati510 deleted the fix/minor branch April 9, 2026 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants