Deep Research Agent for Technical & Engineering Intelligence

Potency AI is a multi-source AI research agent that retrieves, analyzes, and synthesizes technical information from documentation, academic papers, blog posts, and code repositories in parallel. It produces structured engineering reports complete with architecture diagrams, source credibility scoring, and actionable follow-up questions -- all streamed to the browser in real time via Server-Sent Events.
The system maintains a persistent knowledge graph (SQLite-backed) that grows across research sessions, linking technologies, concepts, patterns, and organizations discovered during each investigation. Every entity is deduplicated, scored for relevance strength, and exportable in JSON, CSV, or Obsidian markdown format.
Potency AI supports five LLM providers (Gemini, OpenAI, Groq, HuggingFace, Ollama) with automatic fallback and hybrid routing based on real-time connectivity monitoring. When the network degrades, classification and extraction tasks shift to a local model while synthesis stays on the cloud. When fully offline, the agent operates entirely on local LLMs with cached sources -- no API keys required.
- Multi-source parallel retrieval -- documentation, academic papers, blog posts, and code repositories searched concurrently with semantic reranking (BAAI/bge embeddings)
- Real-time streaming pipeline -- SSE-powered Agent Brain feed showing intent classification, planning, retrieval, reasoning, and synthesis stages with live progress
- 8 Mermaid diagram types -- architecture, sequence, flowchart, class, ER, mindmap, timeline, and C4 context diagrams, auto-detected per query with Mermaid v11 compatibility
- Persistent knowledge graph -- SQLite-backed with UUID entities, fuzzy deduplication, session tracking, entity merge, backlinks, and Cytoscape.js visualization
- Multi-LLM support -- Gemini, OpenAI, Groq, HuggingFace, and Ollama with automatic provider detection and rate-limit fallback chains
- Hybrid routing -- cloud, hybrid, or offline mode determined by real-time connectivity monitoring; tasks are routed to cloud or local models based on latency
- Web page fetching and analysis -- fetch any URL, extract structured content with trafilatura, analyze key facts/entities/sentiment, and crawl with configurable depth
- Redis caching -- query results and source content cached with configurable TTL; automatic file-based fallback when Redis is unavailable
- Kafka event streaming -- optional integration for publishing research pipeline events to a Kafka topic
- Model comparison mode -- run the same query against two LLMs simultaneously with side-by-side streaming output
- User preference memory -- detects "remember I prefer X" in queries and applies saved preferences to future reports automatically
- Four reasoning modules -- architecture analysis, tradeoff comparison, performance evaluation, and code quality review, selected automatically by query intent
- Export -- Markdown copy, print-to-PDF, and PNG export from the browser
+---------------------+
| Browser (SPA) |
| Tailwind + Mermaid |
| + Cytoscape.js |
+----------+----------+
| SSE / REST
v
+-----------+-----------+
| FastAPI |
| Middleware (Auth, |
| Rate Limiting) |
+-----------+-----------+
|
+------------------------+------------------------+
| | |
+-------v-------+ +--------v--------+ +-------v-------+
| Research | | Diagrams | | Knowledge |
| Pipeline | | Engine | | Graph API |
+-------+-------+ +--------+--------+ +-------+-------+
| | |
+------------+------------+ | +-------v-------+
| | | | | SQLite |
+----v---+ +-----v----+ +----v----+ | | (aiosqlite) |
| Intent | | Planning | | Reason- | | +---------------+
| Class. | | | | ing | |
+--------+ +----------+ +---------+ |
| |
+-------v-------+ |
| Retrieval | |
| Aggregator | |
+-------+-------+ |
| |
+------+------+------+------+ |
| | | | | |
Docs Papers Blogs Code Web |
|
+-------v-------+ |
| Synthesis +<--------------+
| Engine |
+-------+-------+
|
+-------v-------+
| LLM Router |
| (Hybrid Cloud/ |
| Local) |
+-------+-------+
|
+------+------+------+------+
| | | | |
Gemini OpenAI Groq HF Ollama
- Python 3.11 or higher
- At least one of: an LLM API key (Gemini, OpenAI, Groq, HuggingFace) or Ollama installed locally
- (Optional) Redis for caching, ChromaDB for vector storage
# Clone the repository
git clone https://github.com/your-org/potency-ai.git
cd potency-ai
# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Copy the example environment file
cp .env.example .env
# Edit .env and add at least one LLM API key
# (or leave all empty to use Ollama locally)
# Start the server
uvicorn app.main:app --reload
# Open http://localhost:8000
# Start the full stack (app + Redis + ChromaDB)
docker compose up --build
# Access at http://localhost:8000
All settings are controlled via environment variables or a .env file. The table below lists the key variables; see .env.example for a fully annotated template.
| Variable |
Description |
Default |
GEMINI_API_KEY |
Google Gemini API key (get one) |
"" |
OPENAI_API_KEY |
OpenAI API key |
"" |
GROQ_API_KEY |
Groq API key (get one) |
"" |
HUGGINGFACE_API_KEY |
HuggingFace API key (get one) |
"" |
ANTHROPIC_API_KEY |
Anthropic API key |
"" |
OLLAMA_BASE_URL |
Ollama server URL |
http://localhost:11434 |
OLLAMA_MODEL |
Default Ollama model |
llama3.1:8b |
DEFAULT_FAST_MODEL |
Override model for classification/extraction |
auto-detected |
DEFAULT_REASONING_MODEL |
Override model for reasoning tasks |
auto-detected |
DEFAULT_SYNTHESIS_MODEL |
Override model for report synthesis |
auto-detected |
LOCAL_LLM_BACKEND |
Local LLM backend (ollama, llamacpp, lmstudio) |
ollama |
TAVILY_API_KEY |
Tavily search API key (get one) |
"" |
GITHUB_TOKEN |
GitHub personal access token for code retrieval |
"" |
SEMANTIC_SCHOLAR_API_KEY |
Semantic Scholar API key |
"" |
REDIS_URL |
Redis connection URL |
redis://localhost:6379/0 |
KAFKA_ENABLED |
Enable Kafka event streaming |
false |
KAFKA_BOOTSTRAP_SERVERS |
Kafka broker addresses |
localhost:9092 |
API_KEY_SECRET |
Optional API key for endpoint authentication |
"" |
LOG_LEVEL |
Logging level |
INFO |
ENVIRONMENT |
Runtime environment (development, production, testing) |
development |
MAX_CONCURRENT_RETRIEVALS |
Max parallel retrieval tasks |
5 |
QUICK_MODE_TIMEOUT_SECONDS |
Timeout for quick research mode |
120 |
DEEP_MODE_TIMEOUT_SECONDS |
Timeout for deep research mode |
600 |
RATE_LIMIT_REQUESTS_PER_MINUTE |
API rate limit per client |
30 |
CONNECTIVITY_CHECK_INTERVAL_SECONDS |
Interval between connectivity probes |
60 |
SOURCE_CACHE_TTL_DAYS |
Days before cached sources expire |
7 |
| Method |
Endpoint |
Description |
POST |
/research |
Execute a research query and return a structured report |
POST |
/research/stream |
Execute research with real-time SSE streaming progress |
POST |
/research/clarify |
Check if a query needs clarification before starting |
POST |
/research/compare |
Run a query against two LLMs with side-by-side streaming |
POST |
/research/summarize-source |
Summarize source text using a local BART model |
GET |
/research/knowledge-graph |
Return the current knowledge graph for visualization |
GET |
/research/provider |
Return the active LLM provider name |
GET |
/research/models |
List available HuggingFace models and their status |
GET |
/research/demo/list |
List available pre-seeded demo queries |
GET |
/research/demo/{id} |
Load a pre-seeded demo result instantly |
| Method |
Endpoint |
Description |
GET |
/diagrams/types |
List all 8 supported diagram types with metadata |
POST |
/diagrams/detect-type |
Auto-detect the best diagram type for a query |
POST |
/diagrams/generate |
Generate a single Mermaid diagram of a given type |
POST |
/diagrams/generate-all |
Auto-detect types and generate multiple diagrams |
POST |
/diagrams/regenerate |
Regenerate a diagram with user feedback incorporated |
| Method |
Endpoint |
Description |
GET |
/knowledge-graph |
Full graph with optional category and strength filters |
GET |
/knowledge-graph/search |
Search entities by name (case-insensitive substring) |
GET |
/knowledge-graph/timeline |
Entities ordered by discovery date |
GET |
/knowledge-graph/entity/{id} |
Single entity with backlinks |
PUT |
/knowledge-graph/entity/{id} |
Update entity notes and tags |
DELETE |
/knowledge-graph/entity/{id} |
Delete an entity and all its relationships |
GET |
/knowledge-graph/backlinks/{id} |
Entities that link to a given entity |
POST |
/knowledge-graph/entity/merge |
Merge two entities into one |
POST |
/knowledge-graph/export |
Export graph as JSON, CSV, or Obsidian markdown |
| Method |
Endpoint |
Description |
GET |
/llm/status |
Current connectivity quality, active provider, routing mode |
GET |
/llm/status/stream |
SSE stream of real-time connectivity changes |
GET |
/llm/local/models |
List locally available models (Ollama, llama.cpp, LM Studio) |
POST |
/llm/local/switch |
Switch the active local model at runtime |
| Method |
Endpoint |
Description |
POST |
/fetch/url |
Fetch a URL and analyze its content (facts, entities, sentiment) |
POST |
/fetch/crawl |
Crawl from a URL at a specified depth |
GET |
/fetch/monitors |
List all monitored URLs |
POST |
/fetch/monitors |
Add a URL to the change watchlist |
DELETE |
/fetch/monitors/{id} |
Remove a URL from the watchlist |
| Method |
Endpoint |
Description |
GET |
/memory/history/{user_id} |
Retrieve research session history for a user |
GET |
/memory/knowledge/stats |
Knowledge graph entity and relationship counts |
GET |
/cache/health |
Redis cache health and connectivity status |
| Method |
Endpoint |
Description |
GET |
/health |
Basic health check |
GET |
/health/ollama |
Ollama connectivity check and available models |
GET |
/health/detailed |
Detailed health check with all dependency statuses |
potency-ai/
|-- app/
| |-- main.py # FastAPI application entry point and lifespan
| |-- config.py # Pydantic settings, .env loading
| |-- cli.py # CLI entry point (typer)
| |-- api/
| | |-- middleware/
| | | |-- auth.py # API key authentication middleware
| | | |-- rate_limit.py # Per-client rate limiting
| | |-- routes/
| | |-- research.py # Research pipeline endpoints
| | |-- diagrams.py # Diagram generation endpoints
| | |-- knowledge_graph.py # Knowledge graph CRUD and export
| | |-- llm.py # LLM status, model switching, connectivity SSE
| | |-- fetch.py # Web fetcher, crawler, and page monitors
| | |-- memory.py # Session history and knowledge stats
| | |-- cache.py # Cache health endpoint
| | |-- health.py # Health check endpoints
| |-- core/
| | |-- orchestrator.py # Main 9-stage research pipeline controller
| | |-- events.py # SSE event emitter and pipeline stage enum
| | |-- intent.py # LLM-based query intent classification
| | |-- planner.py # Research plan decomposition into sub-tasks
| | |-- pipeline.py # Data models (ResearchReport, ResearchMode, etc.)
| |-- llm/
| | |-- router.py # Hybrid LLM router (cloud / local / offline)
| | |-- providers.py # LiteLLM wrapper with multi-provider fallback
| | |-- connectivity.py # Real-time connectivity monitor with adaptive polling
| | |-- local_adapter.py # Ollama, llama.cpp, and LM Studio adapter
| | |-- hf_models.py # HuggingFace model catalog and local inference
| | |-- prompts.py # All prompt templates (10+)
| |-- retrieval/
| | |-- aggregator.py # Multi-source parallel retrieval orchestrator
| | |-- documentation.py # Documentation retriever (Tavily)
| | |-- papers.py # Academic paper retriever (Semantic Scholar)
| | |-- blogs.py # Blog and article retriever
| | |-- code.py # Code repository retriever (GitHub)
| | |-- web.py # Web search retriever
| | |-- web_fetcher.py # URL fetcher and multi-page crawler
| | |-- reranker.py # Semantic reranking (sentence-transformers, BAAI/bge)
| | |-- cache.py # Source caching for offline use
| | |-- monitor.py # URL change detection and monitoring
| |-- reasoning/
| | |-- architecture.py # Architecture pattern analysis module
| | |-- tradeoff.py # Technology tradeoff comparison module
| | |-- performance.py # Performance and benchmark evaluation module
| | |-- code_quality.py # Code quality review module
| | |-- base.py # Base reasoning module interface
| |-- synthesis/
| | |-- engine.py # Report generation (streaming and batch)
| | |-- templates.py # Report section templates
| | |-- export.py # Export utilities
| |-- diagrams/
| | |-- engine.py # Mermaid generation, validation, auto-fix, retry
| | |-- types.py # 8 diagram type specs and auto-detection logic
| |-- memory/
| | |-- knowledge_graph.py # SQLite-backed knowledge graph with dedup
| | |-- manager.py # Memory manager (KG + sessions + context)
| | |-- session.py # Session history tracking
| | |-- user_prefs.py # User preference storage
| |-- analysis/
| | |-- page_analyzer.py # Web page content analysis (facts, entities)
| |-- cache/
| | |-- redis_client.py # Redis cache client with file-based fallback
| |-- events/
| | |-- kafka_producer.py # Kafka event producer (optional)
| |-- utils/
| |-- errors.py # Custom exception hierarchy
| |-- logging.py # Structured logging (structlog)
| |-- tokens.py # Token usage tracking and cost calculation
|-- static/
| |-- index.html # Single-page application shell
| |-- css/style.css # Tailwind-based dark glass UI
| |-- js/
| |-- app.js # Main app logic, SSE handling, demo mode
| |-- research.js # Pipeline visualization and source cards
| |-- knowledge.js # Cytoscape.js knowledge graph visualization
| |-- diagrams.js # Mermaid diagram rendering and export
| |-- compare.js # Side-by-side model comparison UI
| |-- charts.js # Chart utilities
|-- tests/
| |-- unit/ # Unit tests (20+ test files)
| |-- integration/ # Integration tests
| |-- conftest.py # Shared test fixtures
|-- data/
| |-- knowledge_graph.db # SQLite knowledge graph database
| |-- demo/ # Pre-seeded demo query results
|-- scripts/
| |-- seed_data.py # Database seeding script
| |-- setup_db.py # Database setup script
|-- .env.example # Annotated environment configuration template
|-- requirements.txt # Python dependencies
|-- pyproject.toml # Project metadata, tool config, CLI entry point
|-- Dockerfile # Container image (Python 3.11-slim)
|-- docker-compose.yml # Full stack: app + Redis + ChromaDB
| Layer |
Technology |
Purpose |
| Backend |
FastAPI + Uvicorn |
Async web framework with auto-generated OpenAPI docs |
| LLM Routing |
LiteLLM |
Unified interface to Gemini, OpenAI, Groq, HuggingFace, Ollama |
| Streaming |
Server-Sent Events (SSE) |
Real-time pipeline progress and token streaming |
| Knowledge Graph |
SQLite via aiosqlite |
Persistent entity/relationship storage with session tracking |
| Vector Search |
ChromaDB |
Embedding-based retrieval (optional) |
| Cache |
Redis (hiredis) |
Query and source caching with configurable TTL |
| Event Bus |
Apache Kafka via aiokafka |
Optional pipeline event streaming |
| Semantic Reranking |
sentence-transformers (BAAI/bge) |
Local cross-encoder reranking of retrieved sources |
| Web Retrieval |
Tavily, Semantic Scholar, GitHub API |
Multi-source parallel document search |
| Content Extraction |
trafilatura, BeautifulSoup4 |
Clean text extraction from web pages |
| Diagrams |
Mermaid.js v11 |
8 diagram types rendered client-side |
| Graph Visualization |
Cytoscape.js |
Interactive knowledge graph in the browser |
| Frontend |
Vanilla JS + Tailwind CSS |
Single-page application with no build step |
| Validation |
Pydantic v2 + pydantic-settings |
Request/response validation and .env configuration |
| Logging |
structlog |
Structured JSON logging |
| Metrics |
prometheus-client |
Prometheus-compatible metrics export |
| Testing |
pytest + pytest-asyncio |
Async-first test suite with respx for HTTP mocking |
| Linting |
Ruff |
Fast Python linter and formatter |
| Type Checking |
mypy (strict mode) |
Static type analysis |
| Containerization |
Docker + Docker Compose |
Reproducible multi-service deployments |
This project is licensed under the MIT License. See pyproject.toml for details.