Version: 4.0.0 | Total LOC: ~5,500+ | Modules: 14 | Test Files: 32
contribai/
├── core/ # Foundational abstractions (1,100 LOC)
├── llm/ # Multi-provider LLM routing (900 LOC)
├── github/ # GitHub API + discovery (550 LOC)
├── analysis/ # 7 analyzers + 17 skills (700 LOC)
├── generator/ # Code fix generation (300 LOC)
├── orchestrator/ # Pipeline + hunt + memory (500 LOC)
├── pr/ # PR lifecycle management (542 LOC)
├── issues/ # Issue-driven contributions (339 LOC)
├── agents/ # Sub-agent registry (158 LOC)
├── tools/ # Tool protocol (59 LOC)
├── mcp_server.py # MCP stdio server (180 LOC)
├── web/ # FastAPI dashboard (324 LOC)
├── cli/ # Click CLI + TUI (150+ LOC)
├── scheduler/ # APScheduler wrapper (100 LOC)
├── plugins/ # Plugin system (161 LOC)
├── templates/ # Contribution templates (96 LOC)
├── notifications/ # Slack/Discord/Telegram (248 LOC)
├── sandbox/ # Docker code validation (244 LOC)
└── mcp/ # MCP client (180 LOC)
| Module | Purpose | Key Classes/Functions | LOC |
|---|---|---|---|
| core | Config, models, middleware, events, exceptions, utilities | Config, Middleware, EventBus, Repository, Finding, Contribution |
1,100 |
| llm | Multi-provider routing, token budgeting, formatting | LLMProvider, TaskRouter, ContextManager, Formatter |
900 |
| github | Async GitHub client, repo discovery, guidelines parsing | GitHubClient, RepoDiscovery, GuidelineParser |
550 |
| analysis | Multi-strategy code analysis, skill loading | CodeAnalyzer, SkillLoader, SecurityStrategy, CodeQualityStrategy |
700 |
| generator | LLM-powered fix generation, self-review, quality scoring | ContributionGenerator, QualityScorer |
300 |
| orchestrator | Pipeline coordination, hunt mode, outcome memory | Pipeline, HuntMode, Memory |
500 |
| pr | PR creation, patrol monitoring, CLA/DCO handling | PRManager, PRPatrol, CLAHandler |
542 |
| issues | Issue discovery and solving | IssueSolver |
339 |
| agents | Sub-agent registry with parallel execution | SubAgentRegistry, AnalyzerAgent, GeneratorAgent |
158 |
| tools | Tool protocol (MCP-inspired) | Tool, ToolResult, GitHubTool, LLMTool |
59 |
| mcp_server | MCP stdio server (14 tools for Claude) | MCPServer |
180 |
| web | FastAPI REST API, webhooks, dashboard | app, api_routes, webhook_handler |
324 |
| cli | Click-based CLI, Rich TUI | main, tui |
150+ |
| scheduler | APScheduler wrapper for cron automation | Scheduler |
100 |
| plugins | Entry-point plugin system | AnalyzerPlugin, GeneratorPlugin |
161 |
| templates | YAML-based contribution templates | TemplateRegistry |
96 |
| notifications | Slack/Discord/Telegram integrations | Notifier |
248 |
| sandbox | Docker-based code validation | Sandbox |
244 |
| mcp | MCP JSON-RPC client | MCPClient |
180 |
┌──────────────────┐
│ CLI / Web │
└────────┬─────────┘
│
┌──────────┴──────────┐
▼ ▼
┌─────────────┐ ┌──────────────┐
│ Orchestrator│ │ Scheduler │
│ + Pipeline │ │ │
└─────┬───────┘ └──────────────┘
│
┌─────────┼─────────┬──────────┐
▼ ▼ ▼ ▼
┌────────┐┌────────┐┌────────┐┌────────┐
│Analysis││Generator││ PR ││ Issues │
│ ││ ││Manager ││ Solver │
└────┬───┘└────┬───┘└────┬───┘└────┬───┘
│ │ │ │
└─────────┼─────────┴────────┘
│
┌──────────┴──────────┐
▼ ▼
┌─────────┐ ┌──────────┐
│ LLM │ │ GitHub │
│ Routing │ │ Client │
└────┬────┘ └────┬─────┘
│ │
└───────┬───────────┘
▼
┌──────────────┐
│ CORE │
│ (Models, │
│ Config, │
│ Middleware) │
└──────────────┘
Dependency Flow: core ← github/llm ← analysis/generator ← orchestrator ← cli/web
- File:
contribai/cli/main.py - Function:
cli()(Click group) - Main Commands:
hunt— Autonomous multi-round huntingrun— Single full pipeline runtarget— Analyze specific reposolve— Solve issues in a reposerve— Start web dashboardschedule— Start scheduler
- File:
contribai/web/server.py - Class:
app(FastAPI application) - Key Routes:
GET /api/stats— Overall statisticsGET /api/repos— Analyzed repos listPOST /api/run— Trigger pipelineGET /dashboard— Web UIPOST /webhooks/github— GitHub webhook receiver
- File:
contribai/orchestrator/pipeline.py - Class:
Pipeline - Key Methods:
run()— Execute pipeline on single repohunt()— Execute hunt modeprocess_repo()— Core analysis → generation → PR flow
- File:
contribai/mcp_server.py - Class:
MCPServer - Protocol: stdio JSON-RPC
- Exposed Tools: 14 (GitHub read/write, safety, maintenance)
| Class | Purpose | Key Fields |
|---|---|---|
Repository |
GitHub repo metadata | owner, name, url, stars, language, last_commit |
Finding |
Detected issue | type, file, line, description, severity, context |
Contribution |
Proposed fix | finding_id, code_change, explanation, confidence_score |
PRResult |
PR outcome | pr_number, url, status, feedback, time_to_merge |
Config |
Application config | github, llm, discovery, analysis, pipeline |
| Table | Purpose | Key Columns |
|---|---|---|
analyzed_repos |
Track analyzed repos | repo_id, timestamp, status |
submitted_prs |
All created PRs | repo_id, pr_number, url, status |
findings_cache |
Cached analysis results | repo_id, findings_json, timestamp |
run_log |
Pipeline execution history | timestamp, status, repo_count, pr_count |
pr_outcomes |
PR merge/close outcomes | pr_number, outcome, feedback, time_to_close |
repo_preferences |
Learned repo patterns | repo_id, preferred_types, rejected_types |
RepositoryDiscovered | RepositoryAnalyzed | FindingDetected | ContributionGenerated |
PRCreated | PRMerged | PRClosed | PRPatrolStarted | ReviewFound | CodeChangeGenerated |
ConfigLoaded | PipelineStarted | PipelineCompleted | ErrorOccurred | RateLimitExceeded| Category | Technologies |
|---|---|
| Language | Python 3.11+ |
| Web | FastAPI, Uvicorn, Jinja2 |
| GitHub | GitPython, httpx (async) |
| LLM | google-genai, openai, anthropic |
| Data | Pydantic, aiosqlite, SQLite |
| CLI | Click, Rich |
| Scheduling | APScheduler |
| Task Runtime | asyncio |
| Code Validation | Docker (optional), ast.parse (fallback) |
| Testing | pytest, pytest-asyncio, pytest-cov |
| Linting | ruff |
| Type Checking | pyright (implicit) |
Each module follows this pattern:
module/
├── __init__.py # Public API exports
├── main_class.py # Primary class (e.g., analyzer.py)
├── sub_component.py # Supporting components
└── exceptions.py # Module-specific exceptions (optional)
- Source:
contribai/core/config.py(Pydantic model) - File:
config.yaml(YAML with schema validation) - Overrides: Environment variables (prefix:
CONTRIBAI_) - Presets: Named profiles (YAML files in
contribai/core/profiles.py)
- Location:
~/.contribai/memory.db(auto-initialized) - Type: SQLite 3.x
- Async: aiosqlite
- Migrations: Embedded in
Memory.init()method
- Emit Location: Throughout codebase
- Handling:
core/events.py(EventBus) - Logging:
~/.contribai/events.jsonl(append-only) - Consumption: Notifications, webhooks, monitoring
- All I/O: async (GitHub API, LLM API, file operations)
- Concurrency Control:
asyncio.gather()withSemaphore(max 3 concurrent repos) - Database Access: aiosqlite (async SQLite)
- Rate Limiting: Middleware intercepts, enforces delays
# Pipeline: async repo processing
for repo in repos:
task = pipeline.process_repo(repo)
tasks.append(task)
await asyncio.gather(*tasks, return_exceptions=True)
# LLM: async with retry
async with retry_handler(max_retries=3, backoff=exponential):
response = await llm_provider.complete(prompt)
# Database: async transaction
async with memory.transaction():
await memory.add_finding(finding)
await memory.increment_stats()- Defaults — Hardcoded in
Configclass - File —
config.yaml(overrides defaults) - Profiles — Named presets (override file)
- Environment — Env vars
CONTRIBAI_*(override profiles) - CLI Flags — Command-line args (override all)
Load Order: CLI flags → Env vars → Profile → YAML → Defaults
tests/
├── unit/ # Isolated module tests
│ ├── test_analyzer.py
│ ├── test_generator.py
│ ├── test_github_client.py
│ ├── test_llm_provider.py
│ ├── test_memory.py
│ ├── test_middleware.py
│ ├── test_pr_manager.py
│ └── ... (25+ test files)
├── integration/ # End-to-end pipeline tests
│ └── test_pipeline.py
└── conftest.py # Shared fixtures
Test Coverage: ~53% (threshold: 50%)
Test Tools:
pytest— Test frameworkpytest-asyncio— Async test supportpytest-cov— Coverage reportingunittest.mock— Mocking
async with GitHubClient(token) as client:
repos = await client.search_repos(language="python")class Config(BaseModel):
github: GitHubConfig
llm: LLMConfig
discovery: DiscoveryConfig
model_config = ConfigDict(validate_assignment=True)@middleware("rate_limit")
@middleware("validation")
@middleware("retry")
async def process_repo(repo: Repository) -> PipelineResult:
...def create_llm_provider(config: LLMConfig) -> LLMProvider:
if config.provider == "gemini":
return GeminiProvider(config)
elif config.provider == "openai":
return OpenAIProvider(config)event_bus.emit(RepositoryAnalyzed(
repo=repo,
findings_count=len(findings),
timestamp=datetime.now()
))from .analyzer import CodeAnalyzer
from .skills import load_skillsfrom contribai.core.models import Repository, Finding
from contribai.llm.provider import create_llm_provider
from contribai.analysis.analyzer import CodeAnalyzerUse TYPE_CHECKING for type hints:
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from contribai.core.models import Repository| Component | Optimization | Details |
|---|---|---|
| LLM Context | Token budgeting | Max 30k tokens per analysis |
| GitHub API | Rate limit respect | Check limits before burst requests |
| Database | Batch inserts | Insert 100+ findings in one transaction |
| Analysis | Progressive skills | Load only needed skills (by language/framework) |
| Concurrency | Semaphore(3) | Max 3 repos processed simultaneously |
| Caching | 72h TTL | Cache analysis results per repo |
- Never log API keys, tokens, or credentials
- Use env vars for sensitive config
- Validate LLM output before code execution
- Sandbox execution (Docker or ast.parse)
- Syntax validation before commit
- Balanced bracket check before parsing
- ast.parse fallback if Docker unavailable
- Manual review gate (optional) for high-risk changes
- GitHub token required (PAT scope:
repo,workflow) - API key auth on dashboard endpoints
- HMAC validation on webhooks
- No direct shell exec (use GitPython, httpx)
- Created: 2026-03-28
- Last Updated: 2026-03-28
- Owner: Technical Writer / Documentation Team
- References: README.md, docs/ARCHITECTURE.md, docs/system-architecture.md