AI-powered research workflow: paper discovery → LLM analysis → scholar tracking → Paper2Code → multi-agent studio
Getting Started · Features · Roadmap · Architecture · Contributing
"Oh, God! My idea comes true." is an end-to-end research assistant that automates the paper discovery → analysis → reproduction pipeline. It combines multi-source search, LLM-powered evaluation, scholar tracking, and code generation into a unified workflow with Web, CLI, and API interfaces.
Backend Python + FastAPI (SSE streaming) · Frontend Next.js + Ink CLI · Sources arXiv / Semantic Scholar / OpenAlex / HuggingFace Daily Papers / papers.cool
Web Dashboard
Current dashboard layout focused on the active research question, the workflow console, and decision-critical alerts.
| Research Workspace | AgentSwarm Studio |
|---|---|
![]() |
![]() |
| LLM-as-Judge Radar | Email Push |
|---|---|
![]() |
![]() |
- Multi-source search — Aggregate arXiv, Semantic Scholar, OpenAlex, HF Daily Papers, papers.cool with cross-query dedup and scoring
- DailyPaper — Automated daily report generation with SSE streaming, LLM enrichment (summary / trends / insight), and multi-channel push (Email / Slack / DingTalk / Telegram / Discord / WeCom / Feishu)
- LLM-as-Judge — 5-dimensional scoring (Relevance / Novelty / Rigor / Impact / Clarity) with multi-round calibration, automatic filtering of low-quality papers
- Deadline Radar — Conference deadline tracking with CCF ranking and research track matching
- Paper Library — Save, organize, and export papers (BibTeX / RIS / Markdown / CSL-JSON / Zotero sync)
- Structured Cards — LLM-extracted method / dataset / conclusion / limitations with DB caching
- Related Work — Draft generation from saved papers with [AuthorYear] citation format
- Memory System — Research memory with FTS5 + BM25 search, context engine for personalized recommendations
- MemoryBench Suite — Retrieval / context / isolation / injection / performance / ROI / effectiveness benchmarks for the memory and Paper2Code stack
- Paper2Code — Paper → code skeleton (Planning → Analysis → Generation → Verification) with self-healing debugging
- AgentSwarm — Multi-agent orchestration platform with Claude Code integration, Runbook file management, Diff/Snapshot, and sandbox execution (Docker / E2B)
- Scholar Tracking — Multi-agent monitoring with PIS influence scoring (citation velocity, trend momentum)
- Deep Review — Simulated peer review (screening → critique → decision)
# Use python3 for macOS/Linux
python -m venv .venv && source .venv/bin/activate
pip install -e .cp env.example .env
# Set at least one LLM key: OPENAI_API_KEY=sk-...LLM routing configuration
Multiple LLM backends supported via ModelRouter:
| Task Type | Route | Example Models |
|---|---|---|
| default / extraction / summary | default | gpt-4o-mini / MiniMax M2.1 |
| analysis / reasoning / judge | reasoning | DeepSeek R1 / GLM 4.7 |
| code | code | gpt-4o |
Push notification configuration
DailyPaper supports Email / Slack / DingTalk / Telegram / Discord / WeCom / Feishu push.
Web UI — Configure in the Topic Workflow settings panel (recommended).
Environment variables:
PAPERBOT_NOTIFY_ENABLED=true
PAPERBOT_NOTIFY_CHANNELS=email,slack
PAPERBOT_NOTIFY_SMTP_HOST=smtp.qq.com
PAPERBOT_NOTIFY_SMTP_PORT=587
PAPERBOT_NOTIFY_SMTP_USERNAME=your@qq.com
PAPERBOT_NOTIFY_SMTP_PASSWORD=your-auth-code
PAPERBOT_NOTIFY_EMAIL_FROM=your@qq.com
PAPERBOT_NOTIFY_EMAIL_TO=recipient@example.com# Database migration (first time)
alembic upgrade head
# API server
# Use python3 for macOS/Linux
python -m uvicorn src.paperbot.api.main:app --reload --port 8000
# Web dashboard (separate terminal)
cd web && npm install && npm run dev
# Background jobs (optional)
arq paperbot.infrastructure.queue.arq_worker.WorkerSettings# Daily paper with LLM + Judge + push
python -m paperbot.presentation.cli.main daily-paper \
-q "LLM reasoning" -q "code generation" \
--with-llm --with-judge --save --notify
# Topic search
python -m paperbot.presentation.cli.main topic-search \
-q "ICL compression" --source arxiv_api --source hf_daily
# Scholar tracking
python main.py track --summary
# Paper2Code
python main.py gen-code --title "..." --abstract "..." --output-dir ./output
# Deep review
python main.py review --title "..." --abstract "..."Editable source: Excalidraw · draw.io
Full maturity matrix and progress: Roadmap #232
| Status | Modules |
|---|---|
| Production | Topic Search · DailyPaper · LLM-as-Judge · Push/Notify · Model Provider · Deadline Radar · Paper Library |
| Usable | Scholar Tracking · Deep Review · Paper2Code · Memory · Context Engine · Discovery · AgentSwarm · Harvest · Import/Sync |
| Planned | DB Modernization #231 · Obsidian Integration #159 |
Aligned with LongMemEval (ICLR 2025), LoCoMo (ACL 2024), Mem0, Letta. Full methodology:
evals/memory/README.md· Epic #283
Retrieval Quality — 40 queries, 45 memories, 2 users (FTS5 + BM25)
| Metric | Target | Result | |
|---|---|---|---|
| Recall@5 | ≥ 0.80 | 0.873 | ✅ |
| MRR@10 | ≥ 0.65 | 0.731 | ✅ |
| nDCG@10 | ≥ 0.70 | 0.747 | ✅ |
| Hit@10 | — | 1.000 |
Breakdown by LoCoMo question type:
| Type | Recall@5 | MRR@10 |
|---|---|---|
| single-hop (24) | 0.931 | 0.770 |
| multi-hop (6) | 0.708 | 0.583 |
| temporal (2) | 1.000 | 0.417 |
| acronym (4) | 0.708 | 0.875 |
Scope Isolation + CRUD — zero-leak enforcement, Mem0 lifecycle
| Check | Result |
|---|---|
| Cross-user leak rate | 0 (zero tolerance) |
| Cross-scope leak rate | 0 (zero tolerance) |
| CRUD Update (old content gone) | PASS |
| CRUD Delete (soft-delete enforced) | PASS |
| CRUD Dedup (exact duplicate skipped) | PASS |
Context Extraction — L0-L3 layer assembly, Letta alignment
| Test | Result |
|---|---|
| Layer completeness (L0 profile → L3 paper) | 8/8 PASS |
| Graceful degradation (missing paper / empty user) | 3/3 PASS |
| Context precision (query → relevant memories) | 100% (3/3) |
| Token budget guard (300 token cap) | 215 tokens |
| TrackRouter accuracy (query → correct track) | 100% (5/5) |
Injection Robustness — offline pattern detection
| Metric | Target | Result |
|---|---|---|
| Pollution rate (missed malicious) | ≤ 2% | 0.0% (6/6 caught) |
| False positive rate (benign flagged) | — | 0.0% (0/6 flagged) |
Covers: instruction override, tag escape, special token injection, role hijack, Unicode bypass, privilege escalation.
# Run full MemoryBench suite (~6s, fully offline, no API keys needed)
PYTHONPATH=src pytest -q evals/memory/test_retrieval_bench.py \
evals/memory/test_scope_isolation.py \
evals/memory/test_context_extraction.py \
evals/memory/test_injection_robustness.py -sRoadmap #232 — Living roadmap organized by functional area, with checkbox tracking and Epic links.
Active Epics:
| Epic | Area | Status |
|---|---|---|
| #197 | AgentSwarm Studio | Foundation |
| #231 | DB Infrastructure | Planning |
| #153 | Memory & Context | P0-P1 done |
| #154 | Agentic Research | Design done |
| #179 | Daily Push | Complete |
| #283 | MemoryBench | Complete |
| #159 | Obsidian CLI | Not started |
- Pick an unchecked item from the Roadmap
- Check the linked Epic for detailed requirements
- Open a PR targeting
devbranch - Follow Conventional Commits format
# Run tests
pytest -q
# Format
python -m black . && python -m isort .| Doc | Description |
|---|---|
| Roadmap #232 | Living project roadmap |
docs/PLAN.md |
Architecture assessment |
docs/PAPERSCOOL_WORKFLOW.md |
Topic Workflow guide |
docs/p2c/ |
Paper2Context design docs |
docs/benchmark/MEMORYBENCH_EPIC_283_COMPLETION.md |
MemoryBench Epic completion report |
docs/benchmark/MEMORYBENCH_RUNTIME_REPORT_2026-03-07.md |
Live ROI + 1M memory runtime report |
docs/search_eval.md |
Retrieval benchmark guide |
docs/document_evidence_eval.md |
Document evidence retrieval benchmark guide |
docs/context_engine_eval.md |
Context extraction benchmark guide |
docs/memory_performance_eval.md |
Memory performance benchmark guide |
docs/p2c/P2C_ROI_BENCHMARK.md |
ROI benchmark guide |
docs/memory_effectiveness_eval.md |
Multi-session memory effectiveness benchmark guide |
docs/memory_system.md |
Memory system design |
docs/anchor_system.md |
Anchor author system |
docs/AGENTIC_RESEARCH_EVOLUTION.md |
Agentic Research evolution plan |
- Qc-TX — Crawler contributions
- BettaFish — Multi-agent collaboration reference
- OpenClaw — Memory architecture reference
MIT






