Oh, God! My idea comes true.

AI-powered research workflow: paper discovery → LLM analysis → scholar tracking → Paper2Code → multi-agent studio

Getting Started · Features · Roadmap · Architecture · Contributing

About

"Oh, God! My idea comes true." is an end-to-end research assistant that automates the paper discovery → analysis → reproduction pipeline. It combines multi-source search, LLM-powered evaluation, scholar tracking, and code generation into a unified workflow with Web, CLI, and API interfaces.

Backend Python + FastAPI (SSE streaming) · Frontend Next.js + Ink CLI · Sources arXiv / Semantic Scholar / OpenAlex / HuggingFace Daily Papers / papers.cool

Screenshots

Web Dashboard

Current dashboard layout focused on the active research question, the workflow console, and decision-critical alerts.

Research Workspace	AgentSwarm Studio

LLM-as-Judge Radar	Email Push

Terminal UI (Ink)

Features

Discovery & Analysis

Multi-source search — Aggregate arXiv, Semantic Scholar, OpenAlex, HF Daily Papers, papers.cool with cross-query dedup and scoring
DailyPaper — Automated daily report generation with SSE streaming, LLM enrichment (summary / trends / insight), and multi-channel push (Email / Slack / DingTalk / Telegram / Discord / WeCom / Feishu)
LLM-as-Judge — 5-dimensional scoring (Relevance / Novelty / Rigor / Impact / Clarity) with multi-round calibration, automatic filtering of low-quality papers
Deadline Radar — Conference deadline tracking with CCF ranking and research track matching

Knowledge Management

Paper Library — Save, organize, and export papers (BibTeX / RIS / Markdown / CSL-JSON / Zotero sync)
Structured Cards — LLM-extracted method / dataset / conclusion / limitations with DB caching
Related Work — Draft generation from saved papers with [AuthorYear] citation format
Memory System — Research memory with FTS5 + BM25 search, context engine for personalized recommendations
MemoryBench Suite — Retrieval / context / isolation / injection / performance / ROI / effectiveness benchmarks for the memory and Paper2Code stack

Reproduction & Studio

Paper2Code — Paper → code skeleton (Planning → Analysis → Generation → Verification) with self-healing debugging
AgentSwarm — Multi-agent orchestration platform with Claude Code integration, Runbook file management, Diff/Snapshot, and sandbox execution (Docker / E2B)
Scholar Tracking — Multi-agent monitoring with PIS influence scoring (citation velocity, trend momentum)
Deep Review — Simulated peer review (screening → critique → decision)

Getting Started

Install

# Use python3 for macOS/Linux
python -m venv .venv && source .venv/bin/activate
pip install -e .

Configure

cp env.example .env
# Set at least one LLM key: OPENAI_API_KEY=sk-...

LLM routing configuration

Multiple LLM backends supported via ModelRouter:

Task Type	Route	Example Models
default / extraction / summary	default	gpt-4o-mini / MiniMax M2.1
analysis / reasoning / judge	reasoning	DeepSeek R1 / GLM 4.7
code	code	gpt-4o

Push notification configuration

DailyPaper supports Email / Slack / DingTalk / Telegram / Discord / WeCom / Feishu push.

Web UI — Configure in the Topic Workflow settings panel (recommended).

Environment variables:

PAPERBOT_NOTIFY_ENABLED=true
PAPERBOT_NOTIFY_CHANNELS=email,slack
PAPERBOT_NOTIFY_SMTP_HOST=smtp.qq.com
PAPERBOT_NOTIFY_SMTP_PORT=587
PAPERBOT_NOTIFY_SMTP_USERNAME=your@qq.com
PAPERBOT_NOTIFY_SMTP_PASSWORD=your-auth-code
PAPERBOT_NOTIFY_EMAIL_FROM=your@qq.com
PAPERBOT_NOTIFY_EMAIL_TO=recipient@example.com

Run

# Database migration (first time)
alembic upgrade head

# API server
# Use python3 for macOS/Linux
python -m uvicorn src.paperbot.api.main:app --reload --port 8000

# Web dashboard (separate terminal)
cd web && npm install && npm run dev

# Background jobs (optional)
arq paperbot.infrastructure.queue.arq_worker.WorkerSettings

CLI Usage

# Daily paper with LLM + Judge + push
python -m paperbot.presentation.cli.main daily-paper \
  -q "LLM reasoning" -q "code generation" \
  --with-llm --with-judge --save --notify

# Topic search
python -m paperbot.presentation.cli.main topic-search \
  -q "ICL compression" --source arxiv_api --source hf_daily

# Scholar tracking
python main.py track --summary

# Paper2Code
python main.py gen-code --title "..." --abstract "..." --output-dir ./output

# Deep review
python main.py review --title "..." --abstract "..."

Architecture

Editable source: Excalidraw · draw.io

Module Status

Full maturity matrix and progress: Roadmap #232

Status	Modules
Production	Topic Search · DailyPaper · LLM-as-Judge · Push/Notify · Model Provider · Deadline Radar · Paper Library
Usable	Scholar Tracking · Deep Review · Paper2Code · Memory · Context Engine · Discovery · AgentSwarm · Harvest · Import/Sync
Planned	DB Modernization #231 · Obsidian Integration #159

MemoryBench Evaluation

Aligned with LongMemEval (ICLR 2025), LoCoMo (ACL 2024), Mem0, Letta. Full methodology: evals/memory/README.md · Epic #283

Retrieval Quality — 40 queries, 45 memories, 2 users (FTS5 + BM25)

Metric	Target	Result
Recall@5	≥ 0.80	0.873	✅
MRR@10	≥ 0.65	0.731	✅
nDCG@10	≥ 0.70	0.747	✅
Hit@10	—	1.000

Breakdown by LoCoMo question type:

Type	Recall@5	MRR@10
single-hop (24)	0.931	0.770
multi-hop (6)	0.708	0.583
temporal (2)	1.000	0.417
acronym (4)	0.708	0.875

Scope Isolation + CRUD — zero-leak enforcement, Mem0 lifecycle

Check	Result
Cross-user leak rate	0 (zero tolerance)
Cross-scope leak rate	0 (zero tolerance)
CRUD Update (old content gone)	PASS
CRUD Delete (soft-delete enforced)	PASS
CRUD Dedup (exact duplicate skipped)	PASS

Context Extraction — L0-L3 layer assembly, Letta alignment

Test	Result
Layer completeness (L0 profile → L3 paper)	8/8 PASS
Graceful degradation (missing paper / empty user)	3/3 PASS
Context precision (query → relevant memories)	100% (3/3)
Token budget guard (300 token cap)	215 tokens
TrackRouter accuracy (query → correct track)	100% (5/5)

Injection Robustness — offline pattern detection

Metric	Target	Result
Pollution rate (missed malicious)	≤ 2%	0.0% (6/6 caught)
False positive rate (benign flagged)	—	0.0% (0/6 flagged)

Covers: instruction override, tag escape, special token injection, role hijack, Unicode bypass, privilege escalation.

# Run full MemoryBench suite (~6s, fully offline, no API keys needed)
PYTHONPATH=src pytest -q evals/memory/test_retrieval_bench.py \
  evals/memory/test_scope_isolation.py \
  evals/memory/test_context_extraction.py \
  evals/memory/test_injection_robustness.py -s

Roadmap

Roadmap #232 — Living roadmap organized by functional area, with checkbox tracking and Epic links.

Active Epics:

Epic	Area	Status
#197	AgentSwarm Studio	Foundation
#231	DB Infrastructure	Planning
#153	Memory & Context	P0-P1 done
#154	Agentic Research	Design done
#179	Daily Push	Complete
#283	MemoryBench	Complete
#159	Obsidian CLI	Not started

Contributing

Pick an unchecked item from the Roadmap
Check the linked Epic for detailed requirements
Open a PR targeting dev branch
Follow Conventional Commits format

# Run tests
pytest -q

# Format
python -m black . && python -m isort .

Documentation

Doc	Description
Roadmap #232	Living project roadmap
`docs/PLAN.md`	Architecture assessment
`docs/PAPERSCOOL_WORKFLOW.md`	Topic Workflow guide
`docs/p2c/`	Paper2Context design docs
`docs/benchmark/MEMORYBENCH_EPIC_283_COMPLETION.md`	MemoryBench Epic completion report
`docs/benchmark/MEMORYBENCH_RUNTIME_REPORT_2026-03-07.md`	Live ROI + 1M memory runtime report
`docs/search_eval.md`	Retrieval benchmark guide
`docs/document_evidence_eval.md`	Document evidence retrieval benchmark guide
`docs/context_engine_eval.md`	Context extraction benchmark guide
`docs/memory_performance_eval.md`	Memory performance benchmark guide
`docs/p2c/P2C_ROI_BENCHMARK.md`	ROI benchmark guide
`docs/memory_effectiveness_eval.md`	Multi-session memory effectiveness benchmark guide
`docs/memory_system.md`	Memory system design
`docs/anchor_system.md`	Anchor author system
`docs/AGENTIC_RESEARCH_EVOLUTION.md`	Agentic Research evolution plan

Acknowledgements

Qc-TX — Crawler contributions
BettaFish — Multi-agent collaboration reference
OpenClaw — Memory architecture reference

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Oh, God! My idea comes true.

AI-powered research workflow: paper discovery → LLM analysis → scholar tracking → Paper2Code → multi-agent studio

About

Screenshots

Features

Discovery & Analysis

Knowledge Management

Reproduction & Studio

Getting Started

Install

Configure

Run

CLI Usage

Architecture

Module Status

MemoryBench Evaluation

Roadmap

Contributing

Documentation

Acknowledgements

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Oh, God! My idea comes true.

AI-powered research workflow: paper discovery → LLM analysis → scholar tracking → Paper2Code → multi-agent studio

About

Screenshots

Features

Discovery & Analysis

Knowledge Management

Reproduction & Studio

Getting Started

Install

Configure

Run

CLI Usage

Architecture

Module Status

MemoryBench Evaluation

Roadmap

Contributing

Documentation

Acknowledgements

License