Skip to content

Latest commit

 

History

History
332 lines (243 loc) · 12.6 KB

File metadata and controls

332 lines (243 loc) · 12.6 KB

Oh, God! My idea comes true.

AI-powered research workflow: paper discovery → LLM analysis → scholar tracking → Paper2Code → multi-agent studio

CI Roadmap Version 0.1.0 License

Python Next.js Platform Downloads

Getting Started · Features · Roadmap · Architecture · Contributing


About

"Oh, God! My idea comes true." is an end-to-end research assistant that automates the paper discovery → analysis → reproduction pipeline. It combines multi-source search, LLM-powered evaluation, scholar tracking, and code generation into a unified workflow with Web, CLI, and API interfaces.

Backend Python + FastAPI (SSE streaming) · Frontend Next.js + Ink CLI · Sources arXiv / Semantic Scholar / OpenAlex / HuggingFace Daily Papers / papers.cool

Screenshots

Web Dashboard

Current dashboard layout focused on the active research question, the workflow console, and decision-critical alerts.

Dashboard

Research Workspace AgentSwarm Studio
Research Studio
LLM-as-Judge Radar Email Push
Judge Email
Terminal UI (Ink)

CLI

Features

Discovery & Analysis

  • Multi-source search — Aggregate arXiv, Semantic Scholar, OpenAlex, HF Daily Papers, papers.cool with cross-query dedup and scoring
  • DailyPaper — Automated daily report generation with SSE streaming, LLM enrichment (summary / trends / insight), and multi-channel push (Email / Slack / DingTalk / Telegram / Discord / WeCom / Feishu)
  • LLM-as-Judge — 5-dimensional scoring (Relevance / Novelty / Rigor / Impact / Clarity) with multi-round calibration, automatic filtering of low-quality papers
  • Deadline Radar — Conference deadline tracking with CCF ranking and research track matching

Knowledge Management

  • Paper Library — Save, organize, and export papers (BibTeX / RIS / Markdown / CSL-JSON / Zotero sync)
  • Structured Cards — LLM-extracted method / dataset / conclusion / limitations with DB caching
  • Related Work — Draft generation from saved papers with [AuthorYear] citation format
  • Memory System — Research memory with FTS5 + BM25 search, context engine for personalized recommendations
  • MemoryBench Suite — Retrieval / context / isolation / injection / performance / ROI / effectiveness benchmarks for the memory and Paper2Code stack

Reproduction & Studio

  • Paper2Code — Paper → code skeleton (Planning → Analysis → Generation → Verification) with self-healing debugging
  • AgentSwarm — Multi-agent orchestration platform with Claude Code integration, Runbook file management, Diff/Snapshot, and sandbox execution (Docker / E2B)
  • Scholar Tracking — Multi-agent monitoring with PIS influence scoring (citation velocity, trend momentum)
  • Deep Review — Simulated peer review (screening → critique → decision)

Getting Started

Install

# Use python3 for macOS/Linux
python -m venv .venv && source .venv/bin/activate
pip install -e .

Configure

cp env.example .env
# Set at least one LLM key: OPENAI_API_KEY=sk-...
LLM routing configuration

Multiple LLM backends supported via ModelRouter:

Task Type Route Example Models
default / extraction / summary default gpt-4o-mini / MiniMax M2.1
analysis / reasoning / judge reasoning DeepSeek R1 / GLM 4.7
code code gpt-4o
Push notification configuration

DailyPaper supports Email / Slack / DingTalk / Telegram / Discord / WeCom / Feishu push.

Web UI — Configure in the Topic Workflow settings panel (recommended).

Environment variables:

PAPERBOT_NOTIFY_ENABLED=true
PAPERBOT_NOTIFY_CHANNELS=email,slack
PAPERBOT_NOTIFY_SMTP_HOST=smtp.qq.com
PAPERBOT_NOTIFY_SMTP_PORT=587
PAPERBOT_NOTIFY_SMTP_USERNAME=your@qq.com
PAPERBOT_NOTIFY_SMTP_PASSWORD=your-auth-code
PAPERBOT_NOTIFY_EMAIL_FROM=your@qq.com
PAPERBOT_NOTIFY_EMAIL_TO=recipient@example.com

Run

# Database migration (first time)
alembic upgrade head

# API server
# Use python3 for macOS/Linux
python -m uvicorn src.paperbot.api.main:app --reload --port 8000

# Web dashboard (separate terminal)
cd web && npm install && npm run dev

# Background jobs (optional)
arq paperbot.infrastructure.queue.arq_worker.WorkerSettings

CLI Usage

# Daily paper with LLM + Judge + push
python -m paperbot.presentation.cli.main daily-paper \
  -q "LLM reasoning" -q "code generation" \
  --with-llm --with-judge --save --notify

# Topic search
python -m paperbot.presentation.cli.main topic-search \
  -q "ICL compression" --source arxiv_api --source hf_daily

# Scholar tracking
python main.py track --summary

# Paper2Code
python main.py gen-code --title "..." --abstract "..." --output-dir ./output

# Deep review
python main.py review --title "..." --abstract "..."

Architecture

Architecture

Editable source: Excalidraw · draw.io

Module Status

Full maturity matrix and progress: Roadmap #232

Status Modules
Production Topic Search · DailyPaper · LLM-as-Judge · Push/Notify · Model Provider · Deadline Radar · Paper Library
Usable Scholar Tracking · Deep Review · Paper2Code · Memory · Context Engine · Discovery · AgentSwarm · Harvest · Import/Sync
Planned DB Modernization #231 · Obsidian Integration #159

MemoryBench Evaluation

Aligned with LongMemEval (ICLR 2025), LoCoMo (ACL 2024), Mem0, Letta. Full methodology: evals/memory/README.md · Epic #283

Retrieval Quality — 40 queries, 45 memories, 2 users (FTS5 + BM25)
Metric Target Result
Recall@5 ≥ 0.80 0.873
MRR@10 ≥ 0.65 0.731
nDCG@10 ≥ 0.70 0.747
Hit@10 1.000

Breakdown by LoCoMo question type:

Type Recall@5 MRR@10
single-hop (24) 0.931 0.770
multi-hop (6) 0.708 0.583
temporal (2) 1.000 0.417
acronym (4) 0.708 0.875
Scope Isolation + CRUD — zero-leak enforcement, Mem0 lifecycle
Check Result
Cross-user leak rate 0 (zero tolerance)
Cross-scope leak rate 0 (zero tolerance)
CRUD Update (old content gone) PASS
CRUD Delete (soft-delete enforced) PASS
CRUD Dedup (exact duplicate skipped) PASS
Context Extraction — L0-L3 layer assembly, Letta alignment
Test Result
Layer completeness (L0 profile → L3 paper) 8/8 PASS
Graceful degradation (missing paper / empty user) 3/3 PASS
Context precision (query → relevant memories) 100% (3/3)
Token budget guard (300 token cap) 215 tokens
TrackRouter accuracy (query → correct track) 100% (5/5)
Injection Robustness — offline pattern detection
Metric Target Result
Pollution rate (missed malicious) ≤ 2% 0.0% (6/6 caught)
False positive rate (benign flagged) 0.0% (0/6 flagged)

Covers: instruction override, tag escape, special token injection, role hijack, Unicode bypass, privilege escalation.

# Run full MemoryBench suite (~6s, fully offline, no API keys needed)
PYTHONPATH=src pytest -q evals/memory/test_retrieval_bench.py \
  evals/memory/test_scope_isolation.py \
  evals/memory/test_context_extraction.py \
  evals/memory/test_injection_robustness.py -s

Roadmap

Roadmap #232 — Living roadmap organized by functional area, with checkbox tracking and Epic links.

Active Epics:

Epic Area Status
#197 AgentSwarm Studio Foundation
#231 DB Infrastructure Planning
#153 Memory & Context P0-P1 done
#154 Agentic Research Design done
#179 Daily Push Complete
#283 MemoryBench Complete
#159 Obsidian CLI Not started

Contributing

  1. Pick an unchecked item from the Roadmap
  2. Check the linked Epic for detailed requirements
  3. Open a PR targeting dev branch
  4. Follow Conventional Commits format
# Run tests
pytest -q

# Format
python -m black . && python -m isort .

Documentation

Doc Description
Roadmap #232 Living project roadmap
docs/PLAN.md Architecture assessment
docs/PAPERSCOOL_WORKFLOW.md Topic Workflow guide
docs/p2c/ Paper2Context design docs
docs/benchmark/MEMORYBENCH_EPIC_283_COMPLETION.md MemoryBench Epic completion report
docs/benchmark/MEMORYBENCH_RUNTIME_REPORT_2026-03-07.md Live ROI + 1M memory runtime report
docs/search_eval.md Retrieval benchmark guide
docs/document_evidence_eval.md Document evidence retrieval benchmark guide
docs/context_engine_eval.md Context extraction benchmark guide
docs/memory_performance_eval.md Memory performance benchmark guide
docs/p2c/P2C_ROI_BENCHMARK.md ROI benchmark guide
docs/memory_effectiveness_eval.md Multi-session memory effectiveness benchmark guide
docs/memory_system.md Memory system design
docs/anchor_system.md Anchor author system
docs/AGENTIC_RESEARCH_EVOLUTION.md Agentic Research evolution plan

Acknowledgements

  • Qc-TX — Crawler contributions
  • BettaFish — Multi-agent collaboration reference
  • OpenClaw — Memory architecture reference

License

MIT