Academic Paper Analysis — Agent-Agnostic Pipeline
Paper Reader reads, analyzes, and archives academic papers with intelligent domain detection and structured Obsidian vault integration.
Give it a URL, a PDF path, or a batch of 10+ papers — it handles content acquisition, domain classification, deep analysis, and archiving automatically.
Designed as a pipeline that works with any AI agent (Hermes, Claude Code, Codex, OpenCode, etc.). Adapter files included in adapters/.
5-stage pipeline with 3-tier content acquisition. Each stage is modular and independently configurable.
┌──────────────────────────────────────────────────────────────────┐
│ Paper Reader Pipeline │
│ │
│ ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────┐ │
│ │ Stage 1 │──▶│ Stage 2 │──▶│ Stage 3 │──▶│ Stage 4│──▶│Stage5│ │
│ │ Fetch │ │ Detect │ │ Select │ │Execute │ │Output│ │
│ └─────────┘ └──────────┘ └──────────┘ └────────┘ └──────┘ │
│ │ │ │
│ ┌────┴──────────────────────────┐ ┌───────┴───────┐ │
│ │ 3-Tier Content Acquisition │ │ Domain │ │
│ │ ① Jina Reader (1-2s) │ │ Checklists │ │
│ │ ② Scrapling (5-15s) │ │ · MD / Med │ │
│ │ ③ web_search (2-5s) │ │ · AI / Bio │ │
│ └──────────────────────────────┘ │ · Programming│ │
│ └───────────────┘ │
└──────────────────────────────────────────────────────────────────┘
| Priority | Tool | Speed | Output | Covers |
|---|---|---|---|---|
| Tier 1 | Jina Reader (r.jina.ai) |
1-2s | Clean Markdown | arXiv, bioRxiv, open-access, most Nature articles |
| Tier 2 | Scrapling + Camoufox | 5-15s | Raw HTML text | Nature/Elsevier when Tier 1 is partial |
| Tier 3 | web_search | 2-5s | Metadata only | Hard paywalls (Cell, NEJM, Lancet) |
| Local | MinerU | ~2min/40p | Markdown + images | Local PDF files, full extraction |
| Domain | ID | Example Keywords |
|---|---|---|
| Molecular Dynamics | md |
force field, AMBER, GROMACS, RMSD, free energy, simulation |
| Medicine | med |
clinical trial, RCT, cohort, hazard ratio, prognosis |
| AI / ML | ai |
transformer, deep learning, benchmark, SOTA, training |
| Bioinformatics | bio |
RNA-seq, GWAS, genome, differential expression, enrichment |
| Programming | prog |
compiler, algorithm, system design, database, runtime |
| Mode | Purpose | Output |
|---|---|---|
| 🔍 Quick Scan | 3-min screening: worth reading? | Conversation only |
| 📖 Deep Read | Full structured analysis | Obsidian archive note |
| 💬 Q&A | Interactive paper Q&A | Conversation + optional log |
| 📦 Batch | Process N papers in parallel | Archive notes + summary table |
Archives to ~/obsidian/papers/{domain}/ with structured YAML frontmatter and sections: 基本信息 · 研究问题 · 方法 · 核心结果 · 局限性 · 研究启示 · 引用网络
A real Paper Alert processed end-to-end. Honest breakdown.
| # | Paper | Source | Method | Time | Content Quality |
|---|---|---|---|---|---|
| 1 | Allosteric Switches (Baker) | Nature NBT | MinerU (PDF) | 93s | ★★★★★ Full text, 494 lines, 36 figures |
| 2 | ConforNets (AlQuraishi) | arXiv PDF | MinerU (PDF) | 118s | ★★★★★ Full text, 646 lines, 75 figures |
| 3 | Closing the Loop | ScienceDirect | web_search | ~5s | ★★☆☆☆ Metadata + abstract only |
| 4 | trRosettaRNA2 | Nature NMI | web_search | ~5s | ★★☆☆☆ Metadata + abstract only |
| 5 | Target ID in AI Era | Nature NRDD | web_search | ~5s | ★★★☆☆ Rich metadata (review article) |
| 6 | ERAST | Nature NBT | web_search | ~5s | ★★☆☆☆ Metadata only |
| 7 | AlphaFast | bioRxiv | web_search | ~5s | ★★☆☆☆ Metadata only |
| 8 | Flow Matching | Nature NMI | web_search | ~5s | ★★☆☆☆ Metadata only |
| 9 | lightning-boltz | GitHub | README scan | ~3s | ★★☆☆☆ Repo info only |
Total time: ~6 minutes
| Paper | Previous | With Jina Reader | Improvement |
|---|---|---|---|
| Allosteric Switches | MinerU 93s | 1.0s, 117K chars | 93× faster |
| Target ID (NRDD) | web_search metadata | 1.3s, 149K chars | Metadata → full text |
| ERAST (NBT) | web_search metadata | 1.0s, full text | Metadata → full text |
✅ Worked well: Full-text papers → 100+ line archive notes with quantitative results. 9 papers in 6 min. All domains correctly classified.
❌ Remaining gap: ScienceDirect/Elsevier still Tier 3. arXiv abstract URLs need pdf/ for full text.
| Scenario | Reality | Workaround |
|---|---|---|
| Hard paywalls (Cell, NEJM, Lancet, JAMA) | Require institutional login. | Use VPN, download PDF manually. |
| Authenticated access (SSO, Shibboleth) | Outside scope. | Download PDF through your institution. |
| Freshly published | Days/weeks before indexed. | Wait for preprint. |
| Non-English | MinerU supports Chinese (-l ch) etc. |
Use local PDF + MinerU. |
- Tier 3 papers — Archive notes lack methods detail and quantitative results.
- Figure analysis — Depends on model vision capability. Falls back to captions.
- Archive ≠ reading the paper — Structured summaries, not replacements.
We fetch what's publicly visible in a browser. If you need to log in to see it — you should log in yourself. No bypassing authentication.
npx skills add nowa277/paper-reader -g -yOne command. Works with Claude Code, Hermes, Cursor, and any agent that supports the skills convention.
npm install paper-readergit clone https://github.com/nowa277/paper-reader.git
cp -r paper-reader ~/.hermes/skills/| Dependency | Required | Install |
|---|---|---|
| MinerU | ✅ | pip install mineru |
| Jina Reader | Built-in | r.jina.ai API, no install needed |
| Scrapling | Recommended | pip install scrapling camoufox && python -m camoufox fetch |
| Obsidian | Optional | For archive notes |
# Single paper
read this paper https://arxiv.org/abs/2604.18559
# Batch
Paper Alert:
1. ConforNets https://arxiv.org/abs/2604.18559
2. Allosteric Switches https://www.nature.com/articles/s41587-026-03081-9
3. Target ID https://doi.org/10.1038/s41573-026-01412-8
paper-reader/
├── SKILL.md # Main skill definition (5-stage pipeline)
├── README.md # This file
├── LICENSE # MIT
├── adapters/ # Agent-specific config files
├── scripts/
│ ├── extract.sh # MinerU extraction wrapper
│ └── fetch_paper.py # Unified 3-tier content acquisition
├── references/
│ ├── archive-template.md # Obsidian note YAML + section template
│ ├── domain-{ai,md,med,bio,prog}.md # Domain analysis checklists
│ ├── mode-{scan,deep,qa,batch}.md # Mode execution instructions
│ └── mineru-quirks.md # MinerU known issues
└── docs/ # 6-language READMEs
---
title: "Artificial allosteric protein switches with ML-designed receptors"
authors: ["Zhong Guo", "David Baker"]
year: 2026
journal: "Nature Biotechnology"
doi: "10.1038/s41587-026-03081-9"
domain: "ai"
tags: [paper/ai, allosteric-switch, biosensor, protein-design]
date_read: "2026-05-04"
rating: "5"
---
# Artificial allosteric protein switches with ML-designed receptors
## 基本信息
## 研究问题与动机
## 方法 # Core architecture, ML design, experimental validation
## 核心结果 # Kd = 0.9 μM, 400-fold dynamic range, etc.
## 关键创新
## 局限性 # Author-stated + independent assessment
## 研究启示
## 引用网络| Variable | Default | Description |
|---|---|---|
MINERU |
MinerU binary path | MinerU executable |
WORK_BASE |
/tmp/paper-reader |
Temporary working directory |
ARCHIVE_BASE |
~/obsidian/papers |
Obsidian vault archive root |
JINA_READER |
https://r.jina.ai |
Jina Reader API endpoint |
PRs welcome — especially new agent adapters (Cursor, Aider, Continue, etc.).
MIT License — see LICENSE.
- MinerU — PDF extraction
- Jina Reader — URL-to-Markdown
- Scrapling — Stealth web fetching
- Every researcher who has 50 tabs of unread papers open right now