📄 Paper Reader

Academic Paper Analysis — Agent-Agnostic Pipeline

English · 简体中文 · 繁體中文 · 日本語 · Español · Русский

✨ Overview

Paper Reader reads, analyzes, and archives academic papers with intelligent domain detection and structured Obsidian vault integration.

Give it a URL, a PDF path, or a batch of 10+ papers — it handles content acquisition, domain classification, deep analysis, and archiving automatically.

Designed as a pipeline that works with any AI agent (Hermes, Claude Code, Codex, OpenCode, etc.). Adapter files included in adapters/.

🏗️ Architecture

5-stage pipeline with 3-tier content acquisition. Each stage is modular and independently configurable.

┌──────────────────────────────────────────────────────────────────┐
│                        Paper Reader Pipeline                      │
│                                                                    │
│  ┌─────────┐   ┌──────────┐   ┌──────────┐   ┌────────┐   ┌──────┐  │
│  │ Stage 1 │──▶│ Stage 2  │──▶│ Stage 3  │──▶│ Stage 4│──▶│Stage5│  │
│  │  Fetch  │   │  Detect  │   │  Select  │   │Execute │   │Output│  │
│  └─────────┘   └──────────┘   └──────────┘   └────────┘   └──────┘  │
│       │                                              │              │
│  ┌────┴──────────────────────────┐          ┌───────┴───────┐      │
│  │ 3-Tier Content Acquisition   │          │  Domain       │      │
│  │ ① Jina Reader  (1-2s)       │          │  Checklists   │      │
│  │ ② Scrapling     (5-15s)     │          │  · MD / Med   │      │
│  │ ③ web_search    (2-5s)      │          │  · AI / Bio   │      │
│  └──────────────────────────────┘          │  · Programming│      │
│                                             └───────────────┘      │
└──────────────────────────────────────────────────────────────────┘

Stage 1: Input Parse & Fetch

Priority	Tool	Speed	Output	Covers
Tier 1	Jina Reader (`r.jina.ai`)	1-2s	Clean Markdown	arXiv, bioRxiv, open-access, most Nature articles
Tier 2	Scrapling + Camoufox	5-15s	Raw HTML text	Nature/Elsevier when Tier 1 is partial
Tier 3	web_search	2-5s	Metadata only	Hard paywalls (Cell, NEJM, Lancet)
Local	MinerU	~2min/40p	Markdown + images	Local PDF files, full extraction

Stage 2: Domain Detection

Domain	ID	Example Keywords
Molecular Dynamics	`md`	force field, AMBER, GROMACS, RMSD, free energy, simulation
Medicine	`med`	clinical trial, RCT, cohort, hazard ratio, prognosis
AI / ML	`ai`	transformer, deep learning, benchmark, SOTA, training
Bioinformatics	`bio`	RNA-seq, GWAS, genome, differential expression, enrichment
Programming	`prog`	compiler, algorithm, system design, database, runtime

Stage 3: Mode Selection

Mode	Purpose	Output
🔍 Quick Scan	3-min screening: worth reading?	Conversation only
📖 Deep Read	Full structured analysis	Obsidian archive note
💬 Q&A	Interactive paper Q&A	Conversation + optional log
📦 Batch	Process N papers in parallel	Archive notes + summary table

Stage 5: Output & Archive

Archives to ~/obsidian/papers/{domain}/ with structured YAML frontmatter and sections: 基本信息 · 研究问题 · 方法 · 核心结果 · 局限性 · 研究启示 · 引用网络

📊 Case Study: 9-Paper Batch (May 2026)

A real Paper Alert processed end-to-end. Honest breakdown.

Acquisition Results

#	Paper	Source	Method	Time	Content Quality
1	Allosteric Switches (Baker)	Nature NBT	MinerU (PDF)	93s	★★★★★ Full text, 494 lines, 36 figures
2	ConforNets (AlQuraishi)	arXiv PDF	MinerU (PDF)	118s	★★★★★ Full text, 646 lines, 75 figures
3	Closing the Loop	ScienceDirect	web_search	~5s	★★☆☆☆ Metadata + abstract only
4	trRosettaRNA2	Nature NMI	web_search	~5s	★★☆☆☆ Metadata + abstract only
5	Target ID in AI Era	Nature NRDD	web_search	~5s	★★★☆☆ Rich metadata (review article)
6	ERAST	Nature NBT	web_search	~5s	★★☆☆☆ Metadata only
7	AlphaFast	bioRxiv	web_search	~5s	★★☆☆☆ Metadata only
8	Flow Matching	Nature NMI	web_search	~5s	★★☆☆☆ Metadata only
9	lightning-boltz	GitHub	README scan	~3s	★★☆☆☆ Repo info only

Total time: ~6 minutes

Post-Hoc: Jina Reader + Scrapling Integration

Paper	Previous	With Jina Reader	Improvement
Allosteric Switches	MinerU 93s	1.0s, 117K chars	93× faster
Target ID (NRDD)	web_search metadata	1.3s, 149K chars	Metadata → full text
ERAST (NBT)	web_search metadata	1.0s, full text	Metadata → full text

Objective Assessment

✅ Worked well: Full-text papers → 100+ line archive notes with quantitative results. 9 papers in 6 min. All domains correctly classified.

⚠️ Suboptimal: Paywalled papers (pre-Jina) → 34-53 line metadata-only notes. MinerU serial bottleneck.

❌ Remaining gap: ScienceDirect/Elsevier still Tier 3. arXiv abstract URLs need pdf/ for full text.

⚠️ Honest Limitations

What We Cannot Break

Scenario	Reality	Workaround
Hard paywalls (Cell, NEJM, Lancet, JAMA)	Require institutional login.	Use VPN, download PDF manually.
Authenticated access (SSO, Shibboleth)	Outside scope.	Download PDF through your institution.
Freshly published	Days/weeks before indexed.	Wait for preprint.
Non-English	MinerU supports Chinese (`-l ch`) etc.	Use local PDF + MinerU.

What You Should Know

Tier 3 papers — Archive notes lack methods detail and quantitative results.
Figure analysis — Depends on model vision capability. Falls back to captions.
Archive ≠ reading the paper — Structured summaries, not replacements.

Legal & Ethical Stance

We fetch what's publicly visible in a browser. If you need to log in to see it — you should log in yourself. No bypassing authentication.

🚀 Installation

Option 1: npx (Recommended)

npx skills add nowa277/paper-reader -g -y

One command. Works with Claude Code, Hermes, Cursor, and any agent that supports the skills convention.

Option 2: npm

npm install paper-reader

Option 3: Git Clone

git clone https://github.com/nowa277/paper-reader.git
cp -r paper-reader ~/.hermes/skills/

Prerequisites

Dependency	Required	Install
MinerU	✅	`pip install mineru`
Jina Reader	Built-in	`r.jina.ai` API, no install needed
Scrapling	Recommended	`pip install scrapling camoufox && python -m camoufox fetch`
Obsidian	Optional	For archive notes

📋 Quick Start

# Single paper
read this paper https://arxiv.org/abs/2604.18559

# Batch
Paper Alert:
1. ConforNets https://arxiv.org/abs/2604.18559
2. Allosteric Switches https://www.nature.com/articles/s41587-026-03081-9
3. Target ID https://doi.org/10.1038/s41573-026-01412-8

📁 Project Structure

paper-reader/
├── SKILL.md                          # Main skill definition (5-stage pipeline)
├── README.md                         # This file
├── LICENSE                           # MIT
├── adapters/                         # Agent-specific config files
├── scripts/
│   ├── extract.sh                    # MinerU extraction wrapper
│   └── fetch_paper.py                # Unified 3-tier content acquisition
├── references/
│   ├── archive-template.md           # Obsidian note YAML + section template
│   ├── domain-{ai,md,med,bio,prog}.md  # Domain analysis checklists
│   ├── mode-{scan,deep,qa,batch}.md  # Mode execution instructions
│   └── mineru-quirks.md              # MinerU known issues
└── docs/                             # 6-language READMEs

📝 Archive Output Example

---
title: "Artificial allosteric protein switches with ML-designed receptors"
authors: ["Zhong Guo", "David Baker"]
year: 2026
journal: "Nature Biotechnology"
doi: "10.1038/s41587-026-03081-9"
domain: "ai"
tags: [paper/ai, allosteric-switch, biosensor, protein-design]
date_read: "2026-05-04"
rating: "5"
---

# Artificial allosteric protein switches with ML-designed receptors

## 基本信息
## 研究问题与动机
## 方法                    # Core architecture, ML design, experimental validation
## 核心结果                # Kd = 0.9 μM, 400-fold dynamic range, etc.
## 关键创新
## 局限性                  # Author-stated + independent assessment
## 研究启示
## 引用网络

⚙️ Configuration

Variable	Default	Description
`MINERU`	MinerU binary path	MinerU executable
`WORK_BASE`	`/tmp/paper-reader`	Temporary working directory
`ARCHIVE_BASE`	`~/obsidian/papers`	Obsidian vault archive root
`JINA_READER`	`https://r.jina.ai`	Jina Reader API endpoint

🤝 Contributing

PRs welcome — especially new agent adapters (Cursor, Aider, Continue, etc.).

📄 License

MIT License — see LICENSE.

🙏 Acknowledgments

MinerU — PDF extraction
Jina Reader — URL-to-Markdown
Scrapling — Stealth web fetching
Every researcher who has 50 tabs of unread papers open right now

Made with ❤️ for researchers who read too many papers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 Paper Reader

✨ Overview

🏗️ Architecture

Stage 1: Input Parse & Fetch

Stage 2: Domain Detection

Stage 3: Mode Selection

Stage 5: Output & Archive

📊 Case Study: 9-Paper Batch (May 2026)

Acquisition Results

Post-Hoc: Jina Reader + Scrapling Integration

Objective Assessment

⚠️ Honest Limitations

What We Cannot Break

What You Should Know

Legal & Ethical Stance

🚀 Installation

Option 1: npx (Recommended)

Option 2: npm

Option 3: Git Clone

Prerequisites

📋 Quick Start

📁 Project Structure

📝 Archive Output Example

⚙️ Configuration

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
adapters		adapters
docs		docs
references		references
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

📄 Paper Reader

✨ Overview

🏗️ Architecture

Stage 1: Input Parse & Fetch

Stage 2: Domain Detection

Stage 3: Mode Selection

Stage 5: Output & Archive

📊 Case Study: 9-Paper Batch (May 2026)

Acquisition Results

Post-Hoc: Jina Reader + Scrapling Integration

Objective Assessment

⚠️ Honest Limitations

What We Cannot Break

What You Should Know

Legal & Ethical Stance

🚀 Installation

Option 1: npx (Recommended)

Option 2: npm

Option 3: Git Clone

Prerequisites

📋 Quick Start

📁 Project Structure

📝 Archive Output Example

⚙️ Configuration

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages