Zotero PDF Integrator

Matches local PDF files to existing Zotero library records and uploads them as stored copies. Designed for researchers who have migrated references from EndNote (or similar) and need to attach their annotated PDFs to the corresponding Zotero records.

Why

Zotero cannot natively scan local folders and match PDFs to existing records. This tool automates the matching using metadata extracted from the PDFs via GROBID, then uploads matched files via the Zotero API. The process is split into two phases with a manual review step in between, so nothing touches your Zotero library without your explicit approval.

How it works

Phase 1: Build mapping CSV (`phase1_build_csv.py`)

Scans configured PDF folders recursively
Detects annotated/original pairs (files ending in - annotated.pdf)
Extracts metadata from each PDF using GROBID (title, authors, DOI, journal, volume, issue, pages, year)
Fetches all records from your Zotero library (cached locally in SQLite)
Matches PDFs to Zotero records using a tiered strategy:
- Tier 1: DOI exact match (confidence 100)
- Tier 2: Title + Year fuzzy match (confidence ~90-98)
- Tier 3: Author + Year + Title fragment (confidence ~80-90)
- Tier 4: Author + Year only (confidence ~60-75)
- Tier 5: General fuzzy fallback (confidence varies)
Deduplicates when the same paper exists in multiple folders (keeps annotated version)
Outputs a CSV for manual review

Manual review

Open the CSV and edit the action column:

UPLOAD — approve for upload
SKIP — ignore this file
Leave REVIEW_* as-is if unsure (will not be uploaded)
Correct zotero_key if the match is wrong

Phase 2: Upload to Zotero (`phase2_upload.py`)

Reads the reviewed CSV
Checks which Zotero records already have PDF attachments (skips those)
Uploads approved PDFs as stored copies
Creates symlinks in unmapped_pdfs/ for files that were not uploaded, making manual handling easier

Prerequisites

Python 3.10+
GROBID running at localhost:8070 (via Docker)
Zotero API key with read/write access to your personal library

Setup

Clone this repo and install dependencies:

pip install pyzotero rapidfuzz python-dotenv requests

Start GROBID (add to your docker-compose.yaml or run directly):

docker run -d --name grobid -p 8070:8070 lfoppiano/grobid:0.8.2

Copy .env.example to .env and fill in your credentials:
```
cp .env.example .env
```
Edit PDF_SUBFOLDERS in phase1_build_csv.py to list the folders you want to scan (relative to PDF_BASE_FOLDER from .env).

Usage

First run (test with a small batch)

Set TEST_LIMIT = 10 in phase1_build_csv.py, then:

python phase1_build_csv.py
# Review the generated CSV, set action=UPLOAD for approved rows
python phase2_upload.py zotero_mapping_XXXXXXXX.csv

Production run

Set TEST_LIMIT = None, then:

python phase1_build_csv.py
# Review CSV
python phase2_upload.py zotero_mapping_XXXXXXXX.csv

Batch re-runs

When running in batches, use --check-uploaded to mark previously uploaded records in the CSV so you can focus on what's new:

python phase1_build_csv.py --check-uploaded

Records that already have PDFs in Zotero will be marked ALREADY_UPLOADED and sorted to the bottom of the CSV.

Files

File	Purpose
`phase1_build_csv.py`	Scan, extract metadata, match, produce CSV
`phase2_upload.py`	Upload approved PDFs, create symlinks for unmapped
`.env`	Your Zotero credentials and PDF base path (not committed)
`.env.example`	Template for `.env`
`zotero_cache.db`	SQLite cache of Zotero records (auto-generated)
`unmapped_pdfs/`	Symlinks to PDFs that were not uploaded (for manual review)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
unmapped_pdfs		unmapped_pdfs
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
phase1_build_csv.py		phase1_build_csv.py
phase2_upload.py		phase2_upload.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zotero PDF Integrator

Why

How it works

Phase 1: Build mapping CSV (`phase1_build_csv.py`)

Manual review

Phase 2: Upload to Zotero (`phase2_upload.py`)

Prerequisites

Setup

Usage

First run (test with a small batch)

Production run

Batch re-runs

Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Zotero PDF Integrator

Why

How it works

Phase 1: Build mapping CSV (phase1_build_csv.py)

Manual review

Phase 2: Upload to Zotero (phase2_upload.py)

Prerequisites

Setup

Usage

First run (test with a small batch)

Production run

Batch re-runs

Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Phase 1: Build mapping CSV (`phase1_build_csv.py`)

Phase 2: Upload to Zotero (`phase2_upload.py`)

Packages