A no-LLM bibliography verification tool for .bib files.
Ref. Checker parses BibTeX entries, classifies references, validates them against configurable sources, detects duplicates, and provides a web UI for review, correction, and export. It is designed for research workflows that need deterministic, rule-based, and auditable reference checking without relying on AI generation.
- Pure rule-based reference verification
- Upload and parse
.bibfiles in the browser - Automatic classification into:
paperGitHubblog
- Configurable validation sources:
arxivcrossrefopenalexserpapischolar-html
- Duplicate detection by DOI / arXiv ID / normalized title + year
- Review mismatches and apply corrected BibTeX back into a working
.bib - Export:
- Excel report
- JSON report
- corrected
.bib
- Chinese / English UI toggle
Ref. Checker follows a simple pipeline:
- Parse
.bibentries into structured metadata - Classify each entry as a paper, GitHub repo, or blog/web reference
- Validate papers against selected scholarly sources
- Validate GitHub/blog references through URL reachability
- Compare returned metadata with the original BibTeX entry
- Mark each result as:
FOUND_MATCHFOUND_MISMATCHNOT_FOUND
- Let users inspect, edit, apply, and export corrections
Recommended for the fastest deployment.
cp .env.example .env
docker compose up --buildThen open:
http://127.0.0.1:8000
conda env create -f environment.yml
conda activate ref-check
uvicorn backend_api:app --host 0.0.0.0 --port 8000 --reloadpython3 -m pip install -r requirements.txt
uvicorn backend_api:app --host 0.0.0.0 --port 8000 --reloadThe web UI is built around a lightweight review workflow:
- Upload a
.bibfile - Parse entries and inspect duplicate references
- Configure validation sources
- Run verification
- Filter results by:
- verdict
- reference type
- search query
- Open mismatch suggestions in a modal
- Edit the recommended
.bib - Apply the correction into the working copy
- Export the final report or corrected bibliography
The frontend exposes the following sources:
arxivcrossrefopenalexserpapischolar-html
Default UI behavior:
- enabled by default:
arxivcrossrefopenalexserpapi
- disabled by default:
scholar-html
If a user enables only one source, the backend respects that configuration and validates papers only with the selected source(s).
Ref. Checker currently routes validation based on detected reference category:
Papers are validated against configured scholarly sources and matched by rule-based comparison of:
- title similarity
- author overlap
- year
- venue
- DOI
- arXiv ID
GitHub references are validated through URL reachability.
Blog / general web references are also validated through URL reachability.
Duplicate reference groups are detected using strong, deterministic signals:
- same DOI
- same arXiv ID
- same normalized title + year
After verification, the project supports:
report.jsonreport.xlsx- corrected
modified.bib
The Excel report is simplified for manual review and grouped into three sheets:
完全匹配部分匹配没有找到
Copy .env.example to .env before deployment.
| Variable | Default | Description |
|---|---|---|
APP_PORT |
8000 |
Host port for Docker Compose |
REFCHECK_MAX_UPLOAD_BYTES |
10485760 |
Maximum upload size in bytes |
SERPAPI_API_KEY |
empty | Optional SerpApi key |
HTTP_PROXY |
empty | Optional outbound proxy |
HTTPS_PROXY |
empty | Optional outbound proxy |
The service also supports:
REFCHECK_JOB_STORE_DIRREFCHECK_CACHE_PATH
These are already configured in docker-compose.yml.
Main endpoints:
GET /healthGET /api/v1/sourcesPOST /api/v1/bib/parsePOST /api/v1/jobsGET /api/v1/jobsGET /api/v1/jobs/{job_id}POST /api/v1/jobs/{job_id}/apply-correctionGET /api/v1/jobs/{job_id}/report.jsonGET /api/v1/jobs/{job_id}/report.csvGET /api/v1/jobs/{job_id}/report.xlsxGET /api/v1/jobs/{job_id}/modified.bib
.
├── backend_api.py # FastAPI entrypoint
├── bib_ref_checker.py # BibTeX parsing, search adapters, rule-based matching
├── reference_backend.py # Job store, async analysis, exports, correction workflow
├── frontend/
│ ├── index.html
│ ├── app.js
│ ├── styles.css
│ └── site.css
├── requirements.txt # Python runtime dependencies
├── environment.yml # Conda environment
├── Dockerfile # Container image
├── docker-compose.yml # One-command deployment
├── Makefile # Convenience commands
└── .env.example # Environment template
If you prefer short commands:
make install
make dev
make docker-up
make docker-down.job_store/and cache files should not be committed- Docker Compose persists runtime data in the
refcheck_datavolume - The repo is intended to stay clean: keep generated reports and local caches out of version control
Ref. Checker is built for practical BibTeX QA workflows. It focuses on:
- deterministic parsing
- configurable source-based verification
- auditability
- interactive correction
It does not use LLMs or AI generation for validation.
If you find Ref. Checker useful in your work, please cite:
@misc{zhou2026refchecker,
author = {Xueyang Zhou},
title = {Ref. Checker: A No-LLM Bibliography Verification Tool for BibTeX Files},
year = {2026},
howpublished = {\url{https://github.com/Zxy-MLlab/Ref-checker}},
note = {GitHub repository}
}