2ndDocOpinion

A locally-anonymizing tool that prepares a structured second opinion on your own MRI / CT exam, so you can have a better-informed conversation with your physician.

Reads a radiology report (PNG / JPG / PDF-as-image) and the DICOM image export, anonymizes both on your machine before any LLM call, then runs multi-role review and image-vs-report cross-checks with confidence tags on every single claim.

Read this first → Disclaimer and Privacy architecture.

🇩🇪 Deutsche Fassung: README.de.md

Disclaimer (read this first)

This is NOT a medical device, NOT clinical decision support, NOT a diagnosis.

For personal use only — to help you prepare your own conversation with your physician about your own scan.
Not for third parties, not for treaters, not for other patients.
LLMs hallucinate. Every claim ships with a confidence value and a reason — read both critically. None of it replaces a qualified medical opinion.
No therapy recommendations, no health claims. Output is conversation material for your physician / physiotherapist / sports-medicine consultation.
For acute symptoms, sudden pain, loss of function → see a physician directly, not this tool.
GDPR Art. 9 (special-category health data): the tool is designed for individual self-use on a local machine. Institutional or commercial use requires separate legal review (MDR / IVDR, advertising-of-health-products law, data-processing agreements).

By using the tool you accept these conditions.

What the tool does

Inputs:

A radiology report as image (PNG / JPG). PDF support is on the roadmap.
Optional: a DICOM export (folder of .dcm files or a DICOMDIR tree). With DICOM → image-cross-check mode. Without DICOM → report-only mode.
Optional: a context.yml with brief clinical context (region, symptoms, trauma history, prior surgeries, your own questions).

Local processing pipeline:

Anonymization — DICOM tags stripped via allowlist (so unknown / private tags fail safe), UIDs rehashed with a project-local salt that never leaves your machine, report image redacted (top header zone + PHI-pattern matches).
Fail-closed audit gate — no external LLM call happens unless audit-report.json is green.
Slice extraction — DICOM volumes → PNG slices, windowed and 1024 px.
Report OCR — Tesseract, German by default, structured by report sections.
Vision pass — Claude vision on anonymized slices, multi-sample consensus (n = 3 in V2).
Self-critique pass — the same model re-reads its own observations adversarially, looking for anatomical-location, sequence-signal-behavior, and classification-stage errors (the three most common radiology-hallucination axes).
Cross-check — every claim from the original report is matched against the model's own image observations. Disagreements are documented, not iterated away.
Multi-role review — 3-5 specialist personas (radiologist, orthopedist, physiotherapist, sports medicine, pain therapy, nutrition-and-sports, region specialists) iterate the case until a score threshold (Full ≥ 17 / 20, Report-only ≥ 18 / 20) or a plateau is reached.
Synthesis — final report in Markdown with confidence bands per claim, a "what the tool cannot judge" section, escalation triggers, and concrete next-step suggestions for the conversation with your physician.
PDF output — with the disclaimer footer on every page.

Outputs: data/output/<case>/zweitmeinung.{md,pdf}. The Markdown is your working copy; the PDF is shareable with the physician who will give you the actual second opinion.

Privacy architecture

The single most important property of this tool: no patient-identifying data leaves your machine.

DICOM anonymization is allowlist-based (a small set of clinically necessary tags is kept; everything else is dropped). This is more robust than a blocklist because new or vendor-private tags fall out by default.
UID rehashing uses HMAC-SHA256 with a salt stored in data/.salt. The salt never leaves your machine. Same study → same anonymized UIDs across re-runs; different studies → different UIDs.
Report image redaction masks the header zone (where most clinics put patient name / date of birth / case number) and any text matching configurable PHI patterns.
Audit gate runs after anonymization. It re-reads every file in data/anon/<case>/ and checks for residual identifiers. If anything looks wrong, status is red and no LLM call happens.
data/ is gitignored (and contains a .gitignore of its own as belt-and-suspenders).
What goes to the cloud: anonymized DICOM slices (PNG) and the OCR text of the anonymized report. No patient name, no date of birth, no case ID, no hospital identifier, no original DICOM headers.

If you want zero cloud at all, wait for the roadmap item V8 (fully-local pipeline) or open a pull request — see Roadmap.

Three modes

Inputs	Mode	What it does
Report + DICOM	Full	Image-aware second opinion with cross-check between original report and model's own image findings
Report only	Report-only	Re-reads and re-interprets the report, internal consistency check
DICOM only	Image-only	Pure image-based second opinion, no original report — auto-detected when `dicom/` exists but no `befund.png/jpg`. Phases 2 (claim extraction) and 5 (cross-check) are skipped; phases 3 (vision), 3.5 (self-critique), 6/7 (multi-role), 8/9 (synthesis + PDF) run as usual.

Prerequisites

Python 3.13+ (tested on 3.14)
Tesseract 5.x with the language pack(s) for your reports:
- macOS: brew install tesseract tesseract-lang
- Linux: apt install tesseract-ocr tesseract-ocr-deu (replace deu with the language code you need)
Anthropic Claude access. Two options:
- Anthropic Pro / Max subscription via the Claude Code CLI (recommended — no API key in this repo)
- or your own ANTHROPIC_API_KEY (pay-per-call billing)
~15 GB of disk for data/anon/<case>/ per typical MRI study (slices + intermediate outputs)

Setup

git clone https://github.com/josudia/2nddocopinion.git
cd 2nddocopinion
python3 -m venv venv
./venv/bin/pip install -e .
cp .env.example .env       # optional, only if you need to set API key / OCR language

A first sanity check:

./venv/bin/python -m pytest tests/ -q

You should see all tests pass.

Usage

Drop a case under input/<case-slug>/, then run the pipeline.

input/2026-05-08-shoulder-right-mri/
├── befund.png        # or befund.jpg — the radiology report as image
├── dicom/            # optional — folder of DICOM files (any depth)
└── context.yml       # optional — clinical context (otherwise interactive)

Slug convention: YYYY-MM-DD-region-side-modality (sortable, descriptive, unique).

A guided wizard for context.yml:

./venv/bin/python scripts/new_case.py

Then the four pre-LLM steps (anonymize → audit → extract → OCR):

./venv/bin/python scripts/prep_case.py --case 2026-05-08-shoulder-right-mri run-all-pre-llm

If audit-report.json is green, run the LLM pipeline:

./venv/bin/python scripts/run_doc.py --case 2026-05-08-shoulder-right-mri --n-samples 3

Defaults: 2 review rounds, score threshold 17 / 20 in Full mode, n = 1 vision sample (use --n-samples 3 for V2-grade robustness).

Output lands in data/output/<case>/zweitmeinung.{md,pdf}.

Why not a medical / bio-trained LLM?

This is a question many users will reasonably ask, so here is the empirical answer, not the marketing one.

In May 2026 we ran a sanity test on 39 curated German radiology statements (synthetic, no PHI — see tests/fixtures/sanity_testset.json) against four locally-runnable medical and general models, plus Anthropic Claude:

Model	Accuracy	Sensitivity (catches errors)	Specificity (keeps correct claims)	Latency
Claude Opus 4.7	80 %	94 %	100 %	6 s
Gemma 3 27B (local)	69 %	88 %	81 %	12 s
Mistral 7B base (local)	72 %	62 %	94 %	4 s
Gemma 4 8B reasoning (local)	62 %	81 %	69 %	2.5 s
BioMistral 7B Q4 (local)	41 %	6 %	100 %*	3 s

* BioMistral hardly ever flags a statement as wrong, so its high specificity is an artefact.

Three findings:

BioMistral specifically is unusable on German radiology, sensitivity 6 %. Its medical English fine-tune appears to actively harm German output. We did not find a single locally-runnable German medical generation LLM in May 2026 (only encoders / NER / PII / translation models exist for language: de + medical on HuggingFace).
No locally-runnable model we tested could serve as a useful cross-validator to Claude on German radiology. The best local model (Gemma 3 27B) introduces ~5–6 false alarms on correct statements for every Claude error it catches. Net cost-benefit is negative.
The biggest hallucination risk in this pipeline is the vision pass, not the text pass. A text-only cross-validator cannot see the image. The tool addresses image hallucinations with self-critique passes that reason in the same model's image context (see V2 architecture).

Implication for English-speaking users: the medical-LLM landscape is English-first. BioMistral-7B was trained on English clinical text, and stronger options like Med-PaLM-2, Meditron-70B, and Asclepius exist. If you want to add a local English cross-validator:

Run the tool with OCR_LANG=eng and install Tesseract's English data pack.
Stand up an Ollama server with the medical model of your choice (ollama pull hf.co/<repo>:Q4_K_M works for HuggingFace GGUFs).
Implement the LLMProvider interface that ships with V8.1 (see Roadmap) — pull requests very welcome.

Until V8.1 lands, the tool is a single-provider (Claude) pipeline. Trade-offs are explicit; if your threat model requires zero data to leave your machine, see V8.

Roadmap

Version	Status	What
V1	shipped	Full + Report-only mode, single-sample vision pass, German radiology
V2	shipped	Multi-sample consensus (n = 3), self-critique with anatomical / sequence / classification checks, clinical-context reach-through across all phases
V3	shipped	Image-only mode (skip report-PDF requirement for DICOM-only cases), region-aware prompt header for multi-region support beyond shoulder, null-friendly clinical-context schema
V8.1	planned	`LLMProvider` abstraction (Claude / Ollama / MLX) — enables local-LLM cross-validation, English medical models
V8	planned	Fully-local pipeline option (vision + text + validation, no cloud)

V8 is the largest jump in scope and quality is expected to drop somewhat — the trade-off is that no patient-derived data ever leaves the machine.

Contributing

This is an open-source tool for the community. We do not sell it and do not monetize it.

Pull requests are especially welcome for:

Additional report languages (English, French, Spanish, …) — OCR config + prompt translations + a few example reports in tests/fixtures/.
Local-LLM provider implementations (V8.1 scope).
Additional medical specialist roles for the multi-role review (cardiology, neurology, …) — see prompts/role_*.md for examples.
Anonymization patterns for non-DICOM medical data formats (HL7, FHIR, plain PDFs without DICOM).
External validation of the sanity test set by a radiologist — see tests/fixtures/sanity_testset.json.

Hard rules for PRs:

No patient data in the codebase, PRs, issues, or any public surface, ever. Test fixtures are synthetic only.
New pipeline phases need tests.
The anonymization gate, the audit gate, and the disclaimers are not optional and not removable in production paths.

Code of conduct: be civil, criticize ideas, not people. The project is small and we'd like to keep it pleasant.

License

MIT — see LICENSE. Note the additional notice in LICENSE regarding medical use.

Acknowledgements

Anthropic for claude-agent-sdk and the Claude models that power the vision and text passes.
pydicom for the DICOM toolkit that makes safe anonymization possible.
Tesseract OCR for the offline-capable OCR backbone.
The early reviewers and case-providers who let us validate the pipeline on real-world MRIs — privacy-preserved, of course.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
docs		docs
input		input
prompts		prompts
scripts		scripts
src/secondopinion		src/secondopinion
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.de.md		README.de.md
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2ndDocOpinion

Disclaimer (read this first)

What the tool does

Privacy architecture

Three modes

Prerequisites

Setup

Usage

Why not a medical / bio-trained LLM?

Roadmap

Contributing

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

2ndDocOpinion

Disclaimer (read this first)

What the tool does

Privacy architecture

Three modes

Prerequisites

Setup

Usage

Why not a medical / bio-trained LLM?

Roadmap

Contributing

License

Acknowledgements

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages