A locally-anonymizing tool that prepares a structured second opinion on your own MRI / CT exam, so you can have a better-informed conversation with your physician.
Reads a radiology report (PNG / JPG / PDF-as-image) and the DICOM image export, anonymizes both on your machine before any LLM call, then runs multi-role review and image-vs-report cross-checks with confidence tags on every single claim.
Read this first β Disclaimer and Privacy architecture.
π©πͺ Deutsche Fassung: README.de.md
This is NOT a medical device, NOT clinical decision support, NOT a diagnosis.
- For personal use only β to help you prepare your own conversation with your physician about your own scan.
- Not for third parties, not for treaters, not for other patients.
- LLMs hallucinate. Every claim ships with a confidence value and a reason β read both critically. None of it replaces a qualified medical opinion.
- No therapy recommendations, no health claims. Output is conversation material for your physician / physiotherapist / sports-medicine consultation.
- For acute symptoms, sudden pain, loss of function β see a physician directly, not this tool.
- GDPR Art. 9 (special-category health data): the tool is designed for individual self-use on a local machine. Institutional or commercial use requires separate legal review (MDR / IVDR, advertising-of-health-products law, data-processing agreements).
By using the tool you accept these conditions.
Inputs:
- A radiology report as image (PNG / JPG). PDF support is on the roadmap.
- Optional: a DICOM export (folder of
.dcmfiles or aDICOMDIRtree). With DICOM β image-cross-check mode. Without DICOM β report-only mode. - Optional: a
context.ymlwith brief clinical context (region, symptoms, trauma history, prior surgeries, your own questions).
Local processing pipeline:
- Anonymization β DICOM tags stripped via allowlist (so unknown / private tags fail safe), UIDs rehashed with a project-local salt that never leaves your machine, report image redacted (top header zone + PHI-pattern matches).
- Fail-closed audit gate β no external LLM call happens unless
audit-report.jsonisgreen. - Slice extraction β DICOM volumes β PNG slices, windowed and 1024 px.
- Report OCR β Tesseract, German by default, structured by report sections.
- Vision pass β Claude vision on anonymized slices, multi-sample consensus (n = 3 in V2).
- Self-critique pass β the same model re-reads its own observations adversarially, looking for anatomical-location, sequence-signal-behavior, and classification-stage errors (the three most common radiology-hallucination axes).
- Cross-check β every claim from the original report is matched against the model's own image observations. Disagreements are documented, not iterated away.
- Multi-role review β 3-5 specialist personas (radiologist, orthopedist, physiotherapist, sports medicine, pain therapy, nutrition-and-sports, region specialists) iterate the case until a score threshold (Full β₯ 17 / 20, Report-only β₯ 18 / 20) or a plateau is reached.
- Synthesis β final report in Markdown with confidence bands per claim, a "what the tool cannot judge" section, escalation triggers, and concrete next-step suggestions for the conversation with your physician.
- PDF output β with the disclaimer footer on every page.
Outputs: data/output/<case>/zweitmeinung.{md,pdf}. The Markdown is your working copy; the PDF is shareable with the physician who will give you the actual second opinion.
The single most important property of this tool: no patient-identifying data leaves your machine.
- DICOM anonymization is allowlist-based (a small set of clinically necessary tags is kept; everything else is dropped). This is more robust than a blocklist because new or vendor-private tags fall out by default.
- UID rehashing uses HMAC-SHA256 with a salt stored in
data/.salt. The salt never leaves your machine. Same study β same anonymized UIDs across re-runs; different studies β different UIDs. - Report image redaction masks the header zone (where most clinics put patient name / date of birth / case number) and any text matching configurable PHI patterns.
- Audit gate runs after anonymization. It re-reads every file in
data/anon/<case>/and checks for residual identifiers. If anything looks wrong, status isredand no LLM call happens. data/is gitignored (and contains a.gitignoreof its own as belt-and-suspenders).- What goes to the cloud: anonymized DICOM slices (PNG) and the OCR text of the anonymized report. No patient name, no date of birth, no case ID, no hospital identifier, no original DICOM headers.
If you want zero cloud at all, wait for the roadmap item V8 (fully-local pipeline) or open a pull request β see Roadmap.
| Inputs | Mode | What it does |
|---|---|---|
| Report + DICOM | Full | Image-aware second opinion with cross-check between original report and model's own image findings |
| Report only | Report-only | Re-reads and re-interprets the report, internal consistency check |
| DICOM only | Image-only | Pure image-based second opinion, no original report β auto-detected when dicom/ exists but no befund.png/jpg. Phases 2 (claim extraction) and 5 (cross-check) are skipped; phases 3 (vision), 3.5 (self-critique), 6/7 (multi-role), 8/9 (synthesis + PDF) run as usual. |
- Python 3.13+ (tested on 3.14)
- Tesseract 5.x with the language pack(s) for your reports:
- macOS:
brew install tesseract tesseract-lang - Linux:
apt install tesseract-ocr tesseract-ocr-deu(replacedeuwith the language code you need)
- macOS:
- Anthropic Claude access. Two options:
- Anthropic Pro / Max subscription via the Claude Code CLI (recommended β no API key in this repo)
- or your own
ANTHROPIC_API_KEY(pay-per-call billing)
- ~15 GB of disk for
data/anon/<case>/per typical MRI study (slices + intermediate outputs)
git clone https://github.com/josudia/2nddocopinion.git
cd 2nddocopinion
python3 -m venv venv
./venv/bin/pip install -e .
cp .env.example .env # optional, only if you need to set API key / OCR languageA first sanity check:
./venv/bin/python -m pytest tests/ -qYou should see all tests pass.
Drop a case under input/<case-slug>/, then run the pipeline.
input/2026-05-08-shoulder-right-mri/
βββ befund.png # or befund.jpg β the radiology report as image
βββ dicom/ # optional β folder of DICOM files (any depth)
βββ context.yml # optional β clinical context (otherwise interactive)
Slug convention: YYYY-MM-DD-region-side-modality (sortable, descriptive, unique).
A guided wizard for context.yml:
./venv/bin/python scripts/new_case.pyThen the four pre-LLM steps (anonymize β audit β extract β OCR):
./venv/bin/python scripts/prep_case.py --case 2026-05-08-shoulder-right-mri run-all-pre-llmIf audit-report.json is green, run the LLM pipeline:
./venv/bin/python scripts/run_doc.py --case 2026-05-08-shoulder-right-mri --n-samples 3Defaults: 2 review rounds, score threshold 17 / 20 in Full mode, n = 1 vision sample (use --n-samples 3 for V2-grade robustness).
Output lands in data/output/<case>/zweitmeinung.{md,pdf}.
This is a question many users will reasonably ask, so here is the empirical answer, not the marketing one.
In May 2026 we ran a sanity test on 39 curated German radiology statements (synthetic, no PHI β see tests/fixtures/sanity_testset.json) against four locally-runnable medical and general models, plus Anthropic Claude:
| Model | Accuracy | Sensitivity (catches errors) | Specificity (keeps correct claims) | Latency |
|---|---|---|---|---|
| Claude Opus 4.7 | 80 % | 94 % | 100 % | 6 s |
| Gemma 3 27B (local) | 69 % | 88 % | 81 % | 12 s |
| Mistral 7B base (local) | 72 % | 62 % | 94 % | 4 s |
| Gemma 4 8B reasoning (local) | 62 % | 81 % | 69 % | 2.5 s |
| BioMistral 7B Q4 (local) | 41 % | 6 % | 100 %* | 3 s |
* BioMistral hardly ever flags a statement as wrong, so its high specificity is an artefact.
Three findings:
-
BioMistral specifically is unusable on German radiology, sensitivity 6 %. Its medical English fine-tune appears to actively harm German output. We did not find a single locally-runnable German medical generation LLM in May 2026 (only encoders / NER / PII / translation models exist for
language: de + medicalon HuggingFace). -
No locally-runnable model we tested could serve as a useful cross-validator to Claude on German radiology. The best local model (Gemma 3 27B) introduces ~5β6 false alarms on correct statements for every Claude error it catches. Net cost-benefit is negative.
-
The biggest hallucination risk in this pipeline is the vision pass, not the text pass. A text-only cross-validator cannot see the image. The tool addresses image hallucinations with self-critique passes that reason in the same model's image context (see V2 architecture).
Implication for English-speaking users: the medical-LLM landscape is English-first. BioMistral-7B was trained on English clinical text, and stronger options like Med-PaLM-2, Meditron-70B, and Asclepius exist. If you want to add a local English cross-validator:
- Run the tool with
OCR_LANG=engand install Tesseract's English data pack. - Stand up an Ollama server with the medical model of your choice (
ollama pull hf.co/<repo>:Q4_K_Mworks for HuggingFace GGUFs). - Implement the
LLMProviderinterface that ships with V8.1 (see Roadmap) β pull requests very welcome.
Until V8.1 lands, the tool is a single-provider (Claude) pipeline. Trade-offs are explicit; if your threat model requires zero data to leave your machine, see V8.
| Version | Status | What |
|---|---|---|
| V1 | shipped | Full + Report-only mode, single-sample vision pass, German radiology |
| V2 | shipped | Multi-sample consensus (n = 3), self-critique with anatomical / sequence / classification checks, clinical-context reach-through across all phases |
| V3 | shipped | Image-only mode (skip report-PDF requirement for DICOM-only cases), region-aware prompt header for multi-region support beyond shoulder, null-friendly clinical-context schema |
| V8.1 | planned | LLMProvider abstraction (Claude / Ollama / MLX) β enables local-LLM cross-validation, English medical models |
| V8 | planned | Fully-local pipeline option (vision + text + validation, no cloud) |
V8 is the largest jump in scope and quality is expected to drop somewhat β the trade-off is that no patient-derived data ever leaves the machine.
This is an open-source tool for the community. We do not sell it and do not monetize it.
Pull requests are especially welcome for:
- Additional report languages (English, French, Spanish, β¦) β OCR config + prompt translations + a few example reports in
tests/fixtures/. - Local-LLM provider implementations (V8.1 scope).
- Additional medical specialist roles for the multi-role review (cardiology, neurology, β¦) β see
prompts/role_*.mdfor examples. - Anonymization patterns for non-DICOM medical data formats (HL7, FHIR, plain PDFs without DICOM).
- External validation of the sanity test set by a radiologist β see
tests/fixtures/sanity_testset.json.
Hard rules for PRs:
- No patient data in the codebase, PRs, issues, or any public surface, ever. Test fixtures are synthetic only.
- New pipeline phases need tests.
- The anonymization gate, the audit gate, and the disclaimers are not optional and not removable in production paths.
Code of conduct: be civil, criticize ideas, not people. The project is small and we'd like to keep it pleasant.
MIT β see LICENSE. Note the additional notice in LICENSE regarding medical use.
- Anthropic for
claude-agent-sdkand the Claude models that power the vision and text passes. - pydicom for the DICOM toolkit that makes safe anonymization possible.
- Tesseract OCR for the offline-capable OCR backbone.
- The early reviewers and case-providers who let us validate the pipeline on real-world MRIs β privacy-preserved, of course.