An explainable AI system for medical diagnosis. It analyses patient records and CT scans through a dual-model convergence loop that produces diagnoses with full step-by-step reasoning trails — making every conclusion traceable from raw data to diagnosis.
Built for Dublin Hack Europe 2026.
- The Problem
- Our Solution: Explainable AI via Dual-Model Convergence
- Tech Stack
- Project Structure
- Setup
- Environment Variables
- API Endpoints
- AI Engine — Model Details
- Database Schema
- Data
AI in medicine has a trust problem. Deep learning models can match or exceed human radiologists at detecting abnormalities — but they can't explain why they reached a conclusion. A clinician presented with "92% chance of malignancy" and no reasoning has no way to verify the finding, catch errors, or build trust in the system.
Regulatory bodies (FDA, MHRA, EU AI Act) increasingly require that high-risk AI systems provide meaningful explanations for their outputs. Black-box models don't meet this bar.
MedExplain addresses this directly: every diagnosis comes with a complete, traceable chain of logical reasoning that a clinician can audit step by step.
A single model giving you a diagnosis is an opinion. Two models with fundamentally different reasoning strategies arriving at the same diagnosis is evidence.
MedExplain runs two AI models against the same patient data simultaneously:
| Conclusions Model | Iterative Model | |
|---|---|---|
| Strategy | "Shotgun" — find everything at once | "First principles" — one step at a time |
| Temperature | 0.7 (high entropy, creative) | 0.1 (low entropy, cautious) |
| Bias | Favours false positives over false negatives | Favours precision — won't conclude without evidence |
| Output | ALL possible findings immediately | Single next logical step only |
| When it can diagnose | Always — it guesses aggressively | Only when every prerequisite logical step exists |
The conclusions model catches things the iterative model hasn't reached yet. The iterative model prevents the conclusions model from hallucinating. Together they're stronger than either alone.
The two models don't just run once — they run in a convergence loop:
- Both models analyse the patient data (images + records + any reasoning built so far) in parallel.
- A comparison model checks whether their final conclusions match.
- If they match → the system has converged. We trust the result.
- If they don't match →
- The iterative model's latest logical step is added to a growing reasoning chain.
- The specific image regions where conclusions differ are flagged as "to be double-checked".
- Both models re-run with the updated reasoning chain, and the iterative model is explicitly instructed to examine the flagged regions.
- Repeat until convergence or a maximum of 5 cycles.
If 5 cycles pass without convergence, the case exits as human_review with the disagreement areas explicitly flagged. This isn't a failure — it's the system honestly saying "I'm not sure about these regions, a clinician should look here." Honest uncertainty is more valuable than false confidence.
The iterative model's reasoning chain IS the explanation. Each cycle, it outputs one logical step:
Cycle 1: "Patient has a 20 pack-year smoking history. Baseline risk for pulmonary malignancy is elevated."
Cycle 2: "Prior CT (12 Jan) showed a 2.4cm RUL nodule. Current scan shows 2.8cm — 4mm interval growth in 6 weeks."
Cycle 3: "Growth rate exceeds threshold for benign nodule. Spiculated margins visible. Both features raise suspicion for malignancy."
FINAL: "Suspicious RUL mass with interval growth — recommend tissue sampling / PET-CT."
Every conclusion has a traceable path from raw data to diagnosis. A summary model then writes this chain as a plain-English narrative for clinicians. A programmatic CSV audit trail is also generated — one row per logical step — so the frontend can render it as a table and external systems can consume it directly.
INPUT
Patient records (FHIR) + DICOM scans
│
▼
┌──────────────────────────────────────────────┐
│ FEEDBACK LOOP (max 5 cycles) │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ CONCLUSIONS │ │ ITERATIVE │ │
│ │ MODEL (0.7) │ │ MODEL (0.1) │ │
│ │ │ │ │ │
│ │ All findings │ │ One logical │ │
│ │ at once │ │ step at a time │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ └──────────┬─────────┘ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ COMPARISON │ │
│ │ MODEL (0.0) │ │
│ └────────┬─────────┘ │
│ │ │
│ ┌─────────┴──────────┐ │
│ ▼ ▼ │
│ ✗ MISMATCH ✓ MATCH │
│ Add step to CONVERGED │
│ reasoning chain Exit loop ──────────┼──► PHASE 3
│ Flag regions │
│ Loop again ──────────────────────────────┘
│ (or exit as human_review after 5 cycles)
│
▼
PHASE 3: OUTPUT
SUMMARY MODEL (0.3) → plain-English narrative
generateAuditCSV() → programmatic CSV audit trail (no AI)
API response → findings + bounding boxes + reasoning + summary + csv_audit
- Honest uncertainty over false confidence. When models can't agree, the system says so explicitly and flags the specific image regions for a clinician — rather than picking one answer and hiding the disagreement.
- False positives are cheaper than false negatives. The conclusions model is deliberately biased towards flagging everything. A false positive caught by the iterative model costs nothing. A false negative costs a patient.
- Every exit path is useful. Converged case → full diagnosis. Max-iterations case → partial diagnosis with explicit disagreement areas. Error → whatever was collected so far is preserved in the database.
- The reasoning IS the product. The step-by-step logical chain is not a side effect — it's the core deliverable. The diagnoses are just conclusions drawn from it.
| Layer | Technology |
|---|---|
| Backend / AI engine | Node.js + Express |
| AI models | Claude Sonnet (claude-sonnet-4-5-20250514) via @anthropic-ai/sdk |
| Database | SQLite via sql.js (pure JavaScript, no native bindings) |
| Medical imaging | dicom-parser + sharp (pure JS DICOM → PNG, no Python required) |
| Patient records | FHIR R4 (Synthea synthetic data) |
| Frontend | React + TypeScript + Vite + Tailwind + Framer Motion |
MedExplain/
│
├── server/ # Node.js AI engine + Express backend
│ ├── index.js # Express entry point — initialises DB then starts server
│ │
│ ├── engine/
│ │ ├── orchestrator.js # Main feedback loop + generateAuditCSV()
│ │ ├── iterativeModel.js # Step-by-step model (temp 0.1)
│ │ ├── conclusionsModel.js # Comprehensive model (temp 0.7)
│ │ ├── comparisonModel.js # Deterministic comparator (temp 0.0)
│ │ └── summaryModel.js # Plain-English narrative model (temp 0.3)
│ │
│ ├── db/
│ │ ├── schema.sql # Full DB schema
│ │ ├── connection.js # sql.js wrapper — loads/saves DB to disk
│ │ └── queries.js # All DB query functions
│ │
│ ├── routes/
│ │ ├── patients.js # GET /api/patients, /api/patients/:id/cases
│ │ ├── cases.js # POST /run, GET /status, GET /results, GET /audit.csv
│ │ └── results.js # GET /api/scans/:caseId + DICOM image serving
│ │
│ ├── prompts/
│ │ ├── iterative.txt # System prompt — cautious step-by-step radiologist
│ │ ├── conclusions.txt # System prompt — aggressive comprehensive reviewer
│ │ ├── comparison.txt # System prompt — semantic logic comparator
│ │ └── summary.txt # System prompt — plain-English report writer
│ │
│ └── utils/
│ ├── imageLoader.js # File path → base64 (handles PNG + DCM, Windows paths)
│ ├── dicomConverter.js # Pure-JS DICOM → PNG via dicom-parser + sharp
│ ├── knowledgeLoader.js # Loads PDFs from data/knowledge/ once, caches in memory
│ └── logger.js # Timestamped cycle-by-cycle logging
│
├── client/ # React frontend (Vite + Tailwind + Framer Motion)
│ ├── src/
│ │ ├── components/ # MindMapCanvas, ImageViewer, BoundingBox, etc.
│ │ ├── hooks/ # useCases, usePatients, useResults, useScans
│ │ ├── api/ # Axios API client
│ │ └── types/ # TypeScript interfaces
│ ├── package.json
│ └── vite.config.ts
│
├── scripts/
│ └── seed.js # Seed DB from FHIR patient.json + DICOM scans
│
├── data/
│ ├── medical_demo.db # SQLite database (auto-created on first run)
│ ├── fhir/ # Synthetic FHIR R4 patient JSON files (Synthea)
│ ├── scans/ # DICOM scan files (.dcm), organised by visit
│ │ ├── visit_1/
│ │ └── visit_2/
│ └── knowledge/ # Reference material for model context (radiology guidelines, etc.)
│
├── package.json # Node.js dependencies
└── .env # API key + config (gitignored)
npm installCreate a .env file in the project root:
ANTHROPIC_API_KEY=sk-ant-your-key-here
DB_PATH=./data/medical_demo.db
MAX_ITERATIONS=5
PORT=3001
Loads the patient from data/fhir/patient.json and all DICOM scans from data/scans/:
node scripts/seed.jsThis is idempotent — if the database already has data it exits immediately without re-seeding.
npm startThe server initialises the database schema on startup, then listens on http://localhost:3001.
cd client
npm install
npm run devThe frontend runs at http://localhost:5173 and proxies API requests to the backend at http://localhost:3001.
Frontend features:
- Select patient and case from dropdowns
- View CT scans with DICOM → PNG conversion
- Mind-map style display linking findings to conclusion nodes
- Animated region timeline showing changes over time
- Colour-coded severity: green (low), amber (moderate), red (critical), purple (human review)
| Variable | Default | Description |
|---|---|---|
ANTHROPIC_API_KEY |
— | Your Anthropic API key (required) |
DB_PATH |
./data/medical_demo.db |
Path to the SQLite database |
MAX_ITERATIONS |
5 |
Maximum feedback loop cycles before forcing human_review exit |
PORT |
3001 |
Express server port |
All endpoints are served by the Node.js backend on port 3001.
| Method | Route | Description |
|---|---|---|
POST |
/api/cases/:caseId/run |
Trigger the AI feedback loop. Returns immediately; analysis runs in background. |
GET |
/api/cases/:caseId/status |
Poll for current status and cycle count while running. |
GET |
/api/cases/:caseId/results |
Full results once analysis is complete. |
GET |
/api/cases/:caseId/audit.csv |
Download the full reasoning audit trail as a CSV file. |
POST /api/cases/:caseId/run
{ "status": "running", "caseId": 1 }GET /api/cases/:caseId/status
{ "status": "running", "currentCycle": 2, "maxCycles": 5 }Status values: pending | running | converged | human_review | error
GET /api/cases/:caseId/results
{
"status": "converged",
"totalCycles": 2,
"findings": [
{
"id": "F1",
"label": "Right upper lobe mass — suspected malignancy",
"severity": "critical",
"confidence": 0.94,
"bbox": { "x_pct": 53, "y_pct": 27, "w_pct": 16, "h_pct": 14 },
"anatomical_region": "Right upper lobe, anterior segment",
"models_agreed": true,
"converged_cycle": 2
}
],
"unresolved": [],
"reasoning": [
{
"step": 1,
"cycle": 1,
"deduction": "Patient has 20 pack-year smoking history — elevated baseline risk",
"reasoning": "Extracted from medical_history field"
}
],
"summary": "Review of the CT imaging for [patient] demonstrates a 2.8cm right upper lobe mass...",
"csv_audit": "step_number,region_of_concern,bbox,deduction,reasoning,flagged_by_comparison,final_diagnosis,severity,confidence,models_agreed\n..."
}Findings with models_agreed: false or in the unresolved array carry severity: "human_review" and should be highlighted for clinician attention.
GET /api/cases/:caseId/audit.csv
Downloads a structured CSV audit trail:
Content-Type: text/csv
Content-Disposition: attachment; filename="case_1_audit.csv"
Columns: step_number, region_of_concern, bbox, deduction, reasoning, flagged_by_comparison, final_diagnosis, severity, confidence, models_agreed
| Method | Route | Description |
|---|---|---|
GET |
/api/patients |
List all patients |
GET |
/api/patients/:id/cases |
List all cases for a patient |
| Method | Route | Description |
|---|---|---|
GET |
/api/scans/:caseId |
List all scans for a case with public image URLs |
GET |
/api/scans/image/:scanId |
Serve a DICOM scan converted to PNG |
GET |
/scans/* |
Static image files served directly |
| Method | Route | Description |
|---|---|---|
GET |
/api/health |
Returns { "status": "ok", "timestamp": "..." } |
All four roles use claude-sonnet-4-5-20250514 via the Anthropic SDK. They differ only in system prompt and temperature.
| Role | File | Temperature | Max tokens | Purpose |
|---|---|---|---|---|
| Iterative | iterativeModel.js |
0.1 |
2000 | One careful logical step per call. Receives patient data + all images + logical chain + flagged regions. Cannot declare a final conclusion without prerequisite steps. |
| Conclusions | conclusionsModel.js |
0.7 |
3000 | All possible findings at once. Receives patient data + all images + logical chain (as context only). Does NOT receive flagged regions — operates independently. |
| Comparison | comparisonModel.js |
0.0 |
1500 | Deterministic semantic matching. Text only — no images. Compares both model outputs and flags disagreement regions. |
| Summary | summaryModel.js |
0.3 |
2000 | Writes a plain-English narrative explaining how each diagnosis was reached. Text only — no images. |
The CSV audit trail is generated programmatically by
generateAuditCSV()inorchestrator.js— it reads from the accumulated logical steps, not from an AI model.
Both the iterative and conclusions models receive, every cycle, in this order:
- Knowledge base PDFs — all PDFs from
data/knowledge/, sent asdocumentcontent blocks (loaded once at startup, cached for the server lifetime) - Prior scan images — base64 PNG, oldest first, each preceded by a label with scan type, date, and DICOM metadata
- Latest scan image — base64 PNG with label
- Text prompt — patient demographics, full medical history, imaging context, accumulated logical steps, and the instruction
Only the iterative model additionally receives flagged regions (image bounding boxes where the previous cycle's comparison found disagreement).
The comparison and summary models receive text only — no images, no PDFs.
All models output findings with percentage-based bounding boxes:
"bbox": { "x_pct": 53, "y_pct": 27, "w_pct": 16, "h_pct": 14 }The frontend overlays these absolutely on the image:
.finding-box {
position: absolute;
left: 53%; /* x_pct */
top: 27%; /* y_pct */
width: 16%; /* w_pct */
height: 14%; /* h_pct */
}| Severity | Colour (suggested) | Meaning |
|---|---|---|
critical |
Red | Immediate action required |
moderate |
Amber | Follow-up or monitoring required |
low |
Green | Incidental, stable, likely benign |
human_review |
Purple (dashed border) | Models disagreed — clinician must check |
Defined in server/db/schema.sql. Applied automatically on server startup via initDb().
| Column | Type | Description |
|---|---|---|
id |
INTEGER PK |
Auto-increment |
name |
TEXT NOT NULL |
Full name |
dob |
TEXT NOT NULL |
Date of birth |
nhs_number |
TEXT UNIQUE NOT NULL |
NHS identifier |
medical_history |
TEXT |
JSON blob — conditions, medications, procedures, observations |
contact_details |
TEXT |
JSON — phone, email, address |
| Column | Type | Description |
|---|---|---|
id |
INTEGER PK |
Auto-increment |
patient_id |
INTEGER |
→ Patient.id |
status |
TEXT |
pending | running | converged | human_review | error |
created_at |
TEXT |
Timestamp |
completed_at |
TEXT |
Set on completion |
total_cycles |
INTEGER |
Number of feedback loop cycles run |
final_output |
TEXT |
JSON — full results payload including findings, reasoning, summary, csv_audit, unresolved |
| Column | Type | Description |
|---|---|---|
id |
INTEGER PK |
Auto-increment |
case_id |
INTEGER |
→ Cases.id |
scan_type |
TEXT NOT NULL |
e.g. CT, MRI, X-Ray |
date |
TEXT NOT NULL |
Date of scan |
images_file |
TEXT NOT NULL |
Path to image file (.png or .dcm) |
meta_data |
TEXT |
JSON — DICOM headers (modality, slice thickness, window/level) |
is_latest |
INTEGER |
1 = the scan being analysed this case; 0 = prior context |
One row per feedback loop cycle. Builds the logical reasoning chain.
| Column | Type | Description |
|---|---|---|
id |
INTEGER PK |
Auto-increment |
case_id |
INTEGER |
→ Cases.id |
step_number |
INTEGER |
Cycle number |
new_deduction |
TEXT |
The single new logical conclusion this cycle |
new_reasoning |
TEXT |
JSON — reasoning behind this deduction |
cumulative_deductions |
TEXT |
JSON array — all deductions up to this cycle |
cumulative_reasoning |
TEXT |
JSON array — all reasoning up to this cycle |
flagged_regions |
TEXT |
JSON — bounding boxes flagged for re-examination (written after comparison runs) |
created_at |
TEXT |
Timestamp |
One row per feedback loop cycle. Stores the conclusions model's full output.
| Column | Type | Description |
|---|---|---|
id |
INTEGER PK |
Auto-increment |
case_id |
INTEGER |
→ Cases.id |
run_number |
INTEGER |
Cycle number (matches step_number) |
triggered_by_iteration_id |
INTEGER |
→ Iteration_steps.id for the same cycle |
output_text |
TEXT |
JSON — full conclusions array with severity, confidence, bounding boxes |
created_at |
TEXT |
Timestamp |
Patient
└──< Cases
├──< Scans
├──< Iteration_steps
└──< Conclusion_runs
└── triggered_by → Iteration_steps
- Synthetic patient records generated by Synthea
- Format: FHIR R4 JSON bundles
- Location:
data/fhir/patient.json - Contains: demographics, conditions, medications, procedures, observations, social history
- Format:
.dcm(DICOM) - Location:
data/scans/visit_1/,data/scans/visit_2/ - Dataset: RIDER Lung CT (two visits, ~5 weeks apart) — suitable for interval change analysis
- Conversion:
server/utils/dicomConverter.jsconverts DICOM → PNG usingdicom-parser+sharp(pure JavaScript, no Python required) - The scan date, type, and DICOM metadata are extracted at seed time and stored in the
Scanstable, giving AI models full imaging context alongside the visual data
- Location:
data/knowledge/ - Current contents:
textbook1.pdf,textbook2.pdf(medical reference textbooks) - All PDFs in this folder are loaded on server startup by
server/utils/knowledgeLoader.js, base64-encoded, and cached in memory for the lifetime of the process — no re-reading from disk on each API call - Sent as
documentcontent blocks to the iterative and conclusions models on every call, placed before the scan images in the message content - Both models are instructed to cross-reference their findings against the textbooks and cite specific guidelines in their reasoning, making deductions verifiable against source material
- To add more reference material: drop additional PDFs into
data/knowledge/and restart the server - If combined PDF size exceeds 50MB, the server logs a warning at startup