MedExplain

An explainable AI system for medical diagnosis. It analyses patient records and CT scans through a dual-model convergence loop that produces diagnoses with full step-by-step reasoning trails — making every conclusion traceable from raw data to diagnosis.

Built for Dublin Hack Europe 2026.

The Problem

AI in medicine has a trust problem. Deep learning models can match or exceed human radiologists at detecting abnormalities — but they can't explain why they reached a conclusion. A clinician presented with "92% chance of malignancy" and no reasoning has no way to verify the finding, catch errors, or build trust in the system.

Regulatory bodies (FDA, MHRA, EU AI Act) increasingly require that high-risk AI systems provide meaningful explanations for their outputs. Black-box models don't meet this bar.

MedExplain addresses this directly: every diagnosis comes with a complete, traceable chain of logical reasoning that a clinician can audit step by step.

Our Solution: Explainable AI via Dual-Model Convergence

Why Two Models?

A single model giving you a diagnosis is an opinion. Two models with fundamentally different reasoning strategies arriving at the same diagnosis is evidence.

MedExplain runs two AI models against the same patient data simultaneously:

	Conclusions Model	Iterative Model
Strategy	"Shotgun" — find everything at once	"First principles" — one step at a time
Temperature	0.7 (high entropy, creative)	0.1 (low entropy, cautious)
Bias	Favours false positives over false negatives	Favours precision — won't conclude without evidence
Output	ALL possible findings immediately	Single next logical step only
When it can diagnose	Always — it guesses aggressively	Only when every prerequisite logical step exists

The conclusions model catches things the iterative model hasn't reached yet. The iterative model prevents the conclusions model from hallucinating. Together they're stronger than either alone.

The Feedback Loop

The two models don't just run once — they run in a convergence loop:

Both models analyse the patient data (images + records + any reasoning built so far) in parallel.
A comparison model checks whether their final conclusions match.
If they match → the system has converged. We trust the result.
If they don't match →
- The iterative model's latest logical step is added to a growing reasoning chain.
- The specific image regions where conclusions differ are flagged as "to be double-checked".
- Both models re-run with the updated reasoning chain, and the iterative model is explicitly instructed to examine the flagged regions.
Repeat until convergence or a maximum of 5 cycles.

If 5 cycles pass without convergence, the case exits as human_review with the disagreement areas explicitly flagged. This isn't a failure — it's the system honestly saying "I'm not sure about these regions, a clinician should look here." Honest uncertainty is more valuable than false confidence.

Where the Explainability Comes From

The iterative model's reasoning chain IS the explanation. Each cycle, it outputs one logical step:

Cycle 1: "Patient has a 20 pack-year smoking history. Baseline risk for pulmonary malignancy is elevated."

Cycle 2: "Prior CT (12 Jan) showed a 2.4cm RUL nodule. Current scan shows 2.8cm — 4mm interval growth in 6 weeks."

Cycle 3: "Growth rate exceeds threshold for benign nodule. Spiculated margins visible. Both features raise suspicion for malignancy."

FINAL: "Suspicious RUL mass with interval growth — recommend tissue sampling / PET-CT."

Every conclusion has a traceable path from raw data to diagnosis. A summary model then writes this chain as a plain-English narrative for clinicians. A programmatic CSV audit trail is also generated — one row per logical step — so the frontend can render it as a table and external systems can consume it directly.

System Schematic

INPUT
  Patient records (FHIR) + DICOM scans
           │
           ▼
┌──────────────────────────────────────────────┐
│  FEEDBACK LOOP (max 5 cycles)                │
│                                              │
│  ┌─────────────────┐  ┌─────────────────┐   │
│  │ CONCLUSIONS     │  │ ITERATIVE       │   │
│  │ MODEL (0.7)     │  │ MODEL (0.1)     │   │
│  │                 │  │                 │   │
│  │ All findings    │  │ One logical     │   │
│  │ at once         │  │ step at a time  │   │
│  └────────┬────────┘  └────────┬────────┘   │
│           └──────────┬─────────┘            │
│                      ▼                       │
│           ┌──────────────────┐               │
│           │ COMPARISON       │               │
│           │ MODEL (0.0)      │               │
│           └────────┬─────────┘               │
│                    │                         │
│          ┌─────────┴──────────┐              │
│          ▼                    ▼              │
│    ✗ MISMATCH           ✓ MATCH             │
│    Add step to          CONVERGED           │
│    reasoning chain      Exit loop ──────────┼──► PHASE 3
│    Flag regions                              │
│    Loop again ──────────────────────────────┘
│    (or exit as human_review after 5 cycles)
│
▼
PHASE 3: OUTPUT
  SUMMARY MODEL (0.3)  → plain-English narrative
  generateAuditCSV()   → programmatic CSV audit trail (no AI)
  API response         → findings + bounding boxes + reasoning + summary + csv_audit

Design Principles

Honest uncertainty over false confidence. When models can't agree, the system says so explicitly and flags the specific image regions for a clinician — rather than picking one answer and hiding the disagreement.
False positives are cheaper than false negatives. The conclusions model is deliberately biased towards flagging everything. A false positive caught by the iterative model costs nothing. A false negative costs a patient.
Every exit path is useful. Converged case → full diagnosis. Max-iterations case → partial diagnosis with explicit disagreement areas. Error → whatever was collected so far is preserved in the database.
The reasoning IS the product. The step-by-step logical chain is not a side effect — it's the core deliverable. The diagnoses are just conclusions drawn from it.

Tech Stack

Layer	Technology
Backend / AI engine	Node.js + Express
AI models	Claude Sonnet (`claude-sonnet-4-5-20250514`) via `@anthropic-ai/sdk`
Database	SQLite via `sql.js` (pure JavaScript, no native bindings)
Medical imaging	`dicom-parser` + `sharp` (pure JS DICOM → PNG, no Python required)
Patient records	FHIR R4 (Synthea synthetic data)
Frontend	React + TypeScript + Vite + Tailwind + Framer Motion

Project Structure

MedExplain/
│
├── server/                        # Node.js AI engine + Express backend
│   ├── index.js                   # Express entry point — initialises DB then starts server
│   │
│   ├── engine/
│   │   ├── orchestrator.js        # Main feedback loop + generateAuditCSV()
│   │   ├── iterativeModel.js      # Step-by-step model (temp 0.1)
│   │   ├── conclusionsModel.js    # Comprehensive model (temp 0.7)
│   │   ├── comparisonModel.js     # Deterministic comparator (temp 0.0)
│   │   └── summaryModel.js        # Plain-English narrative model (temp 0.3)
│   │
│   ├── db/
│   │   ├── schema.sql             # Full DB schema
│   │   ├── connection.js          # sql.js wrapper — loads/saves DB to disk
│   │   └── queries.js             # All DB query functions
│   │
│   ├── routes/
│   │   ├── patients.js            # GET /api/patients, /api/patients/:id/cases
│   │   ├── cases.js               # POST /run, GET /status, GET /results, GET /audit.csv
│   │   └── results.js             # GET /api/scans/:caseId + DICOM image serving
│   │
│   ├── prompts/
│   │   ├── iterative.txt          # System prompt — cautious step-by-step radiologist
│   │   ├── conclusions.txt        # System prompt — aggressive comprehensive reviewer
│   │   ├── comparison.txt         # System prompt — semantic logic comparator
│   │   └── summary.txt            # System prompt — plain-English report writer
│   │
│   └── utils/
│       ├── imageLoader.js         # File path → base64 (handles PNG + DCM, Windows paths)
│       ├── dicomConverter.js      # Pure-JS DICOM → PNG via dicom-parser + sharp
│       ├── knowledgeLoader.js     # Loads PDFs from data/knowledge/ once, caches in memory
│       └── logger.js              # Timestamped cycle-by-cycle logging
│
├── client/                        # React frontend (Vite + Tailwind + Framer Motion)
│   ├── src/
│   │   ├── components/            # MindMapCanvas, ImageViewer, BoundingBox, etc.
│   │   ├── hooks/                 # useCases, usePatients, useResults, useScans
│   │   ├── api/                   # Axios API client
│   │   └── types/                 # TypeScript interfaces
│   ├── package.json
│   └── vite.config.ts
│
├── scripts/
│   └── seed.js                    # Seed DB from FHIR patient.json + DICOM scans
│
├── data/
│   ├── medical_demo.db            # SQLite database (auto-created on first run)
│   ├── fhir/                      # Synthetic FHIR R4 patient JSON files (Synthea)
│   ├── scans/                     # DICOM scan files (.dcm), organised by visit
│   │   ├── visit_1/
│   │   └── visit_2/
│   └── knowledge/                 # Reference material for model context (radiology guidelines, etc.)
│
├── package.json                   # Node.js dependencies
└── .env                           # API key + config (gitignored)

Setup

1 — Install dependencies

npm install

2 — Configure environment

Create a .env file in the project root:

ANTHROPIC_API_KEY=sk-ant-your-key-here
DB_PATH=./data/medical_demo.db
MAX_ITERATIONS=5
PORT=3001

3 — Seed the database (once only)

Loads the patient from data/fhir/patient.json and all DICOM scans from data/scans/:

node scripts/seed.js

This is idempotent — if the database already has data it exits immediately without re-seeding.

4 — Start the backend

npm start

The server initialises the database schema on startup, then listens on http://localhost:3001.

5 — Start the frontend

cd client
npm install
npm run dev

The frontend runs at http://localhost:5173 and proxies API requests to the backend at http://localhost:3001.

Frontend features:

Select patient and case from dropdowns
View CT scans with DICOM → PNG conversion
Mind-map style display linking findings to conclusion nodes
Animated region timeline showing changes over time
Colour-coded severity: green (low), amber (moderate), red (critical), purple (human review)

Environment Variables

Variable	Default	Description
`ANTHROPIC_API_KEY`	—	Your Anthropic API key (required)
`DB_PATH`	`./data/medical_demo.db`	Path to the SQLite database
`MAX_ITERATIONS`	`5`	Maximum feedback loop cycles before forcing `human_review` exit
`PORT`	`3001`	Express server port

API Endpoints

All endpoints are served by the Node.js backend on port 3001.

Cases

Method	Route	Description
`POST`	`/api/cases/:caseId/run`	Trigger the AI feedback loop. Returns immediately; analysis runs in background.
`GET`	`/api/cases/:caseId/status`	Poll for current status and cycle count while running.
`GET`	`/api/cases/:caseId/results`	Full results once analysis is complete.
`GET`	`/api/cases/:caseId/audit.csv`	Download the full reasoning audit trail as a CSV file.

POST /api/cases/:caseId/run

{ "status": "running", "caseId": 1 }

GET /api/cases/:caseId/status

{ "status": "running", "currentCycle": 2, "maxCycles": 5 }

Status values: pending | running | converged | human_review | error

GET /api/cases/:caseId/results

{
  "status": "converged",
  "totalCycles": 2,
  "findings": [
    {
      "id": "F1",
      "label": "Right upper lobe mass — suspected malignancy",
      "severity": "critical",
      "confidence": 0.94,
      "bbox": { "x_pct": 53, "y_pct": 27, "w_pct": 16, "h_pct": 14 },
      "anatomical_region": "Right upper lobe, anterior segment",
      "models_agreed": true,
      "converged_cycle": 2
    }
  ],
  "unresolved": [],
  "reasoning": [
    {
      "step": 1,
      "cycle": 1,
      "deduction": "Patient has 20 pack-year smoking history — elevated baseline risk",
      "reasoning": "Extracted from medical_history field"
    }
  ],
  "summary": "Review of the CT imaging for [patient] demonstrates a 2.8cm right upper lobe mass...",
  "csv_audit": "step_number,region_of_concern,bbox,deduction,reasoning,flagged_by_comparison,final_diagnosis,severity,confidence,models_agreed\n..."
}

Findings with models_agreed: false or in the unresolved array carry severity: "human_review" and should be highlighted for clinician attention.

GET /api/cases/:caseId/audit.csv

Downloads a structured CSV audit trail:

Content-Type: text/csv
Content-Disposition: attachment; filename="case_1_audit.csv"

Columns: step_number, region_of_concern, bbox, deduction, reasoning, flagged_by_comparison, final_diagnosis, severity, confidence, models_agreed

Patients

Method	Route	Description
`GET`	`/api/patients`	List all patients
`GET`	`/api/patients/:id/cases`	List all cases for a patient

Scans

Method	Route	Description
`GET`	`/api/scans/:caseId`	List all scans for a case with public image URLs
`GET`	`/api/scans/image/:scanId`	Serve a DICOM scan converted to PNG
`GET`	`/scans/*`	Static image files served directly

Health check

Method	Route	Description
`GET`	`/api/health`	Returns `{ "status": "ok", "timestamp": "..." }`

AI Engine — Model Details

All four roles use claude-sonnet-4-5-20250514 via the Anthropic SDK. They differ only in system prompt and temperature.

Role	File	Temperature	Max tokens	Purpose
Iterative	`iterativeModel.js`	`0.1`	2000	One careful logical step per call. Receives patient data + all images + logical chain + flagged regions. Cannot declare a final conclusion without prerequisite steps.
Conclusions	`conclusionsModel.js`	`0.7`	3000	All possible findings at once. Receives patient data + all images + logical chain (as context only). Does NOT receive flagged regions — operates independently.
Comparison	`comparisonModel.js`	`0.0`	1500	Deterministic semantic matching. Text only — no images. Compares both model outputs and flags disagreement regions.
Summary	`summaryModel.js`	`0.3`	2000	Writes a plain-English narrative explaining how each diagnosis was reached. Text only — no images.

The CSV audit trail is generated programmatically by generateAuditCSV() in orchestrator.js — it reads from the accumulated logical steps, not from an AI model.

What each model receives

Both the iterative and conclusions models receive, every cycle, in this order:

Knowledge base PDFs — all PDFs from data/knowledge/, sent as document content blocks (loaded once at startup, cached for the server lifetime)
Prior scan images — base64 PNG, oldest first, each preceded by a label with scan type, date, and DICOM metadata
Latest scan image — base64 PNG with label
Text prompt — patient demographics, full medical history, imaging context, accumulated logical steps, and the instruction

Only the iterative model additionally receives flagged regions (image bounding boxes where the previous cycle's comparison found disagreement).

The comparison and summary models receive text only — no images, no PDFs.

Bounding box format

All models output findings with percentage-based bounding boxes:

"bbox": { "x_pct": 53, "y_pct": 27, "w_pct": 16, "h_pct": 14 }

The frontend overlays these absolutely on the image:

.finding-box {
  position: absolute;
  left: 53%;   /* x_pct */
  top: 27%;    /* y_pct */
  width: 16%;  /* w_pct */
  height: 14%; /* h_pct */
}

Severity levels

Severity	Colour (suggested)	Meaning
`critical`	Red	Immediate action required
`moderate`	Amber	Follow-up or monitoring required
`low`	Green	Incidental, stable, likely benign
`human_review`	Purple (dashed border)	Models disagreed — clinician must check

Database Schema

Defined in server/db/schema.sql. Applied automatically on server startup via initDb().

Tables

Patient

Column	Type	Description
`id`	`INTEGER PK`	Auto-increment
`name`	`TEXT NOT NULL`	Full name
`dob`	`TEXT NOT NULL`	Date of birth
`nhs_number`	`TEXT UNIQUE NOT NULL`	NHS identifier
`medical_history`	`TEXT`	JSON blob — conditions, medications, procedures, observations
`contact_details`	`TEXT`	JSON — phone, email, address

Cases

Column	Type	Description
`id`	`INTEGER PK`	Auto-increment
`patient_id`	`INTEGER`	→ `Patient.id`
`status`	`TEXT`	`pending` \| `running` \| `converged` \| `human_review` \| `error`
`created_at`	`TEXT`	Timestamp
`completed_at`	`TEXT`	Set on completion
`total_cycles`	`INTEGER`	Number of feedback loop cycles run
`final_output`	`TEXT`	JSON — full results payload including `findings`, `reasoning`, `summary`, `csv_audit`, `unresolved`

Scans

Column	Type	Description
`id`	`INTEGER PK`	Auto-increment
`case_id`	`INTEGER`	→ `Cases.id`
`scan_type`	`TEXT NOT NULL`	e.g. `CT`, `MRI`, `X-Ray`
`date`	`TEXT NOT NULL`	Date of scan
`images_file`	`TEXT NOT NULL`	Path to image file (`.png` or `.dcm`)
`meta_data`	`TEXT`	JSON — DICOM headers (modality, slice thickness, window/level)
`is_latest`	`INTEGER`	`1` = the scan being analysed this case; `0` = prior context

Iteration_steps

One row per feedback loop cycle. Builds the logical reasoning chain.

Column	Type	Description
`id`	`INTEGER PK`	Auto-increment
`case_id`	`INTEGER`	→ `Cases.id`
`step_number`	`INTEGER`	Cycle number
`new_deduction`	`TEXT`	The single new logical conclusion this cycle
`new_reasoning`	`TEXT`	JSON — reasoning behind this deduction
`cumulative_deductions`	`TEXT`	JSON array — all deductions up to this cycle
`cumulative_reasoning`	`TEXT`	JSON array — all reasoning up to this cycle
`flagged_regions`	`TEXT`	JSON — bounding boxes flagged for re-examination (written after comparison runs)
`created_at`	`TEXT`	Timestamp

Conclusion_runs

One row per feedback loop cycle. Stores the conclusions model's full output.

Column	Type	Description
`id`	`INTEGER PK`	Auto-increment
`case_id`	`INTEGER`	→ `Cases.id`
`run_number`	`INTEGER`	Cycle number (matches `step_number`)
`triggered_by_iteration_id`	`INTEGER`	→ `Iteration_steps.id` for the same cycle
`output_text`	`TEXT`	JSON — full conclusions array with severity, confidence, bounding boxes
`created_at`	`TEXT`	Timestamp

Relationships

Patient
  └──< Cases
         ├──< Scans
         ├──< Iteration_steps
         └──< Conclusion_runs
                    └── triggered_by → Iteration_steps

Data

FHIR Patient Records

Synthetic patient records generated by Synthea
Format: FHIR R4 JSON bundles
Location: data/fhir/patient.json
Contains: demographics, conditions, medications, procedures, observations, social history

DICOM Scans

Format: .dcm (DICOM)
Location: data/scans/visit_1/, data/scans/visit_2/
Dataset: RIDER Lung CT (two visits, ~5 weeks apart) — suitable for interval change analysis
Conversion: server/utils/dicomConverter.js converts DICOM → PNG using dicom-parser + sharp (pure JavaScript, no Python required)
The scan date, type, and DICOM metadata are extracted at seed time and stored in the Scans table, giving AI models full imaging context alongside the visual data

Knowledge Base

Location: data/knowledge/
Current contents: textbook1.pdf, textbook2.pdf (medical reference textbooks)
All PDFs in this folder are loaded on server startup by server/utils/knowledgeLoader.js, base64-encoded, and cached in memory for the lifetime of the process — no re-reading from disk on each API call
Sent as document content blocks to the iterative and conclusions models on every call, placed before the scan images in the message content
Both models are instructed to cross-reference their findings against the textbooks and cite specific guidelines in their reasoning, making deductions verifiable against source material
To add more reference material: drop additional PDFs into data/knowledge/ and restart the server
If combined PDF size exceeds 50MB, the server logs a warning at startup

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.claude		.claude
__pycache__		__pycache__
client		client
data		data
scripts		scripts
server		server
services		services
.gitignore		.gitignore
PLAN.md		PLAN.md
README.md		README.md
app.py		app.py
config.py		config.py
debug.log		debug.log
init_db.py		init_db.py
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
schema.sql		schema.sql

Folders and files

Latest commit

History

Repository files navigation

MedExplain

Table of Contents

The Problem

Our Solution: Explainable AI via Dual-Model Convergence

Why Two Models?

The Feedback Loop

Where the Explainability Comes From

System Schematic

Design Principles

Tech Stack

Project Structure

Setup

1 — Install dependencies

2 — Configure environment

3 — Seed the database (once only)

4 — Start the backend

5 — Start the frontend

Environment Variables

API Endpoints

Cases

Patients

Scans

Health check

AI Engine — Model Details

What each model receives

Bounding box format

Severity levels

Database Schema

Tables

Patient

Cases

Scans

Iteration_steps

Conclusion_runs

Relationships

Data

FHIR Patient Records

DICOM Scans

Knowledge Base

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages