Skip to content

BeaHack/MedExplain

Repository files navigation

MedExplain

An explainable AI system for medical diagnosis. It analyses patient records and CT scans through a dual-model convergence loop that produces diagnoses with full step-by-step reasoning trails — making every conclusion traceable from raw data to diagnosis.

Built for Dublin Hack Europe 2026.


Table of Contents


The Problem

AI in medicine has a trust problem. Deep learning models can match or exceed human radiologists at detecting abnormalities — but they can't explain why they reached a conclusion. A clinician presented with "92% chance of malignancy" and no reasoning has no way to verify the finding, catch errors, or build trust in the system.

Regulatory bodies (FDA, MHRA, EU AI Act) increasingly require that high-risk AI systems provide meaningful explanations for their outputs. Black-box models don't meet this bar.

MedExplain addresses this directly: every diagnosis comes with a complete, traceable chain of logical reasoning that a clinician can audit step by step.


Our Solution: Explainable AI via Dual-Model Convergence

Why Two Models?

A single model giving you a diagnosis is an opinion. Two models with fundamentally different reasoning strategies arriving at the same diagnosis is evidence.

MedExplain runs two AI models against the same patient data simultaneously:

Conclusions Model Iterative Model
Strategy "Shotgun" — find everything at once "First principles" — one step at a time
Temperature 0.7 (high entropy, creative) 0.1 (low entropy, cautious)
Bias Favours false positives over false negatives Favours precision — won't conclude without evidence
Output ALL possible findings immediately Single next logical step only
When it can diagnose Always — it guesses aggressively Only when every prerequisite logical step exists

The conclusions model catches things the iterative model hasn't reached yet. The iterative model prevents the conclusions model from hallucinating. Together they're stronger than either alone.

The Feedback Loop

The two models don't just run once — they run in a convergence loop:

  1. Both models analyse the patient data (images + records + any reasoning built so far) in parallel.
  2. A comparison model checks whether their final conclusions match.
  3. If they match → the system has converged. We trust the result.
  4. If they don't match
    • The iterative model's latest logical step is added to a growing reasoning chain.
    • The specific image regions where conclusions differ are flagged as "to be double-checked".
    • Both models re-run with the updated reasoning chain, and the iterative model is explicitly instructed to examine the flagged regions.
  5. Repeat until convergence or a maximum of 5 cycles.

If 5 cycles pass without convergence, the case exits as human_review with the disagreement areas explicitly flagged. This isn't a failure — it's the system honestly saying "I'm not sure about these regions, a clinician should look here." Honest uncertainty is more valuable than false confidence.

Where the Explainability Comes From

The iterative model's reasoning chain IS the explanation. Each cycle, it outputs one logical step:

Cycle 1: "Patient has a 20 pack-year smoking history. Baseline risk for pulmonary malignancy is elevated."

Cycle 2: "Prior CT (12 Jan) showed a 2.4cm RUL nodule. Current scan shows 2.8cm — 4mm interval growth in 6 weeks."

Cycle 3: "Growth rate exceeds threshold for benign nodule. Spiculated margins visible. Both features raise suspicion for malignancy."

FINAL: "Suspicious RUL mass with interval growth — recommend tissue sampling / PET-CT."

Every conclusion has a traceable path from raw data to diagnosis. A summary model then writes this chain as a plain-English narrative for clinicians. A programmatic CSV audit trail is also generated — one row per logical step — so the frontend can render it as a table and external systems can consume it directly.

System Schematic

INPUT
  Patient records (FHIR) + DICOM scans
           │
           ▼
┌──────────────────────────────────────────────┐
│  FEEDBACK LOOP (max 5 cycles)                │
│                                              │
│  ┌─────────────────┐  ┌─────────────────┐   │
│  │ CONCLUSIONS     │  │ ITERATIVE       │   │
│  │ MODEL (0.7)     │  │ MODEL (0.1)     │   │
│  │                 │  │                 │   │
│  │ All findings    │  │ One logical     │   │
│  │ at once         │  │ step at a time  │   │
│  └────────┬────────┘  └────────┬────────┘   │
│           └──────────┬─────────┘            │
│                      ▼                       │
│           ┌──────────────────┐               │
│           │ COMPARISON       │               │
│           │ MODEL (0.0)      │               │
│           └────────┬─────────┘               │
│                    │                         │
│          ┌─────────┴──────────┐              │
│          ▼                    ▼              │
│    ✗ MISMATCH           ✓ MATCH             │
│    Add step to          CONVERGED           │
│    reasoning chain      Exit loop ──────────┼──► PHASE 3
│    Flag regions                              │
│    Loop again ──────────────────────────────┘
│    (or exit as human_review after 5 cycles)
│
▼
PHASE 3: OUTPUT
  SUMMARY MODEL (0.3)  → plain-English narrative
  generateAuditCSV()   → programmatic CSV audit trail (no AI)
  API response         → findings + bounding boxes + reasoning + summary + csv_audit

Design Principles

  • Honest uncertainty over false confidence. When models can't agree, the system says so explicitly and flags the specific image regions for a clinician — rather than picking one answer and hiding the disagreement.
  • False positives are cheaper than false negatives. The conclusions model is deliberately biased towards flagging everything. A false positive caught by the iterative model costs nothing. A false negative costs a patient.
  • Every exit path is useful. Converged case → full diagnosis. Max-iterations case → partial diagnosis with explicit disagreement areas. Error → whatever was collected so far is preserved in the database.
  • The reasoning IS the product. The step-by-step logical chain is not a side effect — it's the core deliverable. The diagnoses are just conclusions drawn from it.

Tech Stack

Layer Technology
Backend / AI engine Node.js + Express
AI models Claude Sonnet (claude-sonnet-4-5-20250514) via @anthropic-ai/sdk
Database SQLite via sql.js (pure JavaScript, no native bindings)
Medical imaging dicom-parser + sharp (pure JS DICOM → PNG, no Python required)
Patient records FHIR R4 (Synthea synthetic data)
Frontend React + TypeScript + Vite + Tailwind + Framer Motion

Project Structure

MedExplain/
│
├── server/                        # Node.js AI engine + Express backend
│   ├── index.js                   # Express entry point — initialises DB then starts server
│   │
│   ├── engine/
│   │   ├── orchestrator.js        # Main feedback loop + generateAuditCSV()
│   │   ├── iterativeModel.js      # Step-by-step model (temp 0.1)
│   │   ├── conclusionsModel.js    # Comprehensive model (temp 0.7)
│   │   ├── comparisonModel.js     # Deterministic comparator (temp 0.0)
│   │   └── summaryModel.js        # Plain-English narrative model (temp 0.3)
│   │
│   ├── db/
│   │   ├── schema.sql             # Full DB schema
│   │   ├── connection.js          # sql.js wrapper — loads/saves DB to disk
│   │   └── queries.js             # All DB query functions
│   │
│   ├── routes/
│   │   ├── patients.js            # GET /api/patients, /api/patients/:id/cases
│   │   ├── cases.js               # POST /run, GET /status, GET /results, GET /audit.csv
│   │   └── results.js             # GET /api/scans/:caseId + DICOM image serving
│   │
│   ├── prompts/
│   │   ├── iterative.txt          # System prompt — cautious step-by-step radiologist
│   │   ├── conclusions.txt        # System prompt — aggressive comprehensive reviewer
│   │   ├── comparison.txt         # System prompt — semantic logic comparator
│   │   └── summary.txt            # System prompt — plain-English report writer
│   │
│   └── utils/
│       ├── imageLoader.js         # File path → base64 (handles PNG + DCM, Windows paths)
│       ├── dicomConverter.js      # Pure-JS DICOM → PNG via dicom-parser + sharp
│       ├── knowledgeLoader.js     # Loads PDFs from data/knowledge/ once, caches in memory
│       └── logger.js              # Timestamped cycle-by-cycle logging
│
├── client/                        # React frontend (Vite + Tailwind + Framer Motion)
│   ├── src/
│   │   ├── components/            # MindMapCanvas, ImageViewer, BoundingBox, etc.
│   │   ├── hooks/                 # useCases, usePatients, useResults, useScans
│   │   ├── api/                   # Axios API client
│   │   └── types/                 # TypeScript interfaces
│   ├── package.json
│   └── vite.config.ts
│
├── scripts/
│   └── seed.js                    # Seed DB from FHIR patient.json + DICOM scans
│
├── data/
│   ├── medical_demo.db            # SQLite database (auto-created on first run)
│   ├── fhir/                      # Synthetic FHIR R4 patient JSON files (Synthea)
│   ├── scans/                     # DICOM scan files (.dcm), organised by visit
│   │   ├── visit_1/
│   │   └── visit_2/
│   └── knowledge/                 # Reference material for model context (radiology guidelines, etc.)
│
├── package.json                   # Node.js dependencies
└── .env                           # API key + config (gitignored)

Setup

1 — Install dependencies

npm install

2 — Configure environment

Create a .env file in the project root:

ANTHROPIC_API_KEY=sk-ant-your-key-here
DB_PATH=./data/medical_demo.db
MAX_ITERATIONS=5
PORT=3001

3 — Seed the database (once only)

Loads the patient from data/fhir/patient.json and all DICOM scans from data/scans/:

node scripts/seed.js

This is idempotent — if the database already has data it exits immediately without re-seeding.

4 — Start the backend

npm start

The server initialises the database schema on startup, then listens on http://localhost:3001.

5 — Start the frontend

cd client
npm install
npm run dev

The frontend runs at http://localhost:5173 and proxies API requests to the backend at http://localhost:3001.

Frontend features:

  • Select patient and case from dropdowns
  • View CT scans with DICOM → PNG conversion
  • Mind-map style display linking findings to conclusion nodes
  • Animated region timeline showing changes over time
  • Colour-coded severity: green (low), amber (moderate), red (critical), purple (human review)

Environment Variables

Variable Default Description
ANTHROPIC_API_KEY Your Anthropic API key (required)
DB_PATH ./data/medical_demo.db Path to the SQLite database
MAX_ITERATIONS 5 Maximum feedback loop cycles before forcing human_review exit
PORT 3001 Express server port

API Endpoints

All endpoints are served by the Node.js backend on port 3001.

Cases

Method Route Description
POST /api/cases/:caseId/run Trigger the AI feedback loop. Returns immediately; analysis runs in background.
GET /api/cases/:caseId/status Poll for current status and cycle count while running.
GET /api/cases/:caseId/results Full results once analysis is complete.
GET /api/cases/:caseId/audit.csv Download the full reasoning audit trail as a CSV file.

POST /api/cases/:caseId/run

{ "status": "running", "caseId": 1 }

GET /api/cases/:caseId/status

{ "status": "running", "currentCycle": 2, "maxCycles": 5 }

Status values: pending | running | converged | human_review | error

GET /api/cases/:caseId/results

{
  "status": "converged",
  "totalCycles": 2,
  "findings": [
    {
      "id": "F1",
      "label": "Right upper lobe mass — suspected malignancy",
      "severity": "critical",
      "confidence": 0.94,
      "bbox": { "x_pct": 53, "y_pct": 27, "w_pct": 16, "h_pct": 14 },
      "anatomical_region": "Right upper lobe, anterior segment",
      "models_agreed": true,
      "converged_cycle": 2
    }
  ],
  "unresolved": [],
  "reasoning": [
    {
      "step": 1,
      "cycle": 1,
      "deduction": "Patient has 20 pack-year smoking history — elevated baseline risk",
      "reasoning": "Extracted from medical_history field"
    }
  ],
  "summary": "Review of the CT imaging for [patient] demonstrates a 2.8cm right upper lobe mass...",
  "csv_audit": "step_number,region_of_concern,bbox,deduction,reasoning,flagged_by_comparison,final_diagnosis,severity,confidence,models_agreed\n..."
}

Findings with models_agreed: false or in the unresolved array carry severity: "human_review" and should be highlighted for clinician attention.

GET /api/cases/:caseId/audit.csv

Downloads a structured CSV audit trail:

Content-Type: text/csv
Content-Disposition: attachment; filename="case_1_audit.csv"

Columns: step_number, region_of_concern, bbox, deduction, reasoning, flagged_by_comparison, final_diagnosis, severity, confidence, models_agreed

Patients

Method Route Description
GET /api/patients List all patients
GET /api/patients/:id/cases List all cases for a patient

Scans

Method Route Description
GET /api/scans/:caseId List all scans for a case with public image URLs
GET /api/scans/image/:scanId Serve a DICOM scan converted to PNG
GET /scans/* Static image files served directly

Health check

Method Route Description
GET /api/health Returns { "status": "ok", "timestamp": "..." }

AI Engine — Model Details

All four roles use claude-sonnet-4-5-20250514 via the Anthropic SDK. They differ only in system prompt and temperature.

Role File Temperature Max tokens Purpose
Iterative iterativeModel.js 0.1 2000 One careful logical step per call. Receives patient data + all images + logical chain + flagged regions. Cannot declare a final conclusion without prerequisite steps.
Conclusions conclusionsModel.js 0.7 3000 All possible findings at once. Receives patient data + all images + logical chain (as context only). Does NOT receive flagged regions — operates independently.
Comparison comparisonModel.js 0.0 1500 Deterministic semantic matching. Text only — no images. Compares both model outputs and flags disagreement regions.
Summary summaryModel.js 0.3 2000 Writes a plain-English narrative explaining how each diagnosis was reached. Text only — no images.

The CSV audit trail is generated programmatically by generateAuditCSV() in orchestrator.js — it reads from the accumulated logical steps, not from an AI model.

What each model receives

Both the iterative and conclusions models receive, every cycle, in this order:

  1. Knowledge base PDFs — all PDFs from data/knowledge/, sent as document content blocks (loaded once at startup, cached for the server lifetime)
  2. Prior scan images — base64 PNG, oldest first, each preceded by a label with scan type, date, and DICOM metadata
  3. Latest scan image — base64 PNG with label
  4. Text prompt — patient demographics, full medical history, imaging context, accumulated logical steps, and the instruction

Only the iterative model additionally receives flagged regions (image bounding boxes where the previous cycle's comparison found disagreement).

The comparison and summary models receive text only — no images, no PDFs.

Bounding box format

All models output findings with percentage-based bounding boxes:

"bbox": { "x_pct": 53, "y_pct": 27, "w_pct": 16, "h_pct": 14 }

The frontend overlays these absolutely on the image:

.finding-box {
  position: absolute;
  left: 53%;   /* x_pct */
  top: 27%;    /* y_pct */
  width: 16%;  /* w_pct */
  height: 14%; /* h_pct */
}

Severity levels

Severity Colour (suggested) Meaning
critical Red Immediate action required
moderate Amber Follow-up or monitoring required
low Green Incidental, stable, likely benign
human_review Purple (dashed border) Models disagreed — clinician must check

Database Schema

Defined in server/db/schema.sql. Applied automatically on server startup via initDb().

Tables

Patient

Column Type Description
id INTEGER PK Auto-increment
name TEXT NOT NULL Full name
dob TEXT NOT NULL Date of birth
nhs_number TEXT UNIQUE NOT NULL NHS identifier
medical_history TEXT JSON blob — conditions, medications, procedures, observations
contact_details TEXT JSON — phone, email, address

Cases

Column Type Description
id INTEGER PK Auto-increment
patient_id INTEGER Patient.id
status TEXT pending | running | converged | human_review | error
created_at TEXT Timestamp
completed_at TEXT Set on completion
total_cycles INTEGER Number of feedback loop cycles run
final_output TEXT JSON — full results payload including findings, reasoning, summary, csv_audit, unresolved

Scans

Column Type Description
id INTEGER PK Auto-increment
case_id INTEGER Cases.id
scan_type TEXT NOT NULL e.g. CT, MRI, X-Ray
date TEXT NOT NULL Date of scan
images_file TEXT NOT NULL Path to image file (.png or .dcm)
meta_data TEXT JSON — DICOM headers (modality, slice thickness, window/level)
is_latest INTEGER 1 = the scan being analysed this case; 0 = prior context

Iteration_steps

One row per feedback loop cycle. Builds the logical reasoning chain.

Column Type Description
id INTEGER PK Auto-increment
case_id INTEGER Cases.id
step_number INTEGER Cycle number
new_deduction TEXT The single new logical conclusion this cycle
new_reasoning TEXT JSON — reasoning behind this deduction
cumulative_deductions TEXT JSON array — all deductions up to this cycle
cumulative_reasoning TEXT JSON array — all reasoning up to this cycle
flagged_regions TEXT JSON — bounding boxes flagged for re-examination (written after comparison runs)
created_at TEXT Timestamp

Conclusion_runs

One row per feedback loop cycle. Stores the conclusions model's full output.

Column Type Description
id INTEGER PK Auto-increment
case_id INTEGER Cases.id
run_number INTEGER Cycle number (matches step_number)
triggered_by_iteration_id INTEGER Iteration_steps.id for the same cycle
output_text TEXT JSON — full conclusions array with severity, confidence, bounding boxes
created_at TEXT Timestamp

Relationships

Patient
  └──< Cases
         ├──< Scans
         ├──< Iteration_steps
         └──< Conclusion_runs
                    └── triggered_by → Iteration_steps

Data

FHIR Patient Records

  • Synthetic patient records generated by Synthea
  • Format: FHIR R4 JSON bundles
  • Location: data/fhir/patient.json
  • Contains: demographics, conditions, medications, procedures, observations, social history

DICOM Scans

  • Format: .dcm (DICOM)
  • Location: data/scans/visit_1/, data/scans/visit_2/
  • Dataset: RIDER Lung CT (two visits, ~5 weeks apart) — suitable for interval change analysis
  • Conversion: server/utils/dicomConverter.js converts DICOM → PNG using dicom-parser + sharp (pure JavaScript, no Python required)
  • The scan date, type, and DICOM metadata are extracted at seed time and stored in the Scans table, giving AI models full imaging context alongside the visual data

Knowledge Base

  • Location: data/knowledge/
  • Current contents: textbook1.pdf, textbook2.pdf (medical reference textbooks)
  • All PDFs in this folder are loaded on server startup by server/utils/knowledgeLoader.js, base64-encoded, and cached in memory for the lifetime of the process — no re-reading from disk on each API call
  • Sent as document content blocks to the iterative and conclusions models on every call, placed before the scan images in the message content
  • Both models are instructed to cross-reference their findings against the textbooks and cite specific guidelines in their reasoning, making deductions verifiable against source material
  • To add more reference material: drop additional PDFs into data/knowledge/ and restart the server
  • If combined PDF size exceeds 50MB, the server logs a warning at startup

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors