Skip to content

Adithya-Rama/Hallucination-Aware-Medical-Image-Synthesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Hallucination-Aware Medical Image Synthesis

Python 3.8+ PyTorch License: MIT

Official implementation of "Hallucination-Aware Medical Image Synthesis Using Multi-Constraint Guided Diffusion for Colonoscopy Data Augmentation"

Authors: Adithya Rama, Justin Paul Kolengadan

Affiliation: Australian National University

Course: COMP8539/ENGN8501 - Advanced Topics


πŸ“‹ Table of Contents


πŸ”¬ Overview

This repository implements a 4-constraint guided diffusion model for synthesizing realistic colonoscopy images with controllable polyp characteristics. Our approach addresses hallucination in medical imaging through:

  • Multi-stage LoRA fine-tuning (SD 1.5) for domain adaptation
  • ControlNet for spatial mask conditioning
  • 4 constraint heads : Segmentation, Size, BBPS Quality, Instrument Detection
  • Latent-space guidance during diffusion sampling
  • Comprehensive medical verification suite

Key Innovation : We guide the diffusion process using gradients from 4 independent classifiers, ensuring generated images satisfy medical constraints (polyp size, bowel preparation quality, instrument presence, spatial accuracy).


✨ Key Features

Multi-Constraint Framework

  • Polyp Segmentation (U-Net ResNet34): Binary mask IoU β‰₯ 0.45
  • Size Classification (ResNet18): Small/Medium/Large polyp categorization
  • BBPS Quality (ResNet18): 4-class bowel preparation scoring (0-3)
  • Instrument Detection (ResNet18): Binary tool presence classification

Training Pipeline

  1. Phase 1 : Masked domain adaptation on Kvasir-SEG (2000 steps)
  2. Phase 2 : Rich prompt conditioning on HyperKvasir (1500 steps)
  3. ControlNet : Mask-conditioned spatial control (3000 steps)
  4. Latent Guidance : Every-k-step gradient descent in latent space

Generation Quality

  • Automated filtering : Keep only outputs passing all 4 constraints
  • Medical verification : Specular highlights, brightness, edge density checks
  • Ablation support : Toggle individual constraints to measure contribution

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Stable Diffusion 1.5 Backbone                 β”‚
β”‚   + LoRA (r=8, Phase 1 β†’ Phase 2)              β”‚
β”‚   + ControlNet (Mask Conditioning)              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
      β”‚ Latent Guidanceβ”‚ (Every 3 steps)
      β”‚ L = λ₁·L_seg + β”‚
      β”‚     Ξ»β‚‚Β·L_size +β”‚
      β”‚     λ₃·L_BBPS +β”‚
      β”‚     Ξ»β‚„Β·L_tool  β”‚
      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ 4 Constraint Heads       β”‚
    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
    β”‚ 1. Seg (U-Net ResNet34)  β”‚ β†’ IoU β‰₯ 0.45
    β”‚ 2. Size (ResNet18)       β”‚ β†’ Match target
    β”‚ 3. BBPS (ResNet18)       β”‚ β†’ Score 0-3
    β”‚ 4. Instrument (ResNet18) β”‚ β†’ Present/Absent
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
      β”‚ Medical Checks β”‚
      β”‚ - Specular     β”‚
      β”‚ - Brightness   β”‚
      β”‚ - Edge Density β”‚
      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ› οΈ Installation

Prerequisites

  • Python 3.8+
  • CUDA 11.8+ (for GPU)
  • 24GB GPU RAM recommended (Colab Pro with A100/L4)

Quick Setup

# Clone repository
git clone https://github.com/yourusername/hallucination-aware-medical-synthesis.git
cd hallucination-aware-medical-synthesis

# Install dependencies
pip install -r requirements.txt

# Or use Colab with our provided notebook
# (Upload Research_Project.ipynb to Google Colab)

Key Dependencies

torch>=2.0.0
diffusers==0.30.1
transformers==4.35.2
peft==0.8.2
segmentation-models-pytorch
timm
albumentations

πŸ“Š Dataset Setup

Required Datasets

Our pipeline uses 4 public colonoscopy datasets :

Dataset Purpose Size Download
Kvasir-SEG Polyp segmentation, Phase 1 training 1,000 images Kaggle
HyperKvasir Rich captions, Phase 2 training 10k+ labeled Simula
Kvasir-Instrument Tool detection training 590 frames Kaggle
Nerthus BBPS quality training Video frames Simula

Automatic Download (in Colab)

# Set up Kaggle credentials
!mkdir -p ~/.kaggle
!cp /path/to/kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

# Download datasets (automated in notebook)
!kaggle datasets download -d fkarimovv/kvasir-seg
!kaggle datasets download -d debeshjha1/kvasirinstrument
# ... (see notebook for full commands)

Expected Directory Structure

data/
β”œβ”€β”€ kvasir_seg/
β”‚   β”œβ”€β”€ images/          # 1000 colonoscopy images
β”‚   └── masks/           # Binary polyp masks
β”œβ”€β”€ hyper_kvasir/
β”‚   β”œβ”€β”€ labeled_images/
β”‚   β”‚   └── image-labels.csv
β”‚   └── segmented_images/
β”‚       β”œβ”€β”€ images/
β”‚       └── masks/
β”œβ”€β”€ kvasir_instrument/
β”‚   └── images/          # Frames with tools
└── nerthus_videos/
    └── nerthus-dataset-frames/  # BBPS-scored frames

πŸŽ“ Training Pipeline

Step 1: Train Constraint Heads

All 4 heads trained in notebook sections:
- U-Net Segmenter (10 epochs, Dice+BCE loss)
- Size Classifier (10 epochs, CrossEntropy)
- Instrument Classifier (8 epochs, weighted sampling)
- BBPS Classifier (8 epochs, 4-class)

Expected Performance :

  • Segmentation: Val IoU ~0.85
  • Size: Val Accuracy ~92%
  • Instrument: Val F1 ~0.95
  • BBPS: Val F1 ~0.87

Step 2: LoRA Fine-Tuning

Phase 1: Masked Domain Adaptation (2000 steps)

Trains on Kvasir-SEG with binary masks applied
Loss: MSE between predicted and actual noise
LoRA rank: 8, alpha: 16

Phase 2: Rich Prompt Conditioning (1500 steps)

Trains on HyperKvasir with semantic captions
Example: "A colonoscopy image of the lower GI tract, 
          showing polyp, classified as adenoma."

Step 3: ControlNet Training (3000 steps)

Combines Kvasir-SEG + HyperKvasir-SEG masks
FP16 mixed precision, DPMSolver++ scheduler
Checkpoints every 500 steps

Training Time (Colab A100):

  • Phase 1: ~2 hours
  • Phase 2: ~1.5 hours
  • ControlNet: ~4 hours
  • Total : ~7.5 hours

πŸš€ Inference & Generation

Guided Generation (4-Constraint)

from src.generation import guided_generate_single

# Generate image with all constraints
image = guided_generate_single(
    mask_pil=mask,
    prompt="Colonoscopy showing medium polyp, clean prep, no tools",
    seed=42,
    use_guidance=True,
    use_seg=True,
    use_size=True,
    use_bbps=True,
    use_tool=True,
    step_scale=0.10,
    guide_k=3  # Guide every 3 steps
)

Constraint Configuration

# Constraint targets
TARGET_SIZE_IDX = 1        # 0=small, 1=medium, 2=large
TARGET_BBPS_IDX = 2        # 0-3 BBPS score
TARGET_TOOL_IDX = 0        # 0=no tool, 1=tool present
IOU_THRESHOLD = 0.45       # Min segmentation IoU

# Guidance weights (tuned values)
LAMBDA_SEG  = 1.0
LAMBDA_SIZE = 0.6
L_BBPS      = 0.4
L_TOOL      = 0.3

Batch Generation with Filtering

# Generates 4 candidates per mask, keeps top 2
python scripts/generate_guided.py \
    --masks data/processed/kvasir_kv_manifest.csv \
    --output results/synthetic \
    --candidates 4 \
    --topk 2

πŸ”¬ Ablation Studies

We systematically ablate each constraint to measure its contribution:

Ablation Seg Size BBPS Tool Pass Rate Mean IoU
No Guidance ❌ ❌ ❌ ❌ 34.2% 0.52
Seg Only βœ… ❌ ❌ ❌ 48.1% 0.61
Seg + Size βœ… βœ… ❌ ❌ 62.3% 0.67
Seg + Size + BBPS βœ… βœ… βœ… ❌ 71.8% 0.72
Full (All 4) βœ… βœ… βœ… βœ… 82.4% 0.78

Run Ablations

# Defined in notebook final sections
ABLATIONS = [
    ("no_guidance", False, False, False, False, ...),
    ("seg_only", True, True, False, False, ...),
    ("seg+size", True, True, True, False, ...),
    ("seg+size+bbps", True, True, True, True, False, ...),
    ("seg+size+bbps+tool", True, True, True, True, True, ...)
]

# Run all ablations
run_ablation_suite()

πŸ“ˆ Downstream Evaluation

We evaluate generalization by training a segmentation model on:

  1. Real only (Kvasir-SEG)
  2. Real + Synthetic (1:1 ratio)

Tested on CVC-ClinicDB (612 images, external dataset).

Results

Training Data ClinicDB IoU Improvement
Real Only 0.712 baseline
Real + Synthetic 0.758 +6.5%

Run Downstream Evaluation

# Automated in notebook Section 8
python scripts/downstream_eval.py \
    --real-train data/kvasir_seg \
    --synthetic results/synthetic_filtered \
    --eval data/clinicdb \
    --epochs 12

Key Findings :

  • Synthetic data improves generalization to unseen domains
  • No overfitting : Training on mixed data doesn't hurt real-only performance
  • Efficiency : Achieves +6.5% IoU without collecting new real data

πŸ“Š Results

Quantitative Metrics

Generation Quality (1000 samples):

Metric Baseline Full Pipeline Ξ”
Pass Rate 34.2% 82.4% +48.2pp
Mean IoU 0.52 0.78 +0.26
Size Accuracy 41.3% 89.7% +48.4pp
BBPS Accuracy - 84.2% -
Tool Accuracy - 91.6% -

Medical Verification :

  • Specular Ratio: 2.1% (safe < 5%)
  • Edge Density: 0.18 (realistic)
  • Brightness Distribution: Normal

Visual Comparison

See results/figures/ for:

  • Input mask β†’ Generated image comparisons
  • Ablation visual examples
  • Downstream segmentation predictions

🎯 Reproducing Results

Complete Workflow (Colab)

  1. Upload Research_Project.ipynb to Google Colab
  2. Mount Google Drive (for saving checkpoints)
  3. Run all cells sequentially :
  • Data download (30 min)
  • Head training (2 hours)
  • LoRA Phase 1+2 (3.5 hours)
  • ControlNet (4 hours)
  • Generation (2 hours for 200 masks)
  • Ablations (4 hours)
  • Downstream (1 hour)

Total Time : ~17 hours on Colab Pro (A100)

Key Hyperparameters

# LoRA
LORA_RANK = 8
LORA_ALPHA = 16

# Generation
STEPS = 28
GUIDANCE_SCALE = 7.5
HEIGHT, WIDTH = 512, 512

# Latent Guidance
GUIDE_EVERY_K = 3
START_GUIDE_AT = 3
STEP_SCALE = 0.10
EMA_BETA = 0.8

# Constraints
LAMBDA_SEG = 1.0
LAMBDA_SIZE = 0.6
L_BBPS = 0.4
L_TOOL = 0.3

πŸ“ Repository Structure

.
β”œβ”€β”€ Research_Project.ipynb      # Main Colab notebook (all-in-one)
β”œβ”€β”€ requirements.txt             # Dependencies
β”œβ”€β”€ README.md                    # This file
β”‚
β”œβ”€β”€ classifiers/                 # Trained constraint heads (download)
β”‚   β”œβ”€β”€ seg_unet_resnet34.pth
β”‚   β”œβ”€β”€ size_cls_resnet18.pth
β”‚   β”œβ”€β”€ bbpsq_resnet18.pth
β”‚   └── instrument_resnet18.pth
β”‚
β”œβ”€β”€ lora_colonoscopy_phase1/     # Phase 1 LoRA weights
β”œβ”€β”€ lora_colonoscopy_phase2/     # Phase 2 LoRA weights
β”œβ”€β”€ controlnet_adapter/          # ControlNet weights
β”‚
β”œβ”€β”€ results/
β”‚   β”œβ”€β”€ synthetic/               # Generated images
β”‚   β”œβ”€β”€ synth_report.csv         # Generation metrics
β”‚
└── experiments/
    └── downstream/              # Segmentation checkpoints

πŸ”— Downloads

Pre-trained Weights

All trained models available on Google Drive:

Datasets (Preprocessed)


πŸ“ Citation

If you use this code in your research, please cite:

@inproceedings{adithya2025hallucination,
  title={Hallucination-Aware Medical Image Synthesis Using Multi-Constraint Guided Diffusion for Colonoscopy Data Augmentation},
  author={Adithya Rama and Justin Paul Kolengadan},
  booktitle={NeurIPS Workshop on Medical Imaging},
  year={2025},
  organization={Australian National University}
}

πŸ™ Acknowledgments

  • Kvasir, HyperKvasir, Nerthus dataset providers
  • Stable Diffusion, ControlNet communities
  • PyTorch, Diffusers, SMP frameworks

πŸ“§ Contact


πŸ“„ License

MIT License - see LICENSE file


Last Updated : November 2025

Status : βœ… Code tested on Colab Pro with A100 GPU

About

Hallucination-Aware Medical Image Synthesis: Multi-Constraint Guided Diffusion for Reliable Colonoscopy Data Augmentation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors