Official implementation of "Hallucination-Aware Medical Image Synthesis Using Multi-Constraint Guided Diffusion for Colonoscopy Data Augmentation"
Authors: Adithya Rama, Justin Paul Kolengadan
Affiliation: Australian National University
Course: COMP8539/ENGN8501 - Advanced Topics
- Overview
- Key Features
- Architecture
- Installation
- Dataset Setup
- Training Pipeline
- Inference & Generation
- Ablation Studies
- Downstream Evaluation
- Results
This repository implements a 4-constraint guided diffusion model for synthesizing realistic colonoscopy images with controllable polyp characteristics. Our approach addresses hallucination in medical imaging through:
- Multi-stage LoRA fine-tuning (SD 1.5) for domain adaptation
- ControlNet for spatial mask conditioning
- 4 constraint heads : Segmentation, Size, BBPS Quality, Instrument Detection
- Latent-space guidance during diffusion sampling
- Comprehensive medical verification suite
Key Innovation : We guide the diffusion process using gradients from 4 independent classifiers, ensuring generated images satisfy medical constraints (polyp size, bowel preparation quality, instrument presence, spatial accuracy).
- Polyp Segmentation (U-Net ResNet34): Binary mask IoU β₯ 0.45
- Size Classification (ResNet18): Small/Medium/Large polyp categorization
- BBPS Quality (ResNet18): 4-class bowel preparation scoring (0-3)
- Instrument Detection (ResNet18): Binary tool presence classification
- Phase 1 : Masked domain adaptation on Kvasir-SEG (2000 steps)
- Phase 2 : Rich prompt conditioning on HyperKvasir (1500 steps)
- ControlNet : Mask-conditioned spatial control (3000 steps)
- Latent Guidance : Every-k-step gradient descent in latent space
- Automated filtering : Keep only outputs passing all 4 constraints
- Medical verification : Specular highlights, brightness, edge density checks
- Ablation support : Toggle individual constraints to measure contribution
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stable Diffusion 1.5 Backbone β
β + LoRA (r=8, Phase 1 β Phase 2) β
β + ControlNet (Mask Conditioning) β
ββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββ
β Latent Guidanceβ (Every 3 steps)
β L = Ξ»βΒ·L_seg + β
β Ξ»βΒ·L_size +β
β Ξ»βΒ·L_BBPS +β
β Ξ»βΒ·L_tool β
ββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββ
β 4 Constraint Heads β
ββββββββββββββββββββββββββββ€
β 1. Seg (U-Net ResNet34) β β IoU β₯ 0.45
β 2. Size (ResNet18) β β Match target
β 3. BBPS (ResNet18) β β Score 0-3
β 4. Instrument (ResNet18) β β Present/Absent
ββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββ
β Medical Checks β
β - Specular β
β - Brightness β
β - Edge Density β
ββββββββββββββββββ
- Python 3.8+
- CUDA 11.8+ (for GPU)
- 24GB GPU RAM recommended (Colab Pro with A100/L4)
# Clone repository
git clone https://github.com/yourusername/hallucination-aware-medical-synthesis.git
cd hallucination-aware-medical-synthesis
# Install dependencies
pip install -r requirements.txt
# Or use Colab with our provided notebook
# (Upload Research_Project.ipynb to Google Colab)torch>=2.0.0
diffusers==0.30.1
transformers==4.35.2
peft==0.8.2
segmentation-models-pytorch
timm
albumentations
Our pipeline uses 4 public colonoscopy datasets :
| Dataset | Purpose | Size | Download |
|---|---|---|---|
| Kvasir-SEG | Polyp segmentation, Phase 1 training | 1,000 images | Kaggle |
| HyperKvasir | Rich captions, Phase 2 training | 10k+ labeled | Simula |
| Kvasir-Instrument | Tool detection training | 590 frames | Kaggle |
| Nerthus | BBPS quality training | Video frames | Simula |
# Set up Kaggle credentials
!mkdir -p ~/.kaggle
!cp /path/to/kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
# Download datasets (automated in notebook)
!kaggle datasets download -d fkarimovv/kvasir-seg
!kaggle datasets download -d debeshjha1/kvasirinstrument
# ... (see notebook for full commands)data/
βββ kvasir_seg/
β βββ images/ # 1000 colonoscopy images
β βββ masks/ # Binary polyp masks
βββ hyper_kvasir/
β βββ labeled_images/
β β βββ image-labels.csv
β βββ segmented_images/
β βββ images/
β βββ masks/
βββ kvasir_instrument/
β βββ images/ # Frames with tools
βββ nerthus_videos/
βββ nerthus-dataset-frames/ # BBPS-scored frames
All 4 heads trained in notebook sections:
- U-Net Segmenter (10 epochs, Dice+BCE loss)
- Size Classifier (10 epochs, CrossEntropy)
- Instrument Classifier (8 epochs, weighted sampling)
- BBPS Classifier (8 epochs, 4-class)Expected Performance :
- Segmentation: Val IoU ~0.85
- Size: Val Accuracy ~92%
- Instrument: Val F1 ~0.95
- BBPS: Val F1 ~0.87
Phase 1: Masked Domain Adaptation (2000 steps)
Trains on Kvasir-SEG with binary masks applied
Loss: MSE between predicted and actual noise
LoRA rank: 8, alpha: 16Phase 2: Rich Prompt Conditioning (1500 steps)
Trains on HyperKvasir with semantic captions
Example: "A colonoscopy image of the lower GI tract,
showing polyp, classified as adenoma."Combines Kvasir-SEG + HyperKvasir-SEG masks
FP16 mixed precision, DPMSolver++ scheduler
Checkpoints every 500 stepsTraining Time (Colab A100):
- Phase 1: ~2 hours
- Phase 2: ~1.5 hours
- ControlNet: ~4 hours
- Total : ~7.5 hours
from src.generation import guided_generate_single
# Generate image with all constraints
image = guided_generate_single(
mask_pil=mask,
prompt="Colonoscopy showing medium polyp, clean prep, no tools",
seed=42,
use_guidance=True,
use_seg=True,
use_size=True,
use_bbps=True,
use_tool=True,
step_scale=0.10,
guide_k=3 # Guide every 3 steps
)# Constraint targets
TARGET_SIZE_IDX = 1 # 0=small, 1=medium, 2=large
TARGET_BBPS_IDX = 2 # 0-3 BBPS score
TARGET_TOOL_IDX = 0 # 0=no tool, 1=tool present
IOU_THRESHOLD = 0.45 # Min segmentation IoU
# Guidance weights (tuned values)
LAMBDA_SEG = 1.0
LAMBDA_SIZE = 0.6
L_BBPS = 0.4
L_TOOL = 0.3# Generates 4 candidates per mask, keeps top 2
python scripts/generate_guided.py \
--masks data/processed/kvasir_kv_manifest.csv \
--output results/synthetic \
--candidates 4 \
--topk 2We systematically ablate each constraint to measure its contribution:
| Ablation | Seg | Size | BBPS | Tool | Pass Rate | Mean IoU |
|---|---|---|---|---|---|---|
| No Guidance | β | β | β | β | 34.2% | 0.52 |
| Seg Only | β | β | β | β | 48.1% | 0.61 |
| Seg + Size | β | β | β | β | 62.3% | 0.67 |
| Seg + Size + BBPS | β | β | β | β | 71.8% | 0.72 |
| Full (All 4) | β | β | β | β | 82.4% | 0.78 |
# Defined in notebook final sections
ABLATIONS = [
("no_guidance", False, False, False, False, ...),
("seg_only", True, True, False, False, ...),
("seg+size", True, True, True, False, ...),
("seg+size+bbps", True, True, True, True, False, ...),
("seg+size+bbps+tool", True, True, True, True, True, ...)
]
# Run all ablations
run_ablation_suite()We evaluate generalization by training a segmentation model on:
- Real only (Kvasir-SEG)
- Real + Synthetic (1:1 ratio)
Tested on CVC-ClinicDB (612 images, external dataset).
| Training Data | ClinicDB IoU | Improvement |
|---|---|---|
| Real Only | 0.712 | baseline |
| Real + Synthetic | 0.758 | +6.5% |
# Automated in notebook Section 8
python scripts/downstream_eval.py \
--real-train data/kvasir_seg \
--synthetic results/synthetic_filtered \
--eval data/clinicdb \
--epochs 12Key Findings :
- Synthetic data improves generalization to unseen domains
- No overfitting : Training on mixed data doesn't hurt real-only performance
- Efficiency : Achieves +6.5% IoU without collecting new real data
Generation Quality (1000 samples):
| Metric | Baseline | Full Pipeline | Ξ |
|---|---|---|---|
| Pass Rate | 34.2% | 82.4% | +48.2pp |
| Mean IoU | 0.52 | 0.78 | +0.26 |
| Size Accuracy | 41.3% | 89.7% | +48.4pp |
| BBPS Accuracy | - | 84.2% | - |
| Tool Accuracy | - | 91.6% | - |
Medical Verification :
- Specular Ratio: 2.1% (safe < 5%)
- Edge Density: 0.18 (realistic)
- Brightness Distribution: Normal
See results/figures/ for:
- Input mask β Generated image comparisons
- Ablation visual examples
- Downstream segmentation predictions
- Upload
Research_Project.ipynbto Google Colab - Mount Google Drive (for saving checkpoints)
- Run all cells sequentially :
- Data download (30 min)
- Head training (2 hours)
- LoRA Phase 1+2 (3.5 hours)
- ControlNet (4 hours)
- Generation (2 hours for 200 masks)
- Ablations (4 hours)
- Downstream (1 hour)
Total Time : ~17 hours on Colab Pro (A100)
# LoRA
LORA_RANK = 8
LORA_ALPHA = 16
# Generation
STEPS = 28
GUIDANCE_SCALE = 7.5
HEIGHT, WIDTH = 512, 512
# Latent Guidance
GUIDE_EVERY_K = 3
START_GUIDE_AT = 3
STEP_SCALE = 0.10
EMA_BETA = 0.8
# Constraints
LAMBDA_SEG = 1.0
LAMBDA_SIZE = 0.6
L_BBPS = 0.4
L_TOOL = 0.3.
βββ Research_Project.ipynb # Main Colab notebook (all-in-one)
βββ requirements.txt # Dependencies
βββ README.md # This file
β
βββ classifiers/ # Trained constraint heads (download)
β βββ seg_unet_resnet34.pth
β βββ size_cls_resnet18.pth
β βββ bbpsq_resnet18.pth
β βββ instrument_resnet18.pth
β
βββ lora_colonoscopy_phase1/ # Phase 1 LoRA weights
βββ lora_colonoscopy_phase2/ # Phase 2 LoRA weights
βββ controlnet_adapter/ # ControlNet weights
β
βββ results/
β βββ synthetic/ # Generated images
β βββ synth_report.csv # Generation metrics
β
βββ experiments/
βββ downstream/ # Segmentation checkpoints
All trained models available on Google Drive:
- Kvasir-SEG Manifest (CSV): Download
- CVC-ClinicDB (612 images): Official Site
If you use this code in your research, please cite:
@inproceedings{adithya2025hallucination,
title={Hallucination-Aware Medical Image Synthesis Using Multi-Constraint Guided Diffusion for Colonoscopy Data Augmentation},
author={Adithya Rama and Justin Paul Kolengadan},
booktitle={NeurIPS Workshop on Medical Imaging},
year={2025},
organization={Australian National University}
}- Kvasir, HyperKvasir, Nerthus dataset providers
- Stable Diffusion, ControlNet communities
- PyTorch, Diffusers, SMP frameworks
- Primary : adithya.rama@anu.edu.au
- Issues : GitHub Issues
MIT License - see LICENSE file
Last Updated : November 2025
Status : β Code tested on Colab Pro with A100 GPU