Skip to content

KNITPhoenix/MIRACLE

Repository files navigation

Code for paper accepted in P2P-CV workshop @ WACV 2026

Paper Overview

intro_diag

In this work, we present MIRACLE (Multi-modal Integrated Radiomics And Clinical Language-based Explanation), a unified deep learning framework that integrates:

  • Structured clinical features collected during preoperative evaluations
  • Chest CT derived radiological features
  • LLM generated, evidence anchored textual explanations

Traditional methods use only clinical data and offer no explainability, making them black-box systems. The proposed model integrates clinical and radiological data with explanatory remarks, enabling transparency and intervention as a glass-box system.

Proposed Architecture

architecture

The proposed architecture consists of three main modules:

  • Two separate Bayesian MLP networks, one each for clinical and radiological features
  • An encoding module using a frozen encoder, fine-tuned on medical data for textual remarks
  • A fusion network for final prediction

The processing pipeline is illustrated in the paper.

Installations and environment creation

conda create -n miracle python=3.9.21
conda activate miracle
pip install -r requirements.txt

We used PyTorch==2.2.2 for CUDA=12.2

Preparing data and pretrained checkpoints

Datasets used in training, validation and testing

The Proposed dataset, called POC-L is acquired from real lung cancer surgery patients, which went through a surgery at Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA.

Dataset statistics:

  • 3094 patients that have went through Lung cancer surgery from 2009-2023, in Roswell Park Comprehensive Cancer Center
  • Patients were split into training (2,694; 22.6% complications), validation (200; 47.5%) and testing (200; 53.5%) splits
  • 57% of patients are female and 43% male
  • Dataset is dominated with White ethnicity patients with representations from African-American and Asian populations
  • All records were de-identified prior to analysis and the study was approved under IRB protocol BDR 176423
  • Each case has 17 structured preoperative clinical variables and 113 standardized radiomic features
  • Postoperative complications were defined across ten major complication events curated by domain experts
  • the presence of any of the ten complication events was aggregated to produce a binary global outcome label indicating whether a patient experienced at least one postoperative complication

Preprocessing the data

  • Preprocessing is done using "preprocess_input_data.py" script, located in the dataset folder
  • Continuous clinical features were normalized using Min-Max scaling fitted on the training split, while categorical variables were label encoded
  • Radiomic features were standardized to ensure numerical stability

Pretrained models and Finetuned checkpoints

Download the zip file from Link and unzip the contents in main directory to use it in evaluation and training scripts

Training

  • MIRACLE is trained using the train.py script on the training split of POC-L and validated on the validation split

Testing and Evaluation

  • The trained model checkpoint is evaluated on the testing split of POC-L using inference.py script

Qualitative Analysis of the LLM generated remarks

The remarks from LLM is compared against remarks given by Surgeons on testing split of POC-L. We employed three distinct types of LLMs (instruction-based, reasoning, and fine-tuned) for remark generation. Despite being given the same set, the remarks varied across the models. To quantify the relative quality of LLM explanations, we carried out a two-stage evaluation:

  1. Automated Adjudication: The comparison is done using LLM as a judge. It can be seen that most of the remarks generated from LLM are completely aligned to remarks given by LLM.
remark_alignment
  1. Expert Manual Review: A panel of thoracic surgery specialists inspected paired surgeon and LLM-generated remarks for a representative subset of test cases. They labeled each LLM explanation as:
  • Performs better
  • Performs comparably
  • Performs worse
examples_final

Quantitative Results

Performance across different models

Model AUC(%) TAR(%)@FAR=0.2 TAR(%)@FAR=0.3
Llama 3.3 70B-Instruct 69.68 41.12 74.77
DeepSeek R1-Distill Qwen-32B 64.49 54.21 56.07
OpenBioLLM-70B 71.01 52.34 60.75
Multivariate logistic regression 80.89 73.83 80.37
Random Forest Classifier 77.00 62.62 74.76
XGBoost 75.17 53.27 64.48
Gradient Boosting Classifier 78.53 65.42 67.29
LightGBM 74.77 46.73 69.16
Surgeons 44.86
MIRACLE (DeepSeek R1 distill) 80.94 73.83 81.31
MIRACLE (Llama 3.3 70B-Instruct) 80.84 71.03 81.31
MIRACLE (OpenBioLLM-70B) 81.04 71.96 81.31

ROC for all the models

combined_ROC

Confusion Matrix for surgeons' performance

Confusion_matrix

Ablation Study (to analyze the contribution of each module)

Clinical Radiological LLM Remarks module AUC(%) TAR(%)@FAR=0.2 TAR(%)@FAR=0.3
74.81 57.94 66.35
78.64 64.48 76.64
80.94 73.83 81.31

Contact

For more information or any questions, feel free to reach us at spandey8@buffalo.edu

License

MIRACLE is CC-BY-NC 4.0 licensed, as found in the LICENSE file. It is released for academic research / non-commercial use only.

Citation

If you use our method or refer our study in your research, we request you to please cite our work as:

@InProceedings{Pandey_2026_WACV,
    author    = {Pandey, Shubham and Jawade, Bhavin and Setlur, Srirangaraj and Govindaraju, Venu and Seastedt, Kenneth},
    title     = {LLM Augmented Intervenable Multimodal Adaptor for Post-operative Complication Prediction in Lung Cancer Surgery},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},
    month     = {March},
    year      = {2026},
    pages     = {434-443}
}

About

Codebase for paper accepted in P2P-CV @ WACV 2026

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages