[P2P-CV @ WACV 2026] LLM Augmented Intervenable Multimodal Adaptor for Post-operative Complication Prediction in Lung Cancer Surgery

Code for paper accepted in P2P-CV workshop @ WACV 2026

Paper Overview

In this work, we present MIRACLE (Multi-modal Integrated Radiomics And Clinical Language-based Explanation), a unified deep learning framework that integrates:

Structured clinical features collected during preoperative evaluations
Chest CT derived radiological features
LLM generated, evidence anchored textual explanations

Traditional methods use only clinical data and offer no explainability, making them black-box systems. The proposed model integrates clinical and radiological data with explanatory remarks, enabling transparency and intervention as a glass-box system.

Proposed Architecture

The proposed architecture consists of three main modules:

Two separate Bayesian MLP networks, one each for clinical and radiological features
An encoding module using a frozen encoder, fine-tuned on medical data for textual remarks
A fusion network for final prediction

The processing pipeline is illustrated in the paper.

Installations and environment creation

conda create -n miracle python=3.9.21
conda activate miracle
pip install -r requirements.txt

We used PyTorch==2.2.2 for CUDA=12.2

Preparing data and pretrained checkpoints

Datasets used in training, validation and testing

The Proposed dataset, called POC-L is acquired from real lung cancer surgery patients, which went through a surgery at Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA.

Dataset statistics:

3094 patients that have went through Lung cancer surgery from 2009-2023, in Roswell Park Comprehensive Cancer Center
Patients were split into training (2,694; 22.6% complications), validation (200; 47.5%) and testing (200; 53.5%) splits
57% of patients are female and 43% male
Dataset is dominated with White ethnicity patients with representations from African-American and Asian populations
All records were de-identified prior to analysis and the study was approved under IRB protocol BDR 176423
Each case has 17 structured preoperative clinical variables and 113 standardized radiomic features
Postoperative complications were defined across ten major complication events curated by domain experts
the presence of any of the ten complication events was aggregated to produce a binary global outcome label indicating whether a patient experienced at least one postoperative complication

Preprocessing the data

Preprocessing is done using "preprocess_input_data.py" script, located in the dataset folder
Continuous clinical features were normalized using Min-Max scaling fitted on the training split, while categorical variables were label encoded
Radiomic features were standardized to ensure numerical stability

Pretrained models and Finetuned checkpoints

Download the zip file from Link and unzip the contents in main directory to use it in evaluation and training scripts

Training

MIRACLE is trained using the train.py script on the training split of POC-L and validated on the validation split

Testing and Evaluation

The trained model checkpoint is evaluated on the testing split of POC-L using inference.py script

Qualitative Analysis of the LLM generated remarks

The remarks from LLM is compared against remarks given by Surgeons on testing split of POC-L. We employed three distinct types of LLMs (instruction-based, reasoning, and fine-tuned) for remark generation. Despite being given the same set, the remarks varied across the models. To quantify the relative quality of LLM explanations, we carried out a two-stage evaluation:

Automated Adjudication: The comparison is done using LLM as a judge. It can be seen that most of the remarks generated from LLM are completely aligned to remarks given by LLM.

Expert Manual Review: A panel of thoracic surgery specialists inspected paired surgeon and LLM-generated remarks for a representative subset of test cases. They labeled each LLM explanation as:

Performs better
Performs comparably
Performs worse

Quantitative Results

Performance across different models

Model	AUC(%)	TAR(%)@FAR=0.2	TAR(%)@FAR=0.3
Llama 3.3 70B-Instruct	69.68	41.12	74.77
DeepSeek R1-Distill Qwen-32B	64.49	54.21	56.07
OpenBioLLM-70B	71.01	52.34	60.75
Multivariate logistic regression	80.89	73.83	80.37
Random Forest Classifier	77.00	62.62	74.76
XGBoost	75.17	53.27	64.48
Gradient Boosting Classifier	78.53	65.42	67.29
LightGBM	74.77	46.73	69.16
Surgeons	‐	44.86	‐
MIRACLE (DeepSeek R1 distill)	80.94	73.83	81.31
MIRACLE (Llama 3.3 70B-Instruct)	80.84	71.03	81.31
MIRACLE (OpenBioLLM-70B)	81.04	71.96	81.31

ROC for all the models

Confusion Matrix for surgeons' performance

Ablation Study (to analyze the contribution of each module)

Clinical	Radiological	LLM Remarks module	AUC(%)	TAR(%)@FAR=0.2	TAR(%)@FAR=0.3
✓	‐	‐	74.81	57.94	66.35
✓	✓	‐	78.64	64.48	76.64
✓	✓	✓	80.94	73.83	81.31

Contact

For more information or any questions, feel free to reach us at spandey8@buffalo.edu

License

MIRACLE is CC-BY-NC 4.0 licensed, as found in the LICENSE file. It is released for academic research / non-commercial use only.

Citation

If you use our method or refer our study in your research, we request you to please cite our work as:

@InProceedings{Pandey_2026_WACV,
    author    = {Pandey, Shubham and Jawade, Bhavin and Setlur, Srirangaraj and Govindaraju, Venu and Seastedt, Kenneth},
    title     = {LLM Augmented Intervenable Multimodal Adaptor for Post-operative Complication Prediction in Lung Cancer Surgery},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},
    month     = {March},
    year      = {2026},
    pages     = {434-443}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
LLM_remark_generation_scripts		LLM_remark_generation_scripts
ROC_matrix		ROC_matrix
benchmarking		benchmarking
checkpoints		checkpoints
dataset		dataset
experiment_logs		experiment_logs
human_vs_llm		human_vs_llm
LICENSE		LICENSE
README.md		README.md
dataloader.py		dataloader.py
inference.py		inference.py
loss.py		loss.py
model.py		model.py
plot_confusion_matrix.py		plot_confusion_matrix.py
plot_roc.py		plot_roc.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches