[P2P-CV @ WACV 2026] LLM Augmented Intervenable Multimodal Adaptor for Post-operative Complication Prediction in Lung Cancer Surgery
In this work, we present MIRACLE (Multi-modal Integrated Radiomics And Clinical Language-based Explanation), a unified deep learning framework that integrates:
- Structured clinical features collected during preoperative evaluations
- Chest CT derived radiological features
- LLM generated, evidence anchored textual explanations
Traditional methods use only clinical data and offer no explainability, making them black-box systems. The proposed model integrates clinical and radiological data with explanatory remarks, enabling transparency and intervention as a glass-box system.
The proposed architecture consists of three main modules:
- Two separate Bayesian MLP networks, one each for clinical and radiological features
- An encoding module using a frozen encoder, fine-tuned on medical data for textual remarks
- A fusion network for final prediction
The processing pipeline is illustrated in the paper.
conda create -n miracle python=3.9.21
conda activate miracle
pip install -r requirements.txt
We used PyTorch==2.2.2 for CUDA=12.2
The Proposed dataset, called POC-L is acquired from real lung cancer surgery patients, which went through a surgery at Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA.
Dataset statistics:
- 3094 patients that have went through Lung cancer surgery from 2009-2023, in Roswell Park Comprehensive Cancer Center
- Patients were split into training (2,694; 22.6% complications), validation (200; 47.5%) and testing (200; 53.5%) splits
- 57% of patients are female and 43% male
- Dataset is dominated with White ethnicity patients with representations from African-American and Asian populations
- All records were de-identified prior to analysis and the study was approved under IRB protocol BDR 176423
- Each case has 17 structured preoperative clinical variables and 113 standardized radiomic features
- Postoperative complications were defined across ten major complication events curated by domain experts
- the presence of any of the ten complication events was aggregated to produce a binary global outcome label indicating whether a patient experienced at least one postoperative complication
- Preprocessing is done using "preprocess_input_data.py" script, located in the dataset folder
- Continuous clinical features were normalized using Min-Max scaling fitted on the training split, while categorical variables were label encoded
- Radiomic features were standardized to ensure numerical stability
Download the zip file from Link and unzip the contents in main directory to use it in evaluation and training scripts
- MIRACLE is trained using the train.py script on the training split of POC-L and validated on the validation split
- The trained model checkpoint is evaluated on the testing split of POC-L using inference.py script
The remarks from LLM is compared against remarks given by Surgeons on testing split of POC-L. We employed three distinct types of LLMs (instruction-based, reasoning, and fine-tuned) for remark generation. Despite being given the same set, the remarks varied across the models. To quantify the relative quality of LLM explanations, we carried out a two-stage evaluation:
- Automated Adjudication: The comparison is done using LLM as a judge. It can be seen that most of the remarks generated from LLM are completely aligned to remarks given by LLM.
- Expert Manual Review: A panel of thoracic surgery specialists inspected paired surgeon and LLM-generated remarks for a representative subset of test cases. They labeled each LLM explanation as:
- Performs better
- Performs comparably
- Performs worse
| Model | AUC(%) | TAR(%)@FAR=0.2 | TAR(%)@FAR=0.3 |
|---|---|---|---|
| Llama 3.3 70B-Instruct | 69.68 | 41.12 | 74.77 |
| DeepSeek R1-Distill Qwen-32B | 64.49 | 54.21 | 56.07 |
| OpenBioLLM-70B | 71.01 | 52.34 | 60.75 |
| Multivariate logistic regression | 80.89 | 73.83 | 80.37 |
| Random Forest Classifier | 77.00 | 62.62 | 74.76 |
| XGBoost | 75.17 | 53.27 | 64.48 |
| Gradient Boosting Classifier | 78.53 | 65.42 | 67.29 |
| LightGBM | 74.77 | 46.73 | 69.16 |
| Surgeons | ‐ | 44.86 | ‐ |
| MIRACLE (DeepSeek R1 distill) | 80.94 | 73.83 | 81.31 |
| MIRACLE (Llama 3.3 70B-Instruct) | 80.84 | 71.03 | 81.31 |
| MIRACLE (OpenBioLLM-70B) | 81.04 | 71.96 | 81.31 |
| Clinical | Radiological | LLM Remarks module | AUC(%) | TAR(%)@FAR=0.2 | TAR(%)@FAR=0.3 |
|---|---|---|---|---|---|
| ✓ | ‐ | ‐ | 74.81 | 57.94 | 66.35 |
| ✓ | ✓ | ‐ | 78.64 | 64.48 | 76.64 |
| ✓ | ✓ | ✓ | 80.94 | 73.83 | 81.31 |
For more information or any questions, feel free to reach us at spandey8@buffalo.edu
MIRACLE is CC-BY-NC 4.0 licensed, as found in the LICENSE file. It is released for academic research / non-commercial use only.
If you use our method or refer our study in your research, we request you to please cite our work as:
@InProceedings{Pandey_2026_WACV,
author = {Pandey, Shubham and Jawade, Bhavin and Setlur, Srirangaraj and Govindaraju, Venu and Seastedt, Kenneth},
title = {LLM Augmented Intervenable Multimodal Adaptor for Post-operative Complication Prediction in Lung Cancer Surgery},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},
month = {March},
year = {2026},
pages = {434-443}
}