Official implementation of the paper "Improving Uncertainty Estimation with Confidence-Aware Training Data" by Korchagin S. et al. The work was presented at the 2025 Winter Conference on Applications of Computer Vision.
AI-driven second-opinion systems play a crucial role in decision-making especially in medicine where accurate predictions guide clinicians. However quantifying uncertainty in deep learning is challenging as current methods often rely on hard class labels which do not reflect true prediction confidence. This often results in overconfident predictions and slow convergence to true probabilities. To address this we suggest a new method that separates uncertainty into two types: epistemic and aleatoric. We estimate these uncertainties using hard and soft confidence labels with experts providing confidence levels that indicate the likelihood of misclassification. We release an updated blood typing dataset consisting of 3139 images with soft labels of uncertainty annotations from six experts and hard labels collected from medical records. Proposed approach improves SotA uncertainty estimation quality by two times for blood typing (classification) and by 62% for histology (segmentation).
The code was run on Python 3.10. To install all necessary dependencies, run
pip install -r requirements.txt
The code is split in three parts:
- All code and instructions to validate experiments for the blood typing task (classification) are located in the
uncertaint_classificationdirectory. - All code and instructions to validate experiments for the lung CT scan segmentation and retinal fundus image segmentation tasks are located in the
segmentationdirectory. - All code and instructions to validate experiments on synthetic data will be made available soon.
nndirectory contains auxiliary files related to the classification task.
The blood typing BloodyWell dataset is available here. The markup/BloodyWell directory stores additional metadata that was used for training and testing the models.
The LIDC-IDRI and RIGA datasets are available on the internet. TODO: add proper links.
For a an input image
The first term represents epistemic uncertainty and the second term represents aleatoric uncertainty. Both terms can be approximated with an ensemble of
where
The aleatoric component can suffer from modern neural networks being overconfident in their predictions.
We propose leveraging information from multiple experts that is often available with medical data to provide soft labels
We test our approach first on synthetic data and then on three real-life tasks: blood typing (classification), lung CT scan segmentation (binary segmentation) and retinal fundus image segmentation (multi-class segmentation). We compare our results to various methods of uncertainty estimation.
Specific details on how we train models and produce soft labels, as well as evaluation metrics, can be found in the paper. Below is the summary of our results.
Using soft labels allows to achieve better values of Mean Absolute Error (MAE) and Expected Calibration Error (ECE). Additionally, we show that using a mixture of hard and soft labels during training produces better MAE values compared to just hard labels.
Usage of CAEs to estimate aleatoric uncertainty significantly reduces Area Above accuracy-rejection Curve (AAC) as well as Throwaway Rate required to achieve Accuracy above 99% (TRA-99).
Our approach improves AAC by over five times compared to only using a basic ensemble on the RIGA segmentation task. Throwaway Rate required to achieve Dice of
However, results on the LIDC-IDRI segmentation task show reduced performance compared to the basic approach. We hypothesize that this is due to low agreement between experts, providing high noise levels to soft labels.
If you find our work useful, please give repository a star and cite our paper.
@InProceedings{Korchagin_2025_WACV,
author = {Korchagin, Sergey and Zaychenkova, Ekaterina and Khalin, Aleksei and Yugay, Aleksandr and Zaytsev, Alexey and Ershov, Egor},
title = {Improving Uncertainty Estimation with Confidence-Aware Training Data},
booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
month = {February},
year = {2025},
pages = {7980-7990}
}




