Skip to content

UMEssen/Radbill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAD-Bill: Automated Billing Code Predictions from Radiology Reports

License: MIT Supported Python version

Pyrate-Banner

This project uses Large Language Models to predict appropriate codes from the German fee schedule for physicians (Gebührenordnung für Ärzte, GOÄ) based on radiology findings. The LLMs can either be fine-tuned using the training scripts, or zero-shotted and few-shotted for out-of-the-box predictions.

🎯 Overview

RAD-Bill addresses the challenge of medical billing in radiology by automatically identifying and predicting relevant GOÄ codes from German radiology reports. The system uses fine-tuned LLMs (e.g., Mistral) trained on radiology findings to predict comma-separated billing codes.

Key Features

  • Fine-tuning Pipeline: Train LLMs with the TRL library
  • Multi-label Evaluation: Comprehensive metrics including precision, recall, F1-score, and Jaccard similarity
  • Bootstrap Confidence Intervals: Statistical evaluation with confidence intervals
  • OpenAI Integration: Evaluation support for OpenAI models
  • Weights & Biases Integration: Experiment tracking and logging

🚀 Installation

Prerequisites

  • Python >= 3.13
  • CUDA-compatible GPU (recommended)

Setup

  1. Install dependencies using uv or pip:
# Using uv (recommended)
uv sync

# Or using pip
pip install -e .
  1. Create a .env file with the required environment variables:
# Model Configuration
MODEL=<path-to-base-model>
INPUT_LENGTH=<max-input-tokens>
TARGET_LENGTH=<max-target-tokens>

# Directory Configuration
OUTPUT_DIR=<path-to-output-directory>
DATASET_DIR=<path-to-dataset>
EVAL_DIR=<path-to-evaluation-output>

# Training Hyperparameters
TRAIN_TEST_SPLIT=0.8
TRAIN_BATCH_SIZE=4
EVAL_BATCH_SIZE=4
NUM_TRAIN_EPOCHS=3
GRADIENT_CHECKPOINTING=true
GRADIENT_ACCUMULATION_STEPS=4
LEARNING_RATE=2e-5

# API Keys
WANDB_API_KEY=<your-wandb-api-key>
OPENAI_API_KEY=<your-openai-api-key>  # Optional, for OpenAI evaluation

🏋️ Training

The training pipeline uses the tlr library for fine-tuning of large language models.

Running Training

python training/training.py

Training Configuration

The training script supports the following configurations:

  • Model: Any Hugging Face causal language model (tested with Mistral)
  • Data Splitting: Group-based train/test split by patient ID
  • Logging: Weights & Biases integration for experiment tracking

📊 Evaluation

Evaluating Fine-tuned Models

python evaluation/sample/evaluate_finetuned_models.py

Calculating Metrics

python evaluation/calculate_scores.py

Metrics

The evaluation framework computes:

  • Micro/Macro Precision, Recall, F1-Score
  • Subset Accuracy: Exact match accuracy
  • Jaccard Score: Set similarity metric
  • Hamming Loss: Fraction of incorrect labels
  • Bootstrap Confidence Intervals: Statistical significance

🔧 Dependencies

Package Version
transformers >=4.57.3
torch >=2.9.1
peft >=0.18.0
trl >=0.26.2
datasets >=4.4.2
scikit-learn >=1.8.0
pandas >=2.3.3
wandb >=0.23.1
openai >=2.14.0
python-dotenv >=1.2.1

📄 License

This project is licenced under the MIT Licence.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📧 Contact

For questions or collaboration inquiries, please open an issue in this repository.

About

A python project for training and evaluating a Large Language Model on predicting billing codes from radiology reports

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages