Skip to content

ramon349/mammo_parse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

repo_parse

A Python package for extracting breast tissue removal data (weight/units) from surgical reports using a multi-stage pipeline.

Pipeline Architecture

  1. Regex & Sectionizer: Splits the report into clinical sections and identifies candidate mentions of weights with their immediate context.
  2. LLM Disambiguation: Uses an LLM (via Hugging Face) and a Jinja2-rendered prompt to select the correct weights for the left and right breasts from the candidates.
  3. Auditor Pass: Programmatically verifies that the numerical value the LLM claims to have found actually exists within its provided evidence string.
  4. Evaluation: Measures accuracy, Mean Absolute Error (MAE), and audit failure rates against ground truth data.

Setup

python -m venv .venv
source .venv/bin/activate
pip install -e .[dev]

CLI Usage

The package can be run directly from the command line:

Process a CSV file containing reports

python -m repo_parse csv=reports.csv column="Surgical Note"

Process a CSV and save results

python -m repo_parse csv=reports.csv output=results.csv

Run evaluation against ground truth

python -m repo_parse eval=true samples=data/samples gt=data/ground_truth.csv

Development

  • Run tests: pytest
  • Linting: make lint
  • Formatting: make format

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors