repo_parse

A Python package for extracting breast tissue removal data (weight/units) from surgical reports using a multi-stage pipeline.

Pipeline Architecture

Regex & Sectionizer: Splits the report into clinical sections and identifies candidate mentions of weights with their immediate context.
LLM Disambiguation: Uses an LLM (via Hugging Face) and a Jinja2-rendered prompt to select the correct weights for the left and right breasts from the candidates.
Auditor Pass: Programmatically verifies that the numerical value the LLM claims to have found actually exists within its provided evidence string.
Evaluation: Measures accuracy, Mean Absolute Error (MAE), and audit failure rates against ground truth data.

Setup

python -m venv .venv
source .venv/bin/activate
pip install -e .[dev]

CLI Usage

The package can be run directly from the command line:

Process a CSV file containing reports

python -m repo_parse csv=reports.csv column="Surgical Note"

Process a CSV and save results

python -m repo_parse csv=reports.csv output=results.csv

Run evaluation against ground truth

python -m repo_parse eval=true samples=data/samples gt=data/ground_truth.csv

Development

Run tests: pytest
Linting: make lint
Formatting: make format

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.gemini		.gemini
data		data
src/repo_parse		src/repo_parse
tests		tests
.flake8		.flake8
.gitignore		.gitignore
EXTENDED_PLAN.md		EXTENDED_PLAN.md
Makefile		Makefile
README.md		README.md
plan.md		plan.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

repo_parse

Pipeline Architecture

Setup

CLI Usage

Process a CSV file containing reports

Process a CSV and save results

Run evaluation against ground truth

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

repo_parse

Pipeline Architecture

Setup

CLI Usage

Process a CSV file containing reports

Process a CSV and save results

Run evaluation against ground truth

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages