Analysis of racial disparities in Emergency Severity Index (ESI) triage decisions using propensity score matching and high-risk symptom detection.
This repository analyzes emergency department triage data and computationally implements the ESI algorithm to identify potential racial disparities in ESI triage assignments. The analysis combines:
- High-risk symptom detection from patient complaints
- Danger zone vital signs identification
- Propensity score matching to control for confounding variables
- Statistical analysis of odds ratios across racial groups
ESI/
├── binarization_code/ # Data binarization scripts per center
├── src/ # Core analysis modules
│ ├── high_risk_dictionary.py # High-risk symptom detection functions
│ ├── vital_signs.py # Danger zone vital signs analysis
│ └── propensity_score_matching.py # PSM analysis, odds ratio calculations, and plotting
├── center_configs.json # Hospital-specific configuration variables
├── notebooks/ # Jupyter notebooks for analysis and visualization
├── main.py # Main analysis pipeline
├── plot.py # Forest plot generation
├── requirements.txt # Python dependencies
└── README.md
# Clone the repository
git clone https://github.com/cavalab/ESI.git
cd ESI
# if desired, make an environment with Python 3.11 in it using conda or mamba
mamba env create
# Create virtual environment with Python 3.11
python3.11 -m venv venv
# Activate virtual environment
source venv/bin/activate
# Install dependencies
pip install -r requirements.txttl;dr, after following the preprocessed data extraction setup:
bash binarize.sh # binarize covariates from preprocessed data
bash run_analysis.sh results/ # run analysis, saving to results/If you need to generate these files from raw data, follow this two-step process:
Step 1: Raw Data → Preprocessed Data
Use the scripts in ed-preprocessing/ for center-specific preprocessing.
This script results in preprocessed datasets, e.g.:
preprocessed_BIDMC.csv: Derived from the publicly available MIMIC-IV-ED dataset from Beth Israel Deaconess (Adult East)preprocessed_Stanford.csv: Derived from the publicly available MC-MED dataset from Stanford Hospital (Adult West)
Step 2: Preprocessed Data → Binarized Covariates
Run the appropriate binarization script for each center:
python binarization-CHLA.py
python binarization-BIDMC.py
python binarization-Stanford.py
python binarization-BCH.py This creates data/preprocessed_{center}.csv files which are used in the main analysis.
See run_analysis.sh, which computes the OR comparisons as follows:
python main.py \
--path_base ${data_base_directory} \
--mode flagged_vs_unflagged \
--center ${center} \
--save_dir ${results_directory}notebooks/forest_plot.ipynb visualizes these results.
The analysis supports four hospital centers, two of which are publicly available:
- CHLA: Children's Hospital Los Angeles
- BIDMC: Beth Israel Deaconess Medical Center
- Stanford: Stanford Hospital
- BCH: Boston Children's Hospital
- flagged_vs_unflagged: Compares HB level 2, HB level 3, and HB level 2+3
- all_combinations: Compares HB level 2, HB2: danger zone vitals, HB2: high risk symptoms, HB level 3
Hospital-specific variables are defined in center_configs.json:
{
"CHLA": {
"triage_col": "esi_acuity",
"complaint_col": "chief_complaint",
"race_predictor": "race_",
"race_names": ["White", "Black", "Hispanic", "Asian"],
"race_order": ["White", "Black", "Hispanic", "Asian"],
"covariate_prefixes": ["age", "gender", "insurance"]
}
}Output is written to {save_dir}/{center}/{mode}/:
complaint_with_mask_and_vitals_{center}.csv: Acuity data with high-risk flagsodds_ratios.csv: Odds ratios and confidence intervalssignificance.csv: Statistical significance results
This code is used to produce the analysis in the following preprint:
Romero Mila, B., Coggan, H., Fine, A.M., Barak-Corren, Y., Reis, B.Y., Aysola, J., Chaudhari, P., La Cava, W.G., 2026. The Benefit of the Doubt Phenomenon in Emergency Triage Assignment Disparities. https://doi.org/10.64898/2026.02.12.26346184
This work was partially supported by NLM R01LM014300.
Contact: @lacava