Skip to content

0717CCC/NYCU-DSP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

Early Prediction of Sepsis

Competition: PhysioNet Challenge 2019
Research Aim: Develop a model to predict sepsis at an early stage, before clinical diagnosis.

Challenges

  • Limited Raw Features: Only 40 features are available in the dataset.
  • High Missing Rates: Most features have missing rates above 90%.
  • Variable-Length Inputs: The number of available records varies at each hour.
  • Label Imbalance: Only 7.26% of patients develop sepsis (1.8% of total records).

Contributions

  • Address dataset imbalance and variable-length time series issues
  • Improve clinical outcome prediction
  • Enhance model interpretability

Methodology

Pipeline

Input → Feature Engineering → Missing Value Imputation (LOOF) → Sampling (Down/Up) → Modeling (XGBoost)

Feature Engineering

  • Feature Selection: Remove non-informative features (Bilirubin_direct, TroponinI, Fibrinogen)
  • Missingness Indicators: Add features reflecting missing data patterns
  • Sliding-Window Statistics: Compute mean, variance, and other time-windowed statistics
  • Empirical Scores: Incorporate clinical scores such as SOFA and ∆SOFA
  • Textual Representation: Convert [column name] [value] into embeddings using BioClinicalBERT

Evaluation

Evaluation Metric

Evaluation Strategy

Addressing Key Challenges

  • High Missing Rate: Missing-value imputation + XGBoost
  • Limited Features & Variable-Length Inputs: Feature engineering and BioClinicalBERT representations
  • Imbalanced Dataset: Explore up-sampling, down-sampling, and weighted loss approaches

Results

Model Performance Comparison

Effectiveness of Class Imbalance Strategies

Impact of Different Feature Engineering Techniques

Contribution of Text-Based Features

Effect of Adding Text Features to a Deep Learning Model (MLP)

Comparing Deep Learning Models (FT-Transformer) vs. XGBoost

Feature Importance Analysis

About

Data Science Project ( DSP ) : Develop a model to predict sepsis at an early stage, before clinical diagnosis. ( Early Prediction of Sepsis )

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors