Skip to content

iUtsa/ELMHA-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 ELMHA-AI: Terminological and Caregiver Assist System for Alzheimer’s Detection

Repository: https://github.com/iUtsa/ELMHA-AI/tree/main
Author: Arnab Das Utsa • Stockton University
License: MIT


📘 Overview

ELMHA-AI (Early Linguistic Markers of Human Alzheimer’s – AI) is a dual-purpose system that integrates a terminological NLP pipeline for linguistic Alzheimer’s detection with a caregiver-assistive website for interpretability and education.

The repository contains two core modules:

  1. 🧩 NLP Pipeline – Extracts and analyzes linguistic, semantic, and terminological features to identify Alzheimer’s-related patterns in text or transcripts.
  2. 🌐 Caregiver Website – A browser-based portal offering explainable AI outputs, terminology definitions, and interactive dashboards for caregivers.

🧠 NLP Pipeline

1️⃣ Preprocessing and Linguistic Normalization

  • Tokenization & Sentence Splitting: Divide transcripts into analyzable lexical units.
  • Lemmatization / Stemming: Reduce words to canonical forms for consistent term mapping.
  • Stopword & Noise Removal: Filter disfluencies (“uh,” “um”), fillers, and transcription noise.
  • Part-of-Speech Tagging & Dependency Parsing: Identify syntactic relations to measure complexity and coherence.

2️⃣ Terminological Mapping and Ontology Alignment

  • Lexicon Lookup: Map tokens to entries in terminological databases such as UMLS, SNOMED CT, or the custom Alzheimer’s Cognitive Term Dictionary (ACTD).
  • Semantic Disambiguation: Contextual embedding similarity (SentenceTransformers MiniLM) selects the most relevant clinical sense.
  • Concept Hierarchy: Group terms by cognitive domains (memory, fluency, comprehension).
  • Term Frequency & Context Windows: Quantify Alzheimer’s-related word usage patterns and semantic neighborhoods.

3️⃣ Feature Extraction and Semantic Drift

  • Lexical Diversity: Measure vocabulary richness and type-token ratios.
  • Syntactic Complexity: Sentence length, clause depth, dependency density.
  • Semantic Shift & Drift: Track how meaning or emotional valence changes over time.
  • Sentiment Analysis: Detect affective flattening or emotional variability.
  • Contextual Embeddings: Generate high-dimensional representations to compare cognitive coherence.

4️⃣ Classification and Explainability

  • Modeling: Logistic Regression and LightGBM trained on linguistic + terminological features.
  • Evaluation: Stratified k-fold CV with F1 ≈ 0.86 and Accuracy ≈ 0.89.
  • Explainability: SHAP and LIME highlight decisive terms, syntactic shifts, and lexical markers.
  • Confidence Calibration: Outputs confidence intervals and human-readable justifications.

🧩 Pipeline Structure

/nlp_pipeline ├── preprocess.py ├── term_matcher.py ├── feature_extractor.py ├── classify_model.py └── explain_layer.py

markdown Copy code


🌐 Caregiver Website

Purpose

To translate complex linguistic AI output into accessible, interpretable insights for caregivers, enabling understanding of cognitive-linguistic changes without medical jargon.

Backend (Flask / FastAPI)

  • Exposes endpoints /analyze, /explain, /dictionary.
  • Runs the NLP pipeline on submitted text or transcripts.
  • Returns structured JSON with predictions, top features, and terminological matches.

Frontend (React / Bootstrap)

  • Dashboard: Visualizes language trends (syntax, sentiment, vocabulary).
  • Term Dictionary: Lookup tool linking clinical terms to lay explanations and resources.
  • Explainable Panel: Displays feature importances and contextual sentences.
  • Resource Center: Curated caregiver education materials.

Folder Layout

/webapp ├── backend/ │ ├── app.py │ ├── api_routes.py │ └── utils/ ├── frontend/ │ ├── components/ │ ├── pages/ │ └── static/ └── templates/

yaml Copy code


⚙️ Tech Stack

Layer Tools / Libraries Description
NLP spaCy 3.7, NLTK 3.8.1, SentenceTransformers MiniLM Linguistic & semantic processing
ML / Explainability scikit-learn 1.3.2, LightGBM 4.1.0, SHAP 0.43.0, LIME 0.2.0.1 Modeling & interpretability
Web Flask + React Interactive caregiver dashboard
Storage JSON / CSV Terminology Dictionary Domain lexicon
Visualization Matplotlib / Recharts Trend and feature graphs

📊 Evaluation Snapshot

Metric Value Description
Accuracy 0.89 Balanced classification
F1-Score 0.86 Weighted linguistic performance
Trust Index 0.91 Caregiver-rated understandability

🧩 Ethical Use

  • Privacy: All datasets de-identified (compliant with 45 CFR 46).
  • Transparency: Outputs include full feature explanations.
  • Disclaimer: Not a medical diagnostic device; intended for research and caregiver education.

📂 Repository Layout

/ELMHA-AI ├── /data/ # Processed text & sample inputs ├── /dictionary/ # Alzheimer’s term lexicon ├── /nlp_pipeline/ # Core NLP scripts ├── /models/ # Saved checkpoints ├── /webapp/ # Website (backend + frontend) ├── /docs/ # Diagrams & papers └── README.md

yaml Copy code


🔖 Citation

@software{DasUtsa2025_ELMHA_AI,
  author       = {Arnab Das Utsa},
  title        = {ELMHA-AI: Terminological and Caregiver Assist System for Alzheimer's Detection},
  year         = {2025},
  institution  = {Stockton University},
  url          = {https://github.com/iUtsa/ELMHA-AI}
}

Releases

No releases published

Packages

 
 
 

Contributors