🧠 ELMHA-AI: Terminological and Caregiver Assist System for Alzheimer’s Detection

Repository: https://github.com/iUtsa/ELMHA-AI/tree/main
Author: Arnab Das Utsa • Stockton University
License: MIT

📘 Overview

ELMHA-AI (Early Linguistic Markers of Human Alzheimer’s – AI) is a dual-purpose system that integrates a terminological NLP pipeline for linguistic Alzheimer’s detection with a caregiver-assistive website for interpretability and education.

The repository contains two core modules:

🧩 NLP Pipeline – Extracts and analyzes linguistic, semantic, and terminological features to identify Alzheimer’s-related patterns in text or transcripts.
🌐 Caregiver Website – A browser-based portal offering explainable AI outputs, terminology definitions, and interactive dashboards for caregivers.

🧠 NLP Pipeline

1️⃣ Preprocessing and Linguistic Normalization

Tokenization & Sentence Splitting: Divide transcripts into analyzable lexical units.
Lemmatization / Stemming: Reduce words to canonical forms for consistent term mapping.
Stopword & Noise Removal: Filter disfluencies (“uh,” “um”), fillers, and transcription noise.
Part-of-Speech Tagging & Dependency Parsing: Identify syntactic relations to measure complexity and coherence.

2️⃣ Terminological Mapping and Ontology Alignment

Lexicon Lookup: Map tokens to entries in terminological databases such as UMLS, SNOMED CT, or the custom Alzheimer’s Cognitive Term Dictionary (ACTD).
Semantic Disambiguation: Contextual embedding similarity (SentenceTransformers MiniLM) selects the most relevant clinical sense.
Concept Hierarchy: Group terms by cognitive domains (memory, fluency, comprehension).
Term Frequency & Context Windows: Quantify Alzheimer’s-related word usage patterns and semantic neighborhoods.

3️⃣ Feature Extraction and Semantic Drift

Lexical Diversity: Measure vocabulary richness and type-token ratios.
Syntactic Complexity: Sentence length, clause depth, dependency density.
Semantic Shift & Drift: Track how meaning or emotional valence changes over time.
Sentiment Analysis: Detect affective flattening or emotional variability.
Contextual Embeddings: Generate high-dimensional representations to compare cognitive coherence.

4️⃣ Classification and Explainability

Modeling: Logistic Regression and LightGBM trained on linguistic + terminological features.
Evaluation: Stratified k-fold CV with F1 ≈ 0.86 and Accuracy ≈ 0.89.
Explainability: SHAP and LIME highlight decisive terms, syntactic shifts, and lexical markers.
Confidence Calibration: Outputs confidence intervals and human-readable justifications.

🧩 Pipeline Structure

/nlp_pipeline ├── preprocess.py ├── term_matcher.py ├── feature_extractor.py ├── classify_model.py └── explain_layer.py

markdown Copy code

🌐 Caregiver Website

Purpose

To translate complex linguistic AI output into accessible, interpretable insights for caregivers, enabling understanding of cognitive-linguistic changes without medical jargon.

Backend (Flask / FastAPI)

Exposes endpoints /analyze, /explain, /dictionary.
Runs the NLP pipeline on submitted text or transcripts.
Returns structured JSON with predictions, top features, and terminological matches.

Frontend (React / Bootstrap)

Dashboard: Visualizes language trends (syntax, sentiment, vocabulary).
Term Dictionary: Lookup tool linking clinical terms to lay explanations and resources.
Explainable Panel: Displays feature importances and contextual sentences.
Resource Center: Curated caregiver education materials.

Folder Layout

/webapp ├── backend/ │ ├── app.py │ ├── api_routes.py │ └── utils/ ├── frontend/ │ ├── components/ │ ├── pages/ │ └── static/ └── templates/

yaml Copy code

⚙️ Tech Stack

Layer	Tools / Libraries	Description
NLP	spaCy 3.7, NLTK 3.8.1, SentenceTransformers MiniLM	Linguistic & semantic processing
ML / Explainability	scikit-learn 1.3.2, LightGBM 4.1.0, SHAP 0.43.0, LIME 0.2.0.1	Modeling & interpretability
Web	Flask + React	Interactive caregiver dashboard
Storage	JSON / CSV Terminology Dictionary	Domain lexicon
Visualization	Matplotlib / Recharts	Trend and feature graphs

📊 Evaluation Snapshot

Metric	Value	Description
Accuracy	0.89	Balanced classification
F1-Score	0.86	Weighted linguistic performance
Trust Index	0.91	Caregiver-rated understandability

🧩 Ethical Use

Privacy: All datasets de-identified (compliant with 45 CFR 46).
Transparency: Outputs include full feature explanations.
Disclaimer: Not a medical diagnostic device; intended for research and caregiver education.

📂 Repository Layout

/ELMHA-AI ├── /data/ # Processed text & sample inputs ├── /dictionary/ # Alzheimer’s term lexicon ├── /nlp_pipeline/ # Core NLP scripts ├── /models/ # Saved checkpoints ├── /webapp/ # Website (backend + frontend) ├── /docs/ # Diagrams & papers └── README.md

yaml Copy code

🔖 Citation

@software{DasUtsa2025_ELMHA_AI,
  author       = {Arnab Das Utsa},
  title        = {ELMHA-AI: Terminological and Caregiver Assist System for Alzheimer's Detection},
  year         = {2025},
  institution  = {Stockton University},
  url          = {https://github.com/iUtsa/ELMHA-AI}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
papers		papers
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 ELMHA-AI: Terminological and Caregiver Assist System for Alzheimer’s Detection

📘 Overview

🧠 NLP Pipeline

1️⃣ Preprocessing and Linguistic Normalization

2️⃣ Terminological Mapping and Ontology Alignment

3️⃣ Feature Extraction and Semantic Drift

4️⃣ Classification and Explainability

🧩 Pipeline Structure

🌐 Caregiver Website

Purpose

Backend (Flask / FastAPI)

Frontend (React / Bootstrap)

Folder Layout

⚙️ Tech Stack

📊 Evaluation Snapshot

🧩 Ethical Use

📂 Repository Layout

🔖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 ELMHA-AI: Terminological and Caregiver Assist System for Alzheimer’s Detection

📘 Overview

🧠 NLP Pipeline

1️⃣ Preprocessing and Linguistic Normalization

2️⃣ Terminological Mapping and Ontology Alignment

3️⃣ Feature Extraction and Semantic Drift

4️⃣ Classification and Explainability

🧩 Pipeline Structure

🌐 Caregiver Website

Purpose

Backend (Flask / FastAPI)

Frontend (React / Bootstrap)

Folder Layout

⚙️ Tech Stack

📊 Evaluation Snapshot

🧩 Ethical Use

📂 Repository Layout

🔖 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages