Valid8 is a microservices-based platform designed to automate the ingestion, cleaning, and validation of healthcare provider data. It leverages Large Language Models (LLMs) and external registries (NPI) to detect discrepancies, normalize messy datasets, and assign confidence scores to provider records.
Unlike traditional synchronous validation tools, Valid8 implements an Asynchronous Job Orchestration pattern, allowing for scalable processing of large datasets with real-time status feedback to the user interface.
The system follows a distributed microservices architecture:
graph LR
User[Frontend UI] -- HTTP/Polling --> Orch[Orchestrator Service]
Orch -- Async Task --> Ingest[Ingestion Service]
Orch -- Async Task --> Validate[Validation Service]
Ingest -- Cleaned Data --> Orch
Validate -- Validation Results --> Orch
Validate -- NPI Registry API --> Ext[External Sources]
- Frontend (Next.js): A responsive React application that manages file uploads, initiates async jobs, polls for status updates, and visualizes validation results.
- Orchestrator Service (FastAPI): The central control plane. It accepts file uploads, spawns background tasks, and manages the state of the processing pipeline (Ingestion → Validation).
- Ingestion Service (FastAPI): Responsible for parsing raw CSVs, normalizing data formats, and using LLMs (Gemini) to clean unstructured text fields.
- Validation Service (FastAPI): Performs authoritative checks against the CMS NPI Registry and uses LLMs to compare provided data vs. registry data, generating confidence scores and discrepancy reports.
- Framework: Next.js 14 (App Router)
- Styling: Tailwind CSS, Shadcn UI
- State Management: React Hooks (Polling architecture for async jobs)
- Visualization: Lucide React Icons, Custom Progress Cards
- Framework: FastAPI (Python)
- Concurrency: Python
asyncio, BackgroundTasks - AI/LLM: Google Gemini 1.5 Flash (via
google.generativeaiSDK) - Networking: REST (Components communicate via HTTP)
- Port:
8000 - Role: API Gateway & Workflow Manager.
- Key Endpoints:
POST /start-job: Accepts a file, generates ajob_id, and starts the async pipeline.GET /status/{job_id}: Returns real-time progress (0-100%), current stage (ingestion,validation), and logs.
- Port:
8001 - Role: Data Cleaning & Normalization.
- Process:
- Accepts a raw CSV file.
- Uses LLM to correct spelling, formatting (Phone, Address), and normalize specialties.
- Returns a structured JSON of
cleaned_providers.
- Port:
8002 - Role: Verification & Risk Analysis.
- Process:
- Iterates through cleaned providers.
- Queries
https://npiregistry.cms.hhs.govfor authoritative data. - Uses LLM to compare Input vs. Registry data.
- Logic: Enforces strict rules (e.g., Missing NPI = 0% Confidence, Critical Risk).
- Node.js v18+
- Python 3.10+
- Google Gemini API Key
Create .env files in respective backend folders:
backend/ingestion/.env & backend/validation/.env
GEMINI_API_KEY=your_gemini_key_here
GEMINI_MODEL=gemini-1.5-flashbackend/orchestrator/.env
INGESTION_URL=http://localhost:8001/ingest/csv
VALIDATION_URL=http://localhost:8002/validateYou need 4 terminal instances to run the full stack:
1. Orchestrator
cd backend/orchestrator
python -m uvicorn main:app --reload --port 80002. Ingestion Service
cd backend/ingestion
python -m uvicorn main:app --reload --port 80013. Validation Service
cd backend/validation
python -m uvicorn main_v:app --reload --port 80024. Frontend
# Root directory
npm run devvalid8/
├── app/ # Next.js App Router pages
├── components/ # React UI components
│ └── pages/ # Page-specific logic (Upload, Progress, Results)
├── backend/
│ ├── orchestrator/ # Orchestrator Service code
│ ├── ingestion/ # Ingestion Service code
│ └── validation/ # Validation Service code
├── public/ # Static assets
└── package.json # Frontend dependencies
- User uploads CSV on the UI.
- UI calls Orchestrator (
/start-job), receiving a Job ID. - UI polls Orchestrator (
/status/{job_id}) every 1s. - Orchestrator sends file to Ingestion Service.
- Ingestion cleans data and returns it to Orchestrator.
- Orchestrator sends cleaned data to Validation Service.
- Validation verifies NPIs and returns scores.
- UI detects completion via polling and renders the Results Dashboard.