Upload messy multi-country transaction data. Get back clean datasets, error reports, and AI-generated insights — automatically.
Organizations pull transaction data from dozens of sources and countries — and that data always arrives messy. Broken phone numbers, inconsistent date formats, missing fields, duplicate IDs, invalid payment modes. Cleaning it by hand doesn't scale.
Stratos is a layered, streaming validation pipeline. Drop in a CSV or XLSX file, and it flows down through every layer of the stack — schema checks, country-specific rule validation, async background processing, and an AI reasoning layer — emerging as a clean dataset, a detailed error report, and a plain-English quality summary.
Every layer in the name is intentional: Stratos validates in layers, the same way the atmosphere itself is layered — from raw upload down to AI-refined insight.
|
|
|
flowchart TD
A[📤 User Uploads CSV / XLSX] --> B[Next.js 16 Frontend]
B --> C[Litestar API]
C --> D[(PostgreSQL<br/>Job Created)]
C --> E[Redis Queue]
E --> F[RQ Worker]
F --> G[Polars + Pandera<br/>Validation Engine]
G --> H[Groq · Llama 3.3 70B<br/>AI Analysis]
H --> I[(PostgreSQL<br/>Results Stored)]
I --> J[📊 Dashboard]
J --> K[⬇️ Clean Data + Reports]
style A fill:#7C9CFF,stroke:#5C7CFA,color:#0f1226
style G fill:#9D8CFF,stroke:#7B61FF,color:#0f1226
style H fill:#5EE7C8,stroke:#4FD1B8,color:#0f1226
style K fill:#7C9CFF,stroke:#5C7CFA,color:#0f1226
| Layer | Technology | Purpose |
|---|---|---|
| 🎨 Frontend | Next.js 16, TypeScript, React | Upload UI, live dashboard, report downloads |
| 🚪 API | Litestar, msgspec | High-performance request handling |
| 🛡️ Validation | Polars, Pandera | Schema + rule-based data validation |
| 📨 Queue | Redis, RQ | Background job orchestration |
| ⚙️ Worker | Python (RQ worker) | Executes validation & report generation |
| 🗄️ Database | PostgreSQL + SQLAlchemy (async) | Jobs, rules, logs, AI reports |
| 🤖 AI | Groq API — Llama 3.3 70B | Quality scoring, summaries, recommendations |
| 📚 Docs | OpenAPI / Swagger | Live API reference at /api/docs |
npm install
npm run devThis spins up PostgreSQL, Redis, the API, and the worker via Docker — plus the Next.js frontend.
| Service | URL |
|---|---|
| App | http://localhost:3000 |
| API | http://localhost:8000 |
| API Docs | http://localhost:8000/api/docs |
cd backend
cp .env.example .env # set DB_PASSWORD and GROQ_API_KEY
docker compose up --build -dcd xeno-data-hub
npm install
cp .env.example .env.local # set NEXT_PUBLIC_API_URL if API is remote
npm run dev💡 In local dev,
/api/*is automatically proxied tohttp://localhost:8000— you only needNEXT_PUBLIC_API_URLwhen the API lives elsewhere.
|
📁 Clean Dataset Only the records that passed every check — ready to load downstream. 🧾 Error Report Every failed record, paired with why it failed and which category it falls under. |
📦 Chunked Outputs
Large files auto-split into manageable pieces ( 🧠 AI Insights Report Executive summary, quality score, country analysis, and next-step recommendations — generated by Llama 3.3. |
Validation rules flex per country instead of forcing one global format:
🇮🇳 India → 10-digit phone · UPI, CARD, NETBANKING, CASH, WALLET
🇸🇬 Singapore → 8-digit phone · PAYNOW, NETS, GRABPAY
🇺🇸 USA → 10-digit phone · standard card/ACH modes
New countries and rule sets plug in without touching the core engine.
- Real-time validation (Kafka)
- Multi-tenant architecture
- S3-backed storage
- ML-based anomaly detection
- Data lineage tracking
- Role-based access control + audit logs