"When learning meets data, growth becomes measurable and inevitable."
Data Analytics | Data Engineering | AI Systems
Building end-to-end data solutions across ETL, analytics, and machine learning.
Current Project: 🚲 Bike Demand ML System — 4-city Random Forest inference API live on GCP Cloud Run (v4.4.0); RMSE accuracy gates in CI, cost-audit alerting via Slack, Cloud Logging + Prometheus metrics; companion R Shiny dashboard with live GBFS + weather feeds across 6 cities — next: drift monitoring pipeline (v4.5.0)
I'm a data-driven professional passionate about applying AI, Data Engineering and Analytics to improve Business, Learning and Development (L&D) outcomes.
After a successful career in Aviation training and Airport operations, I've transitioned toward data engineering and data analytics, where I can apply analytical methods to solve learning and business problems.
I build data-driven solutions covering:
- AI/ML Engineering — end-to-end training pipelines, inference APIs, and production cloud deployment
- Cloud Data Engineering — GCP Cloud Run, Artifact Registry, BigQuery; containerised CI/CD
- Observability — structured JSON logging (Cloud Logging), Prometheus metrics endpoints
- ETL pipelines and data workflows
- Exploratory data analysis and visualization
- Predictive modelling using Python and R
- Analytics dashboards and reporting systems (R Shiny, Tableau, Looker Studio)
Programming & Analysis:
ML Engineering & APIs:
Visualization & Reporting:
Cloud Data Engineering:
| Project | Description | Tools |
|---|---|---|
| 🏗️ Sales Data Pipeline (ETL) | Built a production-grade ETL pipeline using Medallion architecture (Bronze/Silver/Gold) to transform raw sales data into validated, analytics-ready datasets with automated data quality checks, feature engineering, and CI/CD workflows. | Python, Pandas, DuckDB, Docker, GitHub Actions |
| 🚲 Bike Demand Prediction System | Built a 6-city live demand dashboard integrating OpenWeather forecasts, GBFS live station data, and a FastAPI ML backend. Features UC1 fleet rebalancing alerts and UC2 rider demand scores across Seoul, London, NYC, DC, Paris, and Chicago. | R, Shiny, httr, Leaflet, GBFS, FastAPI (backend), Docker, GitHub Actions |
| ⚙️ Bike Demand ML System | Production ML inference API live on GCP Cloud Run (v4.4.0). Trains 4-city Random Forest models (Seoul, London, NYC, DC); models baked into Docker image at build time. CI auto-publishes to GHCR + Artifact Registry and redeploys on merge via gcloud run deploy. RMSE accuracy gates in CI, cost-audit alerting via Slack, structured JSON logging → Cloud Logging, Prometheus /metrics endpoint. |
Python, FastAPI, scikit-learn, Pydantic, Docker, GCP Cloud Run, Prometheus, GitHub Actions |
| 🏠 StayOps — Rental Ops Console | Multi-channel booking reconciliation engine and AI-assisted ops console for short/mid-term rental operators. Ingests bookings from CSV and Google Sheets (idempotent SHA-256 dedup), detects 4 conflict types automatically (duplicates, double-bookings, pricing anomalies, gap nights), and surfaces live KPI dashboards and SQL reports — built end-to-end with Claude Code on Next.js 16 + Supabase. Phase 2: Claude tool-calling agent layer. | TypeScript, Next.js 16, Drizzle ORM, Supabase, shadcn/ui, Anthropic SDK, Vercel |
| 🎓 Corporate Training Analytics Platform | Refactor->Re-write -> full-stack training records and analytics system to manage multi-course training programmes, featuring a unified data model, role-based admin dashboard, KPI tracking, event/result management, and reporting abstraction. | Java, SQL, Data Modeling, KPI Analytics, Role-Based Access |
The Sales Data Pipeline evolves from a production-grade Medallion ETL into a full customer analytics platform — unifying transactions, segmentation, and retention into a single source of truth.
flowchart TD
subgraph Sources["📥 Data Sources"]
S1["CRM"] & S2["POS / Transactions"] & S3["Web Analytics"]
end
subgraph ETL["🏗️ Medallion ETL ✅ Built — Python · Pandas · Pydantic"]
B["🥉 Bronze — Ingest + schema validation"]
C["🥈 Silver — Clean · Dedup · Feature engineering"]
D["🥇 Gold — Star schema · AOV · CLV pre-aggregated"]
end
subgraph Infra["⚙️ Data Infrastructure 🔜 Planned — Airflow · BigQuery · Snowflake"]
ORC["Apache Airflow<br/>Scheduled DAGs · Dependency tracking"]
WH["BigQuery / Snowflake<br/>Partitioned · Clustered · Cost-optimised"]
end
subgraph Seg["🧠 Customer Segmentation 🔜 Planned — scikit-learn · Databricks"]
E["RFM Analysis<br/>Recency · Frequency · Monetary"]
F["Cohort Analysis<br/>Signup cohorts · Engagement lifecycle"]
G["K-Means Clustering<br/>Unsupervised persona discovery"]
end
subgraph Ret["🔁 Retention Analytics 🔜 Planned — scikit-learn · Databricks"]
H["Churn Classification<br/>At-risk flagging · Re-engagement triggers"]
I["LTV Correlation<br/>High-value segment identification"]
end
subgraph Serving["⚡ Serving Layer ✅ Built — FastAPI · DuckDB · Docker"]
J["🦆 DuckDB — In-process analytics"]
K["FastAPI REST API"]
end
subgraph Dash["📊 Analytics Dashboard 🔜 Planned — Tableau · Streamlit"]
L["KPI tracking · Segment views<br/>Retention curves · LTV by cohort"]
end
Sources --> B
B --> C
C --> D
D --> ORC
ORC --> WH
D --> J
WH --> E
WH --> F
E --> G
G --> H
F --> H
H --> I
J --> K
I --> K
K --> L
- Google Data Analytics Professional Certificate
- IBM Data Analytics Professional Certificate with Excel & R
My mission is to bridge Data Engineering and Learning — using data to make learning, Business Analysis and training more effective.
📍 Mumbai, India
📧 deepanmehta@live.com
🔗 LinkedIn
💼 GitHub Projects
"When learning meets data, growth becomes measurable and inevitable."
