Skip to content
View deepan-mehta-analytics's full-sized avatar
🎯
Building end-to-end data pipelines and analytics systems
🎯
Building end-to-end data pipelines and analytics systems
  • Mumbai, India
  • 17:32 (UTC +05:30)

Block or report deepan-mehta-analytics

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Deepan Mehta Banner


👋 Hi, I'm Deepan Mehta

"When learning meets data, growth becomes measurable and inevitable."

Data Analytics | Data Engineering | AI Systems

Building end-to-end data solutions across ETL, analytics, and machine learning.

Current Project: 🚲 Bike Demand ML System — 4-city Random Forest inference API live on GCP Cloud Run (v4.4.0); RMSE accuracy gates in CI, cost-audit alerting via Slack, Cloud Logging + Prometheus metrics; companion R Shiny dashboard with live GBFS + weather feeds across 6 cities — next: drift monitoring pipeline (v4.5.0)


🌟 About Me

I'm a data-driven professional passionate about applying AI, Data Engineering and Analytics to improve Business, Learning and Development (L&D) outcomes.

After a successful career in Aviation training and Airport operations, I've transitioned toward data engineering and data analytics, where I can apply analytical methods to solve learning and business problems.

I build data-driven solutions covering:

  • AI/ML Engineering — end-to-end training pipelines, inference APIs, and production cloud deployment
  • Cloud Data Engineering — GCP Cloud Run, Artifact Registry, BigQuery; containerised CI/CD
  • Observability — structured JSON logging (Cloud Logging), Prometheus metrics endpoints
  • ETL pipelines and data workflows
  • Exploratory data analysis and visualization
  • Predictive modelling using Python and R
  • Analytics dashboards and reporting systems (R Shiny, Tableau, Looker Studio)

🧠 Core Competencies

Programming & Analysis:

Python R Java TypeScript Next.js SQL MySQL BigQuery Excel

ML Engineering & APIs:

scikit-learn FastAPI Pydantic Docker GitHub Actions Prometheus Anthropic

Visualization & Reporting:

Tableau Looker Studio ggplot2 Shiny

Cloud Data Engineering:

Google Cloud BigQuery Cloud Run Vertex AI Artifact Registry


💼 Featured Projects

Project Description Tools
🏗️ Sales Data Pipeline (ETL) Built a production-grade ETL pipeline using Medallion architecture (Bronze/Silver/Gold) to transform raw sales data into validated, analytics-ready datasets with automated data quality checks, feature engineering, and CI/CD workflows. Python, Pandas, DuckDB, Docker, GitHub Actions
🚲 Bike Demand Prediction System Built a 6-city live demand dashboard integrating OpenWeather forecasts, GBFS live station data, and a FastAPI ML backend. Features UC1 fleet rebalancing alerts and UC2 rider demand scores across Seoul, London, NYC, DC, Paris, and Chicago. R, Shiny, httr, Leaflet, GBFS, FastAPI (backend), Docker, GitHub Actions
⚙️ Bike Demand ML System Production ML inference API live on GCP Cloud Run (v4.4.0). Trains 4-city Random Forest models (Seoul, London, NYC, DC); models baked into Docker image at build time. CI auto-publishes to GHCR + Artifact Registry and redeploys on merge via gcloud run deploy. RMSE accuracy gates in CI, cost-audit alerting via Slack, structured JSON logging → Cloud Logging, Prometheus /metrics endpoint. Python, FastAPI, scikit-learn, Pydantic, Docker, GCP Cloud Run, Prometheus, GitHub Actions
🏠 StayOps — Rental Ops Console Multi-channel booking reconciliation engine and AI-assisted ops console for short/mid-term rental operators. Ingests bookings from CSV and Google Sheets (idempotent SHA-256 dedup), detects 4 conflict types automatically (duplicates, double-bookings, pricing anomalies, gap nights), and surfaces live KPI dashboards and SQL reports — built end-to-end with Claude Code on Next.js 16 + Supabase. Phase 2: Claude tool-calling agent layer. TypeScript, Next.js 16, Drizzle ORM, Supabase, shadcn/ui, Anthropic SDK, Vercel
🎓 Corporate Training Analytics Platform Refactor->Re-write -> full-stack training records and analytics system to manage multi-course training programmes, featuring a unified data model, role-based admin dashboard, KPI tracking, event/result management, and reporting abstraction. Java, SQL, Data Modeling, KPI Analytics, Role-Based Access

📊 Architecture: Sales Data Pipeline — Current & Roadmap

The Sales Data Pipeline evolves from a production-grade Medallion ETL into a full customer analytics platform — unifying transactions, segmentation, and retention into a single source of truth.

flowchart TD
    subgraph Sources["📥 Data Sources"]
        S1["CRM"] & S2["POS / Transactions"] & S3["Web Analytics"]
    end

    subgraph ETL["🏗️ Medallion ETL  ✅  Built — Python · Pandas · Pydantic"]
        B["🥉 Bronze — Ingest + schema validation"]
        C["🥈 Silver — Clean · Dedup · Feature engineering"]
        D["🥇 Gold — Star schema · AOV · CLV pre-aggregated"]
    end

    subgraph Infra["⚙️ Data Infrastructure  🔜  Planned — Airflow · BigQuery · Snowflake"]
        ORC["Apache Airflow<br/>Scheduled DAGs · Dependency tracking"]
        WH["BigQuery / Snowflake<br/>Partitioned · Clustered · Cost-optimised"]
    end

    subgraph Seg["🧠 Customer Segmentation  🔜  Planned — scikit-learn · Databricks"]
        E["RFM Analysis<br/>Recency · Frequency · Monetary"]
        F["Cohort Analysis<br/>Signup cohorts · Engagement lifecycle"]
        G["K-Means Clustering<br/>Unsupervised persona discovery"]
    end

    subgraph Ret["🔁 Retention Analytics  🔜  Planned — scikit-learn · Databricks"]
        H["Churn Classification<br/>At-risk flagging · Re-engagement triggers"]
        I["LTV Correlation<br/>High-value segment identification"]
    end

    subgraph Serving["⚡ Serving Layer  ✅  Built — FastAPI · DuckDB · Docker"]
        J["🦆 DuckDB — In-process analytics"]
        K["FastAPI REST API"]
    end

    subgraph Dash["📊 Analytics Dashboard  🔜  Planned — Tableau · Streamlit"]
        L["KPI tracking · Segment views<br/>Retention curves · LTV by cohort"]
    end

    Sources --> B
    B --> C
    C --> D
    D --> ORC
    ORC --> WH
    D --> J
    WH --> E
    WH --> F
    E --> G
    G --> H
    F --> H
    H --> I
    J --> K
    I --> K
    K --> L
Loading

🎓 Certifications

  • Google Data Analytics Professional Certificate
  • IBM Data Analytics Professional Certificate with Excel & R

My mission is to bridge Data Engineering and Learning — using data to make learning, Business Analysis and training more effective.


📫 Contact

📍 Mumbai, India
📧 deepanmehta@live.com
🔗 LinkedIn
💼 GitHub Projects


"When learning meets data, growth becomes measurable and inevitable."

Pinned Loading

  1. sales-data-pipeline sales-data-pipeline Public

    Production-ready sales data pipeline with ETL layers, testing, and CI/CD setup

    Python

  2. bike-demand-ml-system bike-demand-ml-system Public

    End-to-end bike demand prediction system evolving from data analytics to ML engineering and AI systems, with modular pipelines, feature engineering, model training, and future API + deployment inte…

    Python

  3. bike-demand-prediction bike-demand-prediction Public

    End-to-end bike demand prediction system (R + Shiny) with real-time API integration and geospatial analytics.

    Jupyter Notebook

  4. stayops stayops Public

    AI-assisted operations console for short/mid-term rental operators — multi-channel booking reconciliation, KPI dashboard, AI agent layer. Built with Next.js 15 + Vercel + Supabase + Claude Code.

    TypeScript