Skip to content
View mukuldesai's full-sized avatar

Block or report mukuldesai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
mukuldesai/README.md

Mukul Desai

Typing SVG

LinkedIn Portfolio Email Location


About

I build data infrastructure that people can actually rely on. Pipelines that do not break silently. Transformation layers that stay consistent. Systems that scale without requiring someone to babysit them.

Currently embedded with Johnson & Johnson via LTI Mindtree, building enterprise data reliability and access automation systems. Previously the sole data and AI engineer at TripForCure, designing the full data platform across 31 hospital locations before handing it off to a QA team.


Tech Stack

Languages

Python SQL R

Data Engineering

Apache Airflow dbt Apache Kafka Apache Flink Apache Spark

Platforms & Databases

Snowflake PostgreSQL MongoDB MySQL

Cloud & DevOps

AWS Docker Jenkins

Analytics & BI

Power BI Tableau

AI & ML

LangChain OpenAI Scikit-learn FastAPI


Featured Projects

Snowflake dbt Airflow Python Apr 2026

End-to-end ELT pipeline on Snowflake and dbt with Medallion Architecture — staging, intermediate, and mart layers — to ingest multi-source hospital and claims data. Automated quality checks covering freshness validation, null detection, and duplicate identification. Airflow orchestration with custom anomaly monitoring to surface SLA breaches and data inconsistencies.


Kafka Flink Scikit-learn PostgreSQL Mar 2025

Fault-tolerant streaming pipeline ingesting and processing financial market data in real time. AI-powered anomaly detection reduced fraudulent transaction detection from hours to seconds. Portfolio risk models (VaR, CVaR, Sharpe/Sortino Ratio) improved assessment accuracy by 23% over traditional batch methods.


LangChain OpenAI FastAPI Jul 2025

Four-agent AI system — code search, task recommendations, learning guidance, and real-time assistance — with shared context across all agents. OpenAI LLMs integrated with ChromaDB vector search via FastAPI for semantic knowledge retrieval across enterprise codebases.


Airflow Power BI Python Jun 2025

Automated the full DCF valuation pipeline from API ingestion through 5-year forecasting and peer benchmarking, eliminating 10 hours of manual modeling per valuation cycle. Power BI dashboards with scenario-driven recommendations — analysis of Adobe (ADBE) surfaced a -53% downside signal.


OpenAI Next.js Firebase Apr 2025

Full-stack interview preparation platform with a dynamic interview simulator generating role-specific questions, an AI-powered resume analyzer with quality scoring, and a progress dashboard tracking improvement over time.


GitHub Stats


Certifications

IBM Data Engineering Google Cloud IBM Data Science


Education

🎓 M.S. Information Systems — Northeastern University (2023–2025)

🎓 B.E. Electronics & Telecommunications — University of Mumbai, VESIT (2019–2023)


Open to Data Engineer, Analytics Engineer, and Data Platform roles. Building at the intersection of reliable data infrastructure and applied AI.

Pinned Loading

  1. ZeroDay ZeroDay Public

    Python

  2. Marketing-and-Social-Media-Content-Generation-Tool Marketing-and-Social-Media-Content-Generation-Tool Public

    Generating high-quality, tailored content for different digital platforms.

    Python

  3. Lung-Cancer-Detection Lung-Cancer-Detection Public

    Python

  4. Real-Time-Fraud-Detection-System Real-Time-Fraud-Detection-System Public

    Python

  5. Financial-Pipeline Financial-Pipeline Public

    Python

  6. InterviewGPT InterviewGPT Public

    TypeScript