Khairunnisa Maharani KMoex-HZ

🔥 Currently

🚧 Building: Improving MALLARD — adding smart column profiling & export pipelines
📚 Learning: Advanced Spark internals, Delta Lake, and data contract patterns
🎯 Goal: Land a Data Engineering internship where I can work on real production pipelines
🏆 Recent Win: Top 10 International Finalist at IDSC 2026 with ROC-AUC 0.9801

🛠️ Tech Stack

Also working with: Apache Spark · Apache Airflow · Dagster · dbt · DuckDB · MinIO · Great Expectations · Soda Core · PyInstaller · Streamlit

🚀 Featured Data Engineering Projects

🛒 GlowCart — Production-Inspired E-commerce Data Platform

Production-oriented data platform simulating real-world e-commerce analytics with streaming ingestion and robust failure handling.

Component	Implementation
Architecture	Kafka + Medallion (Bronze → Silver → Gold)
Reliability	Dead Letter Queue + idempotent pipelines
Data Quality	Validation gates at Silver layer
Orchestration	Airflow DAGs with failure handling
Serving	FastAPI + DuckDB zero-ETL endpoints
Docs	Architecture Decision Records (ADR)

🦆 MALLARD — Local Data Warehouse & Refiner

100% local, zero-server data tool — drop CSV, Excel, Parquet, or JSON and let DuckDB handle the rest. No cloud, no API keys, no setup.

🧹 Auto Deep Clean — deduplication, type healing, wide-to-long pivot detection
📊 Analytics Explorer — Histogram, Bar, Scatter, Line, Box, Correlation Heatmap
🖥️ Custom SQL — run any DuckDB query directly against ingested tables
📦 Standalone .exe — distributable to Windows users with zero Python setup

🛍️ Modern Data Platform — Automated ELT with CI/CD

Modern data stack (MDS) with strong data contracts and full pipeline automation.

SCD Type 2 for historical change tracking
Fully automated CI/CD with testing & validation on every push
Data quality checks as deployment gates via Soda Core

🚖 NYC Taxi Data Pipeline

End-to-end distributed pipeline processing millions of records with automated data validation.

Processed 2.9M+ records using distributed Spark clusters
Simulated cloud data lake storage using MinIO (S3-compatible)
Automated data validation with Great Expectations

🇧🇷 Olist Data Warehouse

Star schema data warehouse for analytics-ready reporting
Modular dbt transformations with tests and lineage tracking

🧱 Market Data Ingestion Pipeline

Fault-tolerant ETL for real-time market data ingestion
Retry logic, scheduling, and containerized deployment

🏫 LPMPP Institutional Data Warehouse

Role: Principal Data Engineer & Team Lead
Dimensional modeling + cloud ETL pipelines on Azure

🧠 Additional Experience

👁️ Glaucoma Detection — ML Pipeline (IDSC 2026)

Fully reproducible ML pipeline with Docker + DVC + fixed seeds
Achieved ROC-AUC 0.9801 on blind test set

Primary focus is Data Engineering & platform reliability — ML projects demonstrate systems thinking applied to the full model lifecycle.

🏆 Achievements

🌍 International Finalist (Top 10) — IDSC 2026, hosted by Universiti Putra Malaysia
🇮🇩 National Semifinalist (Rank 16) — Satria Data: Big Data Challenge 2025
🛢️ Pertamina Sobat Bumi Scholar — Top 2.3% from 23,313 applicants
☁️ Microsoft Azure Scholarship Recipient
📄 SINTA 2 Published Researcher — AI-Based Website Development Training
🥇 Gold Medal, National English Competition
🥈 Silver Medal, National Mathematics Competition

📊 GitHub Analytics

🐍 Watch the snake eat my contributions!

🤝 Let's Connect

I'm a student actively looking for internship opportunities in Data Engineering. Always happy to connect, collaborate, or just geek out about pipelines!

Semester 6 · Open to Data Engineering internships · Remote friendly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly