- 🚧 Building: Improving MALLARD — adding smart column profiling & export pipelines
- 📚 Learning: Advanced Spark internals, Delta Lake, and data contract patterns
- 🎯 Goal: Land a Data Engineering internship where I can work on real production pipelines
- 🏆 Recent Win: Top 10 International Finalist at IDSC 2026 with ROC-AUC 0.9801
Also working with: Apache Spark · Apache Airflow · Dagster · dbt · DuckDB · MinIO · Great Expectations · Soda Core · PyInstaller · Streamlit
Production-oriented data platform simulating real-world e-commerce analytics with streaming ingestion and robust failure handling.
| Component | Implementation |
|---|---|
| Architecture | Kafka + Medallion (Bronze → Silver → Gold) |
| Reliability | Dead Letter Queue + idempotent pipelines |
| Data Quality | Validation gates at Silver layer |
| Orchestration | Airflow DAGs with failure handling |
| Serving | FastAPI + DuckDB zero-ETL endpoints |
| Docs | Architecture Decision Records (ADR) |
100% local, zero-server data tool — drop CSV, Excel, Parquet, or JSON and let DuckDB handle the rest. No cloud, no API keys, no setup.
- 🧹 Auto Deep Clean — deduplication, type healing, wide-to-long pivot detection
- 📊 Analytics Explorer — Histogram, Bar, Scatter, Line, Box, Correlation Heatmap
- 🖥️ Custom SQL — run any DuckDB query directly against ingested tables
- 📦 Standalone
.exe— distributable to Windows users with zero Python setup
Modern data stack (MDS) with strong data contracts and full pipeline automation.
- SCD Type 2 for historical change tracking
- Fully automated CI/CD with testing & validation on every push
- Data quality checks as deployment gates via Soda Core
End-to-end distributed pipeline processing millions of records with automated data validation.
- Processed 2.9M+ records using distributed Spark clusters
- Simulated cloud data lake storage using MinIO (S3-compatible)
- Automated data validation with Great Expectations
- Star schema data warehouse for analytics-ready reporting
- Modular dbt transformations with tests and lineage tracking
- Fault-tolerant ETL for real-time market data ingestion
- Retry logic, scheduling, and containerized deployment
- Role: Principal Data Engineer & Team Lead
- Dimensional modeling + cloud ETL pipelines on Azure
- Fully reproducible ML pipeline with Docker + DVC + fixed seeds
- Achieved ROC-AUC 0.9801 on blind test set
Primary focus is Data Engineering & platform reliability — ML projects demonstrate systems thinking applied to the full model lifecycle.
- 🌍 International Finalist (Top 10) — IDSC 2026, hosted by Universiti Putra Malaysia
- 🇮🇩 National Semifinalist (Rank 16) — Satria Data: Big Data Challenge 2025
- 🛢️ Pertamina Sobat Bumi Scholar — Top 2.3% from 23,313 applicants
- ☁️ Microsoft Azure Scholarship Recipient
- 📄 SINTA 2 Published Researcher — AI-Based Website Development Training
- 🥇 Gold Medal, National English Competition
- 🥈 Silver Medal, National Mathematics Competition