Skip to content
View KMoex-HZ's full-sized avatar
  • Open for Remote Opportunities
  • Remote / GMT+7
  • 19:53 (UTC -12:00)

Block or report KMoex-HZ

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
KMoex-HZ/README.md
Typing SVG

LinkedIn GitHub Email

Visitor Count


🔥 Currently

  • 🚧 Building: Improving MALLARD — adding smart column profiling & export pipelines
  • 📚 Learning: Advanced Spark internals, Delta Lake, and data contract patterns
  • 🎯 Goal: Land a Data Engineering internship where I can work on real production pipelines
  • 🏆 Recent Win: Top 10 International Finalist at IDSC 2026 with ROC-AUC 0.9801

🛠️ Tech Stack



Also working with: Apache Spark · Apache Airflow · Dagster · dbt · DuckDB · MinIO · Great Expectations · Soda Core · PyInstaller · Streamlit


🚀 Featured Data Engineering Projects

Kafka Spark dbt Airflow FastAPI DuckDB Docker

Production-oriented data platform simulating real-world e-commerce analytics with streaming ingestion and robust failure handling.

Component Implementation
Architecture Kafka + Medallion (Bronze → Silver → Gold)
Reliability Dead Letter Queue + idempotent pipelines
Data Quality Validation gates at Silver layer
Orchestration Airflow DAGs with failure handling
Serving FastAPI + DuckDB zero-ETL endpoints
Docs Architecture Decision Records (ADR)

DuckDB Streamlit Python Plotly

100% local, zero-server data tool — drop CSV, Excel, Parquet, or JSON and let DuckDB handle the rest. No cloud, no API keys, no setup.

  • 🧹 Auto Deep Clean — deduplication, type healing, wide-to-long pivot detection
  • 📊 Analytics Explorer — Histogram, Bar, Scatter, Line, Box, Correlation Heatmap
  • 🖥️ Custom SQL — run any DuckDB query directly against ingested tables
  • 📦 Standalone .exe — distributable to Windows users with zero Python setup

Dagster dbt DuckDB Soda GitHub Actions

CI/CD

Modern data stack (MDS) with strong data contracts and full pipeline automation.

  • SCD Type 2 for historical change tracking
  • Fully automated CI/CD with testing & validation on every push
  • Data quality checks as deployment gates via Soda Core

Spark Airflow MinIO PostgreSQL

End-to-end distributed pipeline processing millions of records with automated data validation.

  • Processed 2.9M+ records using distributed Spark clusters
  • Simulated cloud data lake storage using MinIO (S3-compatible)
  • Automated data validation with Great Expectations

dbt PostgreSQL Docker

  • Star schema data warehouse for analytics-ready reporting
  • Modular dbt transformations with tests and lineage tracking

Airflow Docker PostgreSQL

  • Fault-tolerant ETL for real-time market data ingestion
  • Retry logic, scheduling, and containerized deployment

Azure SSIS

  • Role: Principal Data Engineer & Team Lead
  • Dimensional modeling + cloud ETL pipelines on Azure

🧠 Additional Experience

PyTorch Docker DVC

  • Fully reproducible ML pipeline with Docker + DVC + fixed seeds
  • Achieved ROC-AUC 0.9801 on blind test set

Primary focus is Data Engineering & platform reliability — ML projects demonstrate systems thinking applied to the full model lifecycle.


🏆 Achievements

  • 🌍 International Finalist (Top 10) — IDSC 2026, hosted by Universiti Putra Malaysia
  • 🇮🇩 National Semifinalist (Rank 16) — Satria Data: Big Data Challenge 2025
  • 🛢️ Pertamina Sobat Bumi Scholar — Top 2.3% from 23,313 applicants
  • ☁️ Microsoft Azure Scholarship Recipient
  • 📄 SINTA 2 Published Researcher — AI-Based Website Development Training
  • 🥇 Gold Medal, National English Competition
  • 🥈 Silver Medal, National Mathematics Competition

📊 GitHub Analytics

GitHub Stats

GitHub Streak

Activity Graph


🐍 Watch the snake eat my contributions!

Snake animation


🤝 Let's Connect

I'm a student actively looking for internship opportunities in Data Engineering. Always happy to connect, collaborate, or just geek out about pipelines!

LinkedIn Email GitHub

Semester 6 · Open to Data Engineering internships · Remote friendly

Pinned Loading

  1. glowcart glowcart Public

    End-to-end e-commerce data platform | Kafka · Airflow · Spark · dbt · Docker

    Python 2 1

  2. mallard mallard Public

    Local data warehouse + auto cleaner. Drop CSV/Excel/Parquet/JSON, let DuckDB do the work.

    Python

  3. modern-data-platform-dagster modern-data-platform-dagster Public

    A production-grade Modern Data Stack (MDS) implementation featuring automated ELT, SCD Type 2 history tracking, and CI/CD quality guardrails using Dagster, dbt Core, DuckDB, and Soda.

    Python 1

  4. nyc-taxi-pipeline-spark-airflow nyc-taxi-pipeline-spark-airflow Public

    An automated end-to-end data pipeline using Apache Airflow, Spark, and MinIO for processing NYC Taxi datasets. Features containerized infrastructure (Docker), distributed transformations, and data …

    Python