Skip to content
View kubuka's full-sized avatar
🤓
🤓

Block or report kubuka

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
kubuka/README.md

Hi, I'm Kuba! 👋

Aspiring Data Engineer focused on building robust ETL/ELT pipelines and scalable data architectures.

🚀 Technical Stack

  • Core: Python, SQL
  • Data Engineering: PySpark, dbt Core, DuckDB, Parquet
  • Orchestration & DevOps: Apache Airflow, Docker, Git
  • Visualization: Streamlit, Matplotlib, Seaborn

🛠️ Featured Projects

  • Crypto Anomalies Lakehouse – Local lakehouse orchestrated by Airflow. Uses PySpark for ingestion, dbt + DuckDB for Star Schema modeling, and Streamlit for anomaly visualization. Fully containerized.
  • YouTube Sentiment ETL – ETL pipeline fetching YouTube comments, routing them through local Hugging Face NLP models for translation/sentiment analysis, and loading structured data into PostgreSQL.
  • LoL Pro-Play SQL Analytics – Advanced esports analytics using advanced PostgreSQL queries (window functions, CTEs) to discover meta shifts and player economics from a massive dataset.

📫 Connect with me: LinkedIn | babecjakub@gmail.com

Pinned Loading

  1. crypto-anomaly-detector crypto-anomaly-detector Public

    Local data pipeline that automatically fetches crypto prices and detects market anomalies using Spark and dbt on DuckDB. Everything is orchestrated by Airflow, fully containerized with Docker, and …

    Python

  2. youtube-gaming-sentiment youtube-gaming-sentiment Public

    An ETL pipeline that fetches YouTube video comments via API and processes them using local Hugging Face NLP models for multi-language sentiment analysis. The structured data is loaded into a Postgr…

    Python

  3. lol-proplay-analysis lol-proplay-analysis Public

    Advanced esports analytics pipeline that uses PostgreSQL to process a massive dataset of professional League of Legends matches.

    Python