Aspiring Data Engineer focused on building robust ETL/ELT pipelines and scalable data architectures.
- Core: Python, SQL
- Data Engineering: PySpark, dbt Core, DuckDB, Parquet
- Orchestration & DevOps: Apache Airflow, Docker, Git
- Visualization: Streamlit, Matplotlib, Seaborn
- Crypto Anomalies Lakehouse – Local lakehouse orchestrated by Airflow. Uses PySpark for ingestion, dbt + DuckDB for Star Schema modeling, and Streamlit for anomaly visualization. Fully containerized.
- YouTube Sentiment ETL – ETL pipeline fetching YouTube comments, routing them through local Hugging Face NLP models for translation/sentiment analysis, and loading structured data into PostgreSQL.
- LoL Pro-Play SQL Analytics – Advanced esports analytics using advanced PostgreSQL queries (window functions, CTEs) to discover meta shifts and player economics from a massive dataset.
📫 Connect with me: LinkedIn | babecjakub@gmail.com