Skip to content
View federico1809's full-sized avatar

Block or report federico1809

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
federico1809/README.md

Federico Ceballos Torres

Data Scientist | End-to-End ML · Financial Risk & RecSys · AWS
Python · SQL · LightGBM · XGBoost

I build end-to-end data and machine learning solutions for decision-making, with a strong focus on validation, reproducibility, and production readiness.

My background combines:

  • 5+ years in QA Engineering and Scrum Master roles, where I developed structured validation habits and a systems mindset.
  • Hands-on work across the full data workflow: SQL extraction, EDA, feature engineering, supervised learning, evaluation, and deployment.

Featured Projects

End-to-end ML system for loan default prediction.

What it includes:

  • EDA, feature engineering, and model development with XGBoost.
  • Threshold optimization framed as a business cost minimization problem.
  • SHAP explainability, Optuna tuning, and Docker-based reproducibility.
  • Projected 38.3% cost reduction ($16.8M).

Tech: Python · XGBoost · SHAP · Optuna · Docker · scikit-learn


Production-oriented recommendation system built on the Instacart dataset (~2M transactions).

What it includes:

  • LightGBM model with Optuna tuning.
  • F1: 0.42 · AUC-ROC: 0.82 · +296% uplift vs. popularity baseline.
  • REST API deployed on AWS ECS Fargate.
  • Streamlit app, MLflow experiment tracking, CI/CD, and automated drift monitoring (PSI/KS).
  • Dual inference logic with cold-start fallback for full service availability.

Tech: Python · LightGBM · FastAPI · Streamlit · MLflow · Docker · AWS · PostgreSQL


Production-oriented ML pipeline focused on reproducibility and maintainability.

What it includes:

  • Dockerized training environment.
  • Model versioning and reproducible workflows.
  • CI/CD automation with GitHub Actions.
  • Modular project structure.

Tech: Python · scikit-learn · Docker · GitHub Actions

What I do best

  • Transform business problems into measurable ML tasks in order to predict.
  • Build reproducible and testable data workflows.
  • Combine analytical rigor with engineering discipline.
  • Bridge the gap between QA thinking and ML engineering — catching edge cases before they become production failures.

Background

QA Engineer & Scrum Master

I spent more than five years in QA and Agile environments, which trained me to think in terms of edge cases, traceability, defect patterns, and delivery reliability. That perspective now shapes how I build data products.

Political Science

My academic background strengthened my research mindset, quantitative reasoning, and ability to interpret complex systems.

Contact

Resume

If you're reviewing my profile for a Data Science, Data Analyst, or ML-focused role, my resume is available on LinkedIn.

Pinned Loading

  1. world-cup-2026-predictor world-cup-2026-predictor Public

    Jupyter Notebook

  2. insight-commerce-recsys insight-commerce-recsys Public

    Next Basket Recommendation system built on the Instacart dataset. Full ETL pipeline, LightGBM model (F1 ~0.42 · AUC-ROC ~0.82), REST API deployed on AWS ECS Fargate, and interactive Streamlit front…

    Jupyter Notebook

  3. credit-risk-modeling credit-risk-modeling Public

    End-to-end ML system for loan default prediction with interpretable models and production-ready code

    Jupyter Notebook 1

  4. air-quality-forecast air-quality-forecast Public

    Jupyter Notebook