Data Scientist | End-to-End ML · Financial Risk & RecSys · AWS
Python · SQL · LightGBM · XGBoost
I build end-to-end data and machine learning solutions for decision-making, with a strong focus on validation, reproducibility, and production readiness.
My background combines:
- 5+ years in QA Engineering and Scrum Master roles, where I developed structured validation habits and a systems mindset.
- Hands-on work across the full data workflow: SQL extraction, EDA, feature engineering, supervised learning, evaluation, and deployment.
End-to-end ML system for loan default prediction.
What it includes:
- EDA, feature engineering, and model development with XGBoost.
- Threshold optimization framed as a business cost minimization problem.
- SHAP explainability, Optuna tuning, and Docker-based reproducibility.
- Projected 38.3% cost reduction ($16.8M).
Tech: Python · XGBoost · SHAP · Optuna · Docker · scikit-learn
Production-oriented recommendation system built on the Instacart dataset (~2M transactions).
What it includes:
- LightGBM model with Optuna tuning.
- F1: 0.42 · AUC-ROC: 0.82 · +296% uplift vs. popularity baseline.
- REST API deployed on AWS ECS Fargate.
- Streamlit app, MLflow experiment tracking, CI/CD, and automated drift monitoring (PSI/KS).
- Dual inference logic with cold-start fallback for full service availability.
Tech: Python · LightGBM · FastAPI · Streamlit · MLflow · Docker · AWS · PostgreSQL
Production-oriented ML pipeline focused on reproducibility and maintainability.
What it includes:
- Dockerized training environment.
- Model versioning and reproducible workflows.
- CI/CD automation with GitHub Actions.
- Modular project structure.
Tech: Python · scikit-learn · Docker · GitHub Actions
- Transform business problems into measurable ML tasks in order to predict.
- Build reproducible and testable data workflows.
- Combine analytical rigor with engineering discipline.
- Bridge the gap between QA thinking and ML engineering — catching edge cases before they become production failures.
I spent more than five years in QA and Agile environments, which trained me to think in terms of edge cases, traceability, defect patterns, and delivery reliability. That perspective now shapes how I build data products.
My academic background strengthened my research mindset, quantitative reasoning, and ability to interpret complex systems.
- LinkedIn: https://www.linkedin.com/in/federico-ceballos-torres/
- GitHub: https://github.com/federico1809
- Email: federico.ct@gmail.com
If you're reviewing my profile for a Data Science, Data Analyst, or ML-focused role, my resume is available on LinkedIn.

