Skip to content
View cacelass's full-sized avatar

Block or report cacelass

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
cacelass/README.md

Alex | ML Systems Engineer

I build production-ready machine learning systems — from raw data to deployed models — designed to be reproducible, testable, and operational in real environments.

Most ML projects fail outside the notebook. I focus on production constraints: data quality, reproducibility, evaluation correctness, and system reliability.

Background in systems administration (ASIR). I design ML systems with infrastructure, failure modes, and performance in mind before model complexity.


What I deliver

  • End-to-end ML pipelines (ingestion → validation → feature engineering → training → evaluation → inference)
  • Data pipelines with schema validation, quality checks, and leakage prevention
  • Reproducible environments with versioned data and locked dependencies (uv, Docker)
  • Calibrated probability outputs (Brier score, reliability curves), not raw model scores
  • Time-aware evaluation frameworks (walk-forward / stratified CV depending on problem type)
  • ML systems decoupled from business decision logic
  • Batch inference pipelines designed for scheduled production workloads

Tech Stack

Python SQL Docker Linux Azure Git

HDFS Hive Sqoop

Pandas NumPy Scikit-learn PyTorch Polars

Power BI

Certified: Microsoft Azure Data Fundamentals · Power BI (DAX)
Credly


Featured Projects

dskit — Reproducible ML project scaffold

Production-grade ML template designed to eliminate environment drift and enforce consistent project structure.

Why it matters
Most ML failures are not model failures — they are reproducibility and data consistency failures.

What it enforces

  • Strict project structure (data/, features/, models/, pipelines/)
  • Dependency locking with uv
  • Documentation system with Sphinx
  • Pandas / Polars interoperability

Result
Faster setup, consistent engineering standards, zero environment ambiguity.


credit-risk-classifier — Credit risk scoring system

ML system designed for real decision-making, focused on calibrated probabilities instead of raw predictions.

Key decisions

  • Logistic Regression + Random Forest for interpretability vs performance trade-off
  • Probability calibration (Platt scaling / isotonic regression)
  • Decision threshold decoupled from model (business layer owns decision policy)

Evaluation

  • Stratified k-fold cross-validation
  • Brier score + AUC as primary metrics
  • Strict leakage prevention across time and folds

Result
AUC: 0.81 with calibrated outputs suitable for operational decision systems.


stock-market-prediction — Time series under real constraints

ML applied to a non-stationary, low signal-to-noise environment under realistic constraints.

What most people do wrong
Random splits → leakage → inflated performance

What this project enforces

  • Walk-forward validation (deployment simulation)
  • Baseline-first evaluation discipline
  • Strict no-leakage constraints

Result
Marginal improvement over baseline, consistent with efficient market behavior.


Positioning

I design ML systems that remain stable under real-world constraints: shifting data distributions, imperfect labels, and production latency.

I don’t optimize notebooks. I design systems that survive production.


About me

I enjoy learning new technologies and adapting quickly to different problem domains. I’m comfortable working across the full ML stack and iterating on systems from prototype to production.

Pinned Loading

  1. dskit dskit Public

    Template copier para proyectos de Data Science con uv, Sphinx y estructura modular

    Python 3

  2. Stock-Market-Prediction Stock-Market-Prediction Public

    Proyecto de Machine Learning aplicado a series temporales financieras que explora la predicción de movimientos en bolsa, destacando los retos del overfitting y la alta incertidumbre del mercado

    Jupyter Notebook

  3. credit-risk-classifier credit-risk-classifier Public

    Sistema de Machine Learning supervisado que predice la probabilidad de concesión de crédito a clientes a partir de datos demográficos e históricos

    Jupyter Notebook

  4. global-exclusion-risk-ml global-exclusion-risk-ml Public

    Análisis de indicadores socioeconómicos mediante clustering no supervisado para segmentar países según su nivel real de desarrollo y riesgo de exclusión, sin depender de clasificaciones tradicionales.

    Jupyter Notebook

  5. retainml retainml Public

    Sistema de Machine Learning supervisado que predice la probabilidad de abandono de clientes a partir de datos demográficos, de uso e interacción, permitiendo identificar riesgos y optimizar estrate…

    Jupyter Notebook

  6. rps-predictive-agent rps-predictive-agent Public

    Implementación en Python de un agente inteligente para Piedra-Papel-Tijeras que utiliza predicción por frecuencia para detectar patrones del oponente y optimizar sus decisiones, siguiendo el modelo…

    Python