Skip to content
View DuqueOM's full-sized avatar

Block or report DuqueOM

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
DuqueOM/README.md

Duque Ortega Mutis - MLOps & Production ML

I build ML systems that survive production — three deployed services (GKE + EKS), measured incidents with documented root causes, and an open-source production template. 14 years of operations leadership behind the engineering.

Portfolio LinkedIn YouTube Email


Three production incidents — diagnosed from first principles, not guesswork

Three production incidents diagnosed from first principles:

 81% error rate under load  →  uvicorn --workers is anti-pattern under K8s
                                (shared CPU budget = thrashing, not parallelism)
                                Fixed: asyncio + ThreadPoolExecutor, GIL analysis
                                Result: 81% errors → 0%, 2000m CPU → 1000m

 SHAP returning all zeros   →  TreeExplainer incompatible with StackingClassifier
                                Fixed: KernelExplainer in original feature space
                                Evaluated 4 alternatives before deciding

 HPA never scales down      →  Memory-based HPA + fixed ML footprint
                                = mathematically impossible to scale down
                                Fixed: CPU-only HPA, 3→1 pods in 8 minutes

I spent 14 years running business operations — teams, vendors, budgets and customer-facing processes — before moving into ML engineering. That background is why my portfolio looks the way it does: a model is useful only when it can be deployed, monitored, explained, improved, and operated with cost discipline.

I build working ML systems, document trade-offs, measure failures, and turn those lessons into reusable patterns. Early-career in formal ML employment; experienced in ownership, pressure and making systems easier for the next person to operate.

Target roles: Junior ML Engineer · MLOps / Production ML · AI Engineer I · ML Platform / Data Engineering with ML workflows.


Strength How it shows up
Operations mindset I care about reliability, cost, handoffs, and real user impact.
Production ML fundamentals FastAPI, Docker, Kubernetes, MLflow, CI/CD, monitoring, drift detection, and model versioning.
Debugging discipline I document root causes instead of only showing final demos.
Business judgment I connect engineering decisions to cost, risk, and maintainability.
Learning velocity I use open-source projects to turn new tools into working systems.

Flagship Open-Source — ML-MLOps-Production-Template

Is a reusable foundation for teams that want safer defaults when moving machine learning services toward production.

I created it after building my portfolio and seeing the same failure patterns repeat: blocked APIs, fragile deployments, missing monitoring, unclear model promotion rules, weak secrets handling, and documentation that does not match how the system actually behaves.

The template packages those lessons into a reusable starting point:

Layer What's encoded
32 anti-patterns (D-01→D-32) Runtime · Training · Infrastructure · EDA · Security · Closed-loop monitoring
SLSA L2 supply chain Gitleaks → Trivy → Syft SBOM → Cosign keyless (OIDC) → Kyverno admission
Closed-loop monitoring Ground truth ingestion · Sliced performance · Champion/Challenger (McNemar + bootstrap ΔAUC)
Governed AI-assisted development Agent behavior protocol (AUTO/CONSULT/STOP) · audit trail · eval gates — AI coding made reviewable, not hidden
Quad-IDE native Windsurf · Claude Code · Cursor · Codex — same invariants, native config for each
28 ADRs Each decision documented with alternatives rejected and revisit triggers
# Zero to working fraud detection service in one command
git clone https://github.com/DuqueOM/ML-MLOps-Production-Template.git
cd ML-MLOps-Production-Template && make bootstrap

Template repo  |  QUICK_START.md  |  28 ADRs


Production Portfolio —ML-MLOps-Portfolio

Shows three end-to-end ML projects built beyond notebooks: model training, APIs, containers, deployment artifacts, monitoring, tests, and documented engineering decisions.

Portfolio Demo
Project What it demonstrates Main result
BankChurn Predictor Classification API, SHAP explanations, threshold tuning, model serving. AUC 0.87, 90% coverage.
NLPInsight Analyzer Financial sentiment API, honest dataset selection, CPU-friendly serving. 80.6% accuracy, 98% coverage.
ChicagoTaxi Pipeline PySpark ETL, demand forecasting, data leakage correction. R2 0.96, 6.3M rows processed.

One example from the portfolio: a load test exposed an 81% error rate in an ML API. I traced the issue to a serving pattern that created CPU contention under Kubernetes, then redesigned the inference path with asynchronous execution and a thread pool. The result dropped the error rate to 0% and reduced CPU needs.

That story matters because it shows the habit I want to bring to a junior role: measure the problem, understand the cause, fix it, and document the lesson.

📐 18 ADRs →  |  📋 Engineering Highlights →  |  📺 3min Demo →


Core Stack

Python scikit-learn XGBoost LightGBM PySpark FastAPI Docker Kubernetes Terraform GitHub Actions MLflow DVC Prometheus Grafana SHAP AWS GCP

TripleTen Data Science
14 years operations -> MLOps & Production ML


AI Transparency

I use AI-assisted coding tools — and I engineer that workflow instead of hiding it. My production template encodes the governance layer: behavior protocols (AUTO/CONSULT/STOP), path-scoped rules, an append-only audit trail and eval gates that keep AI-generated changes reviewable and bounded. Architecture, trade-off analysis, incident diagnosis, and final technical decisions are my responsibility; the tooling accelerates the rest.

I consider this a core engineering skill for 2026, not a disclaimer.


Open to entry-level and junior opportunities in MLOps, Production ML, Applied AI, and ML Platform.
Remote preferred - Mexico City (CST)

Pinned Loading

  1. ML-MLOps-Portfolio ML-MLOps-Portfolio Public

    Production-grade MLOps platform: 3 end-to-end ML projects with CI/CD, Terraform (GCP GKE) (AWS EKS), Kubernetes, MLflow, Docker, and 90-96% test coverage

    Python 3

  2. ML-MLOps-Production-Template ML-MLOps-Production-Template Public template

    Production ML template with 32 encoded anti-patterns, multi-cloud K8s, agent rules (AUTO/CONSULT/STOP), and supply-chain security for Windsurf, Claude Code, and Cursor.

    Python 2