Machine Learning Engineer β LLM Systems, Inference & Applied AI
I design, deploy, and optimize production AI systems with a focus on LLM inference, performance engineering, and real-world deployment.
My work sits at the intersection of:
- Large Language Models & agents
- inference performance & GPU efficiency
- ML systems & cloud infrastructure
- applied AI for document intelligence & decision workflows
- vLLM deployment & optimization
- speculative decoding & batching strategies
- KV-cache monitoring & memory pressure mitigation
- latency & throughput optimization
- GPU scheduling & orchestration
- RAG agents & document intelligence
- structured information extraction
- evaluation pipelines & schema design
- prompt & system design
- Kubernetes, Docker, ArgoCD
- GPU workloads & CUDA environments
- monitoring & performance profiling
- scalable API deployment
Languages Python β’ C++ β’ Bash β’ JS
AI & ML PyTorch β’ Transformers β’ vLLM β’ Triton β’ scikit-learn
Infrastructure & MLOps Docker β’ Kubernetes β’ ArgoCD β’ Linux β’ Git
- Fine-tuning small Vision-Language models for structured extraction
- Context window scaling & memory implications
- Deploying GPT-OSS-20B with vLLM
- Training & workshops on LLM agents and deployment
ENSAE Paris β Institut Polytechnique de Paris Engineering Program
Focus areas: Advanced & Bayesian Statistics β’ Machine Learning β’ Optimization β’ Econometrics β’ Parallel Computing β’ Deep Learning β’ NLP
Lightweight always-on-top AI assistant
- Instant screenshot capture injected into LLM context
- Global shortcut & frictionless workflow
- Public distribution with landing page
β‘οΈ https://floatpilot.app
| Hackathon Name | Description | Technologies Used | Link |
|---|---|---|---|
| Hackathon Banque de France (WINNER) | Design a solution that automatically identifies legal topics of interest currently handled by the business, based on documentation, and generates legal monitoring content on these topics (such as articles, news, codes of conduct, and European legislation) to be distributed via a newsletter. | Python, Azure, React, RSS flux, GPT API, TF-IDF | forbidden to share the solution |
| H-Gen AI 2025 (WINNER) | document analysis tool developed for Gide, a leading international law firm. The application streamlines the audit process by automatically analyzing PDF documents using Large Language Models (LLM) and generating structured audit reports in Word format based on predefined templates. | Python, AWS, RAG | GitHub Repo |
| H!Paris | model trained to predict water levels in water tables over time. | Python, XGboost, LSTM | None |
Here are some of the key projects I've worked on:
| Project Name | Description | Technologies Used | Link |
|---|---|---|---|
| Investment opportunities identification - Ardian | Implementation of a search engine leveraging BERT and additional data to identify firms with high acquisition potential. | Python, BERT, Azure | GitHub Repo |
| Document Chat Application (RAG) | An intelligent web application that enables users to upload documents and engage in conversations about their content using advanced Large Language Model technology. Built with FastAPI, Firebase authentication, and OpenAI's GPT models. | OpenIA, FastAPI, Docker, K8s | Github Repo |
| Analysis of LLM at small scale - INRIA | Implementation and training of small-scale language models (<100M parameters) using the Transformers library on AWS cloud with GPU, trained on the full English Wikipedia | Python, Transformers, Pytorch, wandb.ai | GitHub Repo |
| Project Name | Description | Technologies Used | Link |
|---|---|---|---|
| Double Descent | The project explores the double descent phenomenon, where test error improves after overparameterization, using linear regression, RFF, and neural networks. Experiments confirm that implicit biases enable overparameterized models to generalize effectively, challenging traditional overfitting views. | Python, Pytorch, git | GitHub Repo |
| Bayesian Statistics: Optimal Bayesian Estimation of t-Student Mixtures with a Growing Number of Components | The project extends Bayesian estimation for Gaussian mixture models to t-Student mixtures, leveraging their suitability for heavy-tailed data. While theoretical challenges arise due to the t-Student's heavy tails, empirical simulations show Bayesian methods perform robustly, particularly in scenarios with complex or heavy-tailed distributions, making them valuable for real-world applications. | Python, git | GitHub Repo |
| Time Series Analysis of the French industrial Production Index for Electricity Production | Data cleaning, transformation to stationnarity model selection and validation using ARMA and ARIMA models | R | GitHub Repo |
| Sentiment Analysis | Web scraping to extract data, followed by sentiment analysis of the top 100 box office films | Python, Selenium, NLTK, SpaCy, Scikit-learn, Pandas | GitHub Repo |
French (native) English (B2βC1) Russian (B1)
- LinkedIn: My LinkedIn
- Email: Vincent.gimenes@gmail.com
Feel free to explore the repository and reach out if youβd like to collaborate or discuss exciting ideas!
Application is the alchemy that transforms your acquired knowledge into gold

