Vincent Gimenes VincentG1234

Vincent Gimenes

Machine Learning Engineer — LLM Systems, Inference & Applied AI

I design, deploy, and optimize production AI systems with a focus on LLM inference, performance engineering, and real-world deployment.

My work sits at the intersection of:

Large Language Models & agents
inference performance & GPU efficiency
ML systems & cloud infrastructure
applied AI for document intelligence & decision workflows

🧠 Core Expertise

LLM Systems & Inference

vLLM deployment & optimization
speculative decoding & batching strategies
KV-cache monitoring & memory pressure mitigation
latency & throughput optimization
GPU scheduling & orchestration

Applied AI & Retrieval Systems

RAG agents & document intelligence
structured information extraction
evaluation pipelines & schema design
prompt & system design

ML Engineering & Infrastructure

Kubernetes, Docker, ArgoCD
GPU workloads & CUDA environments
monitoring & performance profiling
scalable API deployment

🛠 Technical Stack

Languages Python • C++ • Bash • JS

AI & ML PyTorch • Transformers • vLLM • Triton • scikit-learn

Infrastructure & MLOps Docker • Kubernetes • ArgoCD • Linux • Git

✍️ Writing Blog posts for Quickscale AI

Fine-tuning small Vision-Language models for structured extraction
Context window scaling & memory implications
Deploying GPT-OSS-20B with vLLM
Training & workshops on LLM agents and deployment

🎓 Education

ENSAE Paris — Institut Polytechnique de Paris Engineering Program

Focus areas: Advanced & Bayesian Statistics • Machine Learning • Optimization • Econometrics • Parallel Computing • Deep Learning • NLP

⭐ Indi Projects

🖥 FloatPilot — Desktop LLM Client

Lightweight always-on-top AI assistant

Instant screenshot capture injected into LLM context
Global shortcut & frictionless workflow
Public distribution with landing page

➡️ https://floatpilot.app

🏆 Hackathon Project

Hackathon Name	Description	Technologies Used	Link
Hackathon Banque de France (WINNER)	Design a solution that automatically identifies legal topics of interest currently handled by the business, based on documentation, and generates legal monitoring content on these topics (such as articles, news, codes of conduct, and European legislation) to be distributed via a newsletter.	Python, Azure, React, RSS flux, GPT API, TF-IDF	forbidden to share the solution
H-Gen AI 2025 (WINNER)	document analysis tool developed for Gide, a leading international law firm. The application streamlines the audit process by automatically analyzing PDF documents using Large Language Models (LLM) and generating structured audit reports in Word format based on predefined templates.	Python, AWS, RAG	GitHub Repo
H!Paris	model trained to predict water levels in water tables over time.	Python, XGboost, LSTM	None

📂 Major Projects

Here are some of the key projects I've worked on:

Project Name	Description	Technologies Used	Link
Investment opportunities identification - Ardian	Implementation of a search engine leveraging BERT and additional data to identify firms with high acquisition potential.	Python, BERT, Azure	GitHub Repo
Document Chat Application (RAG)	An intelligent web application that enables users to upload documents and engage in conversations about their content using advanced Large Language Model technology. Built with FastAPI, Firebase authentication, and OpenAI's GPT models.	OpenIA, FastAPI, Docker, K8s	Github Repo
Analysis of LLM at small scale - INRIA	Implementation and training of small-scale language models (<100M parameters) using the Transformers library on AWS cloud with GPU, trained on the full English Wikipedia	Python, Transformers, Pytorch, wandb.ai	GitHub Repo

🛠️ Other Projects

Project Name	Description	Technologies Used	Link
Double Descent	The project explores the double descent phenomenon, where test error improves after overparameterization, using linear regression, RFF, and neural networks. Experiments confirm that implicit biases enable overparameterized models to generalize effectively, challenging traditional overfitting views.	Python, Pytorch, git	GitHub Repo
Bayesian Statistics: Optimal Bayesian Estimation of t-Student Mixtures with a Growing Number of Components	The project extends Bayesian estimation for Gaussian mixture models to t-Student mixtures, leveraging their suitability for heavy-tailed data. While theoretical challenges arise due to the t-Student's heavy tails, empirical simulations show Bayesian methods perform robustly, particularly in scenarios with complex or heavy-tailed distributions, making them valuable for real-world applications.	Python, git	GitHub Repo
Time Series Analysis of the French industrial Production Index for Electricity Production	Data cleaning, transformation to stationnarity model selection and validation using ARMA and ARIMA models	R	GitHub Repo
Sentiment Analysis	Web scraping to extract data, followed by sentiment analysis of the top 100 box office films	Python, Selenium, NLTK, SpaCy, Scikit-learn, Pandas	GitHub Repo

🌍 Languages

French (native) English (B2–C1) Russian (B1)

🚀 Get In Touch

LinkedIn: My LinkedIn
Email: Vincent.gimenes@gmail.com

Feel free to explore the repository and reach out if you’d like to collaborate or discuss exciting ideas!

Application is the alchemy that transforms your acquired knowledge into gold

Provide feedback

Saved searches

Use saved searches to filter your results more quickly