DEMO VIDEO: link
This framework automates the complete benchmarking lifecycle for containerized AI services:
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Recipe βββββΆβ Deploy βββββΆβ Load Gen βββββΆβ Analyze βββββΆβ Report β
β (YAML) β β (Slurm) β β (Clients) β β (Metrics) β β (MD/JSON) β
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
- One-command execution from declarative YAML recipes
- Automated analysis including saturation detection and bottleneck attribution
- Full reproducibility with embedded metadata and rerun support
- Dual interface with CLI and web UI feature parity
- Real-time monitoring via Prometheus/Grafana integration
- Python 3.10+
- SSH access to MeluXina (or compatible HPC cluster)
- SSH key authentication configured
# Clone the repository
git clone https://github.com/EUMASTER4HPC/Team1_EUMASTER4HPC2526.git
cd Team1_EUMASTER4HPC2526
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # Linux/Mac
# or: venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Verify installation
python src/frontend.py --helpCreate ~/.melussh or configure SSH with host alias meluxina:
# ~/.ssh/config
Host meluxina
HostName login.lxp.lu
User your_username
IdentityFile ~/.ssh/id_meluxina# 1. Run a Redis benchmark
python src/frontend.py examples/recipe_redis.yaml
# 2. Monitor execution
python src/frontend.py --watch BM-20260112-001
# 3. Generate analysis report
python src/frontend.py --report BM-20260112-001
# 4. View results in web interface
python src/frontend.py --web
# Open http://localhost:5000python src/frontend.py --uiProvides a guided menu for benchmark operations.
| Category | Service | Port | Use Case |
|---|---|---|---|
| Inference | vLLM | 8000 | High-performance LLM serving with PagedAttention |
| Ollama | 11434 | Lightweight local LLM deployment | |
| Database | PostgreSQL | 5432 | OLTP transactional workloads |
| Redis | 6379 | In-memory caching and pub/sub | |
| MinIO | 9000 | S3-compatible object storage | |
| Vector DB | ChromaDB | 8000 | Embedding storage for RAG |
| Qdrant | 6333 | High-performance vector search |
Recipes define complete experiments in declarative YAML:
configuration:
target: meluxina
service:
type: vllm # Service type
partition: gpu # Slurm partition
num_gpus: 1 # GPU allocation
time_limit: "01:00:00" # Job time limit
settings:
model: facebook/opt-125m # Model to serve
client:
type: vllm_smoke # Client type
partition: cpu
settings:
num_requests: 100
max_tokens: 50
benchmarks:
num_clients: 4 # Concurrent clientsSee docs/RECIPE_REFERENCE.md for complete reference.
Identifies the optimal operating point using maximum curvature analysis:
# Run concurrency sweep
python src/frontend.py examples/recipe_redis.yaml --clients 1,2,4,8,16,32
# Generate sweep report
python src/frontend.py --sweep-report BM-001,BM-002,BM-003,BM-004,BM-005Outputs:
- Latency knee point (where P99 grows superlinearly)
- Throughput saturation (max sustainable RPS)
- SLO compliance limit (max concurrency under latency target)
Classifies limiting factors from resource utilization:
| Bottleneck | Indicators |
|---|---|
| GPU-bound | GPU util >80%, stable CPU, rising TTFT |
| CPU-bound | High CPU time, low GPU, stable memory |
| Memory-bound | High RSS, OOM errors, latency spikes |
| Queueing | Throughput plateau, exploding P99 |
python src/frontend.py --compare BM-001 BM-002Flags regressions when:
- P99 latency increases >10%
- Throughput decreases >10%
- Success rate drops >1%
# Benchmark Operations
python src/frontend.py <recipe.yaml> # Run benchmark
python src/frontend.py --ui # Interactive mode
# Monitoring
python src/frontend.py --list # List all benchmarks
python src/frontend.py --watch <id> # Live status
python src/frontend.py --logs <id> # View logs
python src/frontend.py --stop <id> # Cancel jobs
# Results
python src/frontend.py --collect <id> # Download artifacts
python src/frontend.py --metrics <id> # View metrics
python src/frontend.py --report <id> # Generate report
# Analysis
python src/frontend.py --compare <a> <b> # Regression detection
python src/frontend.py --sweep-report <ids> # Saturation analysis
# Web Interface
python src/frontend.py --web # Launch at :5000Launch with python src/frontend.py --web and open http://localhost:5000
Pages:
- Dashboard - Overview of all benchmarks with status
- Run Recipe - Deploy benchmarks from UI
- Benchmarks - Detailed benchmark views
- Monitoring - Prometheus/Grafana integration
- Metrics - Charts and statistics
- Reports - Generated analysis with plots
Team1_EUMASTER4HPC2526/
βββ src/
β βββ frontend.py # CLI entry point, recipe parsing
β βββ core/
β β βββ manager.py # Service/client orchestration
β β βββ aggregator.py # Metrics aggregation
β β βββ saturation.py
β β βββ bottleneck.py
β β βββ lifecycle.py # Job lifecycle management
β β βββ collector.py # Artifact collection
β βββ infra/
β β βββ communicator.py # SSH/Slurm abstraction
β β βββ storage.py # Benchmark state persistence
β βββ models/
β β βββ service.py
β β βββ client.py
β βββ builders/
β β βββ command_builders.py # Sbatch script generation
β βββ monitoring/
β β βββ manager.py # Prometheus/Grafana stack
β β βββ monitor.py # Metrics collection
β βββ reporting/
β β βββ reporter.py # Report generation
β β βββ plotting.py # Chart generation
β β βββ artifacts.py
β βββ web/
β βββ flask_app.py # Web interface
βββ examples/ # Recipe templates
βββ measurements/ # Benchmark campaigns
βββ results/ # Benchmark artifacts
βββ reports/ # Generated reports
βββ docs/ # Documentation
β βββ RECIPE_REFERENCE.md # Recipe format reference
β βββ methodology.md # Benchmarking methodology
βββ scripts/ # Automation scripts
βββ requirements.txt # Python dependencies
βββ README.md
Each benchmark produces:
results/<benchmark_id>/
βββ run.json # Complete metadata + embedded recipe
βββ requests.jsonl # Per-request timing (microsecond precision)
βββ summary.json # Aggregated metrics
βββ logs/ # Service and client logs
reports/<benchmark_id>/
βββ report.md
βββ report.json
βββ plots/ # Visualization PNGs
βββ latency_percentiles.png
βββ throughput_timeline.png
βββ success_rate.png
Every benchmark is fully reproducible:
# Rerun with identical configuration
python src/frontend.py --rerun BM-20260112-001Captured metadata includes:
- Complete YAML recipe (embedded)
- Container image digests
- Slurm job IDs and node allocations
- Timestamps for all lifecycle events
| Component | Technology |
|---|---|
| Language | Python 3.10+ |
| Cluster Communication | Fabric, Paramiko (SSH) |
| Job Scheduling | Slurm |
| Containerization | Apptainer |
| Web Framework | Flask |
| Visualization | Matplotlib, Chart.js |
| Monitoring | Prometheus, Grafana |