Model Evaluation Workshop

Accuracy scores and leaderboard metrics look impressive—but production-grade AI requires evals that reflect real-world performance, reliability, and user happiness. Traditional benchmarks rarely help you understand how your LLM will perform when embedded in complex workflows or agentic systems. How can you realistically and adequately measure reasoning quality, agent consistency, MCP integration, and user-focused outcomes?

In this practical, example-driven workshop, we'll go beyond standard benchmarks and dive into tangible evaluation strategies using various open-source frameworks like GuideLLM and lm-eval-harness. You'll see concrete examples of how to create custom eval suites tailored to your use case, integrate human-in-the-loop feedback effectively, and implement agent reliability checks that reflect production conditions. Walk away with actionable insights and best practices for evaluating and improving your LLMs, ensuring they meet real-world expectations—not just leaderboard positions!

Preparing Your System

Follow the below instructions according to your system.

Red Hat Enterprise Linux system on RHDP

NVIDIA GPUs

NVIDIA drivers are installed, but the NVIDIA container toolkit is needed.

Install NVIDIA Container Toolkit

curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf config-manager --enable nvidia-container-toolkit-experimental
sudo dnf install -y nvidia-container-toolkit podman
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Activity1		Activity1
Activity2		Activity2
Activity3		Activity3
Artifacts/promptfoo_harmbench		Artifacts/promptfoo_harmbench
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Evaluation Workshop

Preparing Your System

Red Hat Enterprise Linux system on RHDP

Install NVIDIA Container Toolkit

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Model Evaluation Workshop

Preparing Your System

Red Hat Enterprise Linux system on RHDP

Install NVIDIA Container Toolkit

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages