Skip to content

Add additional Model Runs#13

Open
AnkitaNaik wants to merge 11 commits into
mainfrom
dev/add_models
Open

Add additional Model Runs#13
AnkitaNaik wants to merge 11 commits into
mainfrom
dev/add_models

Conversation

@AnkitaNaik
Copy link
Copy Markdown
Collaborator

@AnkitaNaik AnkitaNaik commented Apr 27, 2026

  1. Adding the following models to the harness
  • Qwen/Qwen3.5-397B-A17B-FP8
  • Mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4
  • meta-llama/llama-3-3-70b-instruct
  • Qwen/Qwen2.5-72B-Instruct
  • KimiK2
  1. Add a script to automate the model runs - Script now runs the benchmark_runner as per the task. Retries failed domains thrice and runs the evaluator after the benchmark_runner finishes.

@AnkitaNaik AnkitaNaik force-pushed the dev/add_models branch 3 times, most recently from 46c6a51 to acbdc34 Compare April 27, 2026 18:45
@@ -0,0 +1,499 @@
#!/usr/bin/env python3
from __future__ import annotations
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing usage block

also, is this a script for our own benchmarking? if so, can we move it into "scripts" maybe?

if the idea is to expose this to the users, we should add the right documentation etc.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This is an internal usage script. Will move it under scripts.

Copy link
Copy Markdown
Collaborator

@anupamamurthi anupamamurthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, requested small/minor changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants