Skip to content

Add benchmark results: 6-scenario replay batches, Gemini pro/lite rej…

50ea61b
Select commit
Loading
Failed to load commit list.
Draft

Cross-model debate harness research, policy, and scenarios #6

Add benchmark results: 6-scenario replay batches, Gemini pro/lite rej…
50ea61b
Select commit
Loading
Failed to load commit list.