Skip to content

Add benchmark aggregates, activity sort, agent filters, and search facets#38

Draft
cursor[bot] wants to merge 1 commit into
mainfrom
cursor/product-feature-opportunities-408b
Draft

Add benchmark aggregates, activity sort, agent filters, and search facets#38
cursor[bot] wants to merge 1 commit into
mainfrom
cursor/product-feature-opportunities-408b

Conversation

@cursor

@cursor cursor Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Summary

Implements four product improvements grounded in existing artifact data and UI patterns for the LLM Debate Research dashboard.

Features

1. Benchmark run-level aggregates (detail page)

Surfaces payload.summary metrics that were stored in artifacts but not shown in the UI:

  • Consensus mean ± stddev
  • Critique max severity mean ± stddev
  • Stability mean, stddev, and min–max range

2. Activity feed sort

Adds newest first / oldest first sort to /activity and /api/activity, preserving sort in pagination and exports.

3. Agent pipeline stats filters

Scopes /agents stats to filtered runs using the same model/preset/fast/date filters as runs and activity pages. Includes empty state when filters match nothing and a link to browse matching runs.

4. Search advanced filters

Extends /search with model, preset, fast mode, and date filters. Supports filter-only searches (no text query required).

Testing

  • pnpm test — 176 tests passed
  • pnpm typecheck — passed
  • pnpm web:typecheck — passed
  • pnpm web:build — passed

Manual testing

  1. Open a benchmark detail page (e.g. /benchmarks/benchmark_1771342676099_703f78c9eaf418) — verify "Run-level aggregates" section
  2. Visit /activity?sort=oldest — confirm chronological order reverses
  3. Visit /agents?preset=research_deep — confirm stats scope to filtered runs
  4. Visit /search?model=gpt or /search?preset=standard — confirm filtered results without text query
Open in Web View Automation 

…cets

- Show consensus, critique severity, and stability aggregates on benchmark detail
- Add oldest-first sort option to activity feed and API
- Scope agent pipeline stats with model/preset/fast/date filters
- Extend search page with model, preset, fast mode, and date filters

Co-authored-by: Eamon Boyle <eamonboyle@users.noreply.github.com>
@vercel

vercel Bot commented Jun 13, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
llm-debate-research Ready Ready Preview, Comment Jun 13, 2026 8:08am

Request Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant