-
Notifications
You must be signed in to change notification settings - Fork 6
Pull requests: nearai/benchmarks
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
docs(benchmark): running /benchmark with DeepSeek + on the reborn runtime
#106
opened Jun 15, 2026 by
pranavraja99
Contributor
Loading…
feat(analyze): automated benchmark-failure → harness-fix suggester
#98
opened Jun 13, 2026 by
pranavraja99
Contributor
Loading…
fix(reborn): mount per-task workspace at /workspace + use scoped paths
#97
opened Jun 12, 2026 by
pranavraja99
Contributor
Loading…
feat(clawbench): ClawBench reborn-yolo batch runner (spike)
#68
opened Jun 10, 2026 by
pranavraja99
Contributor
•
Draft
feat(reborn): run ironclaw via the reborn runtime (PlannedDriver) — WIP
#66
opened Jun 9, 2026 by
pranavraja99
Contributor
•
Draft
feat(automation-workflows): add self-learning/ benchmarks for self-learning-loops brief
#60
opened Jun 8, 2026 by
sergeiest
Contributor
Loading…
2 tasks
feat(automation-workflows): add secrets/ benchmarks for vault-backed http calls from Slack DM
#55
opened Jun 5, 2026 by
sergeiest
Contributor
Loading…
3 tasks
feat(automation-workflows): add channels/slack benchmarks for Slack channel routing
#54
opened Jun 5, 2026 by
sergeiest
Contributor
Loading…
3 tasks
Reborn: simulated-company eval harness — PRD + golden validator + default-all tools
#50
opened Jun 4, 2026 by
pranavraja99
Contributor
Loading…
chore(baselines): refresh ironclaw baseline to 749f584 ⚠️ pass_rate shifted >5pp
automated
baselines
#47
opened May 29, 2026 by
github-actions
Bot
Loading…
chore(baselines): refresh ironclaw baseline to 9df5e8d ⚠️ pass_rate shifted >5pp
automated
baselines
#40
opened May 27, 2026 by
github-actions
Bot
Loading…
feat: Build abstraction — score configs (framework + tools + model), not frameworks
#25
opened May 8, 2026 by
pranavraja99
Contributor
Loading…
4 of 6 tasks
feat: structured-data benchmark, 22 scenarios across 4 household categories
#14
opened Apr 10, 2026 by
standardtoaster
Loading…
4 of 5 tasks
feat: workspace.documents seeding, memory tools, tool restriction, numeric assertions
#13
opened Apr 9, 2026 by
standardtoaster
Loading…
4 tasks done
feat: workplace simulation benchmark suite
#11
opened Mar 31, 2026 by
ilblackdragon
Member
•
Draft
2 of 4 tasks
ProTip!
Follow long discussions with comments:>50.