Benchmark Phase 1: semi-automated runner over a 3-5 task set

Child of #75. Builds on Phase 0 (#78). Turn the manual method into a semi-automated runner over a small task set.

## Goal
3 to 5 tasks, run through a runner that launches the agent, detects acceptance, times the run, and logs structured human-intervention events.

## Deliverables
- `runner`: checkout starting commit -> launch agent (GemStack | Next.js adapter) -> stream events -> poll acceptance script -> emit `report.json` `{ framework, task, runIndex, seconds, interventions, status }`.
- Two thin adapters (GemStack / Next.js) so the runner stays framework-agnostic.
- A task set of 3 to 5 tasks from the categories in the #75 design (feature add, schema change, bug fix, AI integration, refactor), each with an acceptance script.
- N runs per task (start at 5); raw `report.json` per run.

## Out of scope
Aggregation/reporting polish and the committed baseline land in Phase 2.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark Phase 1: semi-automated runner over a 3-5 task set #79

Goal

Deliverables

Out of scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Benchmark Phase 1: semi-automated runner over a 3-5 task set #79

Description

Goal

Deliverables

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions