Skip to content

Review sub-sessions saturate parallelism slots, starving implementation progress #339

@mickume

Description

@mickume

Problem

After a sync barrier hot-loads reviewer sub-sessions for already-completed specs, all accumulated review nodes enter the ready queue simultaneously and monopolize every parallelism slot. Implementation groups are starved for extended periods despite being higher-value work.

Observed Impact

During a 12-spec orchestration run (specs 91-102) with parallel = 3:

10:01–10:27 (26 minutes): All 3 slots ran :reviewer:pre-review sessions. Zero implementation groups were scheduled. Specs 101 (groups 2-7) and 102 (groups 5-7) were entirely blocked.

Line  Time     Slot 1                    Slot 2                    Slot 3
 76   10:01    91:0:reviewer:pre-review  92:0:reviewer:pre-review  93:0:reviewer:pre-review
 82   10:06    91:0:reviewer:pre-review  92:0:reviewer:pre-review  94:0:reviewer:pre-review
 86   10:08    94:0:reviewer:pre-review  95:0:reviewer:pre-review  96:0:reviewer:pre-review
 90   10:14    97:0:reviewer:pre-review  98:0:reviewer:pre-review  99:0:reviewer:pre-review
 92   10:17    97:0:reviewer:pre-review  99:0:reviewer:pre-review  100:0:reviewer:pre-review
 ...
 97   10:27    (reviews finally drain, implementation resumes)

Overall run stats:

  • 131 total nodes: 85 implementation, 46 review (35% review overhead)
  • Review sessions add ~50% to wall-clock time despite being individually fast (3-5 min each)
  • A single stuck audit-review session (1h 47m) blocked all of spec 101 for nearly 2 hours

Root Cause

The scheduler treats review sub-sessions and implementation groups with equal priority. When a sync barrier triggers and reviewer injection creates sub-sessions for multiple already-completed specs, all review nodes have their dependencies satisfied simultaneously and flood the ready queue.

With parallel = 3 and 12 review nodes suddenly ready, the scheduler runs 4 rounds of 3 reviews before any implementation group gets a slot.

Suggested Mitigations

Several options, not mutually exclusive:

  1. Priority-based scheduling: Give implementation groups higher priority than review sub-sessions. Reviews only run when no implementation group is ready.

  2. Review concurrency cap: Limit review sessions to at most 1 of N parallel slots (e.g., with parallel = 3, reserve at least 2 slots for implementation).

  3. Deferred review injection: Don't inject review sub-sessions at sync barriers for already-completed groups. Instead, inject them lazily when a slot would otherwise be idle.

  4. Review batching: Instead of one review session per spec per review type, batch multiple specs into a single review session (e.g., review all 12 pre-reviews in one session that checks each sequentially).

Environment

  • parallel = 3
  • 12 specs (91-102) active in the run
  • 4 review types per spec: skeptic, auditor, reviewer:pre-review, reviewer:audit-review
  • Sync barrier triggered after specs 91-99 completed, hot-loading reviewer passes for all of them at once

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions