Problem
After a sync barrier hot-loads reviewer sub-sessions for already-completed specs, all accumulated review nodes enter the ready queue simultaneously and monopolize every parallelism slot. Implementation groups are starved for extended periods despite being higher-value work.
Observed Impact
During a 12-spec orchestration run (specs 91-102) with parallel = 3:
10:01–10:27 (26 minutes): All 3 slots ran :reviewer:pre-review sessions. Zero implementation groups were scheduled. Specs 101 (groups 2-7) and 102 (groups 5-7) were entirely blocked.
Line Time Slot 1 Slot 2 Slot 3
76 10:01 91:0:reviewer:pre-review 92:0:reviewer:pre-review 93:0:reviewer:pre-review
82 10:06 91:0:reviewer:pre-review 92:0:reviewer:pre-review 94:0:reviewer:pre-review
86 10:08 94:0:reviewer:pre-review 95:0:reviewer:pre-review 96:0:reviewer:pre-review
90 10:14 97:0:reviewer:pre-review 98:0:reviewer:pre-review 99:0:reviewer:pre-review
92 10:17 97:0:reviewer:pre-review 99:0:reviewer:pre-review 100:0:reviewer:pre-review
...
97 10:27 (reviews finally drain, implementation resumes)
Overall run stats:
- 131 total nodes: 85 implementation, 46 review (35% review overhead)
- Review sessions add ~50% to wall-clock time despite being individually fast (3-5 min each)
- A single stuck
audit-review session (1h 47m) blocked all of spec 101 for nearly 2 hours
Root Cause
The scheduler treats review sub-sessions and implementation groups with equal priority. When a sync barrier triggers and reviewer injection creates sub-sessions for multiple already-completed specs, all review nodes have their dependencies satisfied simultaneously and flood the ready queue.
With parallel = 3 and 12 review nodes suddenly ready, the scheduler runs 4 rounds of 3 reviews before any implementation group gets a slot.
Suggested Mitigations
Several options, not mutually exclusive:
-
Priority-based scheduling: Give implementation groups higher priority than review sub-sessions. Reviews only run when no implementation group is ready.
-
Review concurrency cap: Limit review sessions to at most 1 of N parallel slots (e.g., with parallel = 3, reserve at least 2 slots for implementation).
-
Deferred review injection: Don't inject review sub-sessions at sync barriers for already-completed groups. Instead, inject them lazily when a slot would otherwise be idle.
-
Review batching: Instead of one review session per spec per review type, batch multiple specs into a single review session (e.g., review all 12 pre-reviews in one session that checks each sequentially).
Environment
parallel = 3
- 12 specs (91-102) active in the run
- 4 review types per spec:
skeptic, auditor, reviewer:pre-review, reviewer:audit-review
- Sync barrier triggered after specs 91-99 completed, hot-loading reviewer passes for all of them at once
Problem
After a sync barrier hot-loads reviewer sub-sessions for already-completed specs, all accumulated review nodes enter the ready queue simultaneously and monopolize every parallelism slot. Implementation groups are starved for extended periods despite being higher-value work.
Observed Impact
During a 12-spec orchestration run (specs 91-102) with
parallel = 3:10:01–10:27 (26 minutes): All 3 slots ran
:reviewer:pre-reviewsessions. Zero implementation groups were scheduled. Specs 101 (groups 2-7) and 102 (groups 5-7) were entirely blocked.Overall run stats:
audit-reviewsession (1h 47m) blocked all of spec 101 for nearly 2 hoursRoot Cause
The scheduler treats review sub-sessions and implementation groups with equal priority. When a sync barrier triggers and reviewer injection creates sub-sessions for multiple already-completed specs, all review nodes have their dependencies satisfied simultaneously and flood the ready queue.
With
parallel = 3and 12 review nodes suddenly ready, the scheduler runs 4 rounds of 3 reviews before any implementation group gets a slot.Suggested Mitigations
Several options, not mutually exclusive:
Priority-based scheduling: Give implementation groups higher priority than review sub-sessions. Reviews only run when no implementation group is ready.
Review concurrency cap: Limit review sessions to at most 1 of N parallel slots (e.g., with
parallel = 3, reserve at least 2 slots for implementation).Deferred review injection: Don't inject review sub-sessions at sync barriers for already-completed groups. Instead, inject them lazily when a slot would otherwise be idle.
Review batching: Instead of one review session per spec per review type, batch multiple specs into a single review session (e.g., review all 12 pre-reviews in one session that checks each sequentially).
Environment
parallel = 3skeptic,auditor,reviewer:pre-review,reviewer:audit-review