Problem
Copilot noted a remaining edge case in PR #266: the backfill query currently uses
LIMIT remaining_limit, but the Rust packing pass can still skip some of those returned rows.
That means the server may return fewer jobs than it could have if the top remaining_limit
backfill candidates do not pack together, even though lower-ranked candidates outside that limited
window would fit the remaining resources.
Example shape:
remaining CPU = 4
remaining_limit = 4
backfill candidates returned by SQL:
job A: 3 CPU
job B: 3 CPU
job C: 3 CPU
job D: 3 CPU
Rust claims A, then skips B/C/D because only 1 CPU remains.
Lower-ranked 1-CPU jobs may exist, but the backfill query did not fetch them.
Scope
This is distinct from the GPU-saturation paging fix in PR #266. That PR keeps the query bounded and
addresses the observed case where a primary page is dominated by higher-priority GPU jobs and
lower-priority CPU jobs can fill leftover CPU capacity.
Possible approaches
- Over-fetch a bounded multiple of
remaining_limit, with a reasonable cap.
- Make the backfill pass iterative/page-based until either the claim limit is met, resources are
saturated, or a maximum number of backfill pages has been scanned.
- Add instrumentation first to see whether skips in the backfill pass are common enough to justify a
broader heuristic.
Acceptance criteria
- Add a regression test where the first backfill window contains candidates that individually fit the
SQL remaining-resource filters but do not pack together, while lower-ranked candidates would fit.
- Keep total SQL work bounded.
- Preserve existing priority ordering and scheduler fallback behavior.
Problem
Copilot noted a remaining edge case in PR #266: the backfill query currently uses
LIMIT remaining_limit, but the Rust packing pass can still skip some of those returned rows.That means the server may return fewer jobs than it could have if the top
remaining_limitbackfill candidates do not pack together, even though lower-ranked candidates outside that limited
window would fit the remaining resources.
Example shape:
Scope
This is distinct from the GPU-saturation paging fix in PR #266. That PR keeps the query bounded and
addresses the observed case where a primary page is dominated by higher-priority GPU jobs and
lower-priority CPU jobs can fill leftover CPU capacity.
Possible approaches
remaining_limit, with a reasonable cap.saturated, or a maximum number of backfill pages has been scanned.
broader heuristic.
Acceptance criteria
SQL remaining-resource filters but do not pack together, while lower-ranked candidates would fit.