Summary
Add support in torc for job runners to prefetch multiple jobs from the server and execute them sequentially from a local in-memory queue, instead of fetching only a single job at a time.
This is intended to reduce idle time on node resources, especially GPUs, between short-running jobs.
Problem
Today, a TORC job runner works roughly like this:
- Poll the server for work
- Receive a single job
- Run the job
- Report completion back to the server
- Wait until the next polling cycle to request more work
Because polling happens on a configurable interval, there can be a gap between when one job finishes and when the next job begins. During that time, node resources are idle.
This is particularly noticeable for short-running jobs. In my current workload, individual jobs run for about 5 seconds. When running across 10 nodes, and potentially scaling to 200 nodes, the fetch/report/poll cycle can leave GPUs underutilized.
Requested Feature
Allow each job runner to request multiple eligible jobs at once and keep them in a local in-memory queue.
The runner should then:
- start the next queued job immediately after the current one finishes
- continue executing queued jobs sequentially
- periodically check back in with the server
- request additional work before it becomes idle, when possible
This should help keep resources busy continuously instead of waiting on the next poll cycle after every single job.
Desired Behavior
- A runner can prefetch more than one job at a time
- Prefetched jobs are stored in memory on the runner
- Jobs are executed one after another as runner resources become available
- Batch size should be configurable
- Prefetch amount should take available resources into account
For example:
- if a runner has 4 GPUs, it should be able to request enough work to keep those GPUs occupied for multiple job durations
- torc already has resource-matching logic, so the runner should continue receiving only jobs appropriate for its available resources
Why This Matters
The current one-job-at-a-time model introduces unnecessary idle time between jobs, which is especially costly for short simulations.
Expected benefits:
- higher GPU utilization
- reduced idle time between jobs
- fewer poll cycles per unit of work
- better throughput for short-duration workloads
- less resource waste on busy clusters
The main success metric for this feature would be improved GPU utilization, ideally keeping GPUs as close to 100% utilized as possible.
Scope / Assumptions
- This is a feature request for internal development
- Job dependencies are already handled by the server, not the runner
- Resource compatibility is already handled by existing torc tooling
- The runner only needs to execute jobs it has been assigned in order, based on available local capacity
Failure / Recovery Considerations
If a runner checks out multiple jobs and then stops reporting back within a configured timeout window, the server should return any outstanding jobs to the available pool.
Since runners already check in regularly, this seems like a reasonable recovery model for prefetched but unfinished work.
Reporting Considerations
There may be flexibility in how often runners report status:
Possible options:
- report after every completed job
- report completion in batches on a larger interval
- report periodically while also requesting more work to avoid becoming idle
Any implementation should preserve correctness while reducing the amount of idle time introduced by per-job reporting.
Suggested Configuration
Potential configuration options:
- prefetch enabled/disabled
- batch size
- max queued jobs per runner
- refill threshold, for example request more work when queue depth drops below a certain level
- lease / timeout for checked-out jobs
Example
Current
Runner polls, gets 1 job, runs it for ~5 seconds, reports completion, waits for next poll, then gets another job.
Proposed
Runner polls, gets a batch of eligible jobs, runs them back-to-back from memory, and refills the queue before it runs dry.
Acceptance Criteria
- Runner can request multiple jobs in a single interaction with the server
- Runner can maintain a local in-memory queue of prefetched jobs
- Runner starts the next job immediately after the previous one completes, assuming resources are available
- Batch size is configurable
- Existing resource-matching behavior is preserved
- Server can reclaim jobs that were checked out by an unresponsive runner after timeout
- GPU idle time between short jobs is measurably reduced
- Overall GPU utilization improves for short-running workloads
Summary
Add support in torc for job runners to prefetch multiple jobs from the server and execute them sequentially from a local in-memory queue, instead of fetching only a single job at a time.
This is intended to reduce idle time on node resources, especially GPUs, between short-running jobs.
Problem
Today, a TORC job runner works roughly like this:
Because polling happens on a configurable interval, there can be a gap between when one job finishes and when the next job begins. During that time, node resources are idle.
This is particularly noticeable for short-running jobs. In my current workload, individual jobs run for about 5 seconds. When running across 10 nodes, and potentially scaling to 200 nodes, the fetch/report/poll cycle can leave GPUs underutilized.
Requested Feature
Allow each job runner to request multiple eligible jobs at once and keep them in a local in-memory queue.
The runner should then:
This should help keep resources busy continuously instead of waiting on the next poll cycle after every single job.
Desired Behavior
For example:
Why This Matters
The current one-job-at-a-time model introduces unnecessary idle time between jobs, which is especially costly for short simulations.
Expected benefits:
The main success metric for this feature would be improved GPU utilization, ideally keeping GPUs as close to 100% utilized as possible.
Scope / Assumptions
Failure / Recovery Considerations
If a runner checks out multiple jobs and then stops reporting back within a configured timeout window, the server should return any outstanding jobs to the available pool.
Since runners already check in regularly, this seems like a reasonable recovery model for prefetched but unfinished work.
Reporting Considerations
There may be flexibility in how often runners report status:
Possible options:
Any implementation should preserve correctness while reducing the amount of idle time introduced by per-job reporting.
Suggested Configuration
Potential configuration options:
Example
Current
Runner polls, gets 1 job, runs it for ~5 seconds, reports completion, waits for next poll, then gets another job.
Proposed
Runner polls, gets a batch of eligible jobs, runs them back-to-back from memory, and refills the queue before it runs dry.
Acceptance Criteria