Skip to content

Support runner-side job prefetching / batching to reduce idle time between short jobs #276

@nkeilbart

Description

@nkeilbart

Summary

Add support in torc for job runners to prefetch multiple jobs from the server and execute them sequentially from a local in-memory queue, instead of fetching only a single job at a time.

This is intended to reduce idle time on node resources, especially GPUs, between short-running jobs.

Problem

Today, a TORC job runner works roughly like this:

  1. Poll the server for work
  2. Receive a single job
  3. Run the job
  4. Report completion back to the server
  5. Wait until the next polling cycle to request more work

Because polling happens on a configurable interval, there can be a gap between when one job finishes and when the next job begins. During that time, node resources are idle.

This is particularly noticeable for short-running jobs. In my current workload, individual jobs run for about 5 seconds. When running across 10 nodes, and potentially scaling to 200 nodes, the fetch/report/poll cycle can leave GPUs underutilized.

Requested Feature

Allow each job runner to request multiple eligible jobs at once and keep them in a local in-memory queue.

The runner should then:

  • start the next queued job immediately after the current one finishes
  • continue executing queued jobs sequentially
  • periodically check back in with the server
  • request additional work before it becomes idle, when possible

This should help keep resources busy continuously instead of waiting on the next poll cycle after every single job.

Desired Behavior

  • A runner can prefetch more than one job at a time
  • Prefetched jobs are stored in memory on the runner
  • Jobs are executed one after another as runner resources become available
  • Batch size should be configurable
  • Prefetch amount should take available resources into account

For example:

  • if a runner has 4 GPUs, it should be able to request enough work to keep those GPUs occupied for multiple job durations
  • torc already has resource-matching logic, so the runner should continue receiving only jobs appropriate for its available resources

Why This Matters

The current one-job-at-a-time model introduces unnecessary idle time between jobs, which is especially costly for short simulations.

Expected benefits:

  • higher GPU utilization
  • reduced idle time between jobs
  • fewer poll cycles per unit of work
  • better throughput for short-duration workloads
  • less resource waste on busy clusters

The main success metric for this feature would be improved GPU utilization, ideally keeping GPUs as close to 100% utilized as possible.

Scope / Assumptions

  • This is a feature request for internal development
  • Job dependencies are already handled by the server, not the runner
  • Resource compatibility is already handled by existing torc tooling
  • The runner only needs to execute jobs it has been assigned in order, based on available local capacity

Failure / Recovery Considerations

If a runner checks out multiple jobs and then stops reporting back within a configured timeout window, the server should return any outstanding jobs to the available pool.

Since runners already check in regularly, this seems like a reasonable recovery model for prefetched but unfinished work.

Reporting Considerations

There may be flexibility in how often runners report status:

Possible options:

  • report after every completed job
  • report completion in batches on a larger interval
  • report periodically while also requesting more work to avoid becoming idle

Any implementation should preserve correctness while reducing the amount of idle time introduced by per-job reporting.

Suggested Configuration

Potential configuration options:

  • prefetch enabled/disabled
  • batch size
  • max queued jobs per runner
  • refill threshold, for example request more work when queue depth drops below a certain level
  • lease / timeout for checked-out jobs

Example

Current

Runner polls, gets 1 job, runs it for ~5 seconds, reports completion, waits for next poll, then gets another job.

Proposed

Runner polls, gets a batch of eligible jobs, runs them back-to-back from memory, and refills the queue before it runs dry.

Acceptance Criteria

  • Runner can request multiple jobs in a single interaction with the server
  • Runner can maintain a local in-memory queue of prefetched jobs
  • Runner starts the next job immediately after the previous one completes, assuming resources are available
  • Batch size is configurable
  • Existing resource-matching behavior is preserved
  • Server can reclaim jobs that were checked out by an unresponsive runner after timeout
  • GPU idle time between short jobs is measurably reduced
  • Overall GPU utilization improves for short-running workloads

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions