Support runner-side job prefetching / batching to reduce idle time between short jobs

## Summary

Add support in torc for job runners to prefetch multiple jobs from the server and execute them sequentially from a local in-memory queue, instead of fetching only a single job at a time.

This is intended to reduce idle time on node resources, especially GPUs, between short-running jobs.

## Problem

Today, a TORC job runner works roughly like this:

1. Poll the server for work
2. Receive a single job
3. Run the job
4. Report completion back to the server
5. Wait until the next polling cycle to request more work

Because polling happens on a configurable interval, there can be a gap between when one job finishes and when the next job begins. During that time, node resources are idle.

This is particularly noticeable for short-running jobs. In my current workload, individual jobs run for about 5 seconds. When running across 10 nodes, and potentially scaling to 200 nodes, the fetch/report/poll cycle can leave GPUs underutilized.

## Requested Feature

Allow each job runner to request multiple eligible jobs at once and keep them in a local in-memory queue.

The runner should then:

- start the next queued job immediately after the current one finishes
- continue executing queued jobs sequentially
- periodically check back in with the server
- request additional work before it becomes idle, when possible

This should help keep resources busy continuously instead of waiting on the next poll cycle after every single job.

## Desired Behavior

- A runner can prefetch more than one job at a time
- Prefetched jobs are stored in memory on the runner
- Jobs are executed one after another as runner resources become available
- Batch size should be configurable
- Prefetch amount should take available resources into account

For example:
- if a runner has 4 GPUs, it should be able to request enough work to keep those GPUs occupied for multiple job durations
- torc already has resource-matching logic, so the runner should continue receiving only jobs appropriate for its available resources

## Why This Matters

The current one-job-at-a-time model introduces unnecessary idle time between jobs, which is especially costly for short simulations.

Expected benefits:
- higher GPU utilization
- reduced idle time between jobs
- fewer poll cycles per unit of work
- better throughput for short-duration workloads
- less resource waste on busy clusters

The main success metric for this feature would be improved GPU utilization, ideally keeping GPUs as close to 100% utilized as possible.

## Scope / Assumptions

- This is a feature request for internal development
- Job dependencies are already handled by the server, not the runner
- Resource compatibility is already handled by existing torc tooling
- The runner only needs to execute jobs it has been assigned in order, based on available local capacity

## Failure / Recovery Considerations

If a runner checks out multiple jobs and then stops reporting back within a configured timeout window, the server should return any outstanding jobs to the available pool.

Since runners already check in regularly, this seems like a reasonable recovery model for prefetched but unfinished work.

## Reporting Considerations

There may be flexibility in how often runners report status:

Possible options:
- report after every completed job
- report completion in batches on a larger interval
- report periodically while also requesting more work to avoid becoming idle

Any implementation should preserve correctness while reducing the amount of idle time introduced by per-job reporting.

## Suggested Configuration

Potential configuration options:
- prefetch enabled/disabled
- batch size
- max queued jobs per runner
- refill threshold, for example request more work when queue depth drops below a certain level
- lease / timeout for checked-out jobs

## Example

### Current
Runner polls, gets 1 job, runs it for ~5 seconds, reports completion, waits for next poll, then gets another job.

### Proposed
Runner polls, gets a batch of eligible jobs, runs them back-to-back from memory, and refills the queue before it runs dry.

## Acceptance Criteria

- Runner can request multiple jobs in a single interaction with the server
- Runner can maintain a local in-memory queue of prefetched jobs
- Runner starts the next job immediately after the previous one completes, assuming resources are available
- Batch size is configurable
- Existing resource-matching behavior is preserved
- Server can reclaim jobs that were checked out by an unresponsive runner after timeout
- GPU idle time between short jobs is measurably reduced
- Overall GPU utilization improves for short-running workloads

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support runner-side job prefetching / batching to reduce idle time between short jobs #276

Summary

Problem

Requested Feature

Desired Behavior

Why This Matters

Scope / Assumptions

Failure / Recovery Considerations

Reporting Considerations

Suggested Configuration

Example

Current

Proposed

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support runner-side job prefetching / batching to reduce idle time between short jobs #276

Description

Summary

Problem

Requested Feature

Desired Behavior

Why This Matters

Scope / Assumptions

Failure / Recovery Considerations

Reporting Considerations

Suggested Configuration

Example

Current

Proposed

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions