Skip to content

Reduce SQLite write contention by introducing in-memory job buffering and batched database updates in torc server #277

@nkeilbart

Description

@nkeilbart

Summary

The torc server currently handles job requests from many job runners by interacting with the database for each request and job state update. With a large number of runners and short job durations, around 5 seconds, this creates heavy pressure on SQLite’s single-writer model and leads to lock contention.

This issue proposes changing the torc server to maintain a bounded in-memory set of ready jobs and to batch state updates back to the database on a periodic interval, instead of performing database writes for each individual runner interaction.
Problem

Today, job runners contact the torc server for work, and the server interacts with SQLite to claim and update jobs. Since SQLite only allows one writer at a time, a workload with many runners and short-lived jobs causes contention around that write path.

This is especially noticeable when:

many job runners are polling for work
jobs complete quickly, around 5 seconds
each request results in DB activity for claim/update operations
priority handling must still be preserved

The current model amplifies writes and lock contention because the effective write rate scales with runner activity.
Proposed change

Move job dispatch and near-term state tracking into the torc server process:

maintain a bounded in-memory queue or pool of ready-to-run jobs
periodically poll the database to refill that in-memory pool
hand jobs to runners directly from memory when they request work
accept runner status updates immediately in memory
periodically flush accumulated state changes back to the database in batches

Under this model, the torc server remains the only writer to the database, but instead of writing once per runner interaction, it writes on a controlled periodic cadence.
Goals

reduce SQLite write contention
reduce lock contention caused by many runners competing through the server
decouple runner request rate from database write rate
preserve existing priority behavior
support bounded prefetching so memory usage and staleness remain controlled

Non-goals

changing the runner-to-server communication pattern
changing job semantics beyond temporary in-memory buffering
guaranteeing that in-memory state survives a torc server crash

Expected behavior

A possible implementation would look like this:

Server periodically queries SQLite for a bounded set of ready jobs.
The number prefetched is based on estimated runnable capacity over the next polling interval, plus a configurable buffer.
Jobs are ordered or selected in a way that preserves current priority semantics.
Runners request work from the server.
The server assigns jobs from the in-memory pool without requiring an immediate DB write for each request.
Runner updates, such as started, completed, failed, are recorded in memory immediately.
The server flushes accumulated state transitions back to SQLite on a periodic interval.

Rationale

This should reduce the number of SQLite writes from being driven by the number of active runners to being driven primarily by the torc server’s flush interval. That should significantly reduce contention on SQLite’s single-writer path.

For workloads with many runners and short jobs, this should improve scheduler throughput and responsiveness while keeping SQLite as the durable source of truth on a delayed basis.
Tradeoffs and considerations

Crash recovery: If the torc server crashes before flushing in-memory updates, some jobs may not be marked complete in SQLite and may be re-run. This is acceptable for this proposal.
Temporary DB staleness: SQLite will lag behind the server’s in-memory state until the next flush.
Priority correctness: Any prefetching logic must preserve existing priority behavior.
Bounded memory: The in-memory ready queue should be capped and not attempt to load all ready jobs.
Flush interval tuning: There is a tradeoff between lower contention and longer persistence delay.
Observability: It may be useful to expose metrics for prefetched jobs, flush latency, batch sizes, and queue depth.

Possible implementation details

configurable polling interval for refilling ready jobs
configurable flush interval for persisting updates
configurable prefetch window and buffer size
adaptive prefetch sizing based on recent runner throughput
batching of state transitions in a single DB write transaction where possible

Open questions

Should claimed jobs be marked in the DB only on flush, or should some lightweight reservation metadata be persisted earlier?
Should the prefetch window be fixed, configurable, or adaptive?
How should jobs be handled if they are prefetched but not assigned before the next refill cycle?
Are there any fairness concerns across priorities or queues when jobs are prefetched in batches?

Success criteria

fewer SQLite lock contention events under high runner counts
reduced time runners spend waiting for job assignment
reduced database write frequency relative to runner request frequency
no regression in priority scheduling behavior

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions