feat(backend): run scheduled tasks with ShedLock#6705
feat(backend): run scheduled tasks with ShedLock#6705theosanderson-agent wants to merge 12 commits into
Conversation
Introduce ShedLock so that the backend's scheduled tasks execute on only one replica at a time, instead of once per replica. This addresses #6704: previously the effective frequency of every @scheduled task scaled with the number of backend replicas. ShedLock coordinates via a shared `shedlock` table in the existing Postgres database (added in migration V1.31). Before a task runs, the replica tries to acquire a row-level lock keyed by the task name; only the replica that wins runs the task, the others skip that tick. Changes: - Add shedlock-spring and shedlock-provider-jdbc-template (7.7.0) deps. - Register a JdbcTemplateLockProvider backed by the existing DataSource, using `usingDbTime()` so lock timing relies on the database clock and is unaffected by clock drift between replicas. - Enable locking with @EnableSchedulerLock (defaultLockAtMostFor=PT30M as a safety net if a replica dies mid-task). - Annotate all four scheduled tasks with @SchedulerLock, each with a lockAtMostFor sized above its expected runtime. - Add Flyway migration V1.31 creating the `shedlock` table. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Claude finished @anna-parker's task in 2m 6s —— View job Code Review
Overall this is a clean, well-scoped implementation. The approach (ShedLock backed by the existing Postgres DB,
Test coverage for the lock mechanism — The existing task tests call |
…ble.sql Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
|
OK, I will let you/someone else merge as I haven't looked into this enough to take responsibility in the way that merging would imply (thanks a lot for checking!) |
|
@corneliusroemer and I took a closer look and discovered that the behavior is slightly different than I expected from quickly reading the docs.
If a second instance tries to start a task while the first instance is running the same task (i.e. that task is locked in the table) the second task run will be skipped. However, if the second instance tried to run the task AFTER the initial instance has finished running (and removed the lock) the second task would still run. As backends will not all start in unison in theory we could get to a state where a task is still run multiple times during its defined frequency period. This will definitely prevent peaks of load when multiple backends start a task in unison but not fully prevent multiple executions of a task. I believe that a way to fix this would be to set the
|
|
Yes my vague expectation was what you say. Avoiding parallel runs is anyway a good start. I was thinking we could maybe run on a timetable based on times and then discard the runs that are stopped by locks (possibly) |
|
alternative suggestion that should handle these edge cases with self rolled code (quite simple as only covers what we need and leads to no external dependencies): #6711 |
|
From discussing with @theosanderson we should increase the This will require some test changes |
|
yeah I guess I would phrase it differently, I would say we set |
…astFor The previous ShedLock setup only set lockAtMostFor, which prevents *simultaneous* runs but not frequency-scaling with replica count: because the lock is released as soon as the task finishes and replicas fire out of sync, the task could still run up to N times per interval with N replicas. Add a configurable lockAtLeastFor to every scheduled task so the lock is held for at least the desired interval, which is what actually guarantees "run once per interval regardless of replica count". The schedulers now poll frequently (ShedLock skips rather than queues when a lock is held), and lockAtLeastFor governs the real cadence. For the GC-style aux cleanup, lockAtMostFor is set above lockAtLeastFor so a long run keeps the lock (avoiding parallelism) while still releasing if a replica dies mid-task. Lock minimums are overridable via loculus.locks.<task>.atLeast and are set to PT0S in tests (ShedLock's in-memory LockRecordRegistry is incompatible with truncating the shedlock table between tests). ShedLockIntegrationTest now verifies lockAtLeastFor directly through the LockProvider. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ca6520502e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Addresses Codex review: useNewerProcessingPipelineVersion hardcoded a lockAtLeastFor of PT10S, which silently overrode operator-configured pipelineVersionUpgradeCheckIntervalSeconds values below 10s (schema allows down to 1s). lockAtLeastFor now defaults to the configured interval via a nested placeholder, so the interval is honored rather than capped. The Helm chart passes both bounds derived from the same value (lockAtLeastFor = interval, lockAtMostFor = 5x interval) so the at-most ceiling always stays above the floor for any configured interval. cleanUpStaleSequencesInProcessing likewise defaults its floor to run-every-seconds. ShedLock does not evaluate SpEL in the duration attributes (only in name), so the 5x is computed in the chart rather than the annotation. Added ShedLockPropertyResolutionTest to cover the production default path, which the DB-backed test can't (it always overrides atLeast to PT0S). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…piry boundary cleanUpStaleSequencesInProcessing used fixedRate with lockAtLeastFor equal to the run-every interval. fixedRate fires on a wall-clock grid every R while the lock expires at acquire+R, so the next tick lands exactly on the expiry boundary and is skipped or not depending on sub-millisecond clock jitter, making the effective cadence drift toward ~2x. Switching to fixedDelay schedules from completion, so each poll fires after lockAtLeastFor has already elapsed since acquisition and reliably re-acquires, while still holding the lock for the full interval (at-most-once per interval across replicas preserved). The other tasks are unaffected: aux/crossref poll far more often than their lock, and pipeline is already fixedDelay. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Remove restatements of the lock mechanism that the annotation values already convey, keeping only the non-obvious rationale (fixedDelay choice, the unique test lock name, and why the property-resolution test exists). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| // non-Helm fallback. Both overridable via `loculus.locks.*` (tests use PT0S). | ||
| lockAtLeastFor = "\${loculus.locks.useNewerProcessingPipelineVersion.atLeast:" + | ||
| "PT\${${BackendSpringProperty.PIPELINE_VERSION_UPGRADE_CHECK_INTERVAL_SECONDS}}S}", | ||
| lockAtMostFor = "\${loculus.locks.useNewerProcessingPipelineVersion.atMost:PT1M}", |
There was a problem hiding this comment.
This hard coding is problematic when the task starts taking longer on big instances
There was a problem hiding this comment.
Ugh this is actually configurable but bypassing our explicit config variables
There was a problem hiding this comment.
yeah agree it could be confusing to have this set in two ways. Would be happy to discard one
|
Closing as Anya and Cornelius prefer a hand-rolled version: #6711 |
Summary
Implementation of ShedLock to close #6704.
Today every
@Scheduledtask runs independently on each backend replica, so the effective execution frequency scales with the replica count (e.g. with 3 replicas the cleanup task runs ~3× as often, and the CrossRef citation fetch hits the external service 3×). ShedLock fixes this by having replicas compete for a shared, named lock before running a task — only the winner executes that tick; the rest skip it. The result is "run once per interval, regardless of replica count".How it works
ShedLock stores locks in a small
shedlocktable in the existing Postgres database (no new infrastructure). When a scheduled method fires:INSERT/UPDATEa row keyed by the task name, settinglock_until.lockAtMostForis a safety net: if the holding replica crashes mid-task, the lock auto-expires after that duration so another replica can take over. We use the database clock (usingDbTime()) for all lock timing, so clock drift between replicas can't cause double-runs.Changes
build.gradle— addnet.javacrumbs.shedlock:shedlock-springandshedlock-provider-jdbc-template(7.7.0);gradle.lockfileregenerated (STRICT lock mode).BackendSpringConfig.kt—@EnableSchedulerLock(defaultLockAtMostFor = "PT30M")and aJdbcTemplateLockProviderbean backed by the existingDataSourcewithusingDbTime().Migration
V1.31__add_shedlock_table.sql— creates theshedlocktable.All four scheduled tasks annotated with
@SchedulerLock, eachlockAtMostForsized above its expected runtime:lockAtMostForCleanUpStaleSequencesInProcessingTaskPT5MUseNewerProcessingPipelineVersionTaskPT10MCleanUpAuxTableTaskPT15MSeqSetCrossRefCitationsTaskPT1HNote the
org.jetbrains.kotlin.plugin.springplugin auto-opens Spring components, so the@SchedulerLockmethods are proxyable without extraopenmodifiers.Testing
./gradlew compileKotlin ktlintFormat— pass../gradlew dependencies --write-locks— lockfile updated.BackendApplicationTest(context load with mockedDataSource) — pass; thelockProviderbean wires up without touching the DB at construction.EndpointTestagainst the Postgres test container — pass; Flyway log confirmsMigrating schema "public" to version "1.31 - add shedlock table"andSuccessfully applied 32 migrations.🤖 Generated with Claude Code
🚀 Preview: Add
previewlabel to enable