Skip to content

feat(backend): run scheduled tasks with ShedLock#6705

Closed
theosanderson-agent wants to merge 12 commits into
mainfrom
feat/backend-shedlock-scheduled-tasks
Closed

feat(backend): run scheduled tasks with ShedLock#6705
theosanderson-agent wants to merge 12 commits into
mainfrom
feat/backend-shedlock-scheduled-tasks

Conversation

@theosanderson-agent

@theosanderson-agent theosanderson-agent commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

Implementation of ShedLock to close #6704.

Today every @Scheduled task runs independently on each backend replica, so the effective execution frequency scales with the replica count (e.g. with 3 replicas the cleanup task runs ~3× as often, and the CrossRef citation fetch hits the external service 3×). ShedLock fixes this by having replicas compete for a shared, named lock before running a task — only the winner executes that tick; the rest skip it. The result is "run once per interval, regardless of replica count".

How it works

ShedLock stores locks in a small shedlock table in the existing Postgres database (no new infrastructure). When a scheduled method fires:

  1. The replica attempts to INSERT/UPDATE a row keyed by the task name, setting lock_until.
  2. If it wins the row lock, it runs the task and releases the lock when done.
  3. Other replicas, seeing an unexpired lock, skip that execution.

lockAtMostFor is a safety net: if the holding replica crashes mid-task, the lock auto-expires after that duration so another replica can take over. We use the database clock (usingDbTime()) for all lock timing, so clock drift between replicas can't cause double-runs.

Changes

  • build.gradle — add net.javacrumbs.shedlock:shedlock-spring and shedlock-provider-jdbc-template (7.7.0); gradle.lockfile regenerated (STRICT lock mode).

  • BackendSpringConfig.kt@EnableSchedulerLock(defaultLockAtMostFor = "PT30M") and a JdbcTemplateLockProvider bean backed by the existing DataSource with usingDbTime().

  • Migration V1.31__add_shedlock_table.sql — creates the shedlock table.

  • All four scheduled tasks annotated with @SchedulerLock, each lockAtMostFor sized above its expected runtime:

    Task Interval lockAtMostFor
    CleanUpStaleSequencesInProcessingTask ~60s PT5M
    UseNewerProcessingPipelineVersionTask ~10s PT10M
    CleanUpAuxTableTask 1h PT15M
    SeqSetCrossRefCitationsTask 6h PT1H

Note the org.jetbrains.kotlin.plugin.spring plugin auto-opens Spring components, so the @SchedulerLock methods are proxyable without extra open modifiers.

Testing

  • ./gradlew compileKotlin ktlintFormat — pass.
  • ./gradlew dependencies --write-locks — lockfile updated.
  • BackendApplicationTest (context load with mocked DataSource) — pass; the lockProvider bean wires up without touching the DB at construction.
  • A DB-backed EndpointTest against the Postgres test container — pass; Flyway log confirms Migrating schema "public" to version "1.31 - add shedlock table" and Successfully applied 32 migrations.

🤖 Generated with Claude Code

🚀 Preview: Add preview label to enable

Introduce ShedLock so that the backend's scheduled tasks execute on only
one replica at a time, instead of once per replica. This addresses #6704:
previously the effective frequency of every @scheduled task scaled with
the number of backend replicas.

ShedLock coordinates via a shared `shedlock` table in the existing Postgres
database (added in migration V1.31). Before a task runs, the replica tries
to acquire a row-level lock keyed by the task name; only the replica that
wins runs the task, the others skip that tick.

Changes:
- Add shedlock-spring and shedlock-provider-jdbc-template (7.7.0) deps.
- Register a JdbcTemplateLockProvider backed by the existing DataSource,
  using `usingDbTime()` so lock timing relies on the database clock and is
  unaffected by clock drift between replicas.
- Enable locking with @EnableSchedulerLock (defaultLockAtMostFor=PT30M as a
  safety net if a replica dies mid-task).
- Annotate all four scheduled tasks with @SchedulerLock, each with a
  lockAtMostFor sized above its expected runtime.
- Add Flyway migration V1.31 creating the `shedlock` table.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@anna-parker anna-parker marked this pull request as ready for review June 17, 2026 14:07
@claude

claude Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Claude finished @anna-parker's task in 2m 6s —— View job


Code Review

  • Gather context
  • Read changed files
  • Analyze code quality, security, performance, and testing
  • Post inline comments
  • Finalize review

Overall this is a clean, well-scoped implementation. The approach (ShedLock backed by the existing Postgres DB, usingDbTime() for clock-safe locking) is the right one for this problem. Two things worth addressing before merging, and one smaller note:

TIMESTAMP vs TIMESTAMP WITH TIME ZONE in migration — The ShedLock PostgreSQL documentation recommends TIMESTAMP WITH TIME ZONE for the lock_until and locked_at columns so that timezone-aware comparisons work correctly when usingDbTime() calls NOW() (which returns timestamptz). Plain TIMESTAMP will work because PostgreSQL silently strips the timezone on assignment, but aligning with the documented schema is safer. Inline suggestion posted.

lockAtMostFor for UseNewerProcessingPipelineVersionTask — This task fires every 10 seconds; a PT10M safety timeout means a crashed replica locks out all others for up to 10 minutes. The actual work is a lightweight DB read, so PT1M is still generous and keeps the failure window much smaller. Inline suggestion posted.

Test coverage for the lock mechanism — The existing task tests call task() directly (bypassing the ShedLock AOP proxy), so they never exercise the lock itself. The business logic tests are fine as-is, but there's no test confirming the shedlock table gets populated or that the lock correctly prevents double-execution. This may be acceptable for a demo PR, but worth tracking before the replica count is increased. Inline comment posted.

anna-parker and others added 3 commits June 17, 2026 16:14
…ble.sql

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
@anna-parker anna-parker self-requested a review June 17, 2026 14:23
anna-parker
anna-parker previously approved these changes Jun 17, 2026

@anna-parker anna-parker left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

I confirmed tasks are running on the preview!

@anna-parker anna-parker added the preview Triggers a deployment to argocd label Jun 17, 2026
@theosanderson

Copy link
Copy Markdown
Member

OK, I will let you/someone else merge as I haven't looked into this enough to take responsibility in the way that merging would imply (thanks a lot for checking!)

@anna-parker

anna-parker commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

@corneliusroemer and I took a closer look and discovered that the behavior is slightly different than I expected from quickly reading the docs.

  1. Shedlock prevents instances running a task in parallel. It does not check that a task is executed only as many times as defined by the task schedule frequency.

If a second instance tries to start a task while the first instance is running the same task (i.e. that task is locked in the table) the second task run will be skipped. However, if the second instance tried to run the task AFTER the initial instance has finished running (and removed the lock) the second task would still run. As backends will not all start in unison in theory we could get to a state where a task is still run multiple times during its defined frequency period. This will definitely prevent peaks of load when multiple backends start a task in unison but not fully prevent multiple executions of a task.

I believe that a way to fix this would be to set the lockAtMostFor and lockAtLeastFor parameters to the task frequency. However, I am wondering if it makes sense to just add code for this instead of using a library.

  1. lockAtMostFor should be greater than the time the task is expected to run for so that another task doesnt try to start while the first task is still running.

@theosanderson

Copy link
Copy Markdown
Member

Yes my vague expectation was what you say. Avoiding parallel runs is anyway a good start. I was thinking we could maybe run on a timetable based on times and then discard the runs that are stopped by locks (possibly)

@anna-parker

Copy link
Copy Markdown
Contributor

alternative suggestion that should handle these edge cases with self rolled code (quite simple as only covers what we need and leads to no external dependencies): #6711

@anna-parker

Copy link
Copy Markdown
Contributor

From discussing with @theosanderson we should increase the lockAtMostFor to at least the task run frequency for all tasks (for the GC task this can be higher), and probably set lockAtLeastFor to the task frequency (slightly less is ok if we are ok with potentially higher task run frequency).

This will require some test changes

@theosanderson

Copy link
Copy Markdown
Member

yeah I guess I would phrase it differently, I would say we set lockAtLeastFor to the desired frequency. We set lockAtMostFor to the highest interval we would be prepared to tolerate in the rare event that a job died.

…astFor

The previous ShedLock setup only set lockAtMostFor, which prevents
*simultaneous* runs but not frequency-scaling with replica count: because
the lock is released as soon as the task finishes and replicas fire out of
sync, the task could still run up to N times per interval with N replicas.

Add a configurable lockAtLeastFor to every scheduled task so the lock is
held for at least the desired interval, which is what actually guarantees
"run once per interval regardless of replica count". The schedulers now poll
frequently (ShedLock skips rather than queues when a lock is held), and
lockAtLeastFor governs the real cadence. For the GC-style aux cleanup,
lockAtMostFor is set above lockAtLeastFor so a long run keeps the lock
(avoiding parallelism) while still releasing if a replica dies mid-task.

Lock minimums are overridable via loculus.locks.<task>.atLeast and are set
to PT0S in tests (ShedLock's in-memory LockRecordRegistry is incompatible
with truncating the shedlock table between tests). ShedLockIntegrationTest
now verifies lockAtLeastFor directly through the LockProvider.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@theosanderson

Copy link
Copy Markdown
Member

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca6520502e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Theo Sanderson and others added 3 commits June 19, 2026 10:27
Addresses Codex review: useNewerProcessingPipelineVersion hardcoded a
lockAtLeastFor of PT10S, which silently overrode operator-configured
pipelineVersionUpgradeCheckIntervalSeconds values below 10s (schema allows
down to 1s).

lockAtLeastFor now defaults to the configured interval via a nested
placeholder, so the interval is honored rather than capped. The Helm chart
passes both bounds derived from the same value (lockAtLeastFor = interval,
lockAtMostFor = 5x interval) so the at-most ceiling always stays above the
floor for any configured interval. cleanUpStaleSequencesInProcessing
likewise defaults its floor to run-every-seconds.

ShedLock does not evaluate SpEL in the duration attributes (only in name),
so the 5x is computed in the chart rather than the annotation. Added
ShedLockPropertyResolutionTest to cover the production default path, which
the DB-backed test can't (it always overrides atLeast to PT0S).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…piry boundary

cleanUpStaleSequencesInProcessing used fixedRate with lockAtLeastFor equal to
the run-every interval. fixedRate fires on a wall-clock grid every R while the
lock expires at acquire+R, so the next tick lands exactly on the expiry
boundary and is skipped or not depending on sub-millisecond clock jitter,
making the effective cadence drift toward ~2x.

Switching to fixedDelay schedules from completion, so each poll fires after
lockAtLeastFor has already elapsed since acquisition and reliably re-acquires,
while still holding the lock for the full interval (at-most-once per interval
across replicas preserved). The other tasks are unaffected: aux/crossref poll
far more often than their lock, and pipeline is already fixedDelay.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Remove restatements of the lock mechanism that the annotation values already
convey, keeping only the non-obvious rationale (fixedDelay choice, the unique
test lock name, and why the property-resolution test exists).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@theosanderson theosanderson changed the title feat(backend): run scheduled tasks on a single replica with ShedLock feat(backend): run scheduled tasks with ShedLock Jun 19, 2026
// non-Helm fallback. Both overridable via `loculus.locks.*` (tests use PT0S).
lockAtLeastFor = "\${loculus.locks.useNewerProcessingPipelineVersion.atLeast:" +
"PT\${${BackendSpringProperty.PIPELINE_VERSION_UPGRADE_CHECK_INTERVAL_SECONDS}}S}",
lockAtMostFor = "\${loculus.locks.useNewerProcessingPipelineVersion.atMost:PT1M}",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hard coding is problematic when the task starts taking longer on big instances

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh this is actually configurable but bypassing our explicit config variables

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah agree it could be confusing to have this set in two ways. Would be happy to discard one

@theosanderson

Copy link
Copy Markdown
Member

Closing as Anya and Cornelius prefer a hand-rolled version: #6711

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend related to the loculus backend component update_db_schema

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make scheduled tasks not scale in frequency with number of backend replicas

5 participants