feat(jobs): durable JobStore backends (file + astra)#58
Merged
Conversation
Adds two durable `JobStore` impls so the async ingest records
survive process restart. The backend is auto-matched to
`controlPlane.driver` — `memory` keeps the existing ephemeral
behavior, `file` writes `<root>/jobs.json` alongside the other
control-plane files, and `astra` reuses the already-open tables
bundle to read/write `wb_jobs_by_workspace`.
## Shared plumbing
- `src/jobs/subscriptions.ts` — in-process pub/sub helper. Every
backend reuses it; listener management doesn't drift.
- `src/jobs/memory-store.ts` — refactored to use `JobSubscriptions`.
`applyUpdate()` extracted as the shared patch helper so all three
backends agree on update semantics.
- `src/jobs/factory.ts` — builds the right impl from
`ControlPlaneConfig`. Memory/file need no extra deps; astra takes
the existing `TablesBundle` from `buildControlPlane()` so we don't
open a second Data API connection.
- `src/control-plane/factory.ts` — new `buildControlPlane()` that
returns `{ store, astraTables? }`; old `buildControlPlaneStore()`
and `storeFromConfig()` kept as thin back-compat wrappers. A new
`controlPlaneFromConfig()` returns the bundle for callers that
want the tables.
## Astra wiring
- New `wb_jobs_by_workspace` table: partition by workspace, sort by
`job_id`, nullable `catalog_uid` / `document_uid`, serialized
`result_json` (same text-column pattern as `filter_json` on saved
queries).
- `JobRow` row type + in-store converters (kept inside the astra
store since they're not shared).
- `openAstraClient` creates the table on startup with
`ifNotExists: true`, same as every other `wb_*` table.
## Tests
- `tests/jobs/contract.ts` — 8-assertion shared contract. Covers
create→get round-trip, workspace scoping, update→get persists,
serialized `result` round-trip, subscribe replay + fire + unsub,
throwing-listener isolation.
- `tests/jobs/memory-store.test.ts` — now a one-liner that runs the
contract.
- `tests/jobs/file-store.test.ts` — contract + a "second instance
over the same root sees prior jobs" case to pin durability.
- `tests/jobs/astra-store.test.ts` — contract against the fake tables
bundle + a nested-result serialization test.
## What didn't change
The `/jobs/{id}` and `/jobs/{id}/events` routes are untouched — they
consume `JobStore`, and every backend satisfies the same interface.
In-flight jobs still don't resume after restart (the worker that
owned them is gone); the record now survives for operator
inspection. Cross-replica pub/sub is the remaining follow-up for a
truly multi-node deployment.
Total: 443 tests (was 421; +22 for the new contract × 3 backends +
two backend-specific tests).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Async ingest jobs now survive process restart. The `JobStore` backend auto-matches `controlPlane.driver`:
Highlights
Tests
Not in scope
In-flight job resume after restart — the record survives but the worker is gone. The operator-facing fix (timeout-based `running` → `failed` flip) and worker-side resume are a separate slice. Cross-replica pub/sub (Redis etc.) also stays future work; the seam is isolated in `JobSubscriptions`.
Test plan
First of three PRs in this Phase 2b-tail batch. Astra-native hybrid/rerank + UI catch-up follow.