Skip to content

dpub serve: persistent server mode for catalogue-scale transcription (deferred) #28

@roelvangils

Description

@roelvangils

Status

Deferred. This was Tier 2 item 1 + 2 in the strategic roadmap (`.claude/plans/can-you-look-at-quirky-allen.md`), originally framed as a "daemon" with a 3–4 week effort estimate. After review the conclusion is: overkill for current realistic workloads. Filing for tracking; do not start without a concrete customer use case.

What was proposed

A long-running process that holds the loaded Whisper model + a warm EPUBCheck JVM in memory, accepts conversion jobs over a Unix socket / HTTP / stdio JSON-RPC, and returns results. The strategic intent was to amortise one-time costs (Whisper model load, JVM startup) across hundreds of books in a catalogue, and to enable a "dpub processes a 500-book catalogue in N hours; Pipeline 2 takes 30× longer just on JVM startup" benchmark as a positioning artefact.

Why it's overkill (today)

Cost Per invocation Daemon helps?
Whisper model load ~10–15 s with `--transcribe` yes, big
EPUBCheck JVM ~3–5 s with `--validate` yes, modest
ACE / Node ~1–2 s with `--a11y` yes, modest
dpub binary cold start ~100 ms irrelevant
  • For a single-book convert, these are amortised over the conversion itself (minutes to hours). User doesn't notice.
  • For `dpub batch` (Tier 1, already shipped) over a small catalogue without `--transcribe`, the win is near zero — the binary runs and exits, no expensive load to amortise.
  • The only realistic workflow where the daemon is meaningfully faster is large transcribed catalogues (≥20 books with `--transcribe`). Eleven Ways' day-to-day (handfuls of books, manually triggered) doesn't hit that case.
  • Pipeline-2-vs-dpub catalogue benchmarks are positioning, not user pain. We can claim them when needed.

If we ever do build it

The 3–4 week estimate from the plan covered a polished daemon (cross-platform Unix socket + Windows named pipe, robust lifecycle, parallel jobs with backpressure, progress streaming, full test surface, docs). A pragmatic minimum viable version is much less:

  • `dpub serve --port N`: HTTP + JSON over TCP, sync, single-threaded, one job at a time
  • Use `tiny_http` (sync, ~no deps) — no tokio runtime
  • Skip Unix-socket-vs-named-pipe juggling; HTTP works everywhere
  • Skip parallel jobs — the existing rayon parallelism inside a single convert is plenty
  • Skip lifecycle ceremony — start it, Ctrl-C kills it, that's it

Estimated scope: ~3–5 days of focused work, ~200 lines of code plus a small HTTP client. Same headline benefit (one model load amortised across N conversions); a fraction of the surface.

When to revisit

Reopen and start work when any of these is true:

  1. A customer has a catalogue-scale transcription workflow (≥20 books with `--transcribe`) where the per-book model reload is a real friction point.
  2. We want to publish a Pipeline-2-vs-dpub catalogue benchmark for marketing.
  3. M7 (WASM) ships and we want a complementary "server" surface for non-WASM-friendly workloads.

Out of scope for the eventual work

  • Watch-folder mode (separate concern)
  • Real OS daemon-isation (systemd, launchd, Windows service) — let the user manage the process
  • Native-code daemons in C++ etc. — Rust is enough

Related

  • Tier 2 in `.claude/plans/can-you-look-at-quirky-allen.md` (preserved for context).
  • M7 WASM (next in Tier 2; preferred priority instead).
  • M6.5 word-level Media Overlay sync (separate Tier 2 item, also deferred until requested).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions