nikazzio · nikazzio · Apr 17, 2026 · Apr 13, 2026 · Apr 17, 2026
diff --git a/docs/explanation/architecture.md b/docs/explanation/architecture.md
@@ -1,25 +1,97 @@
 # Architecture Summary
 
-Scriptoria separates UI orchestration from reusable core services.
+Scriptoria is built as a Python application with a strict separation between the web UI shell, the reusable core services, and the CLI. The split is not stylistic. It is the load-bearing rule that makes the same product work as a FastHTML application, a CLI tool, and a research workbench against unstable upstream providers.
 
-## Main Layers
+## Three Layers, One Direction Of Dependency
 
-- `studio_ui/` renders the FastHTML and HTMX application.
-- `universal_iiif_core/` owns provider resolution, storage, downloads, export, OCR, and runtime policy.
-- `universal_iiif_cli/` exposes CLI behavior on top of the same core layer.
+The repository is organized around three packages.
 
-## Important Boundary
+- `studio_ui/` renders the FastHTML and HTMX interface. It owns routes, components, and presentation glue.
+- `universal_iiif_core/` owns provider resolution, storage, download orchestration, export logic, OCR services, networking, and runtime policy.
+- `universal_iiif_cli/` exposes the CLI entry point on top of the same core services.
 
-The UI depends on the core layer. The core layer does not depend on UI modules.
+The dependency direction is one-way. The UI and CLI both depend on the core layer. The core layer does not import from `studio_ui/` or `universal_iiif_cli/`. That rule is what allows the web and CLI surfaces to share resolution, download, and export behavior without diverging implementations.
 
-## Why This Matters
+## Routes Orchestrate, Core Implements
 
-This boundary lets Scriptoria:
+Inside `studio_ui/`, the route modules in `studio_ui/routes/` exist to register HTTP endpoints and wire user actions to core services. They orchestrate. They do not contain business logic.
+
+The same applies to handlers and helpers under `studio_ui/routes/_studio/` and the panes under `studio_ui/components/`. These exist to keep presentation focused and small. Anything that resolves manuscripts, validates pages, manages jobs, or persists state belongs in `universal_iiif_core/`.
+
+This is the boundary that makes refactors safe. When a route module grows complex, the fix is usually to move logic into a core service, not to keep adding presentation-side conditionals.
+
+## Why The Layering Is The Important Part
+
+Without this separation, the application would still work, but it would behave differently in the CLI than in the web UI, and provider quirks would have to be patched in two places. With the separation, Scriptoria can:
 
 - share resolution and download behavior between web and CLI;
 - keep runtime policy in one place;
-- reduce duplication in storage, networking, and export behavior.
+- make CLI scripts a first-class operational tool rather than a thin demo;
+- reduce duplication in storage, networking, and export logic;
+- evolve the UI surface without rewriting core behavior.
+
+## Core Service Areas
+
+The core layer is intentionally subdivided so that each service area is replaceable without touching the others.
+
+### Discovery
+
+Discovery uses a shared provider registry and a typed orchestration layer in `universal_iiif_core.discovery`. It classifies user input, runs direct resolution against the appropriate provider, and routes free-text queries to provider-specific search adapters. Results are normalized into a shared search contract before they reach the UI. This is what makes Discovery look like one feature even though every provider behaves differently.
+
+### Downloading
+
+Download orchestration is handled by the core services together with a job manager backed by local state. The runtime prefers native PDF when one is advertised and configured, falls back to IIIF image acquisition otherwise, validates pages in staging, and only promotes them into the local scan set when the configured policy allows it. Resume and retry safety across partial runs is part of the model, not a bolt-on.
+
+### Storage
+
+`VaultManager` and the storage services in `universal_iiif_core/services/storage/` keep track of manuscript records, download and export jobs, UI preferences, and snippet/OCR-related rows. The vault is local SQLite. Runtime files live under managed directories resolved through `ConfigManager`.
+
+### Export
+
+Export is a dedicated service rather than a side effect of the download path. It owns PDF inventory discovery, profile-driven export jobs, local and temporary remote high-resolution image sourcing, and cleanup and retention policies. The capability model is explicit so the UI can advertise roadmap surfaces without faking them.
+
+### OCR
+
+OCR services abstract local Kraken and remote OpenAI/Anthropic engines behind a common workflow, while the UI handles asynchronous job feedback. The core is engine-agnostic so that the UI does not have to know which backend produced a transcription.
+
+### Networking
+
+The centralized HTTP client (`universal_iiif_core.http_client`) is the only sanctioned transport layer for runtime network operations. It owns retry and backoff policy, per-library customization, concurrency and rate limiting, and request hygiene around hostile or fragile upstream services. New code must not create ad-hoc `requests` sessions; doing so bypasses the entire network policy layer.
+
+## Runtime Configuration As An Architectural Concern
+
+Configuration in Scriptoria is not cosmetic. `ConfigManager` is the single source of truth for runtime paths and policy. Hardcoded paths are forbidden because they break test isolation, packaging, and per-user installation. Hardcoded network behavior is also forbidden because providers behave too differently to be served by one fixed policy.
+
+When a feature needs a path or a policy value, it asks `ConfigManager`. This keeps the rest of the codebase free of environmental assumptions.
+
+## End-To-End Flow
+
+The high-level flow across the three layers looks like this.
+
+1. **Discovery to Library**: the user submits a URL, identifier, shelfmark, or query; Discovery resolves or searches through the provider registry; results are normalized; `Add item` writes local metadata without forcing a download.
+2. **Download to local working state**: a download job is enqueued through the job manager; the runtime acquires native PDF or IIIF images; pages are validated in staging and promoted into local scans when policy allows.
+3. **Library to Studio**: the user opens a local item; Studio builds a workspace context from local records and manifest state; the viewer chooses local or remote mode based on coverage and policy.
+4. **Output and export**: the Output tab reads PDF inventory and page state; the user picks a profile or a page-level repair action; export jobs run through the storage-backed job system; artifacts are persisted under managed runtime paths.
+
+Every step in that sequence touches multiple core services but stays inside the rules of the dependency layout: UI orchestrates, core implements, storage persists, networking transports.
+
+## Design Rules That Hold This Together
+
+- `docs/` is the documentation source of truth; wiki pages are derived publish targets.
+- Runtime paths are resolved through `ConfigManager`, not hardcoded strings.
+- `scans/` is the operational image source for local study workflows.
+- Staging and retry behavior must remain safe for partial and resumed downloads.
+- UI package structure should reflect responsibility boundaries, not just file size limits.
+- The core layer never imports from UI or CLI packages.
 
 ## Deep Dive
 
-For the more detailed component breakdown, see [Project Architecture](../ARCHITECTURE.md).
+For the more detailed component breakdown, including current UI package structure and contributor hotspots, see [Project Architecture](../ARCHITECTURE.md).
+
+## Related Docs
+
+- [Storage Model](storage-model.md)
+- [Job Lifecycle](job-lifecycle.md)
+- [Discovery And Provider Model](discovery-provider-model.md)
+- [Export And PDF Model](export-and-pdf-model.md)
+- [Security And Path Safety](security-and-path-safety.md)
diff --git a/docs/explanation/job-lifecycle.md b/docs/explanation/job-lifecycle.md
@@ -1,26 +1,106 @@
 # Job Lifecycle
 
-Scriptoria treats download and export work as tracked jobs backed by local state.
+Scriptoria treats long-running work as tracked jobs backed by local state. Downloads and exports are not fire-and-forget calls: they have an identity, a status row in the vault, and a defined set of transitions. This is what makes acquisition and export reliable across providers that behave inconsistently and across sessions that can be interrupted.
 
-## Download Jobs
+## Why The Job Layer Exists
 
-A typical download job:
+Manuscript acquisition is large, slow, and failure-prone. Pages can fail individually, providers rate-limit aggressively, and a long download can be interrupted at any point. A naive script-style approach would lose work whenever something went wrong.
 
-1. is created from discovery or library actions;
-2. records progress in local state;
-3. stages validated pages first;
-4. promotes them according to storage policy;
-5. can be paused, resumed, retried, or cancelled.
+The job layer is the safety mechanism that prevents that. It records the work in progress, exposes pause and resume operations, lets the user retry only what failed, and survives application restarts without losing the partial state that was already on disk.
+
+## The Job Manager
+
+The download and export job manager is a process-wide singleton implemented in `universal_iiif_core/jobs.py`. It owns:
+
+- a registry of in-flight job records keyed by short job ids;
+- a download queue that admits jobs up to a configured concurrency limit;
+- the threading boundaries that isolate worker exceptions from the UI;
+- the bridge between in-memory job state and the persistent `download_jobs` table in the local vault.
+
+The concurrency cap is taken from network policy. The default value of `max_concurrent_download_jobs` is `2`, which is intentionally conservative: most upstream providers prefer a few well-paced clients over many aggressive ones.
+
+## Status Values
+
+The vault recognizes a small, fixed set of statuses for download and export jobs.
+
+### Transitional States
+
+A job is in a transitional state when something is actively happening or being requested:
+
+- `queued`: the job has been created and is waiting for an execution slot;
+- `running`: a worker thread is actively processing the job;
+- `cancelling`: the user requested cancellation while the job was running and the worker is winding down;
+- `pausing`: the user requested a pause and the worker is winding down to a paused state.
+
+### Terminal States
+
+Terminal states describe a job that is no longer doing work:
+
+- `paused`: the worker stopped at a clean point and the job can be resumed later;
+- `cancelled`: the worker stopped after a cancel request and the job will not resume automatically;
+- `completed`: the job finished successfully;
+- `error`: the job stopped because of an error, and the failure reason is recorded in `error_message`.
+
+The vault enforces terminality. Once a row is in `paused`, `cancelled`, `completed`, or `error`, transitional updates that would overwrite that state are ignored. This prevents late worker callbacks from undoing a user-driven decision.
+
+## Lifecycle Of A Download Job
+
+A typical download job goes through this sequence.
+
+1. The route layer calls into the job manager to enqueue the job. A row is created in `download_jobs` with status `queued` and a `job_origin` such as `library_download`.
+2. When a slot frees up, the job is promoted to `running`, `started_at` is set, and the worker thread starts acquiring pages.
+3. Pages are written to a staging area first. They are validated as image files before being considered acceptable for promotion.
+4. As the worker progresses, `current_page` and `total_pages` are updated.
+5. On a clean finish, the job moves to `completed`, `finished_at` is recorded, and validated pages are promoted into the local scan set according to the configured promotion policy.
+6. On a fatal error, the job moves to `error` and the failure reason is stored.
+7. If the user pauses or cancels, the worker first goes through `pausing` or `cancelling`, then settles into the corresponding terminal state.
+
+The pause and cancel transitions exist as their own states because the worker cannot stop instantaneously. Acknowledging the request and the actual stop are different events, and the lifecycle reflects that.
+
+## Why Staging Comes Before Promotion
+
+Staged pages are not the same thing as local scans. The job writes to a temporary directory under the configured temp root, validates each image, and only moves the result into the manuscript's `scans/` directory once promotion is allowed.
+
+The promotion policy is governed by the `storage.partial_promotion_mode` setting:
+
+- `never`: only fully completed runs promote staged pages into `scans/`;
+- `on_pause`: a clean pause also promotes the staged pages it managed to validate.
+
+This separation is what allows partial work to survive a restart without polluting the local scan set with half-validated images. It is also what lets `Retry missing` and `Retry range` make sense as targeted operations rather than always implying a full redownload.
+
+## Resume Safety
+
+Resume is a first-class operation, not a side effect. When a paused or interrupted job is resumed, the manager:
+
+- reuses the existing vault row instead of creating a duplicate;
+- skips pages that already exist in the staging or scan directory;
+- continues acquisition from where the previous run stopped;
+- transitions the row back through `queued` and `running` like any new job.
+
+This is why Library exposes `Retry missing` and `Retry range` separately from `Download full`: they all rely on the same resume-safe job model, but they scope the work differently.
 
 ## Export Jobs
 
-A typical export job:
+Export jobs follow the same overall lifecycle but live in their own job records and run through the export service. Each export job stores scope type, document ids, library identity, export format, output kind, page-selection mode, destination, progress counters, the final output path, and any terminal error.
+
+The route layer creates the job entry first and only then spawns the worker thread. That order is important: it lets the UI poll status, cancel active jobs, and retain history after completion, without depending on whether the worker has had time to start.
+
+On startup, the application also marks any export rows that were left in transitional states by a previous crashed run as `error`, so that stale jobs do not appear to be still running.
 
-1. starts from a profile or page-level action;
-2. records progress in local state;
-3. uses local or temporary remote assets depending on the profile;
-4. persists output artifacts under managed paths.
+## Job Origin
+
+Download jobs carry a `job_origin` field. Common values include `library_download`, `discovery_add_and_download`, and similar markers indicating where the job was triggered from. This is mostly diagnostic, but it lets the system distinguish between user-initiated downloads and chained operations when a problem needs to be traced.
 
 ## Why The Model Matters
 
-The job layer is the safety mechanism that keeps partial work understandable and recoverable.
+The job layer is the safety mechanism that keeps partial work understandable and recoverable. Without it, the application would silently lose progress on every interruption, retries would be all-or-nothing, and pause would either not exist or would corrupt the local scan set.
+
+With it, Scriptoria can run long acquisitions on flaky upstream providers, survive restarts cleanly, and let the user reason about their workspace in terms of states like `partial` and `complete` instead of just "did the download finish".
+
+## Related Docs
+
+- [Storage Model](storage-model.md)
+- [First Manuscript Workflow](../guides/first-manuscript-workflow.md)
+- [Discovery And Library](../guides/discovery-and-library.md)
+- [Export And PDF Model](export-and-pdf-model.md)
+- [Configuration Reference](../CONFIG_REFERENCE.md)