Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions docs/CONFIG_REFERENCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ Default download policies used when library-specific override is not enabled.

## `settings.network.libraries.<library>`

Libraries supported: `gallica`, `vaticana`, `bodleian`, `institut_de_france`, `unknown`.
Libraries supported: `gallica`, `vaticana`, `bodleian`, `institut_de_france`, `internet_culturale` (BETA), `unknown`.

**HTTPClient Integration**: These settings are used by the centralized `HTTPClient` class for per-library network policies (rate limiting, retry, backoff, concurrency).

Expand All @@ -145,7 +145,7 @@ Global-only fields (never overridden by library):

Library override fields (used only when `use_custom_policy=true`):
- `enabled` (`bool`, default: `true`)
- `use_custom_policy` (`bool`, default: `true` for `gallica`, otherwise `false`)
- `use_custom_policy` (`bool`, default: `true` for `gallica` and `internet_culturale`, otherwise `false`)
- When `true`, library-specific settings override global defaults
- When `false`, global defaults from `settings.network.download.*` are used
- `workers_per_job` (`int`, `1..8`)
Expand Down Expand Up @@ -432,8 +432,9 @@ Discovery search configuration. Editable from Settings > Discovery tab in the we
- `max_results_per_provider` (`int`, default: `20`)
- Maximum number of results returned by each search provider per query.
- Clamped to [1, 50] at runtime and on save.
- For paginatable providers (Archive.org, Harvard, LOC, Gallica), additional results can be loaded via the "Carica altri risultati" button.
- For paginatable providers (Archive.org, Harvard, LOC, Gallica, Internet Culturale (BETA)), additional results can be loaded via the "Carica altri risultati" button.
- Non-paginatable providers (Vatican, Bodleian, Cambridge, Heidelberg, Institut, e-codices) return at most this many results from a single API call.
- For Internet Culturale (BETA) the upstream page size is fixed at 20 regardless of `max_results_per_provider`; the "has more" check relies on the authoritative `totalPages` parsed from the HTML instead of the result cap.

## Migration Notes

Expand Down
2 changes: 2 additions & 0 deletions docs/guides/discovery-and-library.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ Discovery also reflects provider-specific result behavior. Some providers can ex

The practical posture is to treat Discovery as a normalized gateway, not as proof that every library offers the same search ergonomics.

Internet Culturale **(BETA)** is a special case worth calling out explicitly. It sits at the bottom of the provider select because the integration is experimental: useful when ICCU is the only channel to reach an Italian record, but less reliable than any native IIIF provider. It is an aggregator that fronts around fifty Italian libraries (Laurenziana, Marciana, BNCF, BNCR, Estense, and many smaller partners) and it routinely returns thousands of results for a single keyword. Scriptoria shows the upstream total as "Mostrati X di Y risultati" so the size of the result set is visible, and "Carica altri risultati" walks through the remaining pages twenty at a time. Because the upstream does not expose a IIIF manifest directly, the manifest used internally is converted on-the-fly from ICCU's MAG/XML document; partial records (those declaring more pages than the server actually serves) are still saved as partial scans rather than failing outright, but expect occasional teaser records where only the frontispiece is really available.

## What Library Does

Library is the local catalog of manuscript records and their current working state.
Expand Down
6 changes: 5 additions & 1 deletion docs/reference/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,14 +78,18 @@ It is split into three layers:
- `settings.network.download.*` for default document download behavior;
- `settings.network.libraries.<provider>.*` for provider-specific overrides.

The supported provider keys under `settings.network.libraries.*` are `gallica`, `vaticana`, `bodleian`, `institut_de_france`, `internet_culturale` **(BETA)**, and `unknown`. Setting `use_custom_policy: false` on a library makes it inherit the `settings.network.download.*` defaults; `true` activates the per-library override fields.

`internet_culturale` (BETA) ships with a conservative default policy (2 workers per job, 1.0–3.0s delay, 300s cooldown on 403/429, 40 requests per 60s burst window) because the ICCU aggregator is a shared infrastructure and is noticeably less tolerant than large IIIF-native providers.

You touch this family when:

- a provider rate-limits too aggressively;
- downloads need to be slower or more parallel;
- one library needs stricter policy than the global default;
- you want reproducible network behavior across machines.

This family is directly reflected in the Settings `Network` pane.
This family is directly reflected in the Settings `Network & Libraries` pane, which exposes per-library override cards for each supported provider key.

### `settings.images.*`

Expand Down
26 changes: 24 additions & 2 deletions docs/reference/provider-support.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ The shared provider registry currently exposes these providers:
- Harvard University
- Library of Congress
- Internet Archive
- Internet Culturale (ICCU) **[BETA]**
- generic direct IIIF manifest URL

These entries come from the runtime provider registry in `src/universal_iiif_core/providers.py`, which is the source used by both UI and shared resolution logic.
Expand Down Expand Up @@ -56,6 +57,7 @@ The provider has a stronger search-first experience and can reasonably be used f
| Harvard | DRS-bearing item URL | `fallback` | Usually best treated as URL-driven |
| Library of Congress | public item URL | `fallback` | Prefer `loc.gov/item/...` URLs |
| Internet Archive | item URL or text query | `search_first` | Good discovery-first behavior for many cases |
| Internet Culturale **[BETA]** | text query, OAI ID, or magparser/viewresource URL | `search_first` | Gateway for ~50 Italian libraries (Laurenziana, Marciana, BNCF/BNCR, Estense, Marucelliana, Ambrosiana partners, etc.). Integration is experimental: many upstream records are incomplete and image quality is variable. Use only when ICCU is the only channel available |
| Generic / direct manifest | exact manifest URL | `direct` | Use only when you already have a valid IIIF manifest URL |

## Per-Provider Notes
Expand Down Expand Up @@ -100,6 +102,23 @@ Library of Congress is best approached with the public `loc.gov/item/...` page a

Internet Archive supports a more discovery-first workflow than many other providers in the registry. It is usually comfortable both for direct item URLs and for broad text search.

### Internet Culturale (ICCU) **[BETA]**

Internet Culturale is the Italian national aggregator run by ICCU. The integration is currently **BETA**: it is good enough to reach content that is otherwise unreachable from Scriptoria, but far from the reliability of native IIIF providers. Treat it as a last-resort channel when no other provider covers the item.

Unlike the other providers in the registry, ICCU does not expose a native IIIF Presentation manifest. Instead, Scriptoria fetches the upstream MAG/XML document (the `jmms/magparser` endpoint) and converts it to a IIIF v2 manifest on the fly. Canvas image URLs come from the real `src` attribute of each `<page>` element — the `/jmms/thumbnail?page=N` endpoint ignores the page parameter and must not be used.

Search is HTML scraping over the advanced search page, paginated with `pag=N` (not `paginate_pageNum`, which the server silently ignores). The parser extracts the total result count and total pages so the UI can show "Mostrati X di Y risultati" and enable "Carica altri". Typical result set sizes are in the thousands.

Known BETA limitations:

- Many ICCU records are "teaser" entries: the MAG XML declares several pages but only the first image is actually served upstream. The downloader applies a partial-finalize mode for ICCU manifests so partial downloads still land correctly in `scans/`, but the user-visible experience is still "you asked for N pages and only got M".
- Image quality and resolution vary widely between teche and between records in the same teca.
- The external viewer path used by Scriptoria is the canonical `/jmms/iccuviewer/iccu.jsp?id=...&mode=all&teca=...`. The older `viewresource` URL renders as a blank page for some teche (BNCF in particular).
- For Mirador-based local reading Scriptoria exposes an internal proxy endpoint, `/api/iccu/manifest?url=...`, that serves the converted manifest as JSON with CORS-friendly headers.
- The ICCU Image API v2.1 does exist at `internetculturale.it/iiif/image/2.1/{id_b64}/...` but is level 0 only (no tile server, no zoom). Static fullsize is the only tier available regardless of which download path is chosen.
- For native IIIF access to Biblioteca Estense records, prefer a dedicated Estense provider (Jarvis backend) when available, rather than going through ICCU.

### Generic Direct Manifest

This path is intentionally simple. It exists for the case where the source is IIIF-compatible but not covered by one of the dedicated resolvers. Scriptoria expects a valid direct manifest URL and does not try to infer provider-specific behavior beyond that.
Expand All @@ -112,9 +131,12 @@ In both cases, the registry metadata already assumes that browser-assisted searc

## Provider Filters

The current provider registry exposes one dedicated provider filter: the `Gallica` material type filter.
The current provider registry exposes two dedicated provider filters:

- `Gallica` — material type (all, manuscripts, printed books).
- `Internet Culturale` **[BETA]** — material type (all, `Manoscritto`, `Libro moderno`, `Musica`, `Fotografia`).

It lets users narrow Gallica results to all materials, manuscripts, or printed books. More provider-specific filters can be added later, but only when the upstream service and the user workflow justify them.
Both filters map directly to server-side parameters and survive pagination, so "Carica altri" preserves the selected material type. More provider-specific filters can be added later, but only when the upstream service and the user workflow justify them.

## How To Choose The Right Input

Expand Down
34 changes: 32 additions & 2 deletions src/studio_ui/components/discovery_results.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from fasthtml.common import H3, A, Button, Div, Img, P, Span


def _provider_viewer_fallback(library: str, doc_id: str, ark: str = "") -> str:
def _provider_viewer_fallback(library: str, doc_id: str, ark: str = "", manifest_url: str = "") -> str:
if library == "Gallica" and ark:
return f"https://gallica.bnf.fr/{ark}"
if library == "Gallica" and doc_id:
Expand All @@ -21,6 +21,12 @@ def _provider_viewer_fallback(library: str, doc_id: str, ark: str = "") -> str:
return f"https://digital.bodleian.ox.ac.uk/objects/{doc_id}"
if library == "Archive.org" and doc_id:
return f"https://archive.org/details/{doc_id}"
if library == "Internet Culturale" and manifest_url:
from universal_iiif_core.resolvers.mag_parser import build_viewer_url, extract_oai_and_teca_from_url

oai, teca = extract_oai_and_teca_from_url(manifest_url)
if oai and teca:
return build_viewer_url(oai, teca)
return ""


Expand All @@ -30,12 +36,15 @@ def _resolve_viewer_url(data: dict) -> str:
raw = data.get("raw")
if not viewer_url and isinstance(raw, dict):
viewer_url = str(raw.get("viewer_url") or "").strip()
if not viewer_url:
viewer_url = str(data.get("source_detail_url") or "").strip()
if viewer_url:
return viewer_url
return _provider_viewer_fallback(
str(data.get("library") or ""),
str(data.get("id") or ""),
str(data.get("ark") or ""),
str(data.get("url") or data.get("manifest") or ""),
)


Expand Down Expand Up @@ -72,6 +81,7 @@ def _render_load_more_section(pagination: dict | None) -> Div | str:
"library": pagination["library"],
"shelfmark": pagination["shelfmark"],
"gallica_type": pagination.get("gallica_type", "all"),
"ic_type": pagination.get("ic_type", "all"),
"page": page + 1,
}
)
Expand Down Expand Up @@ -249,14 +259,34 @@ def _build_result_cards(results: list) -> list:
return cards


def _results_header_text(results: list, pagination: dict | None) -> str:
"""Build the 'Trovati N risultati' header, using total search size when known."""
shown = len(results)
total = 0
if results:
raw = results[0].get("raw") if isinstance(results[0], dict) else None
if isinstance(raw, dict):
try:
total = int(raw.get("_search_total_results") or 0)
except (TypeError, ValueError):
total = 0
page = int((pagination or {}).get("page") or 1)
per_page = shown if shown else 0
if total and per_page:
seen = min(page * per_page, total)
return f"Mostrati {seen} di {total} risultati"
return f"Trovati {shown} risultati"


def render_search_results_list(results: list, *, pagination: dict | None = None) -> Div:
"""Render list of search results aligned with global app theme."""
cards = _build_result_cards(results)
load_more = _render_load_more_section(pagination)
header_text = _results_header_text(results, pagination)

return Div(
Div(
H3(f"Trovati {len(results)} risultati", cls="text-lg font-semibold text-slate-900 dark:text-slate-100"),
H3(header_text, cls="text-lg font-semibold text-slate-900 dark:text-slate-100"),
Span(
"Seleziona un risultato per aggiungerlo in Libreria o avviare il download.",
cls="text-xs text-slate-500",
Expand Down
14 changes: 9 additions & 5 deletions src/studio_ui/components/library_stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,11 +236,15 @@ def render_stats_page_content(manuscripts: list[dict]) -> Div:
)

recent = manuscripts[:6]
recent_panel = Div(
P("Ultimi aggiornati", cls="text-xs uppercase tracking-widest text-slate-500 dark:text-slate-400 mb-2"),
Ul(*[_recent_activity_row(m) for m in recent], cls="divide-y divide-slate-100 dark:divide-slate-800"),
cls=_CARD_CLS + " mb-6",
) if recent else Div()
recent_panel = (
Div(
P("Ultimi aggiornati", cls="text-xs uppercase tracking-widest text-slate-500 dark:text-slate-400 mb-2"),
Ul(*[_recent_activity_row(m) for m in recent], cls="divide-y divide-slate-100 dark:divide-slate-800"),
cls=_CARD_CLS + " mb-6",
)
if recent
else Div()
)

detail_placeholder = Div(
id="stats-detail-panel",
Expand Down
21 changes: 21 additions & 0 deletions src/studio_ui/components/settings/panes/network.py
Original file line number Diff line number Diff line change
Expand Up @@ -490,6 +490,20 @@ def _build_network_pane(cm, s):
**{"data-network-tab-pane": "institut_de_france"},
)

internet_culturale_section = Div(
_build_network_library_card(
title="Internet Culturale (ICCU) [BETA]",
policy_key="internet_culturale",
policy_cfg=libraries_cfg.get(
"internet_culturale",
defaults["libraries"]["internet_culturale"],
),
global_cfg=global_cfg,
),
cls="hidden",
**{"data-network-tab-pane": "internet_culturale"},
)

return Div(
Div(H3("Network & Libraries", cls="text-lg font-bold text-slate-800 dark:text-slate-100 mb-3")),
P(
Expand Down Expand Up @@ -527,13 +541,20 @@ def _build_network_pane(cm, s):
cls="app-btn app-btn-neutral",
**{"data-network-tab-btn": "institut_de_france"},
),
Button(
"Internet Culturale [BETA]",
type="button",
cls="app-btn app-btn-neutral",
**{"data-network-tab-btn": "internet_culturale"},
),
cls="flex items-center flex-wrap gap-2 mb-4",
),
global_section,
gallica_section,
vaticana_section,
bodleian_section,
institut_section,
internet_culturale_section,
_network_subtabs_script(),
cls="p-4",
data_pane="network",
Expand Down
6 changes: 3 additions & 3 deletions src/studio_ui/routes/_studio/manifest_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@
from fasthtml.common import Div

from universal_iiif_core.config_manager import get_config_manager
from universal_iiif_core.http_client import get_http_client
from universal_iiif_core.iiif_logic import total_canvases as manifest_total_canvases
from universal_iiif_core.logger import get_logger
from universal_iiif_core.resolvers.manifest_fetch import fetch_manifest_dict
from universal_iiif_core.services.storage.vault_manager import VaultManager

from .ui_utils import _with_toast
Expand Down Expand Up @@ -96,7 +96,7 @@ def _load_studio_manifest_context(
return manifest_json, initial_canvas, True

if remote_manifest_url:
remote_manifest = get_http_client().get_json(remote_manifest_url, retries=2) or {}
remote_manifest = fetch_manifest_dict(remote_manifest_url, retries=2) or {}
if isinstance(remote_manifest, dict) and remote_manifest:
return remote_manifest, _resolve_initial_canvas(remote_manifest, page), False

Expand Down Expand Up @@ -126,7 +126,7 @@ def _resolve_manifest_for_selected_source(
) -> tuple[dict, str | None, bool, str, str]:
manifest_exists_local = manifest_path.exists()
if read_source_mode == "remote" and remote_manifest_url:
remote_manifest = get_http_client().get_json(remote_manifest_url, retries=2) or {}
remote_manifest = fetch_manifest_dict(remote_manifest_url, retries=2) or {}
if isinstance(remote_manifest, dict) and remote_manifest:
return (
remote_manifest,
Expand Down
3 changes: 3 additions & 0 deletions src/studio_ui/routes/_studio/workspace.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from studio_ui.components.studio.tabs import render_studio_tabs
from universal_iiif_core.config_manager import get_config_manager
from universal_iiif_core.logger import get_logger
from universal_iiif_core.resolvers.mag_parser import is_iccu_magparser_url
from universal_iiif_core.services.ocr.storage import OCRStorage
from universal_iiif_core.services.storage.vault_manager import VaultManager
from universal_iiif_core.utils import load_json
Expand Down Expand Up @@ -197,6 +198,8 @@ def _resolve_workspace_manifest_context(
local_pages_count=int(inventory.local_pages_count),
manifest_exists_local=manifest_exists_local,
)
if is_iccu_magparser_url(manifest_url):
manifest_url = f"/api/iccu/manifest?url={quote(manifest_url, safe='')}"
return {
"manifest_url": manifest_url,
"manifest_json": manifest_json,
Expand Down
1 change: 1 addition & 0 deletions src/studio_ui/routes/discovery.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ def setup_discovery_routes(app):
app.post("/api/discovery/load_more")(discovery_handlers.load_more_results)
app.post("/api/library/add_prefetch_light")(discovery_handlers.add_to_library)
app.get("/api/discovery/pdf_capability")(discovery_handlers.pdf_capability)
app.get("/api/iccu/manifest")(discovery_handlers.serve_iccu_manifest)
app.post("/api/start_download")(discovery_handlers.start_download)
app.get("/api/download_status/{download_id}")(discovery_handlers.get_download_status)
app.post("/api/cancel_download/{download_id}")(discovery_handlers.cancel_download)
Expand Down
Loading
Loading