Public website for browsing and searching Romania's Monitorul Oficial Partea a II-a (parliamentary records).
This repo is the read-only frontend. The scraping, extraction, and Elasticsearch indexing pipeline lives in monitorul-ii. This app queries the indices produced by that pipeline.
For the data shape this app reads from, see docs/architecture.md.
bun install
bun run dev # http://localhost:3020| Command | What it does |
|---|---|
bun run dev |
dev server |
bun run build |
production build |
bun run start |
run the production build |
bun run lint / lint:fix |
oxlint |
bun run fmt / fmt:check |
oxfmt |
Bun is the runtime (Next is invoked via bun --bun).
ES_URL=https://es-1.example.com:9200,https://es-2.example.com:9200 # comma-separate for round-robin
ES_API_KEY=<read-only "monitorul_reader" API key minted by monitorul-ii es-init>
ES_VERIFY_CERTS= # leave unset for self-signed; set to 1 only on managed ES
EMBED_PROVIDER=local # local | cloud — picks the embed backend for hybrid search
EMBED_URL=http://127.0.0.1:8000 # `local` provider: FastAPI embedder (monitorul-ii)
EMBED_CLOUD_URL= # `cloud` provider: OpenAI-compatible /v1/embeddings URL
EMBED_CLOUD_TOKEN= # `cloud` provider: bearer token
EMBED_CLOUD_MODEL=bge-m3 # model id sent in the payload (default bge-m3)
QUERY_LOG_WRITE= # set to 1 to write search telemetry to monitorul_query_log
# S3-compatible storage (Cloudflare R2 in prod) for the original PDFs.
# Bucket stays PRIVATE — the server mints short-lived presigned URLs and
# 302-redirects from /mo/<year>/<part>/<issue>/pdf. Leave blank for
# search-only deployments; the PDF link is hidden when unset.
S3_ENDPOINT=https://<account>.r2.cloudflarestorage.com
S3_ACCESS_KEY_ID=
S3_SECRET_ACCESS_KEY=
S3_BUCKET=monitorul-ii
S3_REGION=auto
# Upstash Redis — sliding-window rate limit for the public MCP server.
# Without these, the limiter no-ops (dev only — set both in production).
UPSTASH_REDIS_REST_URL=
UPSTASH_REDIS_REST_TOKEN=
NEXT_PUBLIC_POSTHOG_KEY= # optional; unset disables browser analytics entirely
NEXT_PUBLIC_POSTHOG_HOST=/trace # first-party PostHog proxy; use https://eu.i.posthog.com without proxy
NEXT_PUBLIC_SITE_URL=https://monitorul.ai
Validated at startup by src/env.ts (via @t3-oss/env-nextjs + Zod). next dev and next build fail fast on missing or malformed values. Set SKIP_ENV_VALIDATION=1 to bypass (useful for lint-only CI). The monitorul_reader key and the local embedder both come from the monitorul-ii repo; NEXT_PUBLIC_SITE_URL controls absolute canonical URLs and JSON-LD @id values.
PostHog analytics is optional and anonymous/cookieless. When the key is set, custom events, $pageview, and $web_vitals (FCP, LCP, INP, CLS) are sent from src/components/posthog-provider.tsx; Do Not Track is respected, person profiles, autocapture, surveys, and session replay stay disabled. Web Vitals attribution is enabled so poor samples include element selectors and timing/resource breakdowns.
Auth + DB env vars (DATABASE_URL, DATABASE_DIRECT_URL, BETTER_AUTH_SECRET, BETTER_AUTH_URL, GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET) have their own provisioning recipe in docs/auth-setup.md — Neon project + branches, Google Cloud Console consent screen + two OAuth clients, Vercel env-var matrix, plus common-pitfall debugging.
Embed provider toggle. EMBED_PROVIDER=local (default) calls the FastAPI embedder at EMBED_URL — the dev path against the monitorul-ii box. EMBED_PROVIDER=cloud calls any OpenAI-compatible /v1/embeddings endpoint that serves BGE-M3 (1024-dim) at EMBED_CLOUD_URL with Authorization: Bearer $EMBED_CLOUD_TOKEN — required when running on Vercel where the local embedder is unreachable. Both sets of creds can stay populated in .env.local; flip the single EMBED_PROVIDER line to switch. Either provider missing creds → embed returns null → search silently degrades to BM25.
| Path | Status | Notes |
|---|---|---|
/ |
live | civic-gazette landing — archive stats register, three entry-point cards, discourse-analysis register (open card framed in ink-16, 2×2 grid of the four framework cells in their schema colors → /statistici + /despre/discurs), MCP register, methodology note. ISR 1h. |
/mo |
live | sessions register — per-year sparkbar over the whole archive plus the document list for the selected year (defaults to most recent active year, switch via ?year=YYYY). ?year= is non-canonical; alternates.canonical always points at /mo. JSON-LD CollectionPage, ISR 1h. |
/mo/[year]/[part]/[issue] |
live | document page — Cuprins (TOC) + inline body grouping every per-doc child under each agenda item in source order: speeches, votes, interpellations, questions, committee meetings (anchored #comisie-<position>, with linked committee name). Heading switches to "Ședințe" for committee_synthesis issues. Discourse stat strip between header and Cuprins when any speech is coded (DISCURSURI ANALIZATE · H≥1 · V≥1 · MARCHERI PRINCIPALI slots, with voice + confidence chip toggles); per-agenda Cuprins rows gain a <CuprinsMarkerIndicator> chip (e.g. 5 marcheri · H2 · V2); each speaker line in the body gains a compact [H=N V=N DQI=L*] <InlineSpeechChip> (hidden for clean speeches). JSON-LD, ISR 1h |
/mo/[year]/[part]/[issue]/pdf |
live | route handler — 302-redirects to a 5-min SigV4-presigned R2 URL for the original PDF. Bucket stays private (creds never leave the server). Hidden from the document page when S3_* env vars aren't set. |
/cauta?q=… |
live | hybrid speech search (BM25 + kNN/RRF), highlights, noindex, follow. Each hit shows a word count + 5-segment length meter, links the speaker to their politician page when person_id is populated, and ships a "Vezi în context →" link to the document anchor. Diacritic-insensitive: sosoaca matches indexed șoșoacă via the .folded subfields on text / agenda_title / speaker.name_search. Filter panel (<details> titled Filtre · N, expanded when any filter is active) covers multi-year selection (chip row backed by Radix Checkbox, multiple years compose: ?year=2024,2007; Alt an select adds older years to the same set), chamber and sort via shadcn RadioGroup rendered as chips, speaker via <SpeakerCombobox> (shadcn command + popover, hits /api/search/persons), grup parlamentar at the time of the speech via shadcn Select with top-12 + Alte optgroup, speech length via XS/S/M/L/XL chips (?length=s,m), and a procedural-toggle shadcn Checkbox. Active filters render as removable shadcn Badge chips above the result list (one badge per year or length bucket). URL params are all-English (?q&page&from&to&year&chamber&speaker&party&length&procedural&sort); the form is plain method="GET" with a 'use client' wrapper that strips empty inputs, joins year and length checkboxes into comma-separated values, and uses router.push for soft navigation. |
/politicieni |
live | politicians register — per-year sparkbar plus the top 100 most active politicians for the selected year. Inline Ordonare: toggle flips between substantive-only (default, ?mode=substantive) and all interventions including procedural turn-taking (?mode=all). Defaults to most recent active year, switch via ?year=YYYY. JSON-LD CollectionPage, ISR 1h. |
/politicieni/[slug] |
live | person profile — name, mandates, paginated recent speeches (20 per page; each row shows word count + 5-segment length meter), JSON-LD Person, ISR 1h. Discourse trajectory panel when the politician has any coded speeches: framework tab strip (Populism · Anti-pluralism · DQI · Voce), career-wide coverage band (per-year coded fraction, hashed bars for un-coded years), aggregate band (per-month stacked H=0/1/2 / V=0/1/2 / DQI L0–L3 distribution), and a speech-dot scatter (one dot per coded speech, click → /discurs/<slug>, dot color encodes the secondary axis, dot size = marker_count). Voce tab swaps the scatter for a stacked-area voice-mix chart. Voice + confidence chip toggles preserved across the framework swap. |
/comisii |
live | committees register — per-year meeting sparkbar plus the top 100 most active committees for the selected year, derived live from mo-committee-meetings (no upstream mo-committees index). Defaults to most recent active year, switch via ?year=YYYY. JSON-LD CollectionPage, ISR 1h. |
/comisii/[committee_id] |
live | committee profile — name, kind, joint-with, total / first / last meeting, per-year meeting sparkbar, list of meetings for the selected year (date, agenda preview, attendance, outcome counts). JSON-LD GovernmentOrganization, ISR 1h. |
/statistici |
live | discourse-analysis dashboard — four panels over the four-prompt analysis layer (Hawkins populism, V-Party anti-pluralism, DQI deliberative quality, voice attribution): (1) system-wide H≥1 / V≥1 monthly time series, (2) H × V cross-tab heatmap with the iliberal cluster (H=2 + V≥1) annotated, (3) three mini-rankings (top H, V, DQI politicians, with Wilson 95% CI on rate), (4) marker-kind treemap. Filter chips at top: year / chamber / voice / confidence. Methodology + model disclaimer block at the bottom. ISR 1h, JSON-LD WebPage. |
/discurs/[slug] |
live | individual speech page — speaker (linked to /politicieni/<person_id> when populated), agenda category + outcome, agenda title (linked back to #agenda-<ord> on the parent doc), full speech body paragraph-split, word count + length meter, bill refs, "În contextul ședinței →" link to #discurs-<position>. Discourse overlay when codings exist: <SpeechDiscourseSummary> strip showing per-framework score + confidence dot + producer footnote, then the speech body wraps each evidence span in a voice-encoded <mark> (background = voice attribution: azure for first-person, italic-gray for quoted/reported, strikethrough for negated, dotted underline for apophasis) with a numbered margin chip per marker, and a right-rail <DiscourseSidePanel> listing one card per marker (framework + kind + confidence + voice + rationale + verbatim evidence). Bidirectional click-and-flash links body spans ↔ panel cards. Voice + confidence chip toggles ride at the top of the panel; URL ?voice=all and ?conf=07 are share-link safe. Empty / pre-coverage states render plain body + a "neacoperit încă" footer linked to /despre. Slug-once: server matches on the trailing <short_id> only — variant slug-prefixes 308-redirect to canonical url_path. index, follow for substantive speeches; noindex, follow for procedural turns. JSON-LD Quotation + Person + isPartOf, ISR 1h. |
/despre |
live | methodology page split into two parts. Part I (Pentru toți cititorii) is plain Romanian for non-technical readers: what the archive is, where data comes from, what it covers, why links stay stable, how search works, and how to flag errors. Part II (Detalii tehnice) carries the 8-step pipeline register, the record_id / content_fingerprint / slug-once URL contract, and the BM25 + kNN/RRF search internals. Footer's Metodologie column links into #identitate and #corectii (Part I); cross-refs jump from Part I down to #identitate-tehnic / #cautare-tehnic. JSON-LD AboutPage, ISR 1h. |
| agenda/vote/etc. | not yet wired | linked from chrome but ship in subsequent phases |
All ES interaction goes through src/lib/search.ts — the only path from app code to Elasticsearch.
A public, OAuth-authenticated Model Context Protocol server is mounted on the same Next.js app:
| URL | Purpose |
|---|---|
https://monitorul.ai/mcp |
Presentation page — what it is, how to plug it into AI clients |
https://monitorul.ai/mcp/server |
Streamable-HTTP endpoint — paste this into your client's MCP config |
https://monitorul.ai/cont |
Account page — connected clients, revoke, sign-out |
It exposes 16 Zod-typed tools that wrap the same lib/search.ts functions the web pages use, so any MCP-capable client (Claude Desktop, Cursor, Codex, claude.ai) can ask multi-step questions over the corpus and get back hits pinned to one-click-verifiable URLs on this site.
Prima conectare: the JSON below is identical to anonymous-MCP setup — mcp-remote (and the native MCP clients) handle OAuth transparently. The first request opens a browser to sign in with Google, you accept the consent screen, and the client caches the access + refresh tokens locally. Subsequent calls are silent until you revoke the client from /cont.
Claude Desktop — add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"monitorul-ai": {
"command": "npx",
"args": ["-y", "mcp-remote", "https://monitorul.ai/mcp/server"]
}
}
}Cursor / Cline — point the client's MCP config at https://monitorul.ai/mcp/server directly (both speak streamable HTTP + OAuth natively).
Codex CLI — same mcp-remote shape as Claude Desktop, in ~/.codex/config.toml.
Local dev — same shape, swap the URL: http://localhost:3020/mcp/server (with --allow-http for mcp-remote).
A typical session starts with describe_corpus (chambers, topics, counts, URL templates) and chains downstream — e.g. search_persons → person_page → search_speeches(speaker_person_id=...) → get_speech for verbatim quotes. The full tool surface is documented in docs/mcp.md, and the user-facing presentation lives at /mcp.
Internally the route file is at src/app/mcp/server/route.ts. The handler tells mcp-handler to dispatch on the literal pathname /mcp/server (streamableHttpEndpoint config), so the public URL and the file location agree — no rewrites involved.
Rate limit: 30 req/min/IP and 30 req/min/user (general); 6 req/min/IP and 6 req/min/user (heavy: RRF / kNN-only search_speeches). Both axes must clear; either 429s short-circuits. Without UPSTASH_REDIS_REST_URL + UPSTASH_REDIS_REST_TOKEN set, the limiter no-ops (dev only — production must configure both).
Automated via release-please on push to main. Use Conventional Commits.