Skip to content

Advanced email search: structured filters, sort, date-integrity (closes #288, #247, #298, #304, #372)#382

Closed
kisst wants to merge 51 commits into
LogicLabs-OU:mainfrom
kisst:feat/advanced-search-381
Closed

Advanced email search: structured filters, sort, date-integrity (closes #288, #247, #298, #304, #372)#382
kisst wants to merge 51 commits into
LogicLabs-OU:mainfrom
kisst:feat/advanced-search-381

Conversation

@kisst
Copy link
Copy Markdown

@kisst kisst commented May 25, 2026

Refs #381 (umbrella). Closes #288, #247, #298, #304, #372.

This PR ships the full advanced-search initiative as one cohesive change. The work is split into 51 atomic commits organized by the master plan in docs/plans/advanced-search.md, so reviewers can audit each slice (P0 → P6) independently even though the merge is one PR.

What this PR delivers

Bug fixes

New features

  • Structured filters across from, to, cc, bcc, subject, timestamp, path, tags, hasAttachments, sizeBytes, isOnLegalHold, threadId, attachments.sha256, ingestionSourceId, userEmail.
  • Sort by date / sender / subject / size (closes Need sorting feature by date in search result. #298, Search in Date Range / Sort Columns #304).
  • Persisted matching-strategy default via localStorage (closes Set default search option or remember. #247).
  • Result-card meta strip showing path, attachment count, tags, size, legal-hold lock.
  • Advanced filter panel with date-range presets, address chip input with op toggle, multi-select source filter, dual include/exclude path lists, tri-state booleans, size range with unit selection, free-text tag chips.
  • Mobile filter UI as a side-drawer Sheet.
  • "Archived on" row in the email detail view, separate from the original Date header.

Backend infrastructure

  • Hardened filter translator at packages/backend/src/services/search/filterTranslator.ts — field allowlist, op allowlist per kind, escaped-quote rendering, ingestionSourceId group expansion preserved, dotted-key flattening for attachments.sha256, path.exclude support.
  • Reindex orchestrator (reindexQueue + processor + worker + ReindexService + admin REST endpoints at /api/v1/admin/reindex) for re-indexing email subsets into Meilisearch after schema changes.
  • Date backfill worker (dateBackfillQueue + processor + worker + DateBackfillService + admin REST endpoints at /api/v1/admin/jobs/date-backfill) — streams archived_emails cursor-paginated, re-parses .eml from storage, recomputes sentAt and originalDateSource, enqueues reindex only for changed rows. Idempotent resume via the date_backfilled_at column. CLI script at packages/backend/scripts/run-date-backfill.ts.
  • Extended Meilisearch index settings (path, tags, hasAttachments, sizeBytes, isOnLegalHold, threadId, attachments.sha256 all filterable; subject, sizeBytes, from all sortable; maxValuesPerFacet: 10000 to fix fix: increase maxValuesPerFacet to fix Top Senders dashboard panel #363).
  • New schema columns: archived_emails.original_date_source ('header' | 'received' | 'unknown', default 'header') and archived_emails.date_backfilled_at (nullable). Migration 0035_lonely_roulette.sql.

Frontend

  • 10 new shadcn-svelte UI primitives wrapping bits-ui v2 (popover, calendar, range-calendar, date-picker, command, combobox, toggle-group, tabs, sheet, tooltip).
  • Search route migrated from GET /search?keywords= to POST /search with full structured body.
  • URL state is the source of truth — every filter encodes to flat URL params, every URL decodes back to the same filter draft. Tests cover the round-trip for every filter type, including special chars and multi-value chip lists.
  • Vitest scaffolding for both backend and frontend; 82 backend tests + 97 frontend tests, all green.

Decisions recorded for review

The master plan flagged 10 open questions (docs/plans/advanced-search.md §7). Decisions taken — push back on any of these and I'll adjust:

# Decision Rationale
Q1 Default limit 25 on POST, 10 on GET shim Spec example said 25; back-compat preserved for GET.
Q2 Empty body → returns all results capped by limit CASL filter already bounds per-user; useful as a "browse my archive" entry point.
Q3 No separate search:filter permission Overkill for v1; one permission suffices.
Q4 Backfill is manual via CLI / admin REST Auto-run on upgrade would hammer storage for hours on large archives.
Q5 No sent_at_idx DB index added Out of P2 scope; tracked as a 3-line follow-up.
Q6 userPreferences stays localStorage in v1 DB-backed prefs is a separate feature; localStorage carries a documented migration path.
Q7 Vitest yes, Playwright no Playwright harness deserves its own focused PR.
Q8 Tags filter is free-text, no /v1/tags endpoint yet Autocomplete is a follow-up.
Q9 Mailbox filter is feature-flagged off pending /v1/archive/mailboxes endpoint Avoids shipping a filter that returns inaccurate results.
Q10 OpenAPI regenerated as part of this PR Manual per-PR for now.

Rollout

The migration is non-destructive but must run before the new code starts ingesting:

1. Apply migration 0035_lonely_roulette.sql (sent_at nullable + new columns)
2. Deploy backend + workers (new emails get correct dates or null)
3. Run backfill for existing rows:
   pnpm --filter @open-archiver/backend exec node scripts/run-date-backfill.ts
   # or POST /api/v1/admin/jobs/date-backfill
4. Trigger reindex to populate new EmailDocument fields on existing rows:
   POST /api/v1/admin/reindex with { "scope": "full" }

Estimated reindex time: a few minutes per 100k emails on local storage, longer on S3 (storage-bound). The queue is pausable.

Commit map (51 atomic commits)

Reviewable by slice. Each commit builds independently and passes its own tests.

  • P0edd975e test scaffolding
  • P1 (API search not using filters parameter #288) — 9ca2f91 types, 75f2a88 translator, b993932 service wiring, 7a1cb6b POST endpoint, c0ea672 tests
  • P2-types1007839
  • P2-migrationdb41c2b schema + migration
  • P2-code98fa81d extractor, fea8369 connectors, 433bced services, 4a53d78 OpenAPI
  • P2-frontend77d32d4 i18n, ee2c7c9 UI null-handling, 4b85765 Archived-on row
  • P2-backfill3afd81c queue/worker, 24e110b service+REST, bcfca5c CLI, 0713225 tests, 74c6e75 OpenAPI
  • P347e688a types, c8aa586 Meili settings, 98882ef translator allowlist, e606769 indexer, 053da53 reindex orchestrator, 755158e tests, 811a2ae OpenAPI
  • P4ad767676 UI primitives
  • P4b885f4ab SearchResults extraction
  • P4c811a99b URL state, d4d90dd preferences, ce1e6c4 tests
  • P4d7efb494 filter components, 8b3b73a panel, 495f189 POST switch, c700a15 meta strip, ac3b45d i18n, c96c825 clear-all fix
  • P5 (Need sorting feature by date in search result. #298, Search in Date Range / Sort Columns #304) — 2559f83 types widening, 3d17f97 sort precedence, 94824ae SortControl, 510ee56 i18n
  • P6 (Set default search option or remember. #247) — bdd1881 preference UI, 6f21235 i18n
  • Verification35bc1dc jsdom, 135e01b SQL annotation, 46467f1 vi.hoisted, 37f0ee1 vitest alias, 5a2d3f2 OpenAPI regen
  • Misce6e5269 plan doc, 5a05fce bg locale fix

Test plan

  • Apply migration on a copy of production DB; confirm \d archived_emails shows sent_at nullable plus the two new columns.
  • Ingest a new email via IMAP; confirm sentAt matches the original Date header.
  • Ingest an email with the Date: header removed; confirm sentAt is null and detail view shows "Original date unknown" with "Archived on: ..." separately.
  • POST /v1/search { "filters": { "from": { "op": "contains", "value": "@acme.com" } } } returns matches with no query.
  • GET /v1/search (no keywords) returns 200 with Deprecation: true header (regression for API search not using filters parameter #288).
  • POST /v1/admin/jobs/date-backfill returns a jobId; GET /:jobId/status reports progress.
  • POST /v1/admin/reindex { "scope": "full" } re-indexes the corpus; POST /v1/search { "filters": { "hasAttachments": true } } returns matches after completion.
  • Open /dashboard/search; advanced filter panel toggles open; apply a date-range + From filter; URL contains both as flat query params; reload preserves filters.
  • Set matching strategy to "Verbatim", click "Set as default", reload — strategy persists.
  • Sort dropdown "Largest first" — results reorder, URL contains sort=sizeBytes:desc.
  • Mobile viewport: filter trigger opens a side-sheet.

Follow-ups (not in this PR)

  • Playwright e2e harness (#TBD)
  • /v1/archive/mailboxes endpoint to unblock the MailboxFilter
  • /v1/tags endpoint or facet-driven tag autocomplete
  • DB index on archived_emails.sent_at for retention-policy queries
  • Promotion of userPreferences from localStorage to DB-backed table
  • Translation fan-out for 10 non-English locales (en.json shipped; others fall back to English until then)

cc / refs: #146 #244 #130 #40 #212 #137 #363

kisst added 30 commits May 25, 2026 12:24
Adds vitest.config.ts and one smoke test per package, plus
test/test:watch scripts. Dependency declared in package.json;
install deferred to verification phase.
…bs-OU#372)

Allow archived emails to record that the original Date header was missing
or unparseable, rather than silently substituting the ingestion timestamp.
Adds OriginalDateSource union and ArchivedEmail.originalDateSource.
Adds popover, calendar, range-calendar, date-picker, command,
combobox, toggle-group, tabs, sheet, and tooltip components.
No business-logic consumers yet; primitives only.
The Bulgarian locale ships in packages/frontend/src/lib/translations/bg/
and the settings page already lists it as a selectable language, but the
type union omitted it, causing a svelte-check error on the settings page.

Pre-existing on main; surfacing it during this PR's svelte-check runs.
kisst added 21 commits May 25, 2026 13:34
@github-actions
Copy link
Copy Markdown


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@kisst kisst marked this pull request as draft May 25, 2026 12:50
@kisst kisst closed this May 25, 2026
@github-actions github-actions Bot locked and limited conversation to collaborators May 25, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Need sorting feature by date in search result. API search not using filters parameter Set default search option or remember.

1 participant