Fix stats search by tiborrr · Pull Request #3322 · eikek/docspell

tiborrr · 2026-06-18T16:43:36Z

Why this PR

Opening the dashboard was slow on my instance (~800k docs). Each box that shows search summaries triggered POST /searchStats, and the dashboard fired several of those at once — often the same query more than once. On a database with a large custom_field_value table, a single stats request could take 7–12 seconds even for a default empty query.

Before — slowest query on dashboard load (general profile, fieldCount):

SELECT COUNT(DISTINCT cf.id) AS num
FROM custom_field_value cvf
INNER JOIN custom_field cf ON (cf.id = cvf.field AND cf.coll_id = $1)
WHERE cvf.item_id IN (
  SELECT i.itemid FROM item i
  WHERE i.coll_id = $2 AND i.state IN ($3, $4) AND i.folder_id IS NULL
) AND NOT cf.ftype IN ($5, $6)

~11.7s on ~895k matching items (full table scan of custom_field_value against a huge IN subselect).

Solution

1. Split `searchStats` into profiles

Introduce statsProfile so callers fetch only what they need:

Profile	count	tagCloud	fieldStats	folderStats	dimension lists	fieldCount / orgCount / …
`full`	✓	✓	✓	✓	✓	—
`general`	✓	✓	—	—	—	✓
`fields`	✓	—	✓	—	—	—

Backend: POST /sec/item/searchStats/{profile} (path > body statsProfile > default full).
Dashboard: Fetches general + fields instead of repeated full stats; StatsCache dedupes so boxes sharing a query hit the backend once per profile.
Backward compatible: POST /searchStats without a profile still returns full.

2. Large-database field-stats optimizations

Three targeted fixes in QItemFieldStats (verified on PostgreSQL with ~895k items):

Issue	Fix
OOM loading ~895k item IDs into the JVM	Count first; only load IDs when match set ≤ 200
`fields` profile ~12s aggregating text fields the UI never shows	For large match sets, only aggregate money/numeric fields (`SearchStatsView` filters to `sum > 0` anyway)
`general.fieldCount` ~12s via `cvf IN (subselect)`	cf-first `EXISTS` probe using `custom_field_value.field` index

After: fields aggregation for text-only collectives completes in <1ms; fieldCount uses the fast EXISTS pattern instead of the 11s IN subselect.

API / docs

New StatsProfile type (full | general | fields).
OpenAPI: profile matrix, path routes, fieldCount semantics, deprecation note for ?statsProfile= on POST.
Changelog entry under v0.44.0.

Tests

SearchStatsTest (H2): each profile omits/includes expected sections; general.fieldCount counts non-numeric field definitions with values.

Manual verification (PostgreSQL)

CI uses H2 only. On a large DB (~900k items):

Dashboard loads without OOM.
Network tab shows /searchStats/general and /searchStats/fields, not duplicate full stats calls.
pg_stat_statements: no multi-second COUNT(DISTINCT cf.id) … cvf.item_id IN (SELECT …) on dashboard load.
Numeric custom fields still show sums/avgs when present.

|calls|total_time_seconds|avg_time_ms|query|
|-----|------------------|-----------|-----|
|3|1.415178407|471.72613566666666|SELECT COUNT(i.itemid)  AS num  FROM item i  WHERE (i.coll_id = $1  AND i.state IN ($2 , $3 ) AND i.folder_id is null )|

Split searchStats into full, general, and fields profiles so the dashboard can fetch lighter summaries. Extract field-stats queries into QItemFieldStats, add SearchStatsTest, and document the API in OpenAPI and Changelog. For large match sets, avoid loading item ids into memory and only aggregate money/numeric custom fields, which matches what the UI displays. Co-authored-by: Cursor <cursoragent@cursor.com>

Route dashboard stats boxes to general and fields profile endpoints and dedupe fetches through StatsCache so multiple boxes sharing a query only hit the backend once. Co-authored-by: Cursor <cursoragent@cursor.com>

The general stats profile still counted non-numeric custom fields via cvf.item_id IN (large item subselect), which took ~12s on ~900k items. Use a cf-first EXISTS probe instead and skip resolveStatsItemContext when searchStatsGeneral already has the item count. Co-authored-by: Cursor <cursoragent@cursor.com>

Tibor Casteleijn and others added 3 commits June 18, 2026 18:25

feat(webapp): use statsProfile on dashboard with deduped stats cache

24a21ad

Route dashboard stats boxes to general and fields profile endpoints and dedupe fetches through StatsCache so multiple boxes sharing a query only hit the backend once. Co-authored-by: Cursor <cursoragent@cursor.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix stats search#3322

Fix stats search#3322
tiborrr wants to merge 3 commits into
eikek:masterfrom
tiborrr:fix-stats-search

tiborrr commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tiborrr commented Jun 18, 2026

Why this PR

Solution

1. Split searchStats into profiles

2. Large-database field-stats optimizations

API / docs

Tests

Manual verification (PostgreSQL)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Split `searchStats` into profiles