Skip to content

Fix stats search#3322

Open
tiborrr wants to merge 3 commits into
eikek:masterfrom
tiborrr:fix-stats-search
Open

Fix stats search#3322
tiborrr wants to merge 3 commits into
eikek:masterfrom
tiborrr:fix-stats-search

Conversation

@tiborrr

@tiborrr tiborrr commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Why this PR

Opening the dashboard was slow on my instance (~800k docs). Each box that shows search summaries triggered POST /searchStats, and the dashboard fired several of those at once — often the same query more than once. On a database with a large custom_field_value table, a single stats request could take 7–12 seconds even for a default empty query.

image

Before — slowest query on dashboard load (general profile, fieldCount):

SELECT COUNT(DISTINCT cf.id) AS num
FROM custom_field_value cvf
INNER JOIN custom_field cf ON (cf.id = cvf.field AND cf.coll_id = $1)
WHERE cvf.item_id IN (
  SELECT i.itemid FROM item i
  WHERE i.coll_id = $2 AND i.state IN ($3, $4) AND i.folder_id IS NULL
) AND NOT cf.ftype IN ($5, $6)

~11.7s on ~895k matching items (full table scan of custom_field_value against a huge IN subselect).

Solution

1. Split searchStats into profiles

Introduce statsProfile so callers fetch only what they need:

Profile count tagCloud fieldStats folderStats dimension lists fieldCount / orgCount / …
full
general
fields
  • Backend: POST /sec/item/searchStats/{profile} (path > body statsProfile > default full).
  • Dashboard: Fetches general + fields instead of repeated full stats; StatsCache dedupes so boxes sharing a query hit the backend once per profile.
  • Backward compatible: POST /searchStats without a profile still returns full.

2. Large-database field-stats optimizations

Three targeted fixes in QItemFieldStats (verified on PostgreSQL with ~895k items):

Issue Fix
OOM loading ~895k item IDs into the JVM Count first; only load IDs when match set ≤ 200
fields profile ~12s aggregating text fields the UI never shows For large match sets, only aggregate money/numeric fields (SearchStatsView filters to sum > 0 anyway)
general.fieldCount ~12s via cvf IN (subselect) cf-first EXISTS probe using custom_field_value.field index

After: fields aggregation for text-only collectives completes in <1ms; fieldCount uses the fast EXISTS pattern instead of the 11s IN subselect.

API / docs

  • New StatsProfile type (full | general | fields).
  • OpenAPI: profile matrix, path routes, fieldCount semantics, deprecation note for ?statsProfile= on POST.
  • Changelog entry under v0.44.0.

Tests

  • SearchStatsTest (H2): each profile omits/includes expected sections; general.fieldCount counts non-numeric field definitions with values.

Manual verification (PostgreSQL)

CI uses H2 only. On a large DB (~900k items):

  1. Dashboard loads without OOM.
  2. Network tab shows /searchStats/general and /searchStats/fields, not duplicate full stats calls.
  3. pg_stat_statements: no multi-second COUNT(DISTINCT cf.id) … cvf.item_id IN (SELECT …) on dashboard load.
  4. Numeric custom fields still show sums/avgs when present.
|calls|total_time_seconds|avg_time_ms|query|
|-----|------------------|-----------|-----|
|3|1.415178407|471.72613566666666|SELECT COUNT(i.itemid)  AS num  FROM item i  WHERE (i.coll_id = $1  AND i.state IN ($2 , $3 ) AND i.folder_id is null )|
image

Tibor Casteleijn and others added 3 commits June 18, 2026 18:25
Split searchStats into full, general, and fields profiles so the dashboard
can fetch lighter summaries. Extract field-stats queries into QItemFieldStats,
add SearchStatsTest, and document the API in OpenAPI and Changelog.

For large match sets, avoid loading item ids into memory and only aggregate
money/numeric custom fields, which matches what the UI displays.

Co-authored-by: Cursor <cursoragent@cursor.com>
Route dashboard stats boxes to general and fields profile endpoints and
dedupe fetches through StatsCache so multiple boxes sharing a query only
hit the backend once.

Co-authored-by: Cursor <cursoragent@cursor.com>
The general stats profile still counted non-numeric custom fields via
cvf.item_id IN (large item subselect), which took ~12s on ~900k items.
Use a cf-first EXISTS probe instead and skip resolveStatsItemContext
when searchStatsGeneral already has the item count.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant