Hi! Following up on #167 / #180 — while testing the HNSW index for a pgvector comparison benchmark, I found several issues that cause crashes and incorrect results. Fix is in #181.
Bugs
1. SIGSEGV on repeated k-NN queries (crash)
Running two or more ORDER BY embedding <=> query LIMIT k queries in the same psql session crashes the PostgreSQL backend with signal 11 (segfault). The fault address is typically 0x1 — stale palloc memory.
Root cause: hnsw_beginscan calls RelationGetIndexScan but never allocates the xs_orderbyvals / xs_orderbynulls arrays. The executor assumes these are valid and writes distance values into whatever memory they happen to point at. First query often works (palloc0'd to null); second query crashes.
Repro:
CREATE TABLE t (id serial, embedding ruvector(5));
INSERT INTO t (embedding) VALUES ('[1,0,0,0,0]'), ('[0,1,0,0,0]'), ('[0,0,1,0,0]');
CREATE INDEX ON t USING hnsw (embedding ruvector_cosine_ops);
SET enable_seqscan = off;
-- First query: usually works
SELECT id FROM t ORDER BY embedding <=> '[1,0,0,0,0]'::ruvector LIMIT 3;
-- Second query: SIGSEGV
SELECT id FROM t ORDER BY embedding <=> '[0,1,0,0,0]'::ruvector LIMIT 3;
2. Empty HNSW graph (no results)
connect_node_to_neighbors is a no-op TODO stub, so hnsw_build inserts nodes but never creates edges. The search traversal finds only the entry point.
3. Wrong distance metric (wrong results)
hnsw_build uses HnswConfig::default() which hardcodes DistanceMetric::Euclidean, even when the index is created with ruvector_cosine_ops. The search computes Euclidean distances on data that should use cosine similarity.
4. Wrong result ordering (wrong results)
BinaryHeap::into_iter().take(k) iterates the heap's backing array in arbitrary order, not sorted order. The results returned are k random candidates from the ef_search pool, not the k closest.
5. "index returned tuples in wrong order" (error on PG17)
If xs_recheckorderby is set to true, PG17's IndexNextWithReorder compares index-reported distances against recalculated distances from heap tuples. Floating-point precision differences cause spurious errors.
6. Use-after-free in endscan
hnsw_endscan unconditionally calls Box::from_raw on scan->opaque without checking for null, risking a double-free if called after a rescan.
Environment
- PostgreSQL 17.7
- pgrx 0.12.9
- ruvector-postgres 2.0.1
- Linux x86_64
Fix
PR #181 addresses all six issues. The same xs_orderbyvals allocation fix is also applied to ivfflat_ambeginscan.
Hi! Following up on #167 / #180 — while testing the HNSW index for a pgvector comparison benchmark, I found several issues that cause crashes and incorrect results. Fix is in #181.
Bugs
1. SIGSEGV on repeated k-NN queries (crash)
Running two or more
ORDER BY embedding <=> query LIMIT kqueries in the same psql session crashes the PostgreSQL backend with signal 11 (segfault). The fault address is typically0x1— stale palloc memory.Root cause:
hnsw_beginscancallsRelationGetIndexScanbut never allocates thexs_orderbyvals/xs_orderbynullsarrays. The executor assumes these are valid and writes distance values into whatever memory they happen to point at. First query often works (palloc0'd to null); second query crashes.Repro:
2. Empty HNSW graph (no results)
connect_node_to_neighborsis a no-op TODO stub, sohnsw_buildinserts nodes but never creates edges. The search traversal finds only the entry point.3. Wrong distance metric (wrong results)
hnsw_buildusesHnswConfig::default()which hardcodesDistanceMetric::Euclidean, even when the index is created withruvector_cosine_ops. The search computes Euclidean distances on data that should use cosine similarity.4. Wrong result ordering (wrong results)
BinaryHeap::into_iter().take(k)iterates the heap's backing array in arbitrary order, not sorted order. The results returned are k random candidates from the ef_search pool, not the k closest.5. "index returned tuples in wrong order" (error on PG17)
If
xs_recheckorderbyis set totrue, PG17'sIndexNextWithReordercompares index-reported distances against recalculated distances from heap tuples. Floating-point precision differences cause spurious errors.6. Use-after-free in endscan
hnsw_endscanunconditionally callsBox::from_rawonscan->opaquewithout checking for null, risking a double-free if called after a rescan.Environment
Fix
PR #181 addresses all six issues. The same
xs_orderbyvalsallocation fix is also applied toivfflat_ambeginscan.