Bug
sanitize_fts5_query (crates/khive-db/src/stores/text.rs) removes - and . from queries instead of replacing them with spaces, which makes any hyphenated or dotted term silently unfindable through the FTS leg.
Pass 1 space-replaces ( ) , :; Pass 2 then filters out * " ' + - ^ . ~ ! $. So:
- query
khive-pack-memory → sanitized to khivepackmemory
- indexed content still contains the literal
khive-pack-memory, whose trigrams include the hyphens (e-p, k-m, ...)
- the sanitized trigrams (
epa, ckm, ...) never occur in the indexed text → 0 hits, no error
Verified empirically against 0.3.0 (in-memory runtime, no embedders, so the text leg is the only leg):
| query |
hits against content LEGACY-FLAT-NOTE |
LEGACY |
1 |
LEGACY FLAT NOTE |
matches |
LEGACY-FLAT |
0 |
LEGACY-FLAT-NOTE (exact content) |
0 |
Why it matters
Suggested fix
Move - and . (and plausibly + ~ ^) from the Pass 2 filter set into the Pass 1 space-replacement set, for exactly the reason the existing Pass 1 comment gives for :: tenant:isolation → tenant isolation, not tenantisolation. LEGACY-FLAT-NOTE → LEGACY FLAT NOTE, whose trigrams all occur in the indexed content, restoring the match.
Characters that FTS5 rejects outright regardless of position ($, quotes, *) stay in Pass 2.
Regression test suggestion: index a document containing a kebab-case token and assert the exact hyphenated query returns it through the text-only path.
Bug
sanitize_fts5_query(crates/khive-db/src/stores/text.rs) removes-and.from queries instead of replacing them with spaces, which makes any hyphenated or dotted term silently unfindable through the FTS leg.Pass 1 space-replaces
( ) , :; Pass 2 then filters out* " ' + - ^ . ~ ! $. So:khive-pack-memory→ sanitized tokhivepackmemorykhive-pack-memory, whose trigrams include the hyphens (e-p,k-m, ...)epa,ckm, ...) never occur in the indexed text → 0 hits, no errorVerified empirically against 0.3.0 (in-memory runtime, no embedders, so the text leg is the only leg):
LEGACY-FLAT-NOTELEGACYLEGACY FLAT NOTELEGACY-FLATLEGACY-FLAT-NOTE(exact content)Why it matters
ok: true), unlike the pre-fix(db): sanitize FTS5 $ metachar and fail-open on FTS search errors #389 hard error, so nothing signals that the query was mangled.Suggested fix
Move
-and.(and plausibly+~^) from the Pass 2 filter set into the Pass 1 space-replacement set, for exactly the reason the existing Pass 1 comment gives for::tenant:isolation→tenant isolation, nottenantisolation.LEGACY-FLAT-NOTE→LEGACY FLAT NOTE, whose trigrams all occur in the indexed content, restoring the match.Characters that FTS5 rejects outright regardless of position (
$, quotes,*) stay in Pass 2.Regression test suggestion: index a document containing a kebab-case token and assert the exact hyphenated query returns it through the text-only path.