Component: PAI/TOOLS/KnowledgeHarvester.ts (classifyDomain + extractTags, v5)
Affected source: Releases/v5.0.0/.claude/PAI/TOOLS/KnowledgeHarvester.ts @ 2fde1bb
Severity: Medium — misfiles notes into the wrong KNOWLEDGE domain. Platform-independent (reproduces everywhere).
Summary
classifyDomain and extractTags match domain keywords by substring, so "person" matches
"personal"/"persona", and a single incidental keyword ("contact", "profile") in an otherwise technical
note is enough to win the high-precision People domain. Result: technical notes get misfiled as People.
Bug — Substring keyword matching + no precision floor
Line 322, inside classifyDomain (line 342 in extractTags has the same shape):
const score = keywords.reduce((acc, kw) => acc + (text.includes(kw) ? 1 : 0), 0);
Two problems:
- Substring matching —
text.includes("person") is true for "personal" and "persona"; any single
incidental keyword ("contact", "profile") in a technical note can score the People domain.
- No precision floor — one weak keyword is enough to route a note to People.
Proposed fix
Word-boundary matching, and require a ≥2-keyword signal for the high-precision People domain (genuine
OSINT/dossier notes hit several: osint, dossier, linkedin, profile, background…):
// word-boundary so "person" doesn't match "personal"/"persona"
scores[domain] = keywords.reduce(
(acc, kw) => acc + (new RegExp(`\\b${kw}\\b`).test(text) ? 1 : 0), 0);
// People is high-precision: a single incidental keyword must not route here
if (scores.People < 2) scores.People = 0;
extractTags gets the same \b-boundary treatment.
Note: \b boundaries won't help multi-word or hyphenated keywords, but the current keyword set has
none, so no further change is needed today.
Verification
- A Linux PAI 5.x install (empirical): after the fix, all 23 graduated notes classified with
0 misfiled into People (21 Ideas / 7 Research / 1 Companies). Acceptable residuals noted: invidious
→ Companies via "startup"; relationship notes → Ideas/Research rather than People dossiers.
- This bug is not platform-specific — the substring logic misclassifies on macOS and Linux alike.
Related issues
Split from the Linux-path issue #1366 (that one is a Linux platform blocker; this is a platform-independent
classification-precision bug — both were found and fixed in the same pass). No existing issue covers
classifier/domain precision (searched 2026-06-20). Adjacent but distinct: #1171 (writes directly to
KNOWLEDGE, bypassing the _harvest-queue curation step) and #1351 (queue review/promote lifecycle)
concern the harvest pipeline, not keyword classification.
Suggested labels
bug, precision, tool:KnowledgeHarvester
Component:
PAI/TOOLS/KnowledgeHarvester.ts(classifyDomain+extractTags, v5)Affected source:
Releases/v5.0.0/.claude/PAI/TOOLS/KnowledgeHarvester.ts@2fde1bbSeverity: Medium — misfiles notes into the wrong KNOWLEDGE domain. Platform-independent (reproduces everywhere).
Summary
classifyDomainandextractTagsmatch domain keywords by substring, so "person" matches"personal"/"persona", and a single incidental keyword ("contact", "profile") in an otherwise technical
note is enough to win the high-precision People domain. Result: technical notes get misfiled as People.
Bug — Substring keyword matching + no precision floor
Line 322, inside
classifyDomain(line 342 inextractTagshas the same shape):Two problems:
text.includes("person")is true for "personal" and "persona"; any singleincidental keyword ("contact", "profile") in a technical note can score the People domain.
Proposed fix
Word-boundary matching, and require a ≥2-keyword signal for the high-precision People domain (genuine
OSINT/dossier notes hit several: osint, dossier, linkedin, profile, background…):
extractTagsgets the same\b-boundary treatment.Verification
0 misfiled into People (21 Ideas / 7 Research / 1 Companies). Acceptable residuals noted:
invidious→ Companies via "startup"; relationship notes → Ideas/Research rather than People dossiers.
Related issues
Split from the Linux-path issue #1366 (that one is a Linux platform blocker; this is a platform-independent
classification-precision bug — both were found and fixed in the same pass). No existing issue covers
classifier/domain precision (searched 2026-06-20). Adjacent but distinct: #1171 (writes directly to
KNOWLEDGE, bypassing the
_harvest-queuecuration step) and #1351 (queue review/promote lifecycle)concern the harvest pipeline, not keyword classification.
Suggested labels
bug,precision,tool:KnowledgeHarvester