W-430: make emoji tags searchable#145
Conversation
Emojibase keywords (`meditation`, `happy`, `zen`, …) were decoded into the Emoji model but never indexed, so concept searches missed emoji whose only relevant word is a tag — `:meditation` couldn't find 🧘, `:happy` missed 😀. 77% of the corpus had at least one unreachable tag keyword. Index tags as lower-priority haystacks: penalized in scoring and excluded from the prefix tier, so real shortcodes/labels always rank first. Tags are kept out of the exact-match index (`:happy:` shouldn't resolve to an arbitrary one of dozens). Scanned only for needles ≥2 chars — a 1-char query already matches almost everything, and tags ~triple the haystack count, so gating keeps the worst-case per-keystroke cost at its prior baseline. Refs: W-430
There was a problem hiding this comment.
The implementation is correct and well-structured. Tag haystacks are properly gated, excluded from the prefix tier, and penalized in scoring — the invariants hold exactly as described. Tests cover the three key properties: a tag-only keyword surfaces the right emoji, a concept keyword works, and a shortcode match outranks a tag-only match when both exist.
Scoring logic trace: non-tag prefix matches go to prefixBestScore as before; non-tag non-prefix matches get the raw score in bestScore; tag haystacks always fall to the else branch (never enter the prefix tier) and carry the -6.0 penalty. The 1-char gate leaves the worst-case single-keystroke cost unchanged. No regressions to existing non-tag paths.
CI: only the ssot run (this review workflow) is in progress — no failed checks.
nit: The doc comments on tagScorePenalty and tagMinNeedle in FuzzyMatcher.swift run 4 and 3 lines respectively. CLAUDE.md says "be brief." Both magic numbers warrant a comment (the why is non-obvious), but they could each be distilled to one line — e.g. // penalty that puts all tag matches below any shortcode prefix hit and // tags ~3× the haystacks; skip for 1-char queries where everything matches anyway.
- Carry `isPrefix` on `ScoredEmoji` and use it for easter-egg hint placement instead of `matchedShortcode.hasPrefix(query)`. A tag display can start with the query without being a prefix-tier match, so the string proxy mis-placed the hint one slot once tags became searchable. - `shortcodeMatchOutranksTagMatch` now requires both emoji present and searches the full corpus — the penalized tag-only match doesn't reach the default top-12, so the old conditional assertion passed vacuously. Refs: W-430
There was a problem hiding this comment.
Round 2 follow-up — previous findings addressed, one nit carried over plus a new one.
The new commit is solid. The isPrefix field is correctly placed on the local Candidate struct (the diff I received was missing that hunk, but reading the file confirmed it at line 146) and flows cleanly to ScoredEmoji. The egg-hint placement fix is an improvement: the old $0.matchedShortcode.lowercased().hasPrefix(lowercased) check could misclassify a tag-only result whose text happened to start with the query; $0.isPrefix is authoritative because it's set during the scoring loop's prefix-tier assignment. The new ranking test covers the ordering invariant clearly.
No blocking findings.
nits (both non-blocking):
- The round-1 nit on
tagScorePenaltyandtagMinNeedlecomments (4 and 3 lines) was not addressed — they're still the same verbose block. CLAUDE.md says one line max. - The new
ScoredEmoji.isPrefixdoc comment also runs 3 lines. The why is non-obvious enough to warrant a comment, but it could be collapsed: e.g./// Set during scoring — false for tag-only matches, which must not enter the prefix tier even when the tag text starts with the query.
The flat tag score penalty was the wrong model: it demoted an *exact* tag match below *loose* shortcode subsequences, so `:happ` led with ♿️ (handicapped) and 👆 (backhand_index_pointing_up) ahead of the actual happy faces. Drop the penalty; rank the non-prefix tier by fzy relevance with shortcode-over-tag as a sort tiebreak. The prefix tier still guarantees canonical shortcode matches win (`:smile` → 😄 first). Label tag-matched rows with the emoji's own primary shortcode instead of the matched keyword, so a shared tag no longer renders a run of identical `:happy:` rows. Refs: W-430
There was a problem hiding this comment.
Round 3 follow-up — previous findings addressed, no new blocking issues.
The latest commit (e993ea8) is a solid improvement. Replacing the flat tagScorePenalty with a sort tiebreaker (isTag as a third rank key after isPrefix and score) is the correct fix: an exact tag match now legitimately outranks a loose shortcode subsequence by raw fzy score, while prefix-tier shortcode matches still always win. The tagScorePenalty nit from rounds 1–2 is now moot — that constant is gone. The makeSpecialRow path correctly omits isPrefix (defaults to false), and the egg-hint placement update in round 2 is preserved and correct here.
No blocking findings.
nits (non-blocking, carried over):
Both of these were flagged in rounds 1 and 2 and are still present:
tagMinNeedleinFuzzyMatcher.swifthas a 4-line doc comment. CLAUDE.md: one line max. Could be:// tags ~3× haystacks; skip for 1-char queries where shortcodes already match everythingScoredEmoji.isPrefixhas a 3-line doc comment. Could be:/// False for tag-only matches — tags must not enter the prefix tier even when the tag text starts with the query.
What
Emoji search ignored emojibase tags (keywords). 🧘 has shortcodes
person_in_lotus_position/lotus_positionand carriesmeditation,yoga,zenonly as tags — so:meditation(and:medit) found nothing, even though the emoji is in the DB. The reporter saw it as "the emoji is missing."It's broad: 1,497 of 1,949 emoji (77%) had at least one tag keyword unreachable by search (
:happymissed 😀😃😄,:laughmissed 😆🤣, etc.). The full-browser search field hits the same path, which is why "browse the whole emojis" didn't surface it either.Tags were decoded into the
Emojimodel but never added to the search haystack.Fix
Index tags as lower-priority haystacks:
:smilestill leads with thesmileshortcode; 😀, which hassmileonly as a tag, ranks below).:happy:shouldn't resolve to an arbitrary one of the dozens that share the tag.Perf
Optimized (
swiftc -O) scan over the real corpus, old vs new:a(1 char)Haystacks grow 2.89× but time only ~1.5× on real queries (the scorer early-bails on non-matching tags), and the 1-char worst case is unchanged by the gate. Well inside a frame.
Test plan
FuzzyMatcherTests::meditationsurfaces 🧘,:happysurfaces 😀, and a shortcode match still outranks a tag-only match.scripts/run-tests.sh).:meditation/:happynow return the expected emoji.Refs W-430