W-430: make emoji tags searchable by wr · Pull Request #145 · wr/mojito

wr · 2026-06-24T19:30:56Z

What

Emoji search ignored emojibase tags (keywords). 🧘 has shortcodes person_in_lotus_position / lotus_position and carries meditation, yoga, zen only as tags — so :meditation (and :medit) found nothing, even though the emoji is in the DB. The reporter saw it as "the emoji is missing."

It's broad: 1,497 of 1,949 emoji (77%) had at least one tag keyword unreachable by search (:happy missed 😀😃😄, :laugh missed 😆🤣, etc.). The full-browser search field hits the same path, which is why "browse the whole emojis" didn't surface it either.

Tags were decoded into the Emoji model but never added to the search haystack.

Fix

Index tags as lower-priority haystacks:

Penalized in scoring and excluded from the prefix tier, so a real shortcode/label match always outranks a tag-only one (:smile still leads with the smile shortcode; 😀, which has smile only as a tag, ranks below).
Kept out of the exact-match index — :happy: shouldn't resolve to an arbitrary one of the dozens that share the tag.
Scanned only for needles ≥2 chars. Tags ~triple the haystack count; a 1-char query already matches nearly everything, so gating keeps the worst-case per-keystroke cost at its prior baseline.

Perf

Optimized (swiftc -O) scan over the real corpus, old vs new:

query	old	new (gated)
`a` (1 char)	2.55 ms	2.56 ms
typical 2–5 char	0.26–0.73 ms	0.31–1.24 ms
avg	0.60 ms	0.88 ms

Haystacks grow 2.89× but time only ~1.5× on real queries (the scorer early-bails on non-matching tags), and the 1-char worst case is unchanged by the gate. Well inside a frame.

Test plan

New unit tests in FuzzyMatcherTests: :meditation surfaces 🧘, :happy surfaces 😀, and a shortcode match still outranks a tag-only match.
Full suite green (scripts/run-tests.sh).
Built + ran locally; :meditation / :happy now return the expected emoji.

Refs W-430

Emojibase keywords (`meditation`, `happy`, `zen`, …) were decoded into the Emoji model but never indexed, so concept searches missed emoji whose only relevant word is a tag — `:meditation` couldn't find 🧘, `:happy` missed 😀. 77% of the corpus had at least one unreachable tag keyword. Index tags as lower-priority haystacks: penalized in scoring and excluded from the prefix tier, so real shortcodes/labels always rank first. Tags are kept out of the exact-match index (`:happy:` shouldn't resolve to an arbitrary one of dozens). Scanned only for needles ≥2 chars — a 1-char query already matches almost everything, and tags ~triple the haystack count, so gating keeps the worst-case per-keystroke cost at its prior baseline. Refs: W-430

linear-code · 2026-06-24T19:30:59Z

W-430

wr-claude-reviewer

The implementation is correct and well-structured. Tag haystacks are properly gated, excluded from the prefix tier, and penalized in scoring — the invariants hold exactly as described. Tests cover the three key properties: a tag-only keyword surfaces the right emoji, a concept keyword works, and a shortcode match outranks a tag-only match when both exist.

Scoring logic trace: non-tag prefix matches go to prefixBestScore as before; non-tag non-prefix matches get the raw score in bestScore; tag haystacks always fall to the else branch (never enter the prefix tier) and carry the -6.0 penalty. The 1-char gate leaves the worst-case single-keystroke cost unchanged. No regressions to existing non-tag paths.

CI: only the ssot run (this review workflow) is in progress — no failed checks.

nit: The doc comments on tagScorePenalty and tagMinNeedle in FuzzyMatcher.swift run 4 and 3 lines respectively. CLAUDE.md says "be brief." Both magic numbers warrant a comment (the why is non-obvious), but they could each be distilled to one line — e.g. // penalty that puts all tag matches below any shortcode prefix hit and // tags ~3× the haystacks; skip for 1-char queries where everything matches anyway.

- Carry `isPrefix` on `ScoredEmoji` and use it for easter-egg hint placement instead of `matchedShortcode.hasPrefix(query)`. A tag display can start with the query without being a prefix-tier match, so the string proxy mis-placed the hint one slot once tags became searchable. - `shortcodeMatchOutranksTagMatch` now requires both emoji present and searches the full corpus — the penalized tag-only match doesn't reach the default top-12, so the old conditional assertion passed vacuously. Refs: W-430

wr-claude-reviewer

Round 2 follow-up — previous findings addressed, one nit carried over plus a new one.

The new commit is solid. The isPrefix field is correctly placed on the local Candidate struct (the diff I received was missing that hunk, but reading the file confirmed it at line 146) and flows cleanly to ScoredEmoji. The egg-hint placement fix is an improvement: the old $0.matchedShortcode.lowercased().hasPrefix(lowercased) check could misclassify a tag-only result whose text happened to start with the query; $0.isPrefix is authoritative because it's set during the scoring loop's prefix-tier assignment. The new ranking test covers the ordering invariant clearly.

No blocking findings.

nits (both non-blocking):

The round-1 nit on tagScorePenalty and tagMinNeedle comments (4 and 3 lines) was not addressed — they're still the same verbose block. CLAUDE.md says one line max.
The new ScoredEmoji.isPrefix doc comment also runs 3 lines. The why is non-obvious enough to warrant a comment, but it could be collapsed: e.g. /// Set during scoring — false for tag-only matches, which must not enter the prefix tier even when the tag text starts with the query.

The flat tag score penalty was the wrong model: it demoted an *exact* tag match below *loose* shortcode subsequences, so `:happ` led with ♿️ (handicapped) and 👆 (backhand_index_pointing_up) ahead of the actual happy faces. Drop the penalty; rank the non-prefix tier by fzy relevance with shortcode-over-tag as a sort tiebreak. The prefix tier still guarantees canonical shortcode matches win (`:smile` → 😄 first). Label tag-matched rows with the emoji's own primary shortcode instead of the matched keyword, so a shared tag no longer renders a run of identical `:happy:` rows. Refs: W-430

wr-claude-reviewer

Round 3 follow-up — previous findings addressed, no new blocking issues.

The latest commit (e993ea8) is a solid improvement. Replacing the flat tagScorePenalty with a sort tiebreaker (isTag as a third rank key after isPrefix and score) is the correct fix: an exact tag match now legitimately outranks a loose shortcode subsequence by raw fzy score, while prefix-tier shortcode matches still always win. The tagScorePenalty nit from rounds 1–2 is now moot — that constant is gone. The makeSpecialRow path correctly omits isPrefix (defaults to false), and the egg-hint placement update in round 2 is preserved and correct here.

No blocking findings.

nits (non-blocking, carried over):

Both of these were flagged in rounds 1 and 2 and are still present:

tagMinNeedle in FuzzyMatcher.swift has a 4-line doc comment. CLAUDE.md: one line max. Could be: // tags ~3× haystacks; skip for 1-char queries where shortcodes already match everything
ScoredEmoji.isPrefix has a 3-line doc comment. Could be: /// False for tag-only matches — tags must not enter the prefix tier even when the tag text starts with the query.

wr-claude-reviewer Bot approved these changes Jun 24, 2026

View reviewed changes

wr merged commit 782ddaa into main Jun 24, 2026
10 checks passed

wr deleted the wells/w-430-meditation-emoji-not-present branch June 24, 2026 22:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

W-430: make emoji tags searchable#145

W-430: make emoji tags searchable#145
wr merged 3 commits into
mainfrom
wells/w-430-meditation-emoji-not-present

wr commented Jun 24, 2026

Uh oh!

linear-code Bot commented Jun 24, 2026

Uh oh!

wr-claude-reviewer Bot left a comment

Uh oh!

wr-claude-reviewer Bot left a comment

Uh oh!

wr-claude-reviewer Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wr commented Jun 24, 2026

What

Fix

Perf

Test plan

Uh oh!

linear-code Bot commented Jun 24, 2026

Uh oh!

wr-claude-reviewer Bot left a comment

Choose a reason for hiding this comment

Uh oh!

wr-claude-reviewer Bot left a comment

Choose a reason for hiding this comment

Uh oh!

wr-claude-reviewer Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant