Skip to content

W-430: make emoji tags searchable#145

Merged
wr merged 3 commits into
mainfrom
wells/w-430-meditation-emoji-not-present
Jun 24, 2026
Merged

W-430: make emoji tags searchable#145
wr merged 3 commits into
mainfrom
wells/w-430-meditation-emoji-not-present

Conversation

@wr

@wr wr commented Jun 24, 2026

Copy link
Copy Markdown
Owner

What

Emoji search ignored emojibase tags (keywords). 🧘 has shortcodes person_in_lotus_position / lotus_position and carries meditation, yoga, zen only as tags — so :meditation (and :medit) found nothing, even though the emoji is in the DB. The reporter saw it as "the emoji is missing."

It's broad: 1,497 of 1,949 emoji (77%) had at least one tag keyword unreachable by search (:happy missed 😀😃😄, :laugh missed 😆🤣, etc.). The full-browser search field hits the same path, which is why "browse the whole emojis" didn't surface it either.

Tags were decoded into the Emoji model but never added to the search haystack.

Fix

Index tags as lower-priority haystacks:

  • Penalized in scoring and excluded from the prefix tier, so a real shortcode/label match always outranks a tag-only one (:smile still leads with the smile shortcode; 😀, which has smile only as a tag, ranks below).
  • Kept out of the exact-match index:happy: shouldn't resolve to an arbitrary one of the dozens that share the tag.
  • Scanned only for needles ≥2 chars. Tags ~triple the haystack count; a 1-char query already matches nearly everything, so gating keeps the worst-case per-keystroke cost at its prior baseline.

Perf

Optimized (swiftc -O) scan over the real corpus, old vs new:

query old new (gated)
a (1 char) 2.55 ms 2.56 ms
typical 2–5 char 0.26–0.73 ms 0.31–1.24 ms
avg 0.60 ms 0.88 ms

Haystacks grow 2.89× but time only ~1.5× on real queries (the scorer early-bails on non-matching tags), and the 1-char worst case is unchanged by the gate. Well inside a frame.

Test plan

  • New unit tests in FuzzyMatcherTests: :meditation surfaces 🧘, :happy surfaces 😀, and a shortcode match still outranks a tag-only match.
  • Full suite green (scripts/run-tests.sh).
  • Built + ran locally; :meditation / :happy now return the expected emoji.

Refs W-430

Emojibase keywords (`meditation`, `happy`, `zen`, …) were decoded into the
Emoji model but never indexed, so concept searches missed emoji whose only
relevant word is a tag — `:meditation` couldn't find 🧘, `:happy` missed 😀.
77% of the corpus had at least one unreachable tag keyword.

Index tags as lower-priority haystacks: penalized in scoring and excluded
from the prefix tier, so real shortcodes/labels always rank first. Tags are
kept out of the exact-match index (`:happy:` shouldn't resolve to an
arbitrary one of dozens). Scanned only for needles ≥2 chars — a 1-char query
already matches almost everything, and tags ~triple the haystack count, so
gating keeps the worst-case per-keystroke cost at its prior baseline.

Refs: W-430
@linear-code

linear-code Bot commented Jun 24, 2026

Copy link
Copy Markdown

W-430

@wr-claude-reviewer wr-claude-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is correct and well-structured. Tag haystacks are properly gated, excluded from the prefix tier, and penalized in scoring — the invariants hold exactly as described. Tests cover the three key properties: a tag-only keyword surfaces the right emoji, a concept keyword works, and a shortcode match outranks a tag-only match when both exist.

Scoring logic trace: non-tag prefix matches go to prefixBestScore as before; non-tag non-prefix matches get the raw score in bestScore; tag haystacks always fall to the else branch (never enter the prefix tier) and carry the -6.0 penalty. The 1-char gate leaves the worst-case single-keystroke cost unchanged. No regressions to existing non-tag paths.

CI: only the ssot run (this review workflow) is in progress — no failed checks.


nit: The doc comments on tagScorePenalty and tagMinNeedle in FuzzyMatcher.swift run 4 and 3 lines respectively. CLAUDE.md says "be brief." Both magic numbers warrant a comment (the why is non-obvious), but they could each be distilled to one line — e.g. // penalty that puts all tag matches below any shortcode prefix hit and // tags ~3× the haystacks; skip for 1-char queries where everything matches anyway.

- Carry `isPrefix` on `ScoredEmoji` and use it for easter-egg hint
  placement instead of `matchedShortcode.hasPrefix(query)`. A tag display
  can start with the query without being a prefix-tier match, so the string
  proxy mis-placed the hint one slot once tags became searchable.
- `shortcodeMatchOutranksTagMatch` now requires both emoji present and
  searches the full corpus — the penalized tag-only match doesn't reach the
  default top-12, so the old conditional assertion passed vacuously.

Refs: W-430

@wr-claude-reviewer wr-claude-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2 follow-up — previous findings addressed, one nit carried over plus a new one.

The new commit is solid. The isPrefix field is correctly placed on the local Candidate struct (the diff I received was missing that hunk, but reading the file confirmed it at line 146) and flows cleanly to ScoredEmoji. The egg-hint placement fix is an improvement: the old $0.matchedShortcode.lowercased().hasPrefix(lowercased) check could misclassify a tag-only result whose text happened to start with the query; $0.isPrefix is authoritative because it's set during the scoring loop's prefix-tier assignment. The new ranking test covers the ordering invariant clearly.

No blocking findings.


nits (both non-blocking):

  • The round-1 nit on tagScorePenalty and tagMinNeedle comments (4 and 3 lines) was not addressed — they're still the same verbose block. CLAUDE.md says one line max.
  • The new ScoredEmoji.isPrefix doc comment also runs 3 lines. The why is non-obvious enough to warrant a comment, but it could be collapsed: e.g. /// Set during scoring — false for tag-only matches, which must not enter the prefix tier even when the tag text starts with the query.

The flat tag score penalty was the wrong model: it demoted an *exact* tag
match below *loose* shortcode subsequences, so `:happ` led with ♿️
(handicapped) and 👆 (backhand_index_pointing_up) ahead of the actual happy
faces. Drop the penalty; rank the non-prefix tier by fzy relevance with
shortcode-over-tag as a sort tiebreak. The prefix tier still guarantees
canonical shortcode matches win (`:smile` → 😄 first).

Label tag-matched rows with the emoji's own primary shortcode instead of the
matched keyword, so a shared tag no longer renders a run of identical
`:happy:` rows.

Refs: W-430

@wr-claude-reviewer wr-claude-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 3 follow-up — previous findings addressed, no new blocking issues.

The latest commit (e993ea8) is a solid improvement. Replacing the flat tagScorePenalty with a sort tiebreaker (isTag as a third rank key after isPrefix and score) is the correct fix: an exact tag match now legitimately outranks a loose shortcode subsequence by raw fzy score, while prefix-tier shortcode matches still always win. The tagScorePenalty nit from rounds 1–2 is now moot — that constant is gone. The makeSpecialRow path correctly omits isPrefix (defaults to false), and the egg-hint placement update in round 2 is preserved and correct here.

No blocking findings.


nits (non-blocking, carried over):

Both of these were flagged in rounds 1 and 2 and are still present:

  • tagMinNeedle in FuzzyMatcher.swift has a 4-line doc comment. CLAUDE.md: one line max. Could be: // tags ~3× haystacks; skip for 1-char queries where shortcodes already match everything
  • ScoredEmoji.isPrefix has a 3-line doc comment. Could be: /// False for tag-only matches — tags must not enter the prefix tier even when the tag text starts with the query.

@wr wr merged commit 782ddaa into main Jun 24, 2026
10 checks passed
@wr wr deleted the wells/w-430-meditation-emoji-not-present branch June 24, 2026 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant