Skip to content

W-432: drop component-group modifiers from the emoji corpus#147

Closed
wr wants to merge 1 commit into
wells/w-431-picker-mouse-clicksfrom
wells/w-432-filter-components
Closed

W-432: drop component-group modifiers from the emoji corpus#147
wr wants to merge 1 commit into
wells/w-431-picker-mouse-clicksfrom
wells/w-432-filter-components

Conversation

@wr

@wr wr commented Jun 24, 2026

Copy link
Copy Markdown
Owner

What

Bare Fitzpatrick skin-tone swatches (🏻–🏿, e.g. :medium_skin_tone:) and hair modifiers (🦰🦱🦳🦲) surfaced as standalone search hits. They're emojibase group 2 ("component") — combining modifiers, useless inserted alone. The browser already skips this group, so they only leaked through live search.

Fix

Filter the component group out of the loaded corpus in EmojiDatabase.load, so it's absent from search and exact-match — matching the browser. SkinTone applies modifiers via hardcoded scalars, not the DB, so skin-tone application is unaffected.

Notes

  • Stacked on #(W-431) — base will retarget to main once that merges. Review only the top commit.
  • Removes the 4 hair components from search too (same rationale); easy to narrow to skin-tones-only if we want hair kept.

Test plan

  • New componentModifiersAreExcluded test: no group-2 entries in the corpus, 🏽 gone from byHexcode and from a :medium search.
  • Full suite green; verified in the dev app.

Refs W-432

Bare Fitzpatrick skin-tone swatches (🏻–🏿) and hair modifiers (🦰🦱🦳🦲)
were surfacing as standalone search hits (e.g. `:medium_skin_tone:`). They're
emojibase group 2 ("component") — combining modifiers, useless inserted alone.
The browser already skips this group, so they only leaked through live search.

Filter the component group out of the loaded corpus so it's absent from search
and exact-match, matching the browser. SkinTone applies modifiers via
hardcoded scalars, not the DB, so skin-tone application is unaffected.

Refs: W-432
@linear-code

linear-code Bot commented Jun 24, 2026

Copy link
Copy Markdown

W-432

@wr-claude-reviewer wr-claude-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean fix. The filter is placed at exactly the right point (right after decode, before self.all is assigned), so the component emojis can never leak into byHexcode, the indexed corpus, or search results. The componentGroup constant being static makes it accessible from the test file without needing a test-only accessor.

Test coverage is solid: the allSatisfy guard checks no group-2 survivors exist, the byHexcode assertion is the reported-case regression test, and the FuzzyMatcher.search assertion closes the loop on the original symptom. Nothing to change.

Note: the ssot CI run was still in progress at review time — no failures observed.

@wr wr deleted the branch wells/w-431-picker-mouse-clicks June 25, 2026 01:09
@wr wr closed this Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant