Skip to content

Suggest categories for uncategorized chats using similarity scores #1

@monapdx

Description

@monapdx

Suggest categories for uncategorized chats using similarity scores

Goal

Help users quickly scan uncategorized chats and identify likely topics by showing how similar each chat is to existing user-defined categories.

This is a discovery + triage tool, not a validation tool.


UI Behavior

For chats that are not yet assigned to a category, display 1–3 similarity indicators:

Likely: Coding (82%)
Also: GitHub (74%)

or more compact:

Coding 82% · GitHub 74%

Do NOT show this on already categorized chats (v1).


How It Works (Conceptually)

  • User creates categories (e.g. Coding, Writing, Business)
  • User assigns a minimum number of chats to each category (e.g. ≥5)
  • Each category gets a text fingerprint/profile
  • Uncategorized chats are compared against these category profiles
  • The UI shows the strongest matches as suggestions

Important UX Constraints

  • This is not auto-categorization
  • This is not a ground truth label
  • Language should reflect uncertainty:
    • “Likely”
    • “Similar to”
    • “Matches pattern”
  • The feature should feel assistive, not authoritative

MVP Implementation Plan

Start simple and local-first:

  • Use TF-IDF + cosine similarity
  • Compare:
    • Chat title + user prompts (default)
    • (Optional later: full conversation)

Performance Strategy

Avoid expensive comparisons:

Instead of:

chat → every chat in every category

Do:

chat → category fingerprint

Where each category fingerprint is:

  • A cached vector representation
  • Built from all chats in that category
  • Recomputed only when category assignments change

Acceptance Criteria

  • Only appears when:
    • At least 1 category exists
    • Category has ≥5 assigned chats
  • Shows top 1–3 category matches
  • Scores are cached locally (SQLite)
  • Updates when:
    • A chat is categorized/uncategorized
  • Can be toggled on/off in settings
  • Does not noticeably slow down UI

Future Enhancements (NOT in this issue)

  • “Auto-suggest category” button
  • Batch categorize similar chats
  • “Similar chats” panel
  • Embeddings-based similarity (optional)
  • Adjustable weighting (title vs prompt vs full text)

Open Questions

  • What minimum category size feels right? (3? 5? 10?)
  • Should we normalize scores across categories?
  • Should we hide scores below a threshold (e.g. <50%)?

Why This Matters

  • Reduces manual sorting friction
  • Makes large chat histories scannable
  • Helps surface patterns the user didn’t explicitly define yet

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions