Suggest categories for uncategorized chats using similarity scores
Goal
Help users quickly scan uncategorized chats and identify likely topics by showing how similar each chat is to existing user-defined categories.
This is a discovery + triage tool, not a validation tool.
UI Behavior
For chats that are not yet assigned to a category, display 1–3 similarity indicators:
Likely: Coding (82%)
Also: GitHub (74%)
or more compact:
Coding 82% · GitHub 74%
Do NOT show this on already categorized chats (v1).
How It Works (Conceptually)
- User creates categories (e.g. Coding, Writing, Business)
- User assigns a minimum number of chats to each category (e.g. ≥5)
- Each category gets a text fingerprint/profile
- Uncategorized chats are compared against these category profiles
- The UI shows the strongest matches as suggestions
Important UX Constraints
- This is not auto-categorization
- This is not a ground truth label
- Language should reflect uncertainty:
- “Likely”
- “Similar to”
- “Matches pattern”
- The feature should feel assistive, not authoritative
MVP Implementation Plan
Start simple and local-first:
- Use TF-IDF + cosine similarity
- Compare:
- Chat title + user prompts (default)
- (Optional later: full conversation)
Performance Strategy
Avoid expensive comparisons:
Instead of:
chat → every chat in every category
Do:
chat → category fingerprint
Where each category fingerprint is:
- A cached vector representation
- Built from all chats in that category
- Recomputed only when category assignments change
Acceptance Criteria
- Only appears when:
- At least 1 category exists
- Category has ≥5 assigned chats
- Shows top 1–3 category matches
- Scores are cached locally (SQLite)
- Updates when:
- A chat is categorized/uncategorized
- Can be toggled on/off in settings
- Does not noticeably slow down UI
Future Enhancements (NOT in this issue)
- “Auto-suggest category” button
- Batch categorize similar chats
- “Similar chats” panel
- Embeddings-based similarity (optional)
- Adjustable weighting (title vs prompt vs full text)
Open Questions
- What minimum category size feels right? (3? 5? 10?)
- Should we normalize scores across categories?
- Should we hide scores below a threshold (e.g. <50%)?
Why This Matters
- Reduces manual sorting friction
- Makes large chat histories scannable
- Helps surface patterns the user didn’t explicitly define yet
Suggest categories for uncategorized chats using similarity scores
Goal
Help users quickly scan uncategorized chats and identify likely topics by showing how similar each chat is to existing user-defined categories.
This is a discovery + triage tool, not a validation tool.
UI Behavior
For chats that are not yet assigned to a category, display 1–3 similarity indicators:
Likely: Coding (82%)
Also: GitHub (74%)
or more compact:
Coding 82% · GitHub 74%
Do NOT show this on already categorized chats (v1).
How It Works (Conceptually)
Important UX Constraints
MVP Implementation Plan
Start simple and local-first:
Performance Strategy
Avoid expensive comparisons:
Instead of:
Do:
Where each category fingerprint is:
Acceptance Criteria
Future Enhancements (NOT in this issue)
Open Questions
Why This Matters