Skip to content

Category Fingerprint Caching #2

@monapdx

Description

@monapdx

Add category fingerprint caching for efficient similarity scoring

Goal

Improve performance and scalability of similarity scoring by introducing cached category fingerprints, instead of comparing chats against every individual chat in a category.


Problem

Naive approach:

chat → every chat in every category

This becomes expensive as the dataset grows and will slow down the UI.


Solution

Introduce a category fingerprint:

A single vector representation of all chats in a category

Then compute:

chat → category fingerprint

This reduces comparisons from N chats per category → 1 vector per category


How It Works

  • Each category maintains a cached vector (fingerprint)
  • The fingerprint is built from:
    • Chat titles
    • User prompts (default)
  • When a category changes, its fingerprint is recomputed

When to Recompute

Rebuild the fingerprint when:

  • A chat is added to a category
  • A chat is removed from a category
  • A chat inside the category is edited (optional, can defer)

Storage

  • Store fingerprints locally (SQLite or in-memory cache with persistence)
  • Store metadata:
    • Last updated timestamp
    • Number of chats included
    • Vector size / method used (for future-proofing)

Implementation Notes

  • Use the same vectorization method as similarity scoring (TF-IDF for MVP)
  • Consider storing:
    • Raw vector
    • Or normalized vector (for faster cosine similarity)

Acceptance Criteria

  • Each category has a cached fingerprint
  • Fingerprints are only recomputed when necessary
  • Similarity scoring uses fingerprints instead of per-chat comparisons
  • No noticeable UI lag when rendering similarity scores
  • Debug/logging option to inspect fingerprint rebuilds

Future Enhancements (NOT in this issue)

  • Incremental updates instead of full recompute
  • Support multiple fingerprint strategies (TF-IDF vs embeddings)
  • Weighted fingerprints (recent chats matter more)

Why This Matters

  • Keeps the app fast as data grows
  • Enables real-time similarity scoring
  • Makes the feature scalable without needing external APIs

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions