Skip to content

fix: optimize category initialization to avoid unnecessary embedding#388

Open
evan-ak wants to merge 2 commits intomainfrom
optimize-category-init
Open

fix: optimize category initialization to avoid unnecessary embedding#388
evan-ak wants to merge 2 commits intomainfrom
optimize-category-init

Conversation

@evan-ak
Copy link
Copy Markdown
Collaborator

@evan-ak evan-ak commented Mar 18, 2026

📝 Pull Request Summary

Please provide a short summary explaining the purpose of this PR.


✅ What does this PR do?

  • Refactors _initialize_categories to skip unnecessary embedding API calls during category initialization. Instead of always embedding all category texts upfront, the method now pre-checks existing categories via list_categories, classifies each config as create, update, or ready, and only batch-embeds the categories that actually need it (new categories, missing embeddings, or changed descriptions).
  • Categories that already exist with a matching description and embedding are reused directly with no repo call or embedding computation. New categories still go through get_or_create_category, while categories needing a description or embedding refresh use update_category directly.

🤔 Why is this change needed?

  • Previously, every call to _initialize_categories generated embeddings for all configured categories unconditionally, even when those categories already existed in the database with all fields populated. In steady state (the common case after first run), this meant paying for embedding tokens on every initialization with zero benefit.
  • This change eliminates that waste: in steady state the embedding call is skipped entirely (zero token usage for category init), and on partial updates only the affected categories are re-embedded — while preserving efficient batching for cases where multiple categories do need embedding.

🔍 Type of Change

Please check what applies:

  • Bug fix
  • New feature
  • Documentation update
  • Refactor / cleanup
  • Other (please explain)

✅ PR Quality Checklist

  • PR title follows the conventional format (feat:, fix:, docs:)
  • Changes are limited in scope and easy to review
  • Documentation updated where applicable
  • No breaking changes (or clearly documented)
  • Related issues or discussions linked

📌 Optional

  • Screenshots or examples added (if applicable)
  • Edge cases considered
  • Follow-up tasks mentioned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant