Rebuild and validate article search indexes by user1303836 · Pull Request #233 · user1303836/intelstream

user1303836 · 2026-03-08T00:22:40Z

Summary

add startup health checks and rebuild-from-SQLite recovery for the article search index
make /index perform a full rebuild so it can repair drift instead of only upserting
validate existing zvec collections against the configured embedding dimensions/model and recreate incompatible derived indexes early

Testing

PYTHONPATH=/Users/user1303836/Development/intelstream-codex-search-index-recovery/src /Users/user1303836/Development/intelstream/.venv/bin/pytest tests/test_vector_store.py tests/test_discord/test_search.py tests/test_database.py -q
/Users/user1303836/Development/intelstream/.venv/bin/ruff check src/intelstream/database/vector_store.py src/intelstream/database/repository.py src/intelstream/discord/cogs/search.py src/intelstream/bot.py tests/test_vector_store.py tests/test_discord/test_search.py
/Users/user1303836/Development/intelstream/.venv/bin/ruff format --check src/intelstream/database/vector_store.py src/intelstream/database/repository.py src/intelstream/discord/cogs/search.py src/intelstream/bot.py tests/test_vector_store.py tests/test_discord/test_search.py

Closes #228
Closes #229

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

user1303836

Review

Overall

Good defensive feature -- startup health checks, automatic rebuild from SQLite, and dimension/model validation for zvec collections. Well-structured with solid test coverage.

Blocking

mypy type error. vector_store.py:94 -- _read_collection_metadata returns json.loads(...) which is Any, but the return type is declared as dict[str, Any] | None. CI confirms this fails type checking. Fix with an explicit cast or intermediate variable, e.g.:
```
data = json.loads(path.read_text())
assert isinstance(data, dict)
return data
```

Non-blocking

Redundant dimension checking in _collection_needs_recreate. It checks actual_dimension != self._dimensions from the schema, then also checks metadata.get("dimensions") != self._dimensions from the JSON file. These should always agree after the first write. The second check is only useful if someone manually edits the metadata file. Not harmful, just redundant.
/index always recreates the collection now. The old behavior was additive (upsert). The new behavior calls recreate_articles_collection() which destroys and rebuilds everything. This means running /index always re-embeds all content, even if 99% is already correct. For large datasets this could be expensive in embedding API calls. Was additive-with-cleanup insufficient?
Health check false positive risk. The probe embeds a sample item and searches for it in the top 10 results. If the item exists but isn't in the top 10 (e.g., many similar items), the check reports unhealthy and triggers a full rebuild. HEALTH_CHECK_TOPK = 10 is reasonable but not bulletproof.
rendered_results tracking is a good catch. Previously, if vector search returned IDs that no longer exist in SQLite (orphaned references), the embed would show 0 fields with a misleading "N results" footer. The new code handles this correctly.
Merge conflict with PR #234. Both PRs modify vector_store.py and repository.py from the same base commit. This PR should merge first (it's more foundational), then #234 should rebase.

CI

Lint failure is pre-existing (7 unrelated files). Type check failure is real and caused by this PR -- must fix before merging.

Verdict

Fix the mypy error, then good to merge. Should go in before PR #234.

greptile-apps bot reviewed Mar 8, 2026

View reviewed changes

user1303836 commented Mar 8, 2026

View reviewed changes

user1303836 mentioned this pull request Mar 8, 2026

Partition lore indexes by guild #234

Merged

Rebuild and validate article search indexes

0315d84

user1303836 force-pushed the codex/search-index-recovery branch from a310b73 to 0315d84 Compare March 8, 2026 00:59

user1303836 merged commit b7e2d2f into main Mar 8, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rebuild and validate article search indexes#233

Rebuild and validate article search indexes#233
user1303836 merged 1 commit intomainfrom
codex/search-index-recovery

user1303836 commented Mar 8, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

user1303836 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

user1303836 commented Mar 8, 2026

Summary

Testing

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

user1303836 left a comment

Choose a reason for hiding this comment

Review

Overall

Blocking

Non-blocking

CI

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant