fix: poll for search-index visibility in three flaky query tests#7
Closed
goharanwar wants to merge 1 commit into
Closed
fix: poll for search-index visibility in three flaky query tests#7goharanwar wants to merge 1 commit into
goharanwar wants to merge 1 commit into
Conversation
…ests The post-index queries in three tests asserted len(search_results) > 0 right after wait_for(get_document().success), but document storage and search index visibility are eventually consistent on staging — get_document returning 200 only proves the document is stored, not that it is searchable. When the index lagged, the first /v2/query returned 0 results and the test failed. Replace the immediate query + assertion with a wait_for(...) poll that retries until the query returns results (timeout 30s, interval 2s), mirroring the existing _krakatoa_gone pattern already used on the delete side of the lifecycle test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
Contributor
Author
|
Closing in favour of #8, which now contains both the search-index visibility fix and the retry-policy/observability fix. Combined branch tested end-to-end on staging: 12/12 originally-failing tests pass, plus 87/87 on the broader core profile across agents/corpus/indexing/query services. |
goharanwar
added a commit
that referenced
this pull request
May 14, 2026
…ility (#8) Combined fix for two intermittent staging failures. Bug 1 — search-index visibility race (3 tests): post-index queries asserted on len(search_results) > 0 immediately after wait_for(get_document().success). get_document returning 200 confirms storage, not search visibility. Replaced each immediate query with a wait_for poll (30s/2s). Bug 2 — non-idempotent POST retries (2 tests): urllib3 retried POST on 5xx, producing 409 'already exists' with fresh UUIDs when the first attempt had committed server-side. Restricted retries to GET/HEAD/OPTIONS; added per-request X-Request-Id and retry_history on APIResponse plus a WARNING log when retries fire, so future surprises arrive with the retry trail attached. Codex-reviewed at high effort. End-to-end verified on staging: 12/12 originally-failing tests pass, 87/87 on the broader core profile across agents/corpus/indexing/query services. Closes #7.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three tests have been failing intermittently on staging:
tests/services/corpus/test_filter_attributes_types.py::TestFilterAttributeTypes::test_text_integer_boolean_filterstests/services/indexing/test_document_lifecycle.py::TestDocumentLifecycle::test_index_query_delete_query_cycletests/services/query/test_query_filters.py::TestQueryFiltersCore::test_query_with_valid_metadata_filterAll three failed the same way:
AssertionError: assert 0 > 0on alen(search_results) > 0check immediately after indexing.Root cause
Each test uses this pattern after indexing:
get_documentreturning 200 only confirms the document is stored — not that it is searchable. There is an eventual-consistency window between document storage and search-index visibility. When that window is longer than usual on staging, the first/v2/queryreturns zero results and the test fails. The product is behaving correctly; the synchronization signal is wrong.test_document_lifecycle.pyalready has the right pattern on the delete side (_krakatoa_gonepolled viawait_for), but the index-and-query side did not.Fix
Replace each immediate post-index query + assertion with a
wait_for(...)poll that retries the query until it returns results (timeout 30s, interval 2s). All other content assertions (correct doc, correct fields, correct filtering) are preserved.Test plan
get_document≠ searchable) is well understood from the test code and direct curl reproduction.https://api.vectara.dev): 3 consecutive runs × 3 tests = 9/9 PASS.🤖 Generated with Claude Code