fix: poll for search-index visibility in three flaky query tests by goharanwar · Pull Request #7 · vectara/api_test_suite

goharanwar · 2026-04-29T13:47:55Z

Summary

Three tests have been failing intermittently on staging:

tests/services/corpus/test_filter_attributes_types.py::TestFilterAttributeTypes::test_text_integer_boolean_filters
tests/services/indexing/test_document_lifecycle.py::TestDocumentLifecycle::test_index_query_delete_query_cycle
tests/services/query/test_query_filters.py::TestQueryFiltersCore::test_query_with_valid_metadata_filter

All three failed the same way: AssertionError: assert 0 > 0 on a len(search_results) > 0 check immediately after indexing.

Root cause

Each test uses this pattern after indexing:

wait_for(lambda: client.get_document(corpus_key, doc_id).success, ...)
query_resp = client.post("/v2/query", ...)
assert len(query_resp.data["search_results"]) > 0  # flaky

get_document returning 200 only confirms the document is stored — not that it is searchable. There is an eventual-consistency window between document storage and search-index visibility. When that window is longer than usual on staging, the first /v2/query returns zero results and the test fails. The product is behaving correctly; the synchronization signal is wrong.

test_document_lifecycle.py already has the right pattern on the delete side (_krakatoa_gone polled via wait_for), but the index-and-query side did not.

Fix

Replace each immediate post-index query + assertion with a wait_for(...) poll that retries the query until it returns results (timeout 30s, interval 2s). All other content assertions (correct doc, correct fields, correct filtering) are preserved.

Test plan

Reproduced behaviour: original tests passed when run in isolation but the failure mode (get_document ≠ searchable) is well understood from the test code and direct curl reproduction.
Ran the three patched tests against staging (https://api.vectara.dev): 3 consecutive runs × 3 tests = 9/9 PASS.
Ran all 5 tests in the three modified test modules (incl. error-path tests): 5/5 PASS.
No production code changes — fix is contained to the api_test_suite tests.

🤖 Generated with Claude Code

…ests The post-index queries in three tests asserted len(search_results) > 0 right after wait_for(get_document().success), but document storage and search index visibility are eventually consistent on staging — get_document returning 200 only proves the document is stored, not that it is searchable. When the index lagged, the first /v2/query returned 0 results and the test failed. Replace the immediate query + assertion with a wait_for(...) poll that retries until the query returns results (timeout 30s, interval 2s), mirroring the existing _krakatoa_gone pattern already used on the delete side of the lifecycle test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

goharanwar · 2026-05-12T23:24:28Z

Closing in favour of #8, which now contains both the search-index visibility fix and the retry-policy/observability fix. Combined branch tested end-to-end on staging: 12/12 originally-failing tests pass, plus 87/87 on the broader core profile across agents/corpus/indexing/query services.

…ility (#8) Combined fix for two intermittent staging failures. Bug 1 — search-index visibility race (3 tests): post-index queries asserted on len(search_results) > 0 immediately after wait_for(get_document().success). get_document returning 200 confirms storage, not search visibility. Replaced each immediate query with a wait_for poll (30s/2s). Bug 2 — non-idempotent POST retries (2 tests): urllib3 retried POST on 5xx, producing 409 'already exists' with fresh UUIDs when the first attempt had committed server-side. Restricted retries to GET/HEAD/OPTIONS; added per-request X-Request-Id and retry_history on APIResponse plus a WARNING log when retries fire, so future surprises arrive with the retry trail attached. Codex-reviewed at high effort. End-to-end verified on staging: 12/12 originally-failing tests pass, 87/87 on the broader core profile across agents/corpus/indexing/query services. Closes #7.

goharanwar mentioned this pull request May 5, 2026

fix: search-index visibility race + non-idempotent retries + observability #8

Merged

3 tasks

goharanwar closed this May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: poll for search-index visibility in three flaky query tests#7

fix: poll for search-index visibility in three flaky query tests#7
goharanwar wants to merge 1 commit into
mainfrom
fix/index-search-visibility-race

goharanwar commented Apr 29, 2026

Uh oh!

goharanwar commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

goharanwar commented Apr 29, 2026

Summary

Root cause

Fix

Test plan

Uh oh!

goharanwar commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant