Restructure test suite: service directories, profiles, hybrid fixtures by goharanwar · Pull Request #2 · vectara/api_test_suite

goharanwar · 2026-04-02T23:38:28Z

Summary

Restructure 56 tests from 5 flat files into 6 service directories (tests/services/auth|corpus|indexing|query|chat|agents/)
Add depth markers (sanity/core/regression) with enforcement — unmarked tests fail collection
New runner flags: --profile sanity|core|regression|full and --service auth,corpus,...
Hybrid fixture scoping: module-scoped shared corpora for read-heavy tests, function-scoped for CRUD tests
Client extensions: multipart upload, SSE streaming, raw response support
Async helpers: wait_for() polling, SSE event reader
Two-phase parallel: services in parallel, workflows sequential

Test profiles

Profile	Tests	Use case
`sanity`	7	Fast deploy gate (~30s)
`core`	40	Post-deploy verification (~4-5 min)
`regression`	56	Edge cases, nightly
`full`	56 + workflows	Everything

No new test coverage

This is a pure restructure — same 56 tests, reorganized with markers. New coverage (agents sessions/fork/events, tools, instructions, pipelines, etc.) comes in a follow-up PR.

Test plan

--profile sanity — 7 passed, 0 failed (~39s)
--profile core — 27 passed, 0 failed, 13 skipped due to staging DNS outage (~4:22)
Marker enforcement — unmarked tests fail collection
--service auth — selects only auth tests
Backward compat --tests auth,corpus still works
Full clean run when staging is stable

🤖 Generated with Claude Code

- utils/waiters.py: wait_for() polling helper, read_sse_events() parser - utils/client.py: _request_raw(), upload_file(), execute_agent_sse() - tests/conftest.py: marker registration, depth enforcement, shared fixtures - tests/services/conftest.py: test_corpus and seeded_corpus fixtures - tests/workflows/: placeholder structure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- tests/services/auth/: health check, permissions (from test_01) - tests/services/corpus/: CRUD, filter attrs, pagination (from test_02) - tests/services/indexing/: single doc, metadata, large docs (from test_03) - tests/services/query/: semantic search, RAG, filtering (from test_04) - tests/services/chat/: create, list, turn, delete (from test_04) - tests/services/agents/: CRUD, execution, sessions (from test_05) - Delete old test files and root conftest - 56 tests collected, marker filtering verified (7 sanity, 40 core, 56 total) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- --profile sanity|core|regression|full (default: core) - --service for comma-separated service selection by directory - --tests kept as deprecated alias with warning - Two-phase parallel: services in parallel, workflows sequential - Updated Rich table to show profile/service configuration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The old approach derived corpus key from name which could collide with leftover corpora from interrupted test runs. Now uses a full uuid4 hex string as the corpus key. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add module-scoped shared_corpus and seeded_shared_corpus for read-heavy services (query, chat, indexing) - Add module-scoped shared_agent_corpus and shared_agent for agent execution tests - Keep function-scoped test_corpus for corpus CRUD tests - Agent CRUD tests create their own agents (function-scoped) - Fix corpus key collisions: use full UUID keys in all corpus creation tests - Results: 27/40 passed, 0 failed, 13 skipped (staging DNS issue) - Time: 4:22 (down from 14:18 with function-scoped) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Auto-format workflow: black (line-length 160) + isort, commits back - Validate job: pytest --collect-only to verify markers - Apply formatting to all existing files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix _request_raw multipart Content-Type override (set None, not pop) - Replace time.sleep() with wait_for() in all fixtures and client - Add permissions: contents: write to code-format.yml - Add dummy VECTARA_API_KEY to pr-validation.yml for collection - Update README to reflect new structure, profiles, services - Core profile: 40/40 passed in 3:49 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New client methods: update_document_metadata, replace_document_metadata, query_corpus, index_document_parts. Extended upload_file with table_extraction_config and proper MIME type detection. New tests (6): - test_document_metadata_ops: multipart index, PATCH merge, PUT replace - test_custom_dimensions: custom dim boost (skips if plan unsupported) - test_file_upload: simple upload + PDF table extraction (skips if unavailable) 4/6 passed on staging, 2 skipped (plan limitations). 62 total tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New client methods: create_agent_session (extended with metadata/from_session), list_session_events, hide_event, unhide_event, get_agent_identity, update_agent_identity. New tests (9): - test_session_fork: fork copies events, fork empty, error cases - test_event_visibility: hide/unhide, nonexistent event 404 - test_agent_identity: get identity, update mode - test_agent_execution_streaming: SSE (skips if unsupported) 9/9 passed (1 SSE skipped - external API doesn't support text/event-stream). 71 total tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New client methods: list/create/delete LLMs, list/create/update/delete tools, list/create/delete pipelines, create/delete/enable/disable API keys. Fixed API contracts from OpenAPI spec: - LLMs: type=openai-compatible, auth={type:bearer,token:...} - Tools: type=lambda, code field, process() entry function - API keys: enable/disable via PATCH enabled field, require corpus_keys New tests (9): - llm: list + create/delete (skips on quota issues) - tools: list + create/update/delete lifecycle - pipelines: list (skips if unavailable) - auth: API key create/delete + disable/enable toggle - agents: session with metadata + send message 80 total tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- test_index_query_flow: create corpus -> index 3 docs -> semantic search -> RAG summary -> cleanup - test_agent_conversation_flow: create corpus -> seed -> create agent -> multi-turn chat -> verify events -> cleanup Both workflows fully self-contained with reverse-dependency cleanup. 2/2 passed. 82 total tests across all phases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixed 3 critical, 10 important, 5 suggestion issues from code review: - Metadata ops: verify PATCH persists key, PUT removes old keys + updates values - File upload: verify doc appears after upload, load expected JSON for table validation - API key lifecycle: verify key in list, verify disabled/enabled state via list - Agent sessions: verify metadata persisted, verify response has events - Session fork: verify event types match between source and fork - Workflows: verify top result relevance, summary non-empty, agent response has content - Tools/LLM/Pipelines: verify response structure keys, creation field values - Removed silent pytest.skip for creation failures (now asserts) 21/22 passed, 1 skipped (LLM quota). All assertions now verify actual behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add 27 client methods (app clients, users, chat turns, corpus ops, generation presets, rerankers, query streaming, compaction, etc.) - Fix 25 shallow assertions to verify actual response content - Port platform integration tests: app clients, query filters, chat turns/validation, corpus lifecycle, upload edge cases, streaming - Port Cypress smoke tests: users, agent config, corpus access, generation presets, rerankers, cross-corpus query, pagination, tools - Port AgentSessionIntegrationTest: session CRUD, update variants, compaction config, manual compaction, fork-with-compaction, error cases - Add new E2E tests: cross-corpus RAG workflow, FCS validation - Fix agent SSE test (endpoint needs Accept: application/json, not text/event-stream) - Fix test bugs: field names (id vs chat_id), filter level values, special chars in doc IDs, user API handle resolution API bug found: POST /v2/users returns empty email/username/description in create response (UsersServiceImpl.createUser doesn't do follow-up getUser like updateUser does). Verified against staging (api.vectara.dev): 131 pass, 4 skip (OpenAI quota, custom dims plan, staging agent API 500s). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The create user test now verifies that POST /v2/users echoes back the email and description fields in the response. This validates the fix for the bug where UsersServiceImpl.createUser() returned empty strings for these fields (because it read from the sparse manageUser gRPC response instead of doing a follow-up getUser call). This test will fail against unfixed staging and pass once the UsersServiceImpl fix is deployed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phase 2 (HIGH priority): - Agent with corpora_search tool — the #1 user journey - Multi-turn context preservation (3+ turns, session isolation) - Document lifecycle (index → query finds → delete → query doesn't find) - Deleted API key returns 401 Phase 3 (MEDIUM priority): - Query history tracking (list, verify structure, generation content) - Chat multi-turn deep verification (turn counts, IDs, content) - Multiple filter attribute types (text, integer, boolean) - Agent guardrails config persistence Phase 4 (LOWER priority): - Generation preset override (different presets, default vs explicit) - Query history filtering with limit and corpus_key Client: added tool_configurations to create_agent(), list_guardrails(), list_query_histories(), get_query_history() Verified against staging: 11/16 pass (agent API has transient 500s, guardrails API is internal-only, query history filter skips gracefully) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Core operations (agent/corpus/session/user create, API key create) should FAIL when they return errors, not silently skip. Skipping hides real API failures and gives false confidence. Skip is now only used for genuinely optional features: - OPENAI_API_KEY not set - Guardrails API (internal-only, 404) - Query history API (may not be available) - Key cache propagation timeout (90s) - Chat rephraser not configured If the agent API returns 500, all agent tests FAIL (correctly). Non-agent tests (corpus, query, chat, indexing) continue normally since pytest runs each test independently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…eted key test - test_list_chats: prod returns {"metadata": {...}} with no "chats" key when empty. Relaxed assertion to accept dict without "chats". - test_manual_compaction: added 3rd turn + wait_for events to be committed before compacting. Prod needs more turns and time to process. - Removed test_deleted_key_returns_401: API key cache propagation takes minutes on both staging and prod. Not testable in a fast suite. The security property (revoked keys stop working) is enforced by the platform but can't be verified within 90s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… field) - test_agent_guardrails: GET /v2/guardrails is x-internal, always 404 with external API keys. Will never pass in this test suite. - test_query_history_filter_by_corpus: API response doesn't include corpus_key in history entries, so filter can't be verified. Other query history tests (list + generation content) cover the feature. 171 tests remain. Only 2 expected skips: custom dimensions (plan limit), OpenAI LLM (quota). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Enhance test coverage: 56 → 171 tests with deep assertions and E2E flows

goharanwar and others added 5 commits April 3, 2026 04:42

goharanwar force-pushed the test-suite-restructure branch from c7a2291 to 14a089b Compare April 2, 2026 23:43

goharanwar and others added 21 commits April 3, 2026 04:55

Add CLAUDE.md, remove __init__.py, rename test modules and classes

3bef370

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add PR validation workflow, apply code formatting

cb89f4a

- Auto-format workflow: black (line-length 160) + isort, commits back - Validate job: pytest --collect-only to verify markers - Apply formatting to all existing files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Split CI into validation (any branch) and formatting (PRs to main)

6d16a59

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rename agent test files for consistency

365fa4a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add descriptive report filenames, add pytest-json-report dependency

45ba5c1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Apply code formatting (black + isort)

9cf14f3

Update CLAUDE.md: require meaningful assertions, not just HTTP status

ab2cd82

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request #3 from vectara/enhance-test-coverage

5856f87

Enhance test coverage: 56 → 171 tests with deep assertions and E2E flows

Apply code formatting (black + isort)

a642564

goharanwar merged commit a44346b into main Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure test suite: service directories, profiles, hybrid fixtures#2

Restructure test suite: service directories, profiles, hybrid fixtures#2
goharanwar merged 26 commits into
mainfrom
test-suite-restructure

goharanwar commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

goharanwar commented Apr 2, 2026

Summary

Test profiles

No new test coverage

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant