Skip to content

Restructure test suite: service directories, profiles, hybrid fixtures#2

Merged
goharanwar merged 26 commits into
mainfrom
test-suite-restructure
Apr 9, 2026
Merged

Restructure test suite: service directories, profiles, hybrid fixtures#2
goharanwar merged 26 commits into
mainfrom
test-suite-restructure

Conversation

@goharanwar

Copy link
Copy Markdown
Contributor

Summary

  • Restructure 56 tests from 5 flat files into 6 service directories (tests/services/auth|corpus|indexing|query|chat|agents/)
  • Add depth markers (sanity/core/regression) with enforcement — unmarked tests fail collection
  • New runner flags: --profile sanity|core|regression|full and --service auth,corpus,...
  • Hybrid fixture scoping: module-scoped shared corpora for read-heavy tests, function-scoped for CRUD tests
  • Client extensions: multipart upload, SSE streaming, raw response support
  • Async helpers: wait_for() polling, SSE event reader
  • Two-phase parallel: services in parallel, workflows sequential

Test profiles

Profile Tests Use case
sanity 7 Fast deploy gate (~30s)
core 40 Post-deploy verification (~4-5 min)
regression 56 Edge cases, nightly
full 56 + workflows Everything

No new test coverage

This is a pure restructure — same 56 tests, reorganized with markers. New coverage (agents sessions/fork/events, tools, instructions, pipelines, etc.) comes in a follow-up PR.

Test plan

  • --profile sanity — 7 passed, 0 failed (~39s)
  • --profile core — 27 passed, 0 failed, 13 skipped due to staging DNS outage (~4:22)
  • Marker enforcement — unmarked tests fail collection
  • --service auth — selects only auth tests
  • Backward compat --tests auth,corpus still works
  • Full clean run when staging is stable

🤖 Generated with Claude Code

goharanwar and others added 5 commits April 3, 2026 04:42
- utils/waiters.py: wait_for() polling helper, read_sse_events() parser
- utils/client.py: _request_raw(), upload_file(), execute_agent_sse()
- tests/conftest.py: marker registration, depth enforcement, shared fixtures
- tests/services/conftest.py: test_corpus and seeded_corpus fixtures
- tests/workflows/: placeholder structure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- tests/services/auth/: health check, permissions (from test_01)
- tests/services/corpus/: CRUD, filter attrs, pagination (from test_02)
- tests/services/indexing/: single doc, metadata, large docs (from test_03)
- tests/services/query/: semantic search, RAG, filtering (from test_04)
- tests/services/chat/: create, list, turn, delete (from test_04)
- tests/services/agents/: CRUD, execution, sessions (from test_05)
- Delete old test files and root conftest
- 56 tests collected, marker filtering verified (7 sanity, 40 core, 56 total)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- --profile sanity|core|regression|full (default: core)
- --service for comma-separated service selection by directory
- --tests kept as deprecated alias with warning
- Two-phase parallel: services in parallel, workflows sequential
- Updated Rich table to show profile/service configuration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The old approach derived corpus key from name which could collide
with leftover corpora from interrupted test runs. Now uses a full
uuid4 hex string as the corpus key.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add module-scoped shared_corpus and seeded_shared_corpus for
  read-heavy services (query, chat, indexing)
- Add module-scoped shared_agent_corpus and shared_agent for
  agent execution tests
- Keep function-scoped test_corpus for corpus CRUD tests
- Agent CRUD tests create their own agents (function-scoped)
- Fix corpus key collisions: use full UUID keys in all corpus
  creation tests
- Results: 27/40 passed, 0 failed, 13 skipped (staging DNS issue)
- Time: 4:22 (down from 14:18 with function-scoped)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@goharanwar goharanwar force-pushed the test-suite-restructure branch from c7a2291 to 14a089b Compare April 2, 2026 23:43
goharanwar and others added 21 commits April 3, 2026 04:55
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Auto-format workflow: black (line-length 160) + isort, commits back
- Validate job: pytest --collect-only to verify markers
- Apply formatting to all existing files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix _request_raw multipart Content-Type override (set None, not pop)
- Replace time.sleep() with wait_for() in all fixtures and client
- Add permissions: contents: write to code-format.yml
- Add dummy VECTARA_API_KEY to pr-validation.yml for collection
- Update README to reflect new structure, profiles, services
- Core profile: 40/40 passed in 3:49

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New client methods: update_document_metadata, replace_document_metadata,
query_corpus, index_document_parts. Extended upload_file with
table_extraction_config and proper MIME type detection.

New tests (6):
- test_document_metadata_ops: multipart index, PATCH merge, PUT replace
- test_custom_dimensions: custom dim boost (skips if plan unsupported)
- test_file_upload: simple upload + PDF table extraction (skips if unavailable)

4/6 passed on staging, 2 skipped (plan limitations). 62 total tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New client methods: create_agent_session (extended with metadata/from_session),
list_session_events, hide_event, unhide_event, get_agent_identity,
update_agent_identity.

New tests (9):
- test_session_fork: fork copies events, fork empty, error cases
- test_event_visibility: hide/unhide, nonexistent event 404
- test_agent_identity: get identity, update mode
- test_agent_execution_streaming: SSE (skips if unsupported)

9/9 passed (1 SSE skipped - external API doesn't support text/event-stream).
71 total tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New client methods: list/create/delete LLMs, list/create/update/delete tools,
list/create/delete pipelines, create/delete/enable/disable API keys.

Fixed API contracts from OpenAPI spec:
- LLMs: type=openai-compatible, auth={type:bearer,token:...}
- Tools: type=lambda, code field, process() entry function
- API keys: enable/disable via PATCH enabled field, require corpus_keys

New tests (9):
- llm: list + create/delete (skips on quota issues)
- tools: list + create/update/delete lifecycle
- pipelines: list (skips if unavailable)
- auth: API key create/delete + disable/enable toggle
- agents: session with metadata + send message

80 total tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_index_query_flow: create corpus -> index 3 docs -> semantic search -> RAG summary -> cleanup
- test_agent_conversation_flow: create corpus -> seed -> create agent -> multi-turn chat -> verify events -> cleanup

Both workflows fully self-contained with reverse-dependency cleanup.
2/2 passed. 82 total tests across all phases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixed 3 critical, 10 important, 5 suggestion issues from code review:
- Metadata ops: verify PATCH persists key, PUT removes old keys + updates values
- File upload: verify doc appears after upload, load expected JSON for table validation
- API key lifecycle: verify key in list, verify disabled/enabled state via list
- Agent sessions: verify metadata persisted, verify response has events
- Session fork: verify event types match between source and fork
- Workflows: verify top result relevance, summary non-empty, agent response has content
- Tools/LLM/Pipelines: verify response structure keys, creation field values
- Removed silent pytest.skip for creation failures (now asserts)

21/22 passed, 1 skipped (LLM quota). All assertions now verify actual behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 27 client methods (app clients, users, chat turns, corpus ops,
  generation presets, rerankers, query streaming, compaction, etc.)
- Fix 25 shallow assertions to verify actual response content
- Port platform integration tests: app clients, query filters, chat
  turns/validation, corpus lifecycle, upload edge cases, streaming
- Port Cypress smoke tests: users, agent config, corpus access,
  generation presets, rerankers, cross-corpus query, pagination, tools
- Port AgentSessionIntegrationTest: session CRUD, update variants,
  compaction config, manual compaction, fork-with-compaction, error cases
- Add new E2E tests: cross-corpus RAG workflow, FCS validation
- Fix agent SSE test (endpoint needs Accept: application/json, not
  text/event-stream)
- Fix test bugs: field names (id vs chat_id), filter level values,
  special chars in doc IDs, user API handle resolution

API bug found: POST /v2/users returns empty email/username/description
in create response (UsersServiceImpl.createUser doesn't do follow-up
getUser like updateUser does).

Verified against staging (api.vectara.dev): 131 pass, 4 skip (OpenAI
quota, custom dims plan, staging agent API 500s).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The create user test now verifies that POST /v2/users echoes back the
email and description fields in the response. This validates the fix
for the bug where UsersServiceImpl.createUser() returned empty strings
for these fields (because it read from the sparse manageUser gRPC
response instead of doing a follow-up getUser call).

This test will fail against unfixed staging and pass once the
UsersServiceImpl fix is deployed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 2 (HIGH priority):
- Agent with corpora_search tool — the #1 user journey
- Multi-turn context preservation (3+ turns, session isolation)
- Document lifecycle (index → query finds → delete → query doesn't find)
- Deleted API key returns 401

Phase 3 (MEDIUM priority):
- Query history tracking (list, verify structure, generation content)
- Chat multi-turn deep verification (turn counts, IDs, content)
- Multiple filter attribute types (text, integer, boolean)
- Agent guardrails config persistence

Phase 4 (LOWER priority):
- Generation preset override (different presets, default vs explicit)
- Query history filtering with limit and corpus_key

Client: added tool_configurations to create_agent(), list_guardrails(),
list_query_histories(), get_query_history()

Verified against staging: 11/16 pass (agent API has transient 500s,
guardrails API is internal-only, query history filter skips gracefully)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Core operations (agent/corpus/session/user create, API key create)
should FAIL when they return errors, not silently skip. Skipping
hides real API failures and gives false confidence.

Skip is now only used for genuinely optional features:
- OPENAI_API_KEY not set
- Guardrails API (internal-only, 404)
- Query history API (may not be available)
- Key cache propagation timeout (90s)
- Chat rephraser not configured

If the agent API returns 500, all agent tests FAIL (correctly).
Non-agent tests (corpus, query, chat, indexing) continue normally
since pytest runs each test independently.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eted key test

- test_list_chats: prod returns {"metadata": {...}} with no "chats" key
  when empty. Relaxed assertion to accept dict without "chats".
- test_manual_compaction: added 3rd turn + wait_for events to be committed
  before compacting. Prod needs more turns and time to process.
- Removed test_deleted_key_returns_401: API key cache propagation takes
  minutes on both staging and prod. Not testable in a fast suite. The
  security property (revoked keys stop working) is enforced by the
  platform but can't be verified within 90s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… field)

- test_agent_guardrails: GET /v2/guardrails is x-internal, always 404
  with external API keys. Will never pass in this test suite.
- test_query_history_filter_by_corpus: API response doesn't include
  corpus_key in history entries, so filter can't be verified. Other
  query history tests (list + generation content) cover the feature.

171 tests remain. Only 2 expected skips: custom dimensions (plan limit),
OpenAI LLM (quota).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enhance test coverage: 56 → 171 tests with deep assertions and E2E flows
@goharanwar goharanwar merged commit a44346b into main Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant