Skip to content

feat: unify config-driven retry across VLM and embedding#1049

Open
snemesh wants to merge 10 commits intovolcengine:mainfrom
snemesh:feature/unified-retry
Open

feat: unify config-driven retry across VLM and embedding#1049
snemesh wants to merge 10 commits intovolcengine:mainfrom
snemesh:feature/unified-retry

Conversation

@snemesh
Copy link
Copy Markdown
Contributor

@snemesh snemesh commented Mar 28, 2026

Summary

Closes #922

Unifies retry behavior across all VLM and embedding paths with a shared transient retry module, replacing scattered per-backend implementations.

Changes

  • New module openviking/models/retry.pyis_transient_error(), transient_retry(), transient_retry_async() with error classification (28 error types), exponential backoff with jitter, count-based retry
  • VLM — Remove function-level max_retries param from get_completion_async(). Retry is now config-driven via vlm.max_retries (default 3, was 2). Add retry to get_vision_completion_async() (was zero retry) and sync methods
  • Embedding — Add embedding.max_retries config (default 3). Apply unified retry to all 8 providers (OpenAI, Volcengine, VikingDB, Gemini, MiniMax, Jina, Voyage, LiteLLM)
  • SDK retry disabled everywhere to prevent double-retry explosion (SDK retry × our retry)
  • Kwargs migration — Switch VLM call chains from positional to keyword arguments (fixes pre-existing tool_choice/messages positional bug in vlm_config.py)
  • Backward compatibleexponential_backoff_retry() unchanged, max_retries=0 disables retry

Config

vlm:
  max_retries: 3    # default, was 2

embedding:
  max_retries: 3    # new field

Breaking Changes

  • max_retries parameter removed from VLMBase.get_completion_async(), StructuredVLM.complete_json_async(), StructuredVLM.complete_model_async(), VLMConfig.get_completion_async()
  • Retry is now fully config-driven

Error Classification

Retryable: HTTP 429/500/502/503/504, TooManyRequests, RateLimit, RequestBurstTooFast, ConnectionError, TimeoutError, openai.RateLimitError

Not retryable: HTTP 400/401/403/404/422, InvalidRequestError, AuthenticationError, unknown errors (conservative default)

Test Plan

  • 84 new tests across 5 files
  • Error classification: 28 parametrized cases (transient vs permanent)
  • Retry behavior: sync + async, backoff verification, edge cases
  • VLM backend integration: retry on 429, no retry on 401, vision retry, SDK disabled
  • Embedding provider integration: OpenAI + VikingDB retry verification
  • Config flow: end-to-end from ov.conf to backend instance
  • Backward compatibility: exponential_backoff_retry unchanged

snemesh added 7 commits March 28, 2026 02:31
Implements `openviking/models/retry.py` with `is_transient_error`,
`transient_retry`, and `transient_retry_async` — a single config-driven
retry layer replacing scattered per-backend implementations.
Adds 50 unit tests covering classification, backoff, jitter, exhaustion,
and custom predicates.
- VLMBase: change max_retries default 2→3, remove max_retries param from
  get_completion_async abstract signature
- OpenAI backend: wrap all 4 methods with transient_retry/transient_retry_async,
  disable SDK retry (max_retries=0 in client constructors), remove manual
  for-loop retry
- VolcEngine backend: same pattern — transient_retry for all methods,
  remove manual for-loop retry
- LiteLLM backend: same pattern — transient_retry for all methods,
  remove manual for-loop retry
volcengine#922)

- VLMConfig: default max_retries 2→3, remove max_retries from
  get_completion_async signature, switch all wrappers to kwargs
- StructuredVLM (llm.py): remove max_retries from complete_json_async and
  complete_model_async, switch all internal calls to kwargs
- memory_react.py: remove max_retries=self.vlm.max_retries (now handled
  internally by backend)
- Update test stubs to match new signatures (remove max_retries=0)
Tests cover OpenAI backend as representative:
- Completion retries on 429, does NOT retry on 401
- Vision completion now retries (was zero before)
- Config max_retries is used (default=3)
- max_retries removed from get_completion_async signature (all backends)
- OpenAI SDK retry disabled (max_retries=0 in client constructors)
- EmbeddingConfig: новое поле max_retries (default=3) для конфигурации retry
- EmbeddingConfig._create_embedder(): инжектирует max_retries в params["config"]
- EmbedderBase.__init__(): извлекает max_retries из config dict
- OpenAI: отключить SDK retry (max_retries=0), обернуть embed/embed_batch
- Volcengine: заменить exponential_backoff_retry на transient_retry, убрать is_429_error
- VikingDB: добавить transient_retry (ранее retry отсутствовал)
- Gemini: отключить SDK HttpRetryOptions (attempts=1), обернуть embed/embed_batch
- MiniMax: отключить urllib3 Retry (total=0), обернуть embed/embed_batch
- Jina: отключить SDK retry (max_retries=0), обернуть embed/embed_batch
- Voyage: отключить SDK retry (max_retries=0), обернуть embed/embed_batch
- LiteLLM: обернуть litellm.embedding() вызовы

Все провайдеры теперь используют единый transient_retry с is_transient_error
для классификации ошибок. Wrapper размещён ВНУТРИ метода вокруг raw API call,
ДО try/except который конвертирует в RuntimeError.
- test_embedding_retry_integration: OpenAI и VikingDB retry на transient/permanent ошибки
- test_retry_config: VLMConfig и EmbeddingConfig max_retries поля и defaults
- test_backward_compat: exponential_backoff_retry importable, signature unchanged, time-based
@github-actions
Copy link
Copy Markdown

Failed to generate code suggestions for PR

@qin-ctx qin-ctx self-assigned this Mar 28, 2026
@snemesh snemesh force-pushed the feature/unified-retry branch from e73415d to b59fd97 Compare March 28, 2026 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

[Feature]: Unify config-driven retry across VLM and embedding

2 participants