feat: unify config-driven retry across VLM and embedding by snemesh · Pull Request #1049 · volcengine/OpenViking

snemesh · 2026-03-28T02:00:00Z

Summary

Closes #922

Unifies retry behavior across all VLM and embedding paths with a shared transient retry module, replacing scattered per-backend implementations.

Changes

New module openviking/models/retry.py — is_transient_error(), transient_retry(), transient_retry_async() with error classification (28 error types), exponential backoff with jitter, count-based retry
VLM — Remove function-level max_retries param from get_completion_async(). Retry is now config-driven via vlm.max_retries (default 3, was 2). Add retry to get_vision_completion_async() (was zero retry) and sync methods
Embedding — Add embedding.max_retries config (default 3). Apply unified retry to all 8 providers (OpenAI, Volcengine, VikingDB, Gemini, MiniMax, Jina, Voyage, LiteLLM)
SDK retry disabled everywhere to prevent double-retry explosion (SDK retry × our retry)
Kwargs migration — Switch VLM call chains from positional to keyword arguments (fixes pre-existing tool_choice/messages positional bug in vlm_config.py)
Backward compatible — exponential_backoff_retry() unchanged, max_retries=0 disables retry

Config

vlm:
  max_retries: 3    # default, was 2

embedding:
  max_retries: 3    # new field

Breaking Changes

max_retries parameter removed from VLMBase.get_completion_async(), StructuredVLM.complete_json_async(), StructuredVLM.complete_model_async(), VLMConfig.get_completion_async()
Retry is now fully config-driven

Error Classification

Retryable: HTTP 429/500/502/503/504, TooManyRequests, RateLimit, RequestBurstTooFast, ConnectionError, TimeoutError, openai.RateLimitError

Not retryable: HTTP 400/401/403/404/422, InvalidRequestError, AuthenticationError, unknown errors (conservative default)

Test Plan

84 new tests across 5 files
Error classification: 28 parametrized cases (transient vs permanent)
Retry behavior: sync + async, backoff verification, edge cases
VLM backend integration: retry on 429, no retry on 401, vision retry, SDK disabled
Embedding provider integration: OpenAI + VikingDB retry verification
Config flow: end-to-end from ov.conf to backend instance
Backward compatibility: exponential_backoff_retry unchanged

Implements `openviking/models/retry.py` with `is_transient_error`, `transient_retry`, and `transient_retry_async` — a single config-driven retry layer replacing scattered per-backend implementations. Adds 50 unit tests covering classification, backoff, jitter, exhaustion, and custom predicates.

- VLMBase: change max_retries default 2→3, remove max_retries param from get_completion_async abstract signature - OpenAI backend: wrap all 4 methods with transient_retry/transient_retry_async, disable SDK retry (max_retries=0 in client constructors), remove manual for-loop retry - VolcEngine backend: same pattern — transient_retry for all methods, remove manual for-loop retry - LiteLLM backend: same pattern — transient_retry for all methods, remove manual for-loop retry

volcengine#922) - VLMConfig: default max_retries 2→3, remove max_retries from get_completion_async signature, switch all wrappers to kwargs - StructuredVLM (llm.py): remove max_retries from complete_json_async and complete_model_async, switch all internal calls to kwargs - memory_react.py: remove max_retries=self.vlm.max_retries (now handled internally by backend) - Update test stubs to match new signatures (remove max_retries=0)

Tests cover OpenAI backend as representative: - Completion retries on 429, does NOT retry on 401 - Vision completion now retries (was zero before) - Config max_retries is used (default=3) - max_retries removed from get_completion_async signature (all backends) - OpenAI SDK retry disabled (max_retries=0 in client constructors)

- EmbeddingConfig: новое поле max_retries (default=3) для конфигурации retry - EmbeddingConfig._create_embedder(): инжектирует max_retries в params["config"] - EmbedderBase.__init__(): извлекает max_retries из config dict

- OpenAI: отключить SDK retry (max_retries=0), обернуть embed/embed_batch - Volcengine: заменить exponential_backoff_retry на transient_retry, убрать is_429_error - VikingDB: добавить transient_retry (ранее retry отсутствовал) - Gemini: отключить SDK HttpRetryOptions (attempts=1), обернуть embed/embed_batch - MiniMax: отключить urllib3 Retry (total=0), обернуть embed/embed_batch - Jina: отключить SDK retry (max_retries=0), обернуть embed/embed_batch - Voyage: отключить SDK retry (max_retries=0), обернуть embed/embed_batch - LiteLLM: обернуть litellm.embedding() вызовы Все провайдеры теперь используют единый transient_retry с is_transient_error для классификации ошибок. Wrapper размещён ВНУТРИ метода вокруг raw API call, ДО try/except который конвертирует в RuntimeError.

- test_embedding_retry_integration: OpenAI и VikingDB retry на transient/permanent ошибки - test_retry_config: VLMConfig и EmbeddingConfig max_retries поля и defaults - test_backward_compat: exponential_backoff_retry importable, signature unchanged, time-based

github-actions · 2026-03-28T02:00:57Z

Failed to generate code suggestions for PR

snemesh added 7 commits March 28, 2026 02:31

github-project-automation bot added this to OpenViking project Mar 28, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 28, 2026

qin-ctx self-assigned this Mar 28, 2026

snemesh force-pushed the feature/unified-retry branch from e73415d to b59fd97 Compare March 28, 2026 10:32

snemesh added 3 commits March 28, 2026 11:42

style: ruff format для всех изменённых файлов

f10946b

style: fix ruff lint errors (import sorting, unused imports)

f1c45a4

style: format test_embedding_retry_integration.py

6f9439b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: unify config-driven retry across VLM and embedding#1049

feat: unify config-driven retry across VLM and embedding#1049
snemesh wants to merge 10 commits intovolcengine:mainfrom
snemesh:feature/unified-retry

snemesh commented Mar 28, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

snemesh commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Config

Breaking Changes

Error Classification

Test Plan

Uh oh!

github-actions bot commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

snemesh commented Mar 28, 2026 •

edited

Loading