feat: unify config-driven retry across VLM and embedding#1049
Open
snemesh wants to merge 10 commits intovolcengine:mainfrom
Open
feat: unify config-driven retry across VLM and embedding#1049snemesh wants to merge 10 commits intovolcengine:mainfrom
snemesh wants to merge 10 commits intovolcengine:mainfrom
Conversation
Implements `openviking/models/retry.py` with `is_transient_error`, `transient_retry`, and `transient_retry_async` — a single config-driven retry layer replacing scattered per-backend implementations. Adds 50 unit tests covering classification, backoff, jitter, exhaustion, and custom predicates.
- VLMBase: change max_retries default 2→3, remove max_retries param from get_completion_async abstract signature - OpenAI backend: wrap all 4 methods with transient_retry/transient_retry_async, disable SDK retry (max_retries=0 in client constructors), remove manual for-loop retry - VolcEngine backend: same pattern — transient_retry for all methods, remove manual for-loop retry - LiteLLM backend: same pattern — transient_retry for all methods, remove manual for-loop retry
volcengine#922) - VLMConfig: default max_retries 2→3, remove max_retries from get_completion_async signature, switch all wrappers to kwargs - StructuredVLM (llm.py): remove max_retries from complete_json_async and complete_model_async, switch all internal calls to kwargs - memory_react.py: remove max_retries=self.vlm.max_retries (now handled internally by backend) - Update test stubs to match new signatures (remove max_retries=0)
Tests cover OpenAI backend as representative: - Completion retries on 429, does NOT retry on 401 - Vision completion now retries (was zero before) - Config max_retries is used (default=3) - max_retries removed from get_completion_async signature (all backends) - OpenAI SDK retry disabled (max_retries=0 in client constructors)
- EmbeddingConfig: новое поле max_retries (default=3) для конфигурации retry - EmbeddingConfig._create_embedder(): инжектирует max_retries в params["config"] - EmbedderBase.__init__(): извлекает max_retries из config dict
- OpenAI: отключить SDK retry (max_retries=0), обернуть embed/embed_batch - Volcengine: заменить exponential_backoff_retry на transient_retry, убрать is_429_error - VikingDB: добавить transient_retry (ранее retry отсутствовал) - Gemini: отключить SDK HttpRetryOptions (attempts=1), обернуть embed/embed_batch - MiniMax: отключить urllib3 Retry (total=0), обернуть embed/embed_batch - Jina: отключить SDK retry (max_retries=0), обернуть embed/embed_batch - Voyage: отключить SDK retry (max_retries=0), обернуть embed/embed_batch - LiteLLM: обернуть litellm.embedding() вызовы Все провайдеры теперь используют единый transient_retry с is_transient_error для классификации ошибок. Wrapper размещён ВНУТРИ метода вокруг raw API call, ДО try/except который конвертирует в RuntimeError.
- test_embedding_retry_integration: OpenAI и VikingDB retry на transient/permanent ошибки - test_retry_config: VLMConfig и EmbeddingConfig max_retries поля и defaults - test_backward_compat: exponential_backoff_retry importable, signature unchanged, time-based
|
Failed to generate code suggestions for PR |
e73415d to
b59fd97
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #922
Unifies retry behavior across all VLM and embedding paths with a shared transient retry module, replacing scattered per-backend implementations.
Changes
openviking/models/retry.py—is_transient_error(),transient_retry(),transient_retry_async()with error classification (28 error types), exponential backoff with jitter, count-based retrymax_retriesparam fromget_completion_async(). Retry is now config-driven viavlm.max_retries(default 3, was 2). Add retry toget_vision_completion_async()(was zero retry) and sync methodsembedding.max_retriesconfig (default 3). Apply unified retry to all 8 providers (OpenAI, Volcengine, VikingDB, Gemini, MiniMax, Jina, Voyage, LiteLLM)tool_choice/messagespositional bug invlm_config.py)exponential_backoff_retry()unchanged,max_retries=0disables retryConfig
Breaking Changes
max_retriesparameter removed fromVLMBase.get_completion_async(),StructuredVLM.complete_json_async(),StructuredVLM.complete_model_async(),VLMConfig.get_completion_async()Error Classification
Retryable: HTTP 429/500/502/503/504,
TooManyRequests,RateLimit,RequestBurstTooFast,ConnectionError,TimeoutError,openai.RateLimitErrorNot retryable: HTTP 400/401/403/404/422,
InvalidRequestError,AuthenticationError, unknown errors (conservative default)Test Plan
exponential_backoff_retryunchanged