feat(semantic): add exponential backoff retry for LLM rate limiting by mvanhorn · Pull Request #889 · volcengine/OpenViking

mvanhorn · 2026-03-23T06:42:16Z

Description

Add retry with exponential backoff for LLM calls in the semantic processor. When rate-limited (429/TooManyRequests/RequestBurstTooFast), calls now retry up to 3 times with increasing delays (0.5s, 1s, 2s + jitter) instead of failing permanently.

Related Issue

Relates to #350

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Test update

Changes Made

Added _llm_with_retry() helper to SemanticProcessor class
Replaced 4 bare vlm.get_completion_async() calls with retrying wrapper
Added import random for jitter calculation

Why this matters

Users batch-indexing large directories hit RequestBurstTooFast 429 errors from LLM providers during summarization. @sponge225 reported in #350 that 281 Markdown files (4,435 sections) consistently trigger rate limiting with Doubao 2.0, causing partial indexing failures.

The embedding path already has exponential_backoff_retry in volcengine_embedders.py, but the LLM summarization path in semantic_processor.py had zero retry logic. A prior attempt to add this (PR #568) was closed because the contributor didn't sign the CLA, not because of technical rejection.

The retry helper detects rate limit errors by checking for "429", "TooManyRequests", "RateLimit", and "RequestBurstTooFast" in the exception string, matching the pattern used in volcengine_embedders.py:is_429_error(). On persistent failure, it returns an empty string for graceful degradation (the file gets indexed without a summary rather than crashing the pipeline).

Testing

I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have tested this on the following platforms:
- Linux
- macOS
- Windows

Post-build dogfooding skipped (score 6/10). Tested via code review and ruff linting only. Full OpenViking install requires system dependencies not available in this environment.

This contribution was developed with AI assistance (Claude Code).

Wrap LLM calls in semantic_processor.py with retry logic that handles 429/TooManyRequests/RequestBurstTooFast errors with exponential backoff and jitter. Previously, a single rate limit error during batch ingestion caused permanent failure. Now retries up to 3 times with delays of 0.5s, 1s, 2s before giving up gracefully. Addresses the ingestion pain reported in volcengine#350 where 4435 sections trigger RequestBurstTooFast with Doubao 2.0. Relates to volcengine#350 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-23T06:45:03Z

Failed to generate code suggestions for PR

qin-ctx · 2026-03-24T04:09:15Z

Thanks for the PR. The problem it targets is real, and the retry behavior here is useful, especially for handling rate-limited semantic indexing workloads.

We’re going to close this PR for now, not because the approach is wrong, but because we want to address retry at a lower level instead of adding more business-layer wrappers like _llm_with_retry() in individual modules.

Our follow-up plan is to unify retry handling across both VLM and embedding paths:

implement retry policy in the provider/backend layer
allow callers to opt in with explicit retry parameters, while falling back to config defaults when not provided
add config-driven retry settings for things like max_retries, backoff, jitter, and retryable error classes
make sure all relevant call paths use the same mechanism consistently

That said, we do want to borrow from this PR’s design, especially:

detection of rate-limit errors such as 429, TooManyRequests, RateLimit, and RequestBurstTooFast
exponential backoff with jitter
graceful degradation on persistent failure instead of taking down the whole pipeline

Thanks again for the contribution. We’ll use this as input for the broader retry refactor.

github-project-automation bot added this to OpenViking project Mar 23, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 23, 2026

mvanhorn mentioned this pull request Mar 23, 2026

[Feature]: Decoupling Ingestion from Indexing & Summarization #350

Open

1 task

lazmo88 mentioned this pull request Mar 23, 2026

[Feature]: Batch multiple file summaries per VLM call to reduce RPM pressure #907

Open

qin-ctx self-assigned this Mar 24, 2026

qin-ctx closed this Mar 24, 2026

github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 24, 2026

qin-ctx mentioned this pull request Mar 24, 2026

[Feature]: Unify config-driven retry across VLM and embedding #922

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(semantic): add exponential backoff retry for LLM rate limiting#889

feat(semantic): add exponential backoff retry for LLM rate limiting#889
mvanhorn wants to merge 1 commit intovolcengine:mainfrom
mvanhorn:osc/350-feat-semantic-processor-llm-backoff

mvanhorn commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

qin-ctx commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mvanhorn commented Mar 23, 2026

Description

Related Issue

Type of Change

Changes Made

Why this matters

Testing

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

qin-ctx commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants