Skip to content

feat(connectors): retry transient Doris Stream Load failures in-request#3574

Open
ryankert01 wants to merge 1 commit into
apache:masterfrom
ryankert01:feat/doris-sink-in-request-retry
Open

feat(connectors): retry transient Doris Stream Load failures in-request#3574
ryankert01 wants to merge 1 commit into
apache:masterfrom
ryankert01:feat/doris-sink-in-request-retry

Conversation

@ryankert01

Copy link
Copy Markdown
Member

Which issue does this PR address?

Relates to #3215

Rationale

Follow-up to the Doris sink (#3215). The connector already classified transient Stream Load outcomes as retryable but had no retry path, so under the runtime's at-most-once delivery a transient backend blip silently dropped the batch.

What changed?

The Doris sink classified transient Stream Load outcomes (5xx/408/429, transport errors, Publish Timeout) as CannotStoreData but never retried them: the runtime commits the consumer offset at poll time before consume() runs and discards its return value, so a transient failure dropped the batch with no replay.

consume() now retries a transiently-failed batch in-request via a new load_batch() that wraps send plus status classification, re-PUTing under the same deterministic label so Doris dedupes a prior attempt that actually landed (e.g. a 2xx whose body could not be read). Permanent failures (4xx, Fail, schema/redirect problems) are never retried. Backoff and jitter come from iggy_connector_sdk::retry, bounded by new max_retries/retry_delay/max_retry_delay config (defaults 3 / 200ms / 5s). This shrinks the at-most-once window within a single poll; cross-poll and crash delivery stay a runtime concern.

Local Execution

  • Passed
  • Pre-commit hooks: checks run manually. The license-headers hook cannot execute on this machine (its script needs bash 4+ mapfile; only bash 3.2 is present), but hawkeye check passes directly and CI enforces it. markdownlint, taplo, cargo fmt, cargo clippy -p iggy_connector_doris_sink --all-targets -- -D warnings, and cargo test -p iggy_connector_doris_sink (41 tests) all pass.

AI Usage

  1. Claude Code (Anthropic).
  2. Implemented the retry loop, config plumbing, tests, and README/config updates, after verifying the runtime's at-most-once delivery semantics directly in the runtime source.
  3. Three new wiremock unit tests pin the behavior: transient-then-success (retry fires), exhausted-budget (exact attempt count via .expect), and permanent-not-retried. Full crate suite, clippy, and doc lint pass locally.
  4. Yes.

The Doris sink classified transient Stream Load outcomes (5xx/408/429,
transport errors, Publish Timeout) as retryable but never acted on them: the
runtime commits the consumer offset at poll time before consume() runs and
discards its return value, so a transient backend blip silently dropped the
batch under at-most-once delivery.

consume() now retries a transiently-failed batch in-request, re-PUTing under
the same deterministic label so Doris dedupes a prior attempt that actually
landed (e.g. a 2xx whose body could not be read). Permanent failures are never
retried. Backoff and jitter come from iggy_connector_sdk::retry, bounded by new
max_retries/retry_delay/max_retry_delay config (defaults 3/200ms/5s). This
shrinks the at-most-once window within a single poll; cross-poll and crash
delivery remain a runtime concern, not something a sink can fix.

Relates to apache#3215.
@github-actions

Copy link
Copy Markdown

Thanks for the PR. It is labeled S-waiting-on-review and queued for review.

Slash commands (own line, regular comment) move it around the queue:

  • /ready - back to S-waiting-on-review after addressing feedback
  • /author - flip to S-waiting-on-author while you finish changes
  • /request-review @user-or-team - request a reviewer

See CONTRIBUTING.md for details.

@github-actions github-actions Bot added the S-waiting-on-review PR is waiting on a reviewer label Jun 27, 2026
@codecov

codecov Bot commented Jun 27, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.10345% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.37%. Comparing base (307fdb1) to head (549e171).

Files with missing lines Patch % Lines
core/connectors/sinks/doris_sink/src/lib.rs 93.10% 5 Missing and 3 partials ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master    #3574       +/-   ##
=============================================
- Coverage     74.07%   46.37%   -27.70%     
  Complexity      937      937               
=============================================
  Files          1249     1246        -3     
  Lines        128248   111821    -16427     
  Branches     104116    87689    -16427     
=============================================
- Hits          94994    51853    -43141     
- Misses        30219    57300    +27081     
+ Partials       3035     2668      -367     
Components Coverage Δ
Rust Core 39.10% <93.10%> (-35.61%) ⬇️
Java SDK 62.44% <ø> (ø)
C# SDK 72.06% <ø> (ø)
Python SDK 88.88% <ø> (ø)
PHP SDK 84.29% <ø> (ø)
Node SDK 91.35% <ø> (ø)
Go SDK 40.14% <ø> (ø)
Files with missing lines Coverage Δ
core/connectors/sinks/doris_sink/src/lib.rs 93.12% <93.10%> (+0.77%) ⬆️

... and 352 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review PR is waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant