feat(sdk): support per-request timeout to TCP/QUIC/WebSocket clients by chengxilo · Pull Request #3429 · apache/iggy

chengxilo · 2026-06-06T02:06:06Z

Which issue does this PR address?

Rationale

Clients had no deadline on individual request, so a stalled server could block a caller forever.

What changed?

send_raw would be blocked forever if server doesn't response. I added a request_timeout field (IggyDuration, default 300s) to TcpClientConfig, QuicClientConfig, and WebSocketClientConfig. Each client's send_raw now wraps its I/O in tokio::time::timeout

Local Execution

Passed
Pre-commit hooks ran

AI Usage

Claude Opus 4.6
implementation
manually reviewed
Yes

Change Log

By default, Iggy Rust SDK would have a 30s timeout for each request.

github-actions · 2026-06-06T02:06:19Z

Thanks for the PR. It is labeled S-waiting-on-review and queued for review.

Slash commands (own line, regular comment) move it around the queue:

/ready - back to S-waiting-on-review after addressing feedback
/author - flip to S-waiting-on-author while you finish changes
/request-review @user-or-team - request a reviewer

See CONTRIBUTING.md for details.

codecov · 2026-06-06T02:07:59Z

Codecov Report

❌ Patch coverage is 72.31834% with 80 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.44%. Comparing base (74d62eb) to head (1c827e8).
⚠️ Report is 12 commits behind head on master.

Files with missing lines	Patch %	Lines
core/sdk/src/quic/quic_client.rs	57.89%	13 Missing and 3 partials ⚠️
core/sdk/src/websocket/websocket_client.rs	75.00%	10 Missing and 5 partials ⚠️
core/sdk/src/client_provider.rs	65.78%	5 Missing and 8 partials ⚠️
core/sdk/src/clients/client_builder.rs	0.00%	12 Missing ⚠️
core/sdk/src/tcp/tcp_client.rs	82.25%	4 Missing and 7 partials ⚠️
...tion/quic_config/quic_connection_string_options.rs	69.23%	2 Missing and 2 partials ⚠️
...ration/tcp_config/tcp_connection_string_options.rs	69.23%	2 Missing and 2 partials ⚠️
core/common/src/types/args/mod.rs	25.00%	1 Missing and 2 partials ⚠️
...cket_config/websocket_connection_string_options.rs	71.42%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #3429      +/-   ##
============================================
- Coverage     74.41%   72.44%   -1.98%     
  Complexity      937      937              
============================================
  Files          1243     1243              
  Lines        125987   121394    -4593     
  Branches     101854    97305    -4549     
============================================
- Hits          93756    87942    -5814     
- Misses        29218    30167     +949     
- Partials       3013     3285     +272

Components	Coverage Δ
Rust Core	`72.76% <72.31%> (-2.41%)`	⬇️
Java SDK	`62.44% <ø> (ø)`
C# SDK	`71.41% <ø> (-0.70%)`	⬇️
Python SDK	`88.88% <ø> (ø)`
PHP SDK	`84.29% <ø> (ø)`
Node SDK	`91.13% <ø> (ø)`
Go SDK	`40.14% <ø> (ø)`

Files with missing lines	Coverage Δ
core/common/src/error/iggy_error.rs	`100.00% <ø> (ø)`
...es/configuration/quic_config/quic_client_config.rs	`100.00% <100.00%> (ø)`
...guration/quic_config/quic_client_config_builder.rs	`36.60% <100.00%> (+7.60%)`	⬆️
...ypes/configuration/tcp_config/tcp_client_config.rs	`100.00% <100.00%> (ø)`
...figuration/tcp_config/tcp_client_config_builder.rs	`71.55% <100.00%> (+3.51%)`	⬆️
...ration/websocket_config/websocket_client_config.rs	`62.35% <100.00%> (+0.90%)`	⬆️
...ebsocket_config/websocket_client_config_builder.rs	`37.96% <100.00%> (+7.75%)`	⬆️
...cket_config/websocket_connection_string_options.rs	`58.51% <71.42%> (+0.70%)`	⬆️
core/common/src/types/args/mod.rs	`45.51% <25.00%> (-0.54%)`	⬇️
...tion/quic_config/quic_connection_string_options.rs	`69.07% <69.23%> (+0.01%)`	⬆️
... and 6 more

... and 121 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

chengxilo · 2026-06-17T01:10:04Z

/ready

chengxilo · 2026-06-21T04:29:29Z

/author

chengxilo · 2026-06-21T04:29:44Z

After merging there is something need to be changed.

chengxilo · 2026-06-21T08:55:55Z

/ready

hubcio

a few findings that don't map onto changed lines:

the consumer poll loop error classifier (consumer.rs) only treats Disconnected | Unauthenticated | StaleClient as connection errors, so a RequestTimeout (and NotConnected) falls through as a raw error with can_poll left true and no backoff, on every transport. this is the real root of the quic no-recovery behavior noted in the quic comment. consumer.rs isn't touched by this PR so flagging it here.
the PR description says default 300s but the code is 30s everywhere (a later commit reduced it). update the description.
no test for the is_zero() no-timeout branch, nor for the vsr session-reset-on-timeout path.
the timeout is rust-sdk only; java/.net/python/go/c++/node don't get it - worth filing parity issues later on

hubcio · 2026-06-23T07:00:57Z

+            if request_timeout.is_zero() {
+                io.await
+            } else {
+                match tokio::time::timeout(request_timeout.get_duration(), io).await {


the timeout wraps the whole io future including stream.lock().await, so the lock-acquire wait counts against the request budget. on master the read deadline was set after lock+write+flush, timing only the response read. since the heartbeat ping shares this same per-transport stream mutex, a request queued behind a stalled in-flight one can hit RequestTimeout on lock-wait alone and then tear down the shared connection, cascading to every queued request. acquire the stream lock outside the timed region and time only the io. same in quic_client.rs and websocket_client.rs.

hubcio · 2026-06-23T07:00:57Z

        })?;

-        if matches!(result, Err(IggyError::Disconnected)) {
+        if matches!(


this teardown nulls the stream and sets Disconnected directly, bypassing disconnect(), so publish_event(DiagnosticEvent::Disconnected) never fires on a timeout - diagnostic subscribers miss every timeout disconnect. routing the timeout teardown through disconnect() would emit the event from one shared path. same gap on websocket (set_state in the timeout arm) and quic (no set_state at all).

hubcio · 2026-06-23T07:00:57Z

+                    Ok(result) => result,
+                    Err(_) => {
+                        // Reset to prevent response desync on the shared stream.
+                        *stream.lock().await = None;


this inner *stream.lock().await = None is redundant with the outer self.stream.lock().await.take() that already runs for RequestTimeout(_). only the consensus_session reset is unique to this block. dropping the inner null and letting the single outer teardown handle stream + state removes the split-lock window and the duplicate work.

hubcio · 2026-06-23T07:00:57Z

+                        *stream.lock().await = None;
+                        #[cfg(feature = "vsr")]
+                        {
+                            *consensus_session


resetting consensus_session mints a fresh client_id. replicated metadata ops are deduped server-side on (client_id, session, request), so a mutation retried after a timed-out-but-committed op runs under the new client_id, misses dedup, and can double-apply. send_messages/partition writes are unaffected (deduped by message id). preserve the client_id across the timeout-reconnect, or suppress transparent retry of replicated mutations after a session reset. vsr-gated so not a blocker for the non-vsr build, but worth fixing before vsr ships.

hubcio · 2026-06-23T07:00:57Z

+            if request_timeout.is_zero() {
+                io.await
+            } else {
+                match tokio::time::timeout(request_timeout.get_duration(), io).await {


unlike tcp/ws, the quic timeout path resets consensus_session (vsr) but never calls set_state(Disconnected) or closes the connection - state stays Connected. RequestTimeout is also absent from the quic retry list, so a quic client never auto-reconnects after a timeout (tcp/ws recover because their next call sees NotConnected). under a persistent server stall the consumer just re-opens open_bi() and re-times-out every request_timeout, surfacing the error each cycle without recovering. fix: set_state(Disconnected) here too, or add RequestTimeout to the retry list.

hubcio · 2026-06-23T07:00:57Z

+                    request_timeout: parse_duration(&args.request_timeout)?,
                }));
            }
            TransportProtocol::Http => {


the http arm builds HttpClientConfig without request_timeout and sets no reqwest timeout, so --request-timeout is silently dropped when --transport http is used, while it applies to the other three transports. wire it into the reqwest client or reject the combination explicitly.

hubcio · 2026-06-23T07:00:57Z

    #[serde(skip_serializing_if = "Option::is_none")]
    pub websocket_reconnection_interval: Option<String>,
+
+    /// The optional per-request timeout for send/receive operations


worth a note here that request_timeout values "0", "unlimited", "disabled", "none" all map to no timeout (infinite wait), since the is_zero() fast-path skips the timeout entirely. "0" reading as infinite rather than instant is the opposite of what most people expect.

github-actions Bot added the S-waiting-on-review PR is waiting on a reviewer label Jun 6, 2026

chengxilo force-pushed the client-timeout branch from d6c5688 to 6d48a9b Compare June 6, 2026 06:56

hubcio requested changes Jun 9, 2026

View reviewed changes

github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels Jun 9, 2026

chengxilo force-pushed the client-timeout branch from 8b9a947 to 005b84e Compare June 15, 2026 03:20

chengxilo requested a review from hubcio June 17, 2026 01:08

github-actions Bot added S-waiting-on-review PR is waiting on a reviewer and removed S-waiting-on-author PR is waiting on author response labels Jun 17, 2026

chengxilo added 8 commits June 20, 2026 18:04

feat(sdk): support per-request timeout to TCP/QUIC/WebSocket clients

80d4abc

chore(sdk): reduce request timeout from 300s to 30s

f6da496

fix: reset stream and VSR session on request timeout

35166e4

fix(sdk): avoid instant fail when request_timeout is zero

dd32824

fix(sdk): fix the hard-coded request_timeout in connection string

9ef1a20

fix(sdk): return error instead of panicking on invalid duration args

e847d7d

test: test the duration args validation

8378bd8

cargo fmt

b08a9a2

chengxilo force-pushed the client-timeout branch from 3c5bfab to b08a9a2 Compare June 20, 2026 22:22

github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels Jun 21, 2026

chengxilo added 2 commits June 21, 2026 03:14

fix(sdk): use request_timeout instead of inner VSR timeout

ee779fc

fix(sdk): change client to Disconnected state on RequestTimeout

06588b9

github-actions Bot added S-waiting-on-review PR is waiting on a reviewer and removed S-waiting-on-author PR is waiting on author response labels Jun 21, 2026

Merge branch 'master' into client-timeout

66e9b09

hubcio requested changes Jun 23, 2026

View reviewed changes

github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels Jun 23, 2026

Merge branch 'master' into client-timeout

1c827e8

Uh oh!

Conversation

chengxilo commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR address?

Rationale

What changed?

Local Execution

AI Usage

Change Log

Uh oh!

github-actions Bot commented Jun 6, 2026

Uh oh!

codecov Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chengxilo commented Jun 17, 2026

Uh oh!

chengxilo commented Jun 21, 2026

Uh oh!

chengxilo commented Jun 21, 2026

Uh oh!

chengxilo commented Jun 21, 2026

Uh oh!

hubcio left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hubcio Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

hubcio Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

hubcio Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

hubcio Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

hubcio Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

hubcio Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

hubcio Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chengxilo commented Jun 6, 2026 •

edited

Loading

codecov Bot commented Jun 6, 2026 •

edited

Loading

hubcio left a comment •

edited

Loading