test(client_server): isolate parallel tests via passive servers + dir… by JustinKovacich · Pull Request #84 · luminartech/simple_someip

JustinKovacich · 2026-04-24T20:45:34Z

…ect subscriber registration

The integration tests in tests/client_server.rs failed intermittently under cargo test's default parallel execution — 6 of 11 tests flaked with the exact subset varying run-to-run. Each test passed cleanly in isolation, which pointed at a shared-resource problem rather than a real concurrency bug inside the tests.

Root cause is structural, not application-level:

The SOME/IP Service Discovery port 30490 is spec-fixed, and the production SD binding uses SO_REUSEADDR + SO_REUSEPORT so that multiple endpoints on the same host can coexist.
Under cargo test, every Server::new and every Client discovery socket joins the same SO_REUSEPORT group at 0.0.0.0:30490. The Linux kernel hash-distributes each incoming SD datagram to exactly one socket in the group. A Subscribe sent by Test-A's Client can therefore be delivered to Test-B's Server, which filters it out by service-id and drops it. Test-A then times out waiting for a subscriber that the kernel routed elsewhere.
Unique per-test service_ids alone do not help — they let the wrong server discard the packet cleanly, but the right server still never sees it.

The permanent fix has three parts, all test-side:

Every Server is now built via Server::new_passive. Passive servers bind SD to an ephemeral port instead of 30490 and are therefore not in the SO_REUSEPORT group. This eliminates the cross-test SD delivery lottery entirely. The crate already documents this as the supported pattern for consumers who own their own SD dispatcher (see Server::new_passive rustdoc).
Subscriber bookkeeping is done directly via EventPublisher::register_subscriber, bypassing the SD handshake. A new bind_and_register_client helper binds the client's unicast socket on a test-specific port (via add_endpoint + a throwaway send_to_service, which exercises the unicast-bind side effect without touching the discovery socket) and then registers the bound address on the publisher. No test client joins the 30490 group anymore except the two tests whose assertion is specifically about SD lifecycle.
A recv_next_unicast helper wraps updates.recv() for tests that do still bind discovery (the two SD-lifecycle tests), filtering out any DiscoveryUpdated cross-talk that leaks from other parallel tests' SD traffic. The filter is cheap and deterministic, and keeps the SD-exercising tests meaningful.

Secondary hygiene: each test now carries a unique SERVICE_ID const (0x5B01..0x5B0A) and a unique CLIENT_PORT const. Service-id uniqueness is defense-in-depth against any residual cross-talk; unicast-port uniqueness is required because UnicastSocket sets only SO_REUSEADDR (not SO_REUSEPORT) and colliding binds would fail.

Result: 10 consecutive clean parallel runs of
cargo test --all-features --test client_server
with 11/11 passing. Runtime also drops from ~2.3s (6-failure flake) to ~0.15s because passive servers skip the SD-group bind round-trip and tests no longer wait on wait_for_subscribers. Full suite (lib 409, integration 11, doc 3) all green.

Tests now passing reliably in parallel:

test_add_endpoint_and_send_to_service
test_client_request_resolves_via_unicast_reply
test_client_server_subscribe_and_receive_event
test_e2e_protect_on_publish_and_check_on_receive
test_multiple_subscribers_receive_events
test_subscribe_auto_binds_discovery
(the other five tests also updated for consistency; they were passing before but only by luck)

…ect subscriber registration The integration tests in tests/client_server.rs failed intermittently under `cargo test`'s default parallel execution — 6 of 11 tests flaked with the exact subset varying run-to-run. Each test passed cleanly in isolation, which pointed at a shared-resource problem rather than a real concurrency bug inside the tests. Root cause is structural, not application-level: * The SOME/IP Service Discovery port 30490 is spec-fixed, and the production SD binding uses `SO_REUSEADDR` + `SO_REUSEPORT` so that multiple endpoints on the same host can coexist. * Under `cargo test`, every `Server::new` and every Client discovery socket joins the same `SO_REUSEPORT` group at `0.0.0.0:30490`. The Linux kernel hash-distributes each incoming SD datagram to exactly one socket in the group. A `Subscribe` sent by Test-A's Client can therefore be delivered to Test-B's Server, which filters it out by service-id and drops it. Test-A then times out waiting for a subscriber that the kernel routed elsewhere. * Unique per-test `service_id`s alone do not help — they let the *wrong* server discard the packet cleanly, but the *right* server still never sees it. The permanent fix has three parts, all test-side: 1. Every Server is now built via `Server::new_passive`. Passive servers bind SD to an ephemeral port instead of 30490 and are therefore not in the `SO_REUSEPORT` group. This eliminates the cross-test SD delivery lottery entirely. The crate already documents this as the supported pattern for consumers who own their own SD dispatcher (see `Server::new_passive` rustdoc). 2. Subscriber bookkeeping is done directly via `EventPublisher::register_subscriber`, bypassing the SD handshake. A new `bind_and_register_client` helper binds the client's unicast socket on a test-specific port (via `add_endpoint` + a throwaway `send_to_service`, which exercises the unicast-bind side effect *without* touching the discovery socket) and then registers the bound address on the publisher. No test client joins the 30490 group anymore except the two tests whose assertion is specifically about SD lifecycle. 3. A `recv_next_unicast` helper wraps `updates.recv()` for tests that do still bind discovery (the two SD-lifecycle tests), filtering out any `DiscoveryUpdated` cross-talk that leaks from other parallel tests' SD traffic. The filter is cheap and deterministic, and keeps the SD-exercising tests meaningful. Secondary hygiene: each test now carries a unique `SERVICE_ID` const (0x5B01..0x5B0A) and a unique `CLIENT_PORT` const. Service-id uniqueness is defense-in-depth against any residual cross-talk; unicast-port uniqueness is required because `UnicastSocket` sets only `SO_REUSEADDR` (not `SO_REUSEPORT`) and colliding `bind`s would fail. Result: 10 consecutive clean parallel runs of `cargo test --all-features --test client_server` with 11/11 passing. Runtime also drops from ~2.3s (6-failure flake) to ~0.15s because passive servers skip the SD-group bind round-trip and tests no longer wait on `wait_for_subscribers`. Full suite (lib 409, integration 11, doc 3) all green. Tests now passing reliably in parallel: * test_add_endpoint_and_send_to_service * test_client_request_resolves_via_unicast_reply * test_client_server_subscribe_and_receive_event * test_e2e_protect_on_publish_and_check_on_receive * test_multiple_subscribers_receive_events * test_subscribe_auto_binds_discovery * (the other five tests also updated for consistency; they were passing before but only by luck) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

codecov-commenter · 2026-04-24T20:46:52Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

@@            Coverage Diff             @@
##             main      #84      +/-   ##
==========================================
+ Coverage   96.55%   96.68%   +0.12%     
==========================================
  Files          30       30              
  Lines        7993     7993              
==========================================
+ Hits         7718     7728      +10     
+ Misses        275      265      -10

see 3 files with indirect coverage changes

Copilot

Pull request overview

Stabilizes tests/client_server.rs under cargo test parallel execution by avoiding shared SOME/IP-SD port (30490) contention and reducing SD cross-talk between concurrently running tests.

Changes:

Switches integration tests to construct servers via Server::new_passive (ephemeral SD port) to avoid SO_REUSEPORT SD datagram hash distribution across tests.
Adds helpers to bind client unicast deterministically and register subscribers directly (bind_and_register_client), bypassing the SD subscribe handshake in most tests.
Adds recv_next_unicast helper to filter out stray DiscoveryUpdated events in tests that still bind discovery.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-24T20:48:17Z

+    client
+        .add_endpoint(service_id, instance_id, server_addr, client_port)
+        .await
+        .expect("add_endpoint failed");
+    let dummy = Message::<RawPayload>::new_sd(0x0001, &empty_sd_header());
+    let _ = client.send_to_service(service_id, instance_id, dummy).await;
+    let client_addr = SocketAddrV4::new(Ipv4Addr::LOCALHOST, client_port);
+    publisher
+        .register_subscriber(service_id, instance_id, event_group_id, client_addr)
+        .await;


bind_and_register_client ignores the result of the dummy send_to_service call. If unicast binding fails (e.g., port unexpectedly in use) or the UDP send fails, the helper will still register the subscriber and the test will later time out in a much less actionable way. Capture the returned PendingResponse and assert the call succeeded (then drop it) so failures surface immediately and ensure the unicast socket is actually bound before registering the subscriber.

Copilot · 2026-04-24T20:48:17Z

+/// (e.g. tests whose assertion is about the SD lifecycle itself). The
+/// outer `Result` is the timeout outcome; the inner `Option` is the
+/// stream state. Returns `None` if the stream closes.


The doc comment for recv_next_unicast says the function returns an outer Result for the timeout outcome, but the implementation uses tokio::time::timeout(...).await.expect(...) and therefore panics on timeout instead of returning it. Please update the docstring to match the actual behavior, or change the function signature to return a Result<Option<ClientUpdate<_>>, Elapsed> and let callers decide how to handle timeouts.

Suggested change

/// (e.g. tests whose assertion is about the SD lifecycle itself). The

/// outer `Result` is the timeout outcome; the inner `Option` is the

/// stream state. Returns `None` if the stream closes.

/// (e.g. tests whose assertion is about the SD lifecycle itself).

/// Returns `None` if the stream closes, and panics if `timeout`

/// elapses before a non-`DiscoveryUpdated` update arrives.

Copilot · 2026-04-24T20:48:18Z

+//! Test-A's Client can land on Test-C's Server socket, Test-C filters
+//! it out by service-id, and Test-A's Server never registers the
+//! subscriber — the test then times out waiting for a subscriber that
+//! was delivered to the wrong process.


The module-level docs describe packets being delivered to the wrong "process", but these integration tests run within a single test process (multiple threads/tasks). It would be more accurate to describe delivery to the wrong socket/server instance (or "test") to avoid confusion when diagnosing SD routing behavior under cargo test parallelism.

Suggested change

//! was delivered to the wrong process.

//! was delivered to the wrong server socket/test instance.

Copilot AI review requested due to automatic review settings April 24, 2026 20:45

Copilot started reviewing on behalf of JustinKovacich April 24, 2026 20:45 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(client_server): isolate parallel tests via passive servers + dir…#84

test(client_server): isolate parallel tests via passive servers + dir…#84
JustinKovacich wants to merge 1 commit into
mainfrom
fix/client_server_test_fixture_collision

JustinKovacich commented Apr 24, 2026

Uh oh!

codecov-commenter commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	//! was delivered to the wrong process.
	//! was delivered to the wrong server socket/test instance.

Conversation

JustinKovacich commented Apr 24, 2026

Uh oh!

codecov-commenter commented Apr 24, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants