feat(cli): Add standalone --server and --client modes by sberg-rh · Pull Request #110 · redhat-performance/rusty-comms

sberg-rh · 2026-03-26T16:39:51Z

Add the ability to run the benchmark server and client as independent processes, enabling cross-environment IPC testing (e.g., host and container).

Add --server flag to start a standalone server that listens for client connections using the specified IPC mechanism
Add --client flag to connect to a running server and execute the benchmark workload with retry logic (100ms backoff, 30s timeout)
Promote --socket-path, --shared-memory-name, and --message-queue-name from hidden internal flags to user-facing under "Standalone Mode"
Support both async (Tokio) and blocking (std) execution modes
Default transport endpoints work without extra flags for simple usage
Add 11 tests for CLI flag parsing and transport config building
Update examples and doctest to include new Args fields

Relates to #11

Description

Brief description of changes

Type of Change

Bug fix
New feature
Breaking change
Documentation update

Testing

Tests pass locally
Added tests for new functionality
Updated documentation

Checklist

Code follows style guidelines
Self-review completed
Comments added for complex code
Documentation updated
No breaking changes (or marked as breaking)

dustinblack

Good first step toward #11. The scope is right — standalone server/client as independent processes without touching transport internals. A few things to think about as this develops:

No integration with existing metrics/reporting. The standalone client does its own ad-hoc stats via info! logging (mean/P50/P95/P99) rather than using ResultsManager/MetricsCollector. This means no streaming output, no JSON/CSV, no integration with the reporting pipeline or dashboard. Fine for an initial draft, but worth planning how to close that gap.
Blocking/async code duplication. The blocking and async server/client paths are nearly identical except for await. Consider whether a shared structure or helper could reduce the duplication — the current pattern will be a maintenance burden as features are added.
Two server modes. --server (new, user-facing) and --internal-run-as-server (existing, hidden) now coexist. The distinction is clear in code but worth documenting so users don't stumble into the internal flag, and so future contributors understand when to use which.
Duration mode not supported. The standalone client only supports message-count mode (-i). If someone passes -d, it will silently fall back to the default message count. Should either support it or reject it with a clear error.

Overall the approach is clean and well-tested. Nice to see 5 integration tests covering round-trip, one-way, ping/pong, retry, and shutdown.

mcurrier2 · 2026-03-31T17:21:59Z

Empty message count panic — If -i 0 or similar produces zero messages, the round-trip percentile computation will panic on an empty vector (latencies[msg_count / 2]). -i 0 can panic in round-trip summary due to divide/index operations on empty latency vectors.
Add explicit validation for msg_count >= 1 in count mode, and robust percentile guards for tiny sample sets.

main.rs:
latencies.sort();
let total: std::time::Duration = latencies.iter().sum();
let mean = total / msg_count as u32;
let p50 = latencies[msg_count / 2];
let p95 = latencies[(msg_count as f64 * 0.95) as usize];
let p99 = latencies[(msg_count as f64 * 0.99) as usize];

Ignored config options — concurrency, send_delay, include_first_message, percentiles, streaming/output_file are all silently ignored. Should either be supported or rejected.

No shutdown message — Client relies on transport close for server to detect disconnect. The internal server path uses explicit MessageType::Shutdown which is more deterministic.

main.rs:
transport.close_blocking()?;
info!("Standalone client finished.");

Duration mode is not honored in standalone client paths:
BenchmarkConfig supports duration by setting msg_count = None, but standalone client code falls back to default count.
main.rs:
let msg_count = config
.msg_count
.unwrap_or(ipc_benchmark::defaults::MSG_COUNT);

When -d 5s is passed, BenchmarkConfig::from_args sets msg_count = None (duration takes precedence). The standalone client then falls back to the default count via unwrap_or. The config.duration field is never read anywhere in the standalone paths. So --client -d 5s -m tcp --blocking silently runs the default message count instead of a 5-second timed test.

Heap allocation per message in measurement loop
main.rs:
for i in 0..msg_count {
let msg = Message::new(i as u64, payload.clone(), MessageType::OneWay);
transport.send_blocking(&msg)?;

payload.clone() allocates a new Vec on every iteration. Per the project's own guidelines ("No allocations in measurement loops"), this adds allocation overhead to every measurement sample. The same pattern appears in the round-trip loop and both async variants. This will skew latency numbers, particularly for small messages where allocation cost is proportionally large.

Standalone client bypasses existing reporting pipeline (ResultsManager/streaming JSON/CSV), unlike established benchmark flows

Test coverage claim mismatch: this branch diff does not add standalone-specific CLI/integration tests; new flags/paths need explicit coverage.

--server / --client flag parsing
--server + --client conflict
build_standalone_transport_config()
Endpoint override defaults (socket_path, shared_memory_name, message_queue_name)
Duration mode in standalone client
connect_blocking_with_retry / connect_async_with_retry
shutdown behavior and retry timeout behavior - shutdown message from client

When I asked to compare against C2C branch:

lacking server-side latency file support
Integration with the full benchmark/results pipeline
--cross-container SHM auto-detection in C2C

sberg-rh · 2026-04-09T16:48:58Z

Normal vs Standalone Benchmark Comparison

Manual comparison of benchmark results between normal mode (main branch) and standalone client/server mode (this PR). All tests run locally on the same machine with 1000 warmup iterations.

TCP Round-Trip (10000 msgs, 1024 bytes)

Metric	Normal (main)	Standalone	Diff
Mean	18.93 us	16.92 us	-11%
P95	33.70 us	19.71 us	-41%
P99	51.81 us	22.25 us	-57%
Min	10.92 us	11.75 us	+8%
Max	80.58 us	37.33 us	-54%
Throughput	50,933 msg/s	59,550 msg/s	+17%

UDS Round-Trip (10000 msgs, 1024 bytes)

Metric	Normal (main)	Standalone	Diff
Mean	8.52 us	9.56 us	+12%
P95	10.54 us	11.75 us	+11%
P99	11.67 us	13.38 us	+15%
Min	5.04 us	5.00 us	-1%
Max	53.83 us	23.79 us	-56%
Throughput	111,590 msg/s	104,240 msg/s	-7%

TCP Large Payload Round-Trip (5000 msgs, 8192 bytes)

Metric	Normal (main)	Standalone	Diff
Mean	28.24 us	25.69 us	-9%
P95	51.81 us	31.21 us	-40%
P99	61.82 us	36.06 us	-42%
Throughput	274,100 msg/s	315,530 msg/s	+15%

TCP One-Way (10000 msgs, 1024 bytes)

Metric	Normal (main)	Standalone
Throughput	126,180 msg/s	196,060 msg/s

Note: Standalone one-way reports throughput only on client side; server reports latency separately. Normal mode measures server-side latency via latency file.

Summary

Mean latencies are within ~10-15% of each other, which is expected given different process spawning models
Standalone mode shows better tail latency (P95/P99) in most cases, likely due to message struct reuse (no per-message heap allocation)
Throughput is comparable or better in standalone mode
Results confirm standalone mode is a valid alternative for benchmarking within reasonable tolerance

sberg-rh · 2026-04-09T16:55:24Z

PR Summary

This PR adds standalone --server and --client modes to enable running the benchmark server and client as independent processes, supporting cross-environment IPC testing (container-to-container, container-to-host). Relates to #11.

Features

--server / --client CLI flags with mutual exclusivity, under a "Standalone Mode" help heading
Multi-accept server for TCP and UDS (one handler thread per client connection)
Concurrency support (-c N) with automatic SHM/PMQ fallback to concurrency=1
Connection retry with 100ms backoff and 30s timeout
Duration (-d) and message-count (-i) modes
send_delay, include_first_message, and percentiles options supported
Explicit MessageType::Shutdown for deterministic server exit

Reporting

Full ResultsManager/MetricsCollector integration: JSON output (-o), streaming CSV/JSON, and console summary with HDR histogram percentiles
Server-side one-way latency measurement using shared monotonic clock (accurate for same-host and container scenarios)
Per-message heap allocation eliminated in measurement loops

Transport

BlockingTcpSocket::from_stream() and BlockingUnixDomainSocket::from_stream() for per-client handler threads in multi-accept servers
Fixed non-blocking stream inheritance from listeners in both blocking and async multi-accept paths

Code Quality

Shared helpers: dispatch_server_message(), effective_concurrency(), handle_client_connection(), aggregate_and_print_server_metrics(), print_server_one_way_latency()
SERVER_ACCEPT_GRACE_PERIOD and CONNECT_RETRY_* constants
40 binary tests covering: round-trip, one-way, ping/pong, duration mode, concurrency, retry, shutdown, large payload integrity, canary filtering, mixed message types, from_stream, metrics aggregation, and async multi-accept
OS-assigned test ports via get_free_port() to prevent conflicts

Benchmark Comparison

Normal mode vs standalone mode results are within ~10-15% mean latency tolerance, with standalone often showing better tail latency (see comparison tables in PR comments).

Deferred Items

Two maintainability concerns were raised during review: the standalone logic adding ~2800 lines to src/main.rs (now ~3900 total), and blocking/async client code duplication. These are structural issues, not correctness concerns — the feature works and is well-tested. Extracting into a src/standalone.rs module and reducing client duplication (potentially using the spawn_blocking + shared handler pattern from the server side) would improve navigability, but we felt it was better to land the functional feature and iterate on structure separately rather than gate the PR on a large refactor. Open to discussion if the reviewers prefer to address these before merge.

dustinblack

Good progress on the implementation — CI is green, the benchmark comparison shows comparable performance, and the integration tests for blocking TCP paths are solid.

However, Matt's review from Mar 31 raised several correctness issues that still appear unaddressed. These need to be resolved before this can be approved:

Panic on -i 0: latencies[msg_count / 2] will panic on an empty vector. Need either input validation rejecting msg_count < 1 or a guard before the percentile computation.
Duration mode silently ignored: --client -d 5s falls back to the default message count via unwrap_or. The config.duration field is never read in the standalone paths. This should either be supported or rejected with a clear error.
payload.clone() in measurement loop: Allocates a new Vec<u8> on every iteration. For a benchmarking tool, this adds measurable overhead, especially for small messages. The existing benchmark runner avoids this — standalone should too.
Silently ignored config options: concurrency, send_delay, include_first_message, percentiles, streaming-output-*, and output-file are all accepted but do nothing. These should either be wired up or rejected so users aren't misled by silent no-ops.
No shutdown message: Client relies on transport close for the server to detect disconnect. The internal server path uses explicit MessageType::Shutdown which is more reliable. Worth aligning these.
Coverage at 7.3%: Almost all standalone code is uncovered. The integration tests exercise the transport layer directly but don't go through run_standalone_server/run_standalone_client. Consider whether the tests added actually cover the code paths being introduced.

sberg-rh · 2026-04-13T13:25:11Z

Thanks for the review! I want to make sure we're looking at the same code — these items were all addressed in commits after mcurrier2's original feedback on March 31. Could you confirm you're reviewing the latest state of the branch (HEAD at c6f99c72)?

Here's where each item was addressed:

#	Concern	Status	Commit
1	Panic on `-i 0`	Fixed — percentile computation now uses MetricsCollector (HDR histogram), old manual percentile code removed	`fd6bb272` (ResultsManager integration)
2	Duration mode silently ignored	Fixed — `config.duration` read and used in all standalone client paths (blocking/async, single/concurrent)	`3c9ed88`
3	`payload.clone()` in loop	Fixed — Message reused with in-place `id`/`timestamp` updates, only initial creation and warmup canaries clone	`80377f8b`
4	Silently ignored config options	Fixed — `concurrency`, `send_delay`, `include_first_message`, `percentiles`, `streaming-output-*`, `output-file` all wired up	Various commits
5	No shutdown message	Fixed — explicit `MessageType::Shutdown` sent before close in all client paths	`642e6a9a`
6	Coverage at 7.3%	Known limitation — tarpaulin doesn't instrument binary crate (`main.rs`). 40 tests exist and pass but the coverage tool can't measure them. `main.rs` shows ~16% coverage across all PRs in this repo for the same reason.

Happy to walk through any of these in more detail if needed.

dustinblack

First — apologies for the previous review. I reviewed against stale code and repeated issues that had already been addressed. That's on me. This review is against the current HEAD (c6f99c72).

The PR has come a long way — duration mode, concurrency, metrics integration, message reuse, shutdown messages, and streaming output are all properly wired up. 36 tests pass. Here are the remaining items:

High: Code duplication

The blocking/async client and server paths are near-identical copies — roughly 8 variants of similar measurement/server loops. Any future bug fix needs replication across all of them. This also ties into the coverage issue below.

High: Coverage and code placement

The 7.3% changed-line coverage is not purely a tarpaulin limitation — it's because ~800 lines of new logic live in main.rs (binary crate) rather than in the library. handle_client_connection, dispatch_server_message, effective_concurrency, build_standalone_transport_config, aggregate_and_print, and the measurement loops could all live in lib.rs or a new module exposed through it. This would:

Let tarpaulin instrument the code, making the 36 existing tests show up in coverage reports
Enable integration tests in tests/ to exercise the standalone logic
Reduce the blocking/async duplication by sharing testable core logic

Given that development on this project is heavily AI-assisted, measurable test coverage is especially important as a verification mechanism. I'd like to see the core logic moved to the library crate.

Medium items

Server one-way latency includes deserialization. get_monotonic_time_ns() is called after receive_blocking() returns, so bincode deserialization time is included in the measurement. For large payloads this is non-trivial.
Concurrent mode grace period bug. The 2-second SERVER_ACCEPT_GRACE_PERIOD could expire between the one-way and round-trip test phases, causing the server to exit mid-test when both tests are enabled with concurrency > 1.
--quiet flag not honored. Standalone paths always init tracing to stderr.
Mutex unwrap() in multi-threaded handlers. If a handler thread panics while holding the metrics lock, all other threads cascade-panic on the poisoned mutex.
--shm-direct auto-blocking not applied. The --shm-direct → --blocking auto-enable happens after the standalone dispatch branch, so --client --shm-direct without --blocking hits the async path incorrectly.

Low items

from_stream() bypasses socket buffer tuning that normal start_server_blocking applies — could cause measurement differences.
Integer division drops remainder messages in concurrent mode (msg_count / concurrency).
Response messages always have empty payloads — asymmetric round-trip. Worth documenting whether this is intentional.

…ting Add the ability to run the benchmark server and client as independent processes, enabling cross-environment IPC testing (e.g., host and container). Relates to #11. Standalone mode features: - --server flag starts a server that listens for client connections - --client flag connects to a running server with retry logic (100ms backoff, 30s timeout) - Both async (Tokio) and blocking (std) execution modes supported - Duration (-d) and message-count (-i) modes both supported - Default transport endpoints work without extra flags - Endpoint flags (--socket-path, --shared-memory-name, --message-queue-name) promoted to user-facing Reporting integration: - Full ResultsManager/MetricsCollector integration for structured output (JSON, streaming CSV, console summary with HDR percentiles) - Server-side one-way latency measurement using monotonic clock (accurate for same-host and container scenarios) - Round-trip latency with per-message streaming support Code quality: - Shared helpers: dispatch_server_message(), retry constants - 25 tests covering CLI parsing, transport config, server dispatch, connection retry, shutdown, duration mode, one-way, round-trip - Explicit MessageType::Shutdown on client disconnect Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- test_standalone_blocking_tcp_one_way: verify server received exact message count with correct sequential IDs, add shutdown message - test_standalone_blocking_tcp_duration_round_trip: verify response IDs match requests, assert count > 10 for 200ms test, add shutdown - test_standalone_blocking_tcp_duration_one_way: verify server received exact count with sequential IDs, assert count > 10 for 200ms test - test_concurrency_forced_to_one_for_shm: test actual concurrency forcing logic instead of just CLI parsing - test_standalone_concurrent_tcp_one_way: assert exact message count per handler instead of just "greater than zero" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Clean up garbled doc comment on async concurrent test (editing artifacts from multiple rewrites) - Replace silent panic swallowing in async multi-accept servers: try_join_next().transpose() silently dropped JoinErrors from panicked handler tasks. Now logs warnings via warn!(). - Extract effective_concurrency() helper to deduplicate the concurrency-forcing logic (was copied in blocking client, async client, and test). Test now calls the actual helper instead of reimplementing the logic inline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- test_standalone_large_payload_integrity: 4KB payloads with recognizable byte pattern, server echoes back, client verifies content byte-for-byte to catch corruption - test_handle_client_connection_filters_canary: verifies warmup canary messages (id=u64::MAX) are excluded from one-way metrics - test_handle_client_connection_mixed_message_types: interleaved OneWay and Request messages on a single connection, verifies correct metrics recording and response dispatch - test_aggregate_and_print_multiple_collectors: aggregation across 2 collectors with different latency distributions - test_effective_concurrency_all_mechanisms: covers UDS, PMQ, SHM, TCP, and concurrency=1 edge case Total binary tests: 40. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Accepted TCP/UDS streams inherit non-blocking mode from the listener (set for the accept poll loop). The handler threads need blocking mode for the transport's read_exact/write_all operations. This is the blocking-server equivalent of the async into_std fix in commit 8723429. Without this fix, standalone server handlers immediately disconnect from clients. Applies to both run_standalone_server_blocking_multi_accept_tcp and _uds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix grace period bug: reset timer on every new connection, not just the first. Prevents premature server exit between one-way and round-trip test phases when using concurrency > 1. Applied to all four multi-accept servers (blocking TCP/UDS, async TCP/UDS). - Honor --quiet flag in standalone server and client. When set, suppress all tracing output to stderr. - Handle poisoned mutex gracefully: use unwrap_or_else(|e| e.into_inner()) instead of unwrap() on mutex locks. If a handler thread panics while holding the lock, other threads can still push their metrics instead of cascade-panicking. - Add defensive --shm-direct guard in standalone server and client: returns error if --shm-direct is used without --blocking. This is normally enforced by main() but the guard protects against future refactoring that might change the dispatch order. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix grace period bug: reset timer on every new connection, not just the first. Prevents premature server exit between one-way and round-trip test phases when using concurrency > 1. Applied to all four multi-accept servers (blocking TCP/UDS, async TCP/UDS). - Honor --quiet flag in standalone server and client. When set, suppress all tracing output to stderr. - Handle poisoned mutex gracefully: use unwrap_or_else(|e| e.into_inner()) instead of unwrap() on mutex locks in handler threads. - Add defensive --shm-direct guard in standalone server and client. - Add socket buffer tuning (recv/send buffer sizes) to multi-accept TCP servers to match normal transport behavior. - Fix integer division remainder: last worker now receives any extra messages when msg_count is not evenly divisible by concurrency. - Document empty response payloads as intentional design matching existing benchmark runner behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add receive_blocking_timed() to the BlockingTransport trait that captures a monotonic timestamp after raw bytes are read but before bincode deserialization. This excludes deserialization overhead from one-way latency measurements. - Add default implementation on BlockingTransport trait (backward compatible -- captures timestamp after full receive) - Override in TCP, UDS, and SHM blocking transports to place timestamp between raw I/O read and deserialization - SHM-direct uses default (no bincode deserialization to exclude) - Update handle_client_connection and standalone single-client server to use receive_blocking_timed Impact is most visible with large payloads where deserialization is non-trivial. 64KB one-way TCP test shows min latency dropped from 41us (post-deserialize) to 14us (pre-deserialize), a ~27us improvement representing the bincode deserialization time excluded from measurement. Mean dropped 28% (73us to 52us) and P99 dropped 14% (132us to 113us). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

PR #105 (Fix/streaming timestamps) added send_timestamp_ns parameter to MessageLatencyRecord::new(). Update all 4 call sites in standalone client to capture wall-clock timestamp at send time and pass it as the 6th argument. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move ~3100 lines of standalone client/server code from the binary crate (main.rs) into two new library modules, following the existing flat-file convention (benchmark.rs/benchmark_blocking.rs pattern). Structure: - standalone_server.rs (1982 lines): constants, shared helpers, server dispatch, multi-accept TCP/UDS, async server paths - standalone_client.rs (1146 lines): retry helpers, client dispatch, single/concurrent blocking and async paths - main.rs reduced from ~4200 to ~1120 lines (thin dispatch layer) Additional changes: - Promote logging.rs from binary-private to library-public module - Move set_affinity() to utils.rs as pub function - All standalone functions now pub for tarpaulin coverage measurement and integration test access No behavioral changes. All 374 tests pass. Benchmark comparison across 3 runs confirms no performance regression (mean latencies within 2-5% run-to-run variance). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Socket configuration failures (set_nonblocking, set_nodelay) in multi-accept servers now log a warning and skip the bad connection instead of crashing the entire server with ? - Thread join panics in blocking multi-accept servers now logged with warn! instead of silently dropped with let _ = - Streaming latency record failures in client now logged with debug! instead of silently swallowed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dustinblack

Thanks for the quick turnaround, Shawn — I can see the grace period, quiet flag, mutex safety, shm-direct guard, socket buffer tuning, and remainder message fixes all made it into the current code. Well done.

One remaining issue: coverage. Now that the standalone logic lives in library modules, tarpaulin can instrument it — so the 1.29% (standalone_client.rs) and 11.59% (standalone_server.rs) coverage isn't a tarpaulin limitation anymore. The helper functions are well-tested (build_standalone_transport_config, dispatch_server_message, connect_blocking_with_retry, handle_client_connection, aggregate_and_print_server_metrics, effective_concurrency), but none of the orchestration functions are called from tests:

run_standalone_server_blocking / _single / _multi_accept_tcp / _multi_accept_uds
run_standalone_client_blocking_single / _concurrent
The async variants of all of the above

These contain the warmup logic, duration mode, metrics finalization, results output, concurrent worker spawning, and the grace period accept loop. Adding a few integration tests that call these entry points end-to-end (e.g., spawn run_standalone_server_blocking_single in a thread, run run_standalone_client_blocking_single against it with a small message count) should bring coverage up substantially and would give us confidence in the orchestration code — especially important given the AI-assisted development workflow on this project.

One minor note: response messages always use empty payloads (Vec::new()) regardless of request size, making round-trip measurements asymmetric. If that's intentional, a brief comment would help.

Server tests (standalone_server.rs, 82.7% coverage): - test_multi_accept_tcp_server_direct: exercises multi-accept TCP directly - test_single_server_direct: exercises blocking single-client directly - test_server_blocking_dispatch: exercises dispatch logic - test_server_blocking_dispatch_uds: exercises UDS dispatch branch - test_async_multi_accept_tcp_full: exercises async multi-accept TCP - test_async_single_server_path: exercises async single-client - test_async_single_server_one_way_metrics: async one-way metrics - test_async_multi_accept_uds_full: exercises async UDS multi-accept - test_multi_accept_server_with_delayed_client: slow sender resilience - test_multi_accept_server_duration_one_way: duration mode with multi-accept - test_async_multi_accept_server_duration_one_way: async duration mode - test_handle_client_connection_send_failure: client disconnect error path - test_single_server_client_disconnect: single server send error path - test_multi_accept_server_survives_bad_client: garbage input resilience - test_handle_client_connection_garbage_input: deserialization error path - test_run_standalone_server_full_dispatch: full entry point dispatch - test_run_standalone_server_rejects_all_via_dispatch: 'all' validation - test_run_standalone_server_rejects_shm_direct: shm-direct guard - test_run_standalone_server_verbose: -vv logging level branches - test_aggregate_server_metrics_from_handlers: real handler data - test_print_server_one_way_latency_with_data/zero: print paths Client tests (standalone_client.rs, 86.3% coverage): - test_client_blocking_tcp_round_trip/one_way: single client paths - test_client_blocking_tcp_duration_round_trip/one_way: duration mode - test_client_blocking_tcp_concurrent_round_trip/one_way: concurrent - test_client_async_single_round_trip/one_way: async single - test_client_async_duration_round_trip/one_way: async duration - test_client_async_concurrent_round_trip/one_way: async concurrent - test_client_blocking_with_send_delay: send_delay round-trip branch - test_client_blocking_one_way_with_send_delay: send_delay one-way branch - test_client_blocking_with_streaming_output: JSON streaming - test_client_blocking_combined_streaming: combined mode streaming - test_client_blocking_csv_streaming: CSV streaming - test_client_blocking_concurrent_duration_one_way: concurrent duration - test_client_async_concurrent_duration_one_way: async concurrent duration - test_run_standalone_client_full_dispatch: full entry point dispatch - test_run_standalone_client_rejects_all_via_dispatch: 'all' validation - test_run_standalone_client_rejects_shm_direct: shm-direct guard - test_connect_async_with_retry_succeeds: async retry path Also: changed tracing .init() to .try_init() with eprintln fallback in both server and client for test compatibility. Coverage: standalone_server 82.7%, standalone_client 86.3%, combined 84.8% Total lib tests: 355. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dustinblack

Coverage looks great — 83.44% changed-line coverage, with standalone_client.rs at 86% and standalone_server.rs at 84%. Nice work addressing all the review items.

Code-wise this is ready. Before I approve, I want to validate with hands-on testing:

Existing benchmark scenarios (non-standalone) still work as expected — regression check
Standalone --server / --client mode on the same host
Standalone mode across container boundaries (container-to-host, container-to-container)

I'll follow up once testing is complete.

Reduce the non-blocking accept loop sleep from 50ms to 5ms in both TCP and UDS multi-accept servers. This cuts connection acceptance latency by 10x with no portability concerns. Discovered during hands-on validation testing of standalone concurrent mode, where the 50ms polling interval was the primary contributor to elevated tail latency under multi-client workloads. Improvement with -c 4 concurrent clients: - RT P95: -46% (65.9us -> 35.5us) - RT P99: -49% (91.4us -> 46.9us) - Throughput: +66% (94.9 -> 157.1 MB/s) Single-client workloads also benefit from faster initial connection acceptance (P99 improved 4-7% across all test modes). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sberg-rh · 2026-04-16T19:34:58Z

@dustinblack just a heads up, I found a performance issue during some hands on testing that I pushed a small fix for in case you're doing your own hands on testing :)

sberg-rh · 2026-04-20T21:10:16Z

I did some validation testing of my own before moving to the more in depth customer testing, here are my results:

PR #110 Validation Testing Summary

Scenario 1: Regression Check (PR #110 vs Main)

Method: 100 paired runs, TCP Async/UDS Async/TCP Blocking, 5,000 messages each, alternating branch order.

Test	main	PR #110	Delta	95% CI	Statistically significant?
TCP Async RT Mean	23.20 us	23.48 us	+1.2%	[+0.16, +0.39] us	Detectable but negligible (0.3us)
TCP Async RT P99	32.07 us	32.26 us	+0.6%	[+0.04, +0.34] us	Detectable but negligible
UDS Async RT Mean	12.69 us	12.73 us	+0.3%	[-0.11, +0.20] us	No
UDS Async RT P99	17.95 us	17.92 us	-0.2%	[-0.22, +0.16] us	No
TCP Blocking RT Mean	17.69 us	17.69 us	-0.0%	[-0.12, +0.11] us	No
TCP Blocking RT P99	24.12 us	24.06 us	-0.2%	[-0.44, +0.32] us	No

Verdict: No actionable regression. Outlier rates are equal across branches (42 vs 37 mean, 21 vs 25 P99).

Scenario 2: Standalone Same-Host

Functional tests: All 6 pass (TCP/UDS blocking, TCP async, duration mode, concurrency, output files).

Standalone c=1 vs Internal c=1 (25 runs):

Metric	Internal	Standalone	Delta
RT Mean	17.34 us	16.82 us	-3.0% (standalone faster)
RT P99	23.92 us	22.23 us	-7.1% (standalone faster)
Standalone won			24/25 runs

Standalone is measurably faster than internal mode due to avoiding child process spawn and pipe coordination overhead.

Concurrency note: Internal mode's -c flag is a no-op (workers run sequentially in both async and blocking paths). Standalone mode is the first real concurrency implementation in the codebase.

Accept loop optimization: Discovered and fixed a performance bottleneck -- reduced multi-accept polling from 50ms to 5ms. Impact with -c 4: RT P99 improved 49%, throughput improved 66%. Committed as 15f9c92a.

Scenario 3: Cross-Container (Podman)

Functional tests: All 3 pass.

Test	Topology	RT Mean	RT P99
3a	Host server, container client	164.43 us	225.79 us
3b	Container server, host client	155.14 us	208.90 us
3c	Container-to-container	40.68 us	50.53 us

Container-to-container is ~4x faster than cross-VM tests because both containers share the Podman VM's network stack.

Parity tests:

Test	Comparison	Mean Delta	Interpretation
P1	Container vs host (internal mode)	+108.7%	Expected -- macOS Podman VM overhead
P2	Container standalone vs host standalone	+145.9%	Consistent with P1 + network bridge hop
P3	PR #110 container vs main container	+0.6%	No regression between branches

The ~2x container overhead is a macOS-specific artifact from the Hypervisor VM layer. On Linux, container overhead is near-zero since containers share the host kernel's network stack.

Overall Conclusion

No performance regressions from PR feat(cli): Add standalone --server and --client modes #110 across any scenario
Standalone mode is slightly faster than internal mode for equivalent workloads
Cross-container communication works correctly across all topologies
One optimization shipped: Accept loop polling interval reduced from 50ms to 5ms, significantly improving concurrent client tail latency

mcurrier2 · 2026-04-23T15:36:23Z

PR #110 Review — New Issues Found

Branch: feat/standalone-client-server (HEAD: 15f9c92)
Reviewer: Matt Currier (AI-assisted)
Date: 2026-04-22

1. Infinite server hang when no client connects

Severity: Major
Files: src/standalone_server.rs lines 354–407, 466–507, 707–775,
819–877

In all four multi-accept paths (blocking TCP/UDS, async TCP/UDS), the
server uses a grace period that only starts when first_client_time is
set — which requires a successful accept(). If no client ever
connects, the grace timer never starts, and the loop never exits.

// standalone_server.rs:354
let mut first_client_time: Option<std::time::Instant> = None;
// ...
loop {
    // ... accept sets first_client_time on success ...
    // Line 405-406:
    let grace_elapsed = first_client_time
        .is_some_and(|t| t.elapsed() > accept_grace_period);
    if grace_elapsed && !handles.is_empty()
        && handles.iter().all(|h| h.is_finished())
    {
        break;  // never reached if no client connects
    }
}

This affects all four multi-accept functions:

run_multi_accept_tcp_blocking (line 354)
run_multi_accept_uds_blocking (line 466)
run_multi_accept_tcp_async (line 707)
run_multi_accept_uds_async (line 819)

Suggestion: Add an overall idle timeout (e.g. 60s with no
connections) or an optional --server-timeout flag so standalone
servers don't hang indefinitely when no client appears.

2. Swallowed canary and shutdown send errors

Severity: Major
File: src/standalone_client.rs

Canary sends (warmup discard messages) and shutdown sends silently
discard errors via let _ = transport.send_*(). If the canary fails,
measurement starts on a broken connection with no warning.

Production occurrences (non-test code):

Line	Context
246	`let _ = transport.send_blocking(&canary);` (one-way canary)
307	`let _ = transport.receive_blocking();` (round-trip canary response)
395	`let _ = transport.send_blocking(&shutdown);` (shutdown)
459	`let _ = transport.send_blocking(&canary);` (concurrent one-way canary)
490	`let _ = transport.send_blocking(&shutdown);` (concurrent shutdown)
543	`let _ = transport.receive_blocking();` (concurrent RT canary response)
581	`let _ = transport.send_blocking(&shutdown);` (concurrent RT shutdown)
739	`let _ = transport.send(&canary).await;` (async one-way canary)
800	`let _ = transport.receive().await;` (async RT canary response)
888	`let _ = transport.send(&shutdown).await;` (async shutdown)
890	`let _ = transport.close().await;` (async close — also swallowed)
948	`let _ = transport.send(&canary).await;` (async concurrent canary)
979	`let _ = transport.send(&shutdown).await;` (async concurrent shutdown)
980	`let _ = transport.close().await;` (async concurrent close)
1062	`let _ = transport.send(&shutdown).await;` (async concurrent RT shutdown)
1063	`let _ = transport.close().await;` (async concurrent RT close)

Suggestion: At minimum, log canary failures at warn! level. For
shutdown sends, debug! is acceptable since the server will exit on
disconnect anyway. Canary failures should either warn or abort the
measurement.

3. Concurrent mode does not stream per-message records

Severity: Major (silent data loss for users)
File: src/standalone_client.rs lines 429–598, 920–1065

Single-connection paths call results_manager.stream_latency_record()
on every message (lines 342, 377, 835, 870). However, the concurrent
worker paths never call stream_latency_record — workers use a
bare MetricsCollector in their thread/task closures without access to
the ResultsManager.

// standalone_client.rs:449-487 (concurrent one-way worker)
std::thread::spawn(move || -> Result<PerformanceMetrics> {
    let mut metrics = MetricsCollector::new(None, percentiles)?;
    // ...
    metrics.record_message(message_size, None)?;
    // ^^^ no stream_latency_record call anywhere in worker
    // ...
    Ok(metrics.get_metrics())
})

Same gap in all four concurrent paths:

Blocking one-way workers (line 449–487)
Blocking round-trip workers (line 531–584)
Async one-way workers (line 935–982)
Async round-trip workers (line 1016–1065)

Suggestion: Either pass an Arc<Mutex<BlockingResultsManager>> to
workers, or document that --streaming is not supported with
--concurrency > 1.

4. Extra mechanisms silently ignored

Severity: Minor
Files: src/standalone_server.rs line 145,
src/standalone_client.rs line 38

// standalone_server.rs:145
let mechanism = match args.mechanisms.first() {
    Some(&m) => m,
    None => return Err(anyhow::anyhow!("No IPC mechanism specified")),
};

If a user runs --server -m tcp uds, only tcp is used and uds is
silently dropped. The -m all case is caught later with a specific
error, but -m tcp uds gives no indication that the second mechanism
was ignored.

Suggestion: Add a warning or error when mechanisms.len() > 1 in
standalone mode.

5. Server receive errors silently exit the loop

Severity: Minor
File: src/standalone_server.rs lines 252, 290

// standalone_server.rs:252
while let Ok((message, receive_time_ns)) =
    transport.receive_blocking_timed()
{
    // Any Err exits this loop with no warning/info log
}

Unlike the async single-server path which logs receive failures, the
blocking handle_client_connection and handle_client_connection_multi
functions exit their receive loops silently on any error. A transport
failure is indistinguishable from a clean client disconnect in the logs.

Suggestion: Add at least a debug! log on the receive error before
exiting the loop, similar to the async paths.

6. Concurrent mode skips warmup iterations

Severity: Minor (metadata mismatch)
File: src/standalone_client.rs lines 408–631, 899–1089

Single-connection paths run an explicit warmup loop using
config.warmup_iterations (blocking line 225, async line 718):

// standalone_client.rs:224-236 (single-connection blocking)
// Warmup
for _ in 0..config.warmup_iterations {
    let msg_type = if config.round_trip {
        MessageType::Request
    } else {
        MessageType::OneWay
    };
    let msg = Message::new(u64::MAX, payload.clone(), msg_type);
    transport.send_blocking(&msg)?;
    if config.round_trip {
        transport.receive_blocking()?;
    }
}
info!("Warmup complete ({} iterations)", config.warmup_iterations);

Concurrent paths (run_standalone_client_blocking_concurrent line 408,
run_standalone_client_async_concurrent line 899) never run this
warmup loop. Workers only optionally send a single canary when
!include_first_message. However, BenchmarkResults::new() still
receives config.warmup_iterations (lines 611, 1086), so the reported
metadata claims warmup was performed when it was not.

Suggestion: Either run warmup per-worker in concurrent mode, or
pass warmup_iterations: 0 to BenchmarkResults::new() for
concurrent runs.

7. Two-phase reconnection in concurrent mode

Severity: Minor (design consideration)
File: src/standalone_client.rs lines 429–598

When both one_way and round_trip are enabled with concurrency > 1,
the client runs two sequential phases: all N workers connect, run
one-way, send shutdown, and disconnect; then all N workers reconnect
for round-trip, run it, send shutdown, and disconnect again. This means:

2× connection establishment overhead
2× shutdown signal processing per worker
Server must accept 2N total connections

This is inherent to the current design and may be intentional, but
it's worth documenting for users who may not expect the double
connection churn.

8. No signal handling — resource leak on kill

Severity: Minor (operational)
File: src/standalone_server.rs (entire module)

The standalone server has no SIGINT/SIGTERM handler. Shutdown
relies solely on the client sending a Shutdown message or
disconnecting. If the server is killed (kill, Ctrl+C), the normal
cleanup paths (close_blocking(), SHM shm_unlink, PMQ mq_unlink,
UDS socket file removal) are skipped.

This can leave stale resources:

/dev/shm/ipc_benchmark_shm (shared memory segment)
/dev/mqueue/ipc_benchmark_pmq (POSIX message queue)
/tmp/ipc_benchmark.sock (UDS socket file)

Users must manually clean up after a killed server.

Suggestion: Document this limitation, or add a ctrlc handler
that sets an AtomicBool shutdown flag checked in the accept loop.

9. PMQ has no `receive_blocking_timed` override

Severity: Minor (measurement methodology)
File: src/ipc/posix_message_queue_blocking.rs (unchanged file)
Related: src/ipc/mod.rs (new trait method added on this branch)

This branch adds the receive_blocking_timed() trait method to
BlockingTransport (in ipc/mod.rs) and adds overrides for TCP,
UDS, SHM, and SHM-direct — but not for POSIX Message Queue.
The PMQ file was not modified on this branch.

PMQ falls back to the trait default, which timestamps after
receive_blocking() returns (including deserialization):

// src/ipc/mod.rs:1195-1198 (trait default, new on this branch)
fn receive_blocking_timed(&mut self) -> Result<(Message, u64)> {
    let msg = self.receive_blocking()?;
    Ok((msg, get_monotonic_time_ns()))
}

The other four transports timestamp between raw I/O and
deserialization. This makes standalone server one-way latency for
PMQ include deserialization time while TCP/UDS/SHM do not.

Note: This is a gap in the scope of this branch's changes
(4 of 5 transports got overrides), not a bug in pre-existing code.

Suggestion: Add a receive_blocking_timed override to
BlockingPosixMessageQueue for consistent methodology.

10. No integration tests for standalone mode

Severity: Minor (coverage gap)
File: tests/ directory

The tests/ directory contains no references to standalone,
--server, or --client. All standalone test coverage is via
in-module unit tests in standalone_server.rs and
standalone_client.rs. There are no end-to-end integration tests
that spawn separate server and client processes (the actual use case).

In particular, SHM and PMQ standalone paths have no happy-path
test coverage beyond effective_concurrency unit tests.

Suggestion: Add at least one integration test per mechanism that
spawns --server in a background process and connects with --client.

11. Test comment contradicts assertion

Severity: Nit (but confusing for reviewers)
File: src/standalone_server.rs lines 2793–2797

// standalone_server.rs:2793-2797
// Should return an error (deserialization failure) but not panic
let result = handle_client_connection(transport.as_mut(), &cfg);
// The receive loop exits on error, returning the collector
// (which may have 0 messages recorded)
assert!(result.is_ok());

The comment says "should return an error" but the assertion checks
is_ok(). The actual behavior is correct (the function returns
Ok(MetricsCollector) even when receive fails because it returns the
collector), but the comment is misleading.

Suggestion: Fix the comment to: "Receive loop exits on error and
returns an Ok(collector) with 0 messages recorded."

Summary

#	Issue	Severity	Action
1	Infinite server hang with no clients	Major	Add idle timeout
2	Swallowed canary/shutdown send errors	Major	Log canary failures
3	Concurrent streaming not wired	Major	Wire or document
4	Extra mechanisms silently ignored	Minor	Warn if len > 1
5	Silent receive error exit	Minor	Add debug log
6	Concurrent mode skips warmup	Minor	Fix metadata or add warmup
7	Two-phase reconnection in concurrent	Minor	Document
8	No signal handling — resource leak on kill	Minor	Document or add handler
9	PMQ missing `receive_blocking_timed`	Minor	Add override for fairness
10	No integration tests for standalone	Minor	Add E2E tests
11	Test comment contradicts assertion	Nit	Fix comment

- Add 60s idle timeout to prevent server hang when no client connects - Log canary/shutdown send errors (warn for canary, debug for shutdown) - Wire concurrent streaming: workers collect MessageLatencyRecord and stream through ResultsManager after completion - Reject multiple mechanisms in standalone mode with clear error - Add debug logging on server receive errors for diagnostics - Add per-worker warmup in concurrent mode (fixes metadata mismatch) - Document two-phase reconnection behavior in concurrent mode - Add SIGINT/SIGTERM signal handler via ctrlc crate for graceful server shutdown and resource cleanup - Add receive_blocking_timed override for PMQ transport to exclude deserialization from one-way latency measurement - Add 11 integration tests spawning separate server/client processes - Fix misleading test comment on garbage input handler Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Drop ctrlc "termination" feature (requires Rust 1.75+ due to nix static zeroed); SIGINT still handled, SIGTERM uses default OS behavior - Remove unused BlockingTcpSocket import in canary failure test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ctrlc 3.5+ uses std::mem::zeroed() in statics which requires Rust 1.75+. Version 3.4.5 does not have this issue and supports Rust 1.69+. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Tests used "tcp uds" but UDS is not a valid mechanism on Windows, causing clap to reject the args before reaching our validation code. Changed to "tcp shm" which exists on all platforms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The test hardcoded /tmp/ which doesn't exist on Windows. Use std::env::temp_dir() for cross-platform compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-23T19:47:32Z

📈 Changed lines coverage: 80.40% (1042/1296)

🚨 Uncovered lines in this PR

src/ipc/mod.rs: 1198-1200
src/ipc/posix_message_queue_blocking.rs: 474-475, 477-478, 484, 486-491, 493-495, 497-498, 503, 505-506, 508-509
src/ipc/shared_memory_blocking.rs: 759-760, 762-763, 770, 787, 789-790, 792
src/ipc/shared_memory_direct.rs: 652, 657-658
src/ipc/tcp_socket_blocking.rs: 322-325
src/ipc/unix_domain_socket_blocking.rs: 363-366
src/main.rs: 794
src/standalone_client.rs: 40, 68-69, 79, 84-85, 87, 96-99, 109-111, 113, 117, 133, 154, 255, 275, 319, 360, 364, 395, 413, 496, 514, 525, 532, 597, 600-601, 609-629, 631-632, 634, 658, 665, 681, 733-735, 740-741, 829, 849, 865, 893, 896-897, 934, 938, 969, 973, 987, 994, 1065, 1083, 1094, 1101, 1104, 1161, 1164-1165, 1173-1193, 1195-1196, 1198, 1222, 1229, 1232, 1244
src/standalone_server.rs: 138, 161, 187, 204-205, 207, 217-219, 221, 260, 284-285, 308-310, 334-335, 417-418, 421-422, 442-443, 453-454, 457, 459, 461, 471-473, 480, 507, 536-537, 552-553, 563-564, 567, 569, 571, 581-583, 590, 640, 719-720, 738-740, 744-746, 793-795, 799-800, 803-804, 820-821, 828-830, 839, 844-845, 848, 850, 852, 862, 885, 912-914, 918-919, 931-932, 939-941, 950, 955-956, 959, 961, 963, 973
src/utils.rs: 239-242

📊 Code Coverage Summary

File	Line Coverage	Uncovered Lines
`src/benchmark.rs`	83.64% (506/605)	`75, 78, 89, 93, 102, 105, 107, 124, 424, 429-434, 441-446, 513-516, 621, 705, 711-713, 717-719, 739, 808-810, 815, 836, 841, 859, 965, 969-972, 974, 983-986, 988, 1064, 1095, 1098, 1100-1101, 1110-1111, 1253, 1266, 1283, 1406, 1415, 1417-1418, 1421-1422, 1428-1429, 1434-1435, 1437, 1442-1443, 1447-1449, 1454-1455, 1458-1459, 1491, 1549-1553, 1555-1561, 1563, 1566, 1723, 1738`
`src/benchmark_blocking.rs`	73.50% (319/434)	`97, 111, 127, 263, 369, 375-377, 380-382, 402, 434, 488, 587, 600, 614, 644-647, 732-735, 754, 758, 773, 815-817, 820, 823-825, 827, 830, 832-836, 838-839, 847-851, 853-857, 860-861, 865-866, 901, 950, 1029, 1040, 1070, 1073, 1138-1143, 1145, 1200-1203, 1208, 1221, 1224-1227, 1231, 1233-1236, 1238, 1240-1241, 1243-1244, 1247, 1249-1254, 1256, 1260-1261, 1263, 1265, 1289, 1301-1306, 1308, 1328-1331`
`src/cli.rs`	92.39% (85/92)	`691, 790, 830, 832, 853-855`
`src/execution_mode.rs`	100.00% (14/14)	``
`src/ipc/mod.rs`	62.67% (47/75)	`115, 425, 427-430, 740-741, 756-757, 775-776, 807, 810, 813, 818, 845-846, 860, 862, 882, 884, 1007-1009, 1198-1200`
`src/ipc/posix_message_queue.rs`	46.09% (59/128)	`139-140, 213-215, 217, 224, 229, 332-335, 337, 345, 437, 441-442, 446, 449-452, 454-458, 539, 679, 782, 789-790, 807-808, 819-820, 831-832, 849-850, 906, 910-911, 914-919, 921-923, 927, 929-931, 933, 935-937, 941-943, 945-947, 994-995, 1017`
`src/ipc/posix_message_queue_blocking.rs`	72.16% (127/176)	`172, 182, 221, 251-255, 274, 325, 368, 387-390, 416-418, 422-423, 425-426, 436, 455, 457-458, 460-461, 474-475, 477-478, 484, 486-491, 493-495, 497-498, 503, 505-506, 508-509`
`src/ipc/shared_memory.rs`	69.36% (163/235)	`61, 141, 145, 246-247, 257-258, 262, 390-391, 417-419, 421, 439-441, 443-444, 446-450, 467, 474, 480, 483-484, 488, 492, 496-497, 502-503, 666-667, 670-671, 674, 676, 681-682, 709-710, 713-714, 721-723, 725, 727-732, 734-735, 738-739, 741-745, 752, 782, 784-785, 787, 791`
`src/ipc/shared_memory_blocking.rs`	75.91% (208/274)	`175, 196-198, 200-201, 204-206, 209-210, 212, 217, 219, 223-225, 230, 238-240, 243-245, 248-249, 251, 254, 257-258, 261-262, 266-267, 269, 273-274, 276, 311-312, 378-379, 403-407, 498, 506, 556, 573, 660, 726, 759-760, 762-763, 770, 787, 789-790, 792, 825, 834, 844, 866`
`src/ipc/shared_memory_direct.rs`	82.42% (150/182)	`372-375, 444-451, 455, 482, 506-509, 513-514, 556-557, 569, 598, 605-606, 629-630, 636, 652, 657-658`
`src/ipc/tcp_socket.rs`	59.43% (63/106)	`31-32, 61, 96, 113-114, 118, 124-125, 129, 136-137, 141, 147-148, 152, 171-172, 175-177, 184-185, 188, 362-363, 366-367, 370-371, 376-377, 422, 429, 447-449, 478, 480-482, 484, 487`
`src/ipc/tcp_socket_blocking.rs`	94.23% (98/104)	`145, 170, 322-325`
`src/ipc/unix_domain_socket.rs`	59.43% (63/106)	`29-30, 58, 93, 103, 122-123, 127, 133-134, 138, 145-146, 150, 156-157, 161, 180-181, 184-186, 193-194, 197, 346-347, 350-351, 354-355, 360-361, 412-414, 443, 445-447, 449, 452, 468`
`src/ipc/unix_domain_socket_blocking.rs`	92.06% (116/126)	`287-288, 294-296, 298, 363-366`
`src/logging.rs`	100.00% (13/13)	``
`src/main.rs`	45.56% (159/349)	85-87, 89, 130-131, 141-145, 149-151, 153-154, 156-157, 177-180, 204-208, 216, 222, 225, 230-233, 238-239, 245, 251, 253-255, 257, 263-264, 270, 275, 278-279, 283, 285-286, 290-291, 293, 299, 303-304, 306-311, 313-314, 317, 326, 329-330, 333, 380-383, 390, 392-396, 399-402, 404-405, 407-408, 410, 412-418, 422, 424-427, 430, 434-436, 440, 442, 445, 449, 454-457, 463-464, 470-471, 477, 479-480, 484, 486, 491-493, 497, 500-501, 503-504, 509, 511-513, 517-518, 520, 527, 532-533, 535-540, 542-543, 547, 556, 559-560, 563, 565, 584, 591, 595-597, 599, 627-628, 636, 669, 720, 724, 727-730, 786-787, 794-795, 798, 825-826, 829, 881-882, 886-889, 911, 938, 947, 952, 957-958
`src/metrics.rs`	84.04% (158/188)	`455-460, 493-494, 552, 558, 579-582, 732-734, 736, 768, 833, 838, 881, 904, 952, 980, 984, 1005, 1007-1008, 1013`
`src/results.rs`	56.85% (253/445)	726, 735-737, 739-740, 743-744, 747, 769, 772-773, 776, 778, 781, 785-790, 800-801, 804-809, 826, 838-839, 841, 843, 846-847, 849, 853, 880, 904-906, 909-910, 914-916, 919, 945, 950, 955, 961, 980, 982-983, 985, 987-991, 993, 995-996, 1030, 1071-1072, 1075, 1081-1082, 1086, 1090-1092, 1094-1095, 1119-1123, 1126-1129, 1132-1139, 1149-1150, 1169-1170, 1172-1176, 1178, 1195-1196, 1198-1203, 1205, 1223, 1225-1230, 1248, 1251, 1267-1268, 1283-1285, 1287-1289, 1291-1292, 1294-1295, 1297-1298, 1300, 1302-1303, 1305-1308, 1310-1312, 1314-1316, 1319, 1323-1324, 1332-1337, 1339-1340, 1344-1345, 1349-1351, 1353, 1357-1358, 1367-1370, 1374-1376, 1380, 1382, 1385, 1390-1391, 1396, 1403-1407, 1409, 1607-1608, 1828-1829, 1831-1832
`src/results_blocking.rs`	95.48% (296/310)	`489-490, 492-493, 544, 769, 774, 779, 815, 818-819, 827-828, 886`
`src/standalone_client.rs`	84.49% (610/722)	`40, 68-69, 79, 84-85, 87, 96-99, 109-111, 113, 117, 133, 154, 255, 275, 319, 360, 364, 395, 413, 496, 514, 525, 532, 597, 600-601, 609-629, 631-632, 634, 658, 665, 681, 733-735, 740-741, 829, 849, 865, 893, 896-897, 934, 938, 969, 973, 987, 994, 1065, 1083, 1094, 1101, 1104, 1161, 1164-1165, 1173-1193, 1195-1196, 1198, 1222, 1229, 1232, 1244`
`src/standalone_server.rs`	80.46% (383/476)	`138, 161, 187, 204-205, 207, 217-219, 221, 260, 284-285, 308-310, 334-335, 417-418, 421-422, 442-443, 453-454, 457, 459, 461, 471-473, 480, 507, 536-537, 552-553, 563-564, 567, 569, 571, 581-583, 590, 640, 719-720, 738-740, 744-746, 793-795, 799-800, 803-804, 820-821, 828-830, 839, 844-845, 848, 850, 852, 862, 885, 912-914, 918-919, 931-932, 939-941, 950, 955-956, 959, 961, 963, 973`
`src/utils.rs`	75.00% (42/56)	`143, 147-148, 153, 159, 198-202, 239-242`
Total	75.38% (3932/5216)

mcurrier2 · 2026-04-24T16:28:48Z

A few more things to look at Shawn..

PR Review Findings: `feat/standalone-client-server`

#gpt-5.3-codex review

Date: 2026-04-23
Repository: /home/mcurrier/auto/work/rusty-comms

Summary

This report now contains only findings that are unique relative to
pr110-review-findings.md in the same directory.

Findings

1) Critical: Server can exit between one-way and round-trip phases

In concurrent standalone client mode, one-way workers complete first, then
round-trip workers are started.
Multi-accept server exits when grace period elapses and all handlers are
finished.
Result: server may stop before round-trip workers connect.

Impacted paths:

src/standalone_client.rs (sequential one-way then round-trip concurrent phases)
src/standalone_server.rs (grace-based shutdown in multi-accept loops)

2) High: Accept-loop errors are not surfaced as function errors

Multi-accept server loops break on non-WouldBlock accept errors.
After break, functions continue to join handlers and return Ok(()).
Operational accept failures are effectively hidden from callers.

Impacted paths:

src/standalone_server.rs (blocking and async multi-accept TCP/UDS paths)

3) Medium: Duration precision loss in spawned server args

Server process duration is passed with duration.as_secs().
Sub-second portions are truncated.
Client/server run windows can diverge in duration mode.

Impacted path:

src/benchmark.rs

5) Medium: Latency file writes ignore line-level write failures

Loop uses writeln!(...).ok() and discards errors.
Function contract says it returns write errors, but per-line errors are not
propagated.

Impacted path:

src/main.rs (write_latency_buffer)

Assumptions

Standalone server is expected to stay available across both benchmark phases
when both are enabled.
Result<()> from server entrypoints is expected to represent runtime failure
signals, not just startup outcomes.