Skip to content

feat(cli): Add standalone --server and --client modes#110

Open
sberg-rh wants to merge 25 commits intomainfrom
feat/standalone-client-server
Open

feat(cli): Add standalone --server and --client modes#110
sberg-rh wants to merge 25 commits intomainfrom
feat/standalone-client-server

Conversation

@sberg-rh
Copy link
Copy Markdown

@sberg-rh sberg-rh commented Mar 26, 2026

Add the ability to run the benchmark server and client as independent processes, enabling cross-environment IPC testing (e.g., host and container).

  • Add --server flag to start a standalone server that listens for client connections using the specified IPC mechanism
  • Add --client flag to connect to a running server and execute the benchmark workload with retry logic (100ms backoff, 30s timeout)
  • Promote --socket-path, --shared-memory-name, and --message-queue-name from hidden internal flags to user-facing under "Standalone Mode"
  • Support both async (Tokio) and blocking (std) execution modes
  • Default transport endpoints work without extra flags for simple usage
  • Add 11 tests for CLI flag parsing and transport config building
  • Update examples and doctest to include new Args fields

Relates to #11

Description

Brief description of changes

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update

Testing

  • Tests pass locally
  • Added tests for new functionality
  • Updated documentation

Checklist

  • Code follows style guidelines
  • Self-review completed
  • Comments added for complex code
  • Documentation updated
  • No breaking changes (or marked as breaking)

@sberg-rh sberg-rh self-assigned this Mar 26, 2026
@github-actions

This comment was marked as outdated.

@redhat-performance redhat-performance deleted a comment from github-actions Bot Mar 26, 2026
Copy link
Copy Markdown
Collaborator

@dustinblack dustinblack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good first step toward #11. The scope is right — standalone server/client as independent processes without touching transport internals. A few things to think about as this develops:

  1. No integration with existing metrics/reporting. The standalone client does its own ad-hoc stats via info! logging (mean/P50/P95/P99) rather than using ResultsManager/MetricsCollector. This means no streaming output, no JSON/CSV, no integration with the reporting pipeline or dashboard. Fine for an initial draft, but worth planning how to close that gap.

  2. Blocking/async code duplication. The blocking and async server/client paths are nearly identical except for await. Consider whether a shared structure or helper could reduce the duplication — the current pattern will be a maintenance burden as features are added.

  3. Two server modes. --server (new, user-facing) and --internal-run-as-server (existing, hidden) now coexist. The distinction is clear in code but worth documenting so users don't stumble into the internal flag, and so future contributors understand when to use which.

  4. Duration mode not supported. The standalone client only supports message-count mode (-i). If someone passes -d, it will silently fall back to the default message count. Should either support it or reject it with a clear error.

Overall the approach is clean and well-tested. Nice to see 5 integration tests covering round-trip, one-way, ping/pong, retry, and shutdown.

@github-actions

This comment was marked as outdated.

@mcurrier2 mcurrier2 self-requested a review March 31, 2026 17:11
@mcurrier2
Copy link
Copy Markdown
Collaborator

Empty message count panic — If -i 0 or similar produces zero messages, the round-trip percentile computation will panic on an empty vector (latencies[msg_count / 2]). -i 0 can panic in round-trip summary due to divide/index operations on empty latency vectors.
Add explicit validation for msg_count >= 1 in count mode, and robust percentile guards for tiny sample sets.

main.rs:
latencies.sort();
let total: std::time::Duration = latencies.iter().sum();
let mean = total / msg_count as u32;
let p50 = latencies[msg_count / 2];
let p95 = latencies[(msg_count as f64 * 0.95) as usize];
let p99 = latencies[(msg_count as f64 * 0.99) as usize];

Ignored config options — concurrency, send_delay, include_first_message, percentiles, streaming/output_file are all silently ignored. Should either be supported or rejected.

No shutdown message — Client relies on transport close for server to detect disconnect. The internal server path uses explicit MessageType::Shutdown which is more deterministic.

main.rs:
transport.close_blocking()?;
info!("Standalone client finished.");

Duration mode is not honored in standalone client paths:
BenchmarkConfig supports duration by setting msg_count = None, but standalone client code falls back to default count.
main.rs:
let msg_count = config
.msg_count
.unwrap_or(ipc_benchmark::defaults::MSG_COUNT);

When -d 5s is passed, BenchmarkConfig::from_args sets msg_count = None (duration takes precedence). The standalone client then falls back to the default count via unwrap_or. The config.duration field is never read anywhere in the standalone paths. So --client -d 5s -m tcp --blocking silently runs the default message count instead of a 5-second timed test.

Heap allocation per message in measurement loop
main.rs:
for i in 0..msg_count {
let msg = Message::new(i as u64, payload.clone(), MessageType::OneWay);
transport.send_blocking(&msg)?;

payload.clone() allocates a new Vec on every iteration. Per the project's own guidelines ("No allocations in measurement loops"), this adds allocation overhead to every measurement sample. The same pattern appears in the round-trip loop and both async variants. This will skew latency numbers, particularly for small messages where allocation cost is proportionally large.

Standalone client bypasses existing reporting pipeline (ResultsManager/streaming JSON/CSV), unlike established benchmark flows

Test coverage claim mismatch: this branch diff does not add standalone-specific CLI/integration tests; new flags/paths need explicit coverage.

--server / --client flag parsing
--server + --client conflict
build_standalone_transport_config()
Endpoint override defaults (socket_path, shared_memory_name, message_queue_name)
Duration mode in standalone client
connect_blocking_with_retry / connect_async_with_retry
shutdown behavior and retry timeout behavior - shutdown message from client

When I asked to compare against C2C branch:

lacking server-side latency file support
Integration with the full benchmark/results pipeline
--cross-container SHM auto-detection in C2C

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@sberg-rh sberg-rh force-pushed the feat/standalone-client-server branch from f382afc to 93bfac7 Compare April 7, 2026 19:00
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

1 similar comment
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

1 similar comment
@github-actions

This comment was marked as outdated.

@sberg-rh
Copy link
Copy Markdown
Author

sberg-rh commented Apr 9, 2026

Normal vs Standalone Benchmark Comparison

Manual comparison of benchmark results between normal mode (main branch) and standalone client/server mode (this PR). All tests run locally on the same machine with 1000 warmup iterations.

TCP Round-Trip (10000 msgs, 1024 bytes)

Metric Normal (main) Standalone Diff
Mean 18.93 us 16.92 us -11%
P95 33.70 us 19.71 us -41%
P99 51.81 us 22.25 us -57%
Min 10.92 us 11.75 us +8%
Max 80.58 us 37.33 us -54%
Throughput 50,933 msg/s 59,550 msg/s +17%

UDS Round-Trip (10000 msgs, 1024 bytes)

Metric Normal (main) Standalone Diff
Mean 8.52 us 9.56 us +12%
P95 10.54 us 11.75 us +11%
P99 11.67 us 13.38 us +15%
Min 5.04 us 5.00 us -1%
Max 53.83 us 23.79 us -56%
Throughput 111,590 msg/s 104,240 msg/s -7%

TCP Large Payload Round-Trip (5000 msgs, 8192 bytes)

Metric Normal (main) Standalone Diff
Mean 28.24 us 25.69 us -9%
P95 51.81 us 31.21 us -40%
P99 61.82 us 36.06 us -42%
Throughput 274,100 msg/s 315,530 msg/s +15%

TCP One-Way (10000 msgs, 1024 bytes)

Metric Normal (main) Standalone
Throughput 126,180 msg/s 196,060 msg/s

Note: Standalone one-way reports throughput only on client side; server reports latency separately. Normal mode measures server-side latency via latency file.

Summary

  • Mean latencies are within ~10-15% of each other, which is expected given different process spawning models
  • Standalone mode shows better tail latency (P95/P99) in most cases, likely due to message struct reuse (no per-message heap allocation)
  • Throughput is comparable or better in standalone mode
  • Results confirm standalone mode is a valid alternative for benchmarking within reasonable tolerance

@sberg-rh
Copy link
Copy Markdown
Author

sberg-rh commented Apr 9, 2026

PR Summary

This PR adds standalone --server and --client modes to enable running the benchmark server and client as independent processes, supporting cross-environment IPC testing (container-to-container, container-to-host). Relates to #11.

Features

  • --server / --client CLI flags with mutual exclusivity, under a "Standalone Mode" help heading
  • Multi-accept server for TCP and UDS (one handler thread per client connection)
  • Concurrency support (-c N) with automatic SHM/PMQ fallback to concurrency=1
  • Connection retry with 100ms backoff and 30s timeout
  • Duration (-d) and message-count (-i) modes
  • send_delay, include_first_message, and percentiles options supported
  • Explicit MessageType::Shutdown for deterministic server exit

Reporting

  • Full ResultsManager/MetricsCollector integration: JSON output (-o), streaming CSV/JSON, and console summary with HDR histogram percentiles
  • Server-side one-way latency measurement using shared monotonic clock (accurate for same-host and container scenarios)
  • Per-message heap allocation eliminated in measurement loops

Transport

  • BlockingTcpSocket::from_stream() and BlockingUnixDomainSocket::from_stream() for per-client handler threads in multi-accept servers
  • Fixed non-blocking stream inheritance from listeners in both blocking and async multi-accept paths

Code Quality

  • Shared helpers: dispatch_server_message(), effective_concurrency(), handle_client_connection(), aggregate_and_print_server_metrics(), print_server_one_way_latency()
  • SERVER_ACCEPT_GRACE_PERIOD and CONNECT_RETRY_* constants
  • 40 binary tests covering: round-trip, one-way, ping/pong, duration mode, concurrency, retry, shutdown, large payload integrity, canary filtering, mixed message types, from_stream, metrics aggregation, and async multi-accept
  • OS-assigned test ports via get_free_port() to prevent conflicts

Benchmark Comparison

Normal mode vs standalone mode results are within ~10-15% mean latency tolerance, with standalone often showing better tail latency (see comparison tables in PR comments).

Deferred Items

Two maintainability concerns were raised during review: the standalone logic adding ~2800 lines to src/main.rs (now ~3900 total), and blocking/async client code duplication. These are structural issues, not correctness concerns — the feature works and is well-tested. Extracting into a src/standalone.rs module and reducing client duplication (potentially using the spawn_blocking + shared handler pattern from the server side) would improve navigability, but we felt it was better to land the functional feature and iterate on structure separately rather than gate the PR on a large refactor. Open to discussion if the reviewers prefer to address these before merge.

@sberg-rh sberg-rh marked this pull request as ready for review April 9, 2026 16:56
@sberg-rh sberg-rh requested a review from ewchong April 9, 2026 16:56
@github-actions

This comment was marked as outdated.

Copy link
Copy Markdown
Collaborator

@dustinblack dustinblack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress on the implementation — CI is green, the benchmark comparison shows comparable performance, and the integration tests for blocking TCP paths are solid.

However, Matt's review from Mar 31 raised several correctness issues that still appear unaddressed. These need to be resolved before this can be approved:

  1. Panic on -i 0: latencies[msg_count / 2] will panic on an empty vector. Need either input validation rejecting msg_count < 1 or a guard before the percentile computation.

  2. Duration mode silently ignored: --client -d 5s falls back to the default message count via unwrap_or. The config.duration field is never read in the standalone paths. This should either be supported or rejected with a clear error.

  3. payload.clone() in measurement loop: Allocates a new Vec<u8> on every iteration. For a benchmarking tool, this adds measurable overhead, especially for small messages. The existing benchmark runner avoids this — standalone should too.

  4. Silently ignored config options: concurrency, send_delay, include_first_message, percentiles, streaming-output-*, and output-file are all accepted but do nothing. These should either be wired up or rejected so users aren't misled by silent no-ops.

  5. No shutdown message: Client relies on transport close for the server to detect disconnect. The internal server path uses explicit MessageType::Shutdown which is more reliable. Worth aligning these.

  6. Coverage at 7.3%: Almost all standalone code is uncovered. The integration tests exercise the transport layer directly but don't go through run_standalone_server/run_standalone_client. Consider whether the tests added actually cover the code paths being introduced.

@sberg-rh
Copy link
Copy Markdown
Author

Thanks for the review! I want to make sure we're looking at the same code — these items were all addressed in commits after mcurrier2's original feedback on March 31. Could you confirm you're reviewing the latest state of the branch (HEAD at c6f99c72)?

Here's where each item was addressed:

# Concern Status Commit
1 Panic on -i 0 Fixed — percentile computation now uses MetricsCollector (HDR histogram), old manual percentile code removed fd6bb272 (ResultsManager integration)
2 Duration mode silently ignored Fixed — config.duration read and used in all standalone client paths (blocking/async, single/concurrent) 3c9ed88
3 payload.clone() in loop Fixed — Message reused with in-place id/timestamp updates, only initial creation and warmup canaries clone 80377f8b
4 Silently ignored config options Fixed — concurrency, send_delay, include_first_message, percentiles, streaming-output-*, output-file all wired up Various commits
5 No shutdown message Fixed — explicit MessageType::Shutdown sent before close in all client paths 642e6a9a
6 Coverage at 7.3% Known limitation — tarpaulin doesn't instrument binary crate (main.rs). 40 tests exist and pass but the coverage tool can't measure them. main.rs shows ~16% coverage across all PRs in this repo for the same reason.

Happy to walk through any of these in more detail if needed.

Copy link
Copy Markdown
Collaborator

@dustinblack dustinblack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First — apologies for the previous review. I reviewed against stale code and repeated issues that had already been addressed. That's on me. This review is against the current HEAD (c6f99c72).

The PR has come a long way — duration mode, concurrency, metrics integration, message reuse, shutdown messages, and streaming output are all properly wired up. 36 tests pass. Here are the remaining items:


High: Code duplication

The blocking/async client and server paths are near-identical copies — roughly 8 variants of similar measurement/server loops. Any future bug fix needs replication across all of them. This also ties into the coverage issue below.

High: Coverage and code placement

The 7.3% changed-line coverage is not purely a tarpaulin limitation — it's because ~800 lines of new logic live in main.rs (binary crate) rather than in the library. handle_client_connection, dispatch_server_message, effective_concurrency, build_standalone_transport_config, aggregate_and_print, and the measurement loops could all live in lib.rs or a new module exposed through it. This would:

  • Let tarpaulin instrument the code, making the 36 existing tests show up in coverage reports
  • Enable integration tests in tests/ to exercise the standalone logic
  • Reduce the blocking/async duplication by sharing testable core logic

Given that development on this project is heavily AI-assisted, measurable test coverage is especially important as a verification mechanism. I'd like to see the core logic moved to the library crate.


Medium items

  1. Server one-way latency includes deserialization. get_monotonic_time_ns() is called after receive_blocking() returns, so bincode deserialization time is included in the measurement. For large payloads this is non-trivial.

  2. Concurrent mode grace period bug. The 2-second SERVER_ACCEPT_GRACE_PERIOD could expire between the one-way and round-trip test phases, causing the server to exit mid-test when both tests are enabled with concurrency > 1.

  3. --quiet flag not honored. Standalone paths always init tracing to stderr.

  4. Mutex unwrap() in multi-threaded handlers. If a handler thread panics while holding the metrics lock, all other threads cascade-panic on the poisoned mutex.

  5. --shm-direct auto-blocking not applied. The --shm-direct--blocking auto-enable happens after the standalone dispatch branch, so --client --shm-direct without --blocking hits the async path incorrectly.

Low items

  1. from_stream() bypasses socket buffer tuning that normal start_server_blocking applies — could cause measurement differences.

  2. Integer division drops remainder messages in concurrent mode (msg_count / concurrency).

  3. Response messages always have empty payloads — asymmetric round-trip. Worth documenting whether this is intentional.

…ting

Add the ability to run the benchmark server and client as independent
processes, enabling cross-environment IPC testing (e.g., host and
container). Relates to #11.

Standalone mode features:
- --server flag starts a server that listens for client connections
- --client flag connects to a running server with retry logic
  (100ms backoff, 30s timeout)
- Both async (Tokio) and blocking (std) execution modes supported
- Duration (-d) and message-count (-i) modes both supported
- Default transport endpoints work without extra flags
- Endpoint flags (--socket-path, --shared-memory-name,
  --message-queue-name) promoted to user-facing

Reporting integration:
- Full ResultsManager/MetricsCollector integration for structured
  output (JSON, streaming CSV, console summary with HDR percentiles)
- Server-side one-way latency measurement using monotonic clock
  (accurate for same-host and container scenarios)
- Round-trip latency with per-message streaming support

Code quality:
- Shared helpers: dispatch_server_message(), retry constants
- 25 tests covering CLI parsing, transport config, server dispatch,
  connection retry, shutdown, duration mode, one-way, round-trip
- Explicit MessageType::Shutdown on client disconnect

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sberg-rh and others added 8 commits April 14, 2026 13:44
- test_standalone_blocking_tcp_one_way: verify server received exact
  message count with correct sequential IDs, add shutdown message
- test_standalone_blocking_tcp_duration_round_trip: verify response
  IDs match requests, assert count > 10 for 200ms test, add shutdown
- test_standalone_blocking_tcp_duration_one_way: verify server received
  exact count with sequential IDs, assert count > 10 for 200ms test
- test_concurrency_forced_to_one_for_shm: test actual concurrency
  forcing logic instead of just CLI parsing
- test_standalone_concurrent_tcp_one_way: assert exact message count
  per handler instead of just "greater than zero"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Clean up garbled doc comment on async concurrent test (editing
  artifacts from multiple rewrites)
- Replace silent panic swallowing in async multi-accept servers:
  try_join_next().transpose() silently dropped JoinErrors from
  panicked handler tasks. Now logs warnings via warn!().
- Extract effective_concurrency() helper to deduplicate the
  concurrency-forcing logic (was copied in blocking client, async
  client, and test). Test now calls the actual helper instead of
  reimplementing the logic inline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_standalone_large_payload_integrity: 4KB payloads with
  recognizable byte pattern, server echoes back, client verifies
  content byte-for-byte to catch corruption
- test_handle_client_connection_filters_canary: verifies warmup
  canary messages (id=u64::MAX) are excluded from one-way metrics
- test_handle_client_connection_mixed_message_types: interleaved
  OneWay and Request messages on a single connection, verifies
  correct metrics recording and response dispatch
- test_aggregate_and_print_multiple_collectors: aggregation across
  2 collectors with different latency distributions
- test_effective_concurrency_all_mechanisms: covers UDS, PMQ, SHM,
  TCP, and concurrency=1 edge case

Total binary tests: 40.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Accepted TCP/UDS streams inherit non-blocking mode from the
listener (set for the accept poll loop). The handler threads need
blocking mode for the transport's read_exact/write_all operations.

This is the blocking-server equivalent of the async into_std fix
in commit 8723429. Without this fix, standalone server handlers
immediately disconnect from clients.

Applies to both run_standalone_server_blocking_multi_accept_tcp
and _uds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix grace period bug: reset timer on every new connection, not just
  the first. Prevents premature server exit between one-way and
  round-trip test phases when using concurrency > 1. Applied to all
  four multi-accept servers (blocking TCP/UDS, async TCP/UDS).
- Honor --quiet flag in standalone server and client. When set,
  suppress all tracing output to stderr.
- Handle poisoned mutex gracefully: use unwrap_or_else(|e| e.into_inner())
  instead of unwrap() on mutex locks. If a handler thread panics while
  holding the lock, other threads can still push their metrics instead
  of cascade-panicking.
- Add defensive --shm-direct guard in standalone server and client:
  returns error if --shm-direct is used without --blocking. This is
  normally enforced by main() but the guard protects against future
  refactoring that might change the dispatch order.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix grace period bug: reset timer on every new connection, not just
  the first. Prevents premature server exit between one-way and
  round-trip test phases when using concurrency > 1. Applied to all
  four multi-accept servers (blocking TCP/UDS, async TCP/UDS).
- Honor --quiet flag in standalone server and client. When set,
  suppress all tracing output to stderr.
- Handle poisoned mutex gracefully: use unwrap_or_else(|e| e.into_inner())
  instead of unwrap() on mutex locks in handler threads.
- Add defensive --shm-direct guard in standalone server and client.
- Add socket buffer tuning (recv/send buffer sizes) to multi-accept
  TCP servers to match normal transport behavior.
- Fix integer division remainder: last worker now receives any extra
  messages when msg_count is not evenly divisible by concurrency.
- Document empty response payloads as intentional design matching
  existing benchmark runner behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add receive_blocking_timed() to the BlockingTransport trait that
captures a monotonic timestamp after raw bytes are read but before
bincode deserialization. This excludes deserialization overhead from
one-way latency measurements.

- Add default implementation on BlockingTransport trait (backward
  compatible -- captures timestamp after full receive)
- Override in TCP, UDS, and SHM blocking transports to place
  timestamp between raw I/O read and deserialization
- SHM-direct uses default (no bincode deserialization to exclude)
- Update handle_client_connection and standalone single-client
  server to use receive_blocking_timed

Impact is most visible with large payloads where deserialization
is non-trivial. 64KB one-way TCP test shows min latency dropped
from 41us (post-deserialize) to 14us (pre-deserialize), a ~27us
improvement representing the bincode deserialization time excluded
from measurement. Mean dropped 28% (73us to 52us) and P99 dropped
14% (132us to 113us).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR #105 (Fix/streaming timestamps) added send_timestamp_ns parameter
to MessageLatencyRecord::new(). Update all 4 call sites in standalone
client to capture wall-clock timestamp at send time and pass it as
the 6th argument.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sberg-rh sberg-rh force-pushed the feat/standalone-client-server branch from f320d06 to 3e17727 Compare April 14, 2026 18:50
@github-actions

This comment was marked as outdated.

sberg-rh and others added 2 commits April 15, 2026 11:28
Move ~3100 lines of standalone client/server code from the binary
crate (main.rs) into two new library modules, following the existing
flat-file convention (benchmark.rs/benchmark_blocking.rs pattern).

Structure:
- standalone_server.rs (1982 lines): constants, shared helpers,
  server dispatch, multi-accept TCP/UDS, async server paths
- standalone_client.rs (1146 lines): retry helpers, client dispatch,
  single/concurrent blocking and async paths
- main.rs reduced from ~4200 to ~1120 lines (thin dispatch layer)

Additional changes:
- Promote logging.rs from binary-private to library-public module
- Move set_affinity() to utils.rs as pub function
- All standalone functions now pub for tarpaulin coverage measurement
  and integration test access

No behavioral changes. All 374 tests pass. Benchmark comparison
across 3 runs confirms no performance regression (mean latencies
within 2-5% run-to-run variance).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Socket configuration failures (set_nonblocking, set_nodelay) in
  multi-accept servers now log a warning and skip the bad connection
  instead of crashing the entire server with ?
- Thread join panics in blocking multi-accept servers now logged
  with warn! instead of silently dropped with let _ =
- Streaming latency record failures in client now logged with
  debug! instead of silently swallowed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions

This comment was marked as outdated.

Copy link
Copy Markdown
Collaborator

@dustinblack dustinblack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick turnaround, Shawn — I can see the grace period, quiet flag, mutex safety, shm-direct guard, socket buffer tuning, and remainder message fixes all made it into the current code. Well done.

One remaining issue: coverage. Now that the standalone logic lives in library modules, tarpaulin can instrument it — so the 1.29% (standalone_client.rs) and 11.59% (standalone_server.rs) coverage isn't a tarpaulin limitation anymore. The helper functions are well-tested (build_standalone_transport_config, dispatch_server_message, connect_blocking_with_retry, handle_client_connection, aggregate_and_print_server_metrics, effective_concurrency), but none of the orchestration functions are called from tests:

  • run_standalone_server_blocking / _single / _multi_accept_tcp / _multi_accept_uds
  • run_standalone_client_blocking_single / _concurrent
  • The async variants of all of the above

These contain the warmup logic, duration mode, metrics finalization, results output, concurrent worker spawning, and the grace period accept loop. Adding a few integration tests that call these entry points end-to-end (e.g., spawn run_standalone_server_blocking_single in a thread, run run_standalone_client_blocking_single against it with a small message count) should bring coverage up substantially and would give us confidence in the orchestration code — especially important given the AI-assisted development workflow on this project.

One minor note: response messages always use empty payloads (Vec::new()) regardless of request size, making round-trip measurements asymmetric. If that's intentional, a brief comment would help.

Server tests (standalone_server.rs, 82.7% coverage):
- test_multi_accept_tcp_server_direct: exercises multi-accept TCP directly
- test_single_server_direct: exercises blocking single-client directly
- test_server_blocking_dispatch: exercises dispatch logic
- test_server_blocking_dispatch_uds: exercises UDS dispatch branch
- test_async_multi_accept_tcp_full: exercises async multi-accept TCP
- test_async_single_server_path: exercises async single-client
- test_async_single_server_one_way_metrics: async one-way metrics
- test_async_multi_accept_uds_full: exercises async UDS multi-accept
- test_multi_accept_server_with_delayed_client: slow sender resilience
- test_multi_accept_server_duration_one_way: duration mode with multi-accept
- test_async_multi_accept_server_duration_one_way: async duration mode
- test_handle_client_connection_send_failure: client disconnect error path
- test_single_server_client_disconnect: single server send error path
- test_multi_accept_server_survives_bad_client: garbage input resilience
- test_handle_client_connection_garbage_input: deserialization error path
- test_run_standalone_server_full_dispatch: full entry point dispatch
- test_run_standalone_server_rejects_all_via_dispatch: 'all' validation
- test_run_standalone_server_rejects_shm_direct: shm-direct guard
- test_run_standalone_server_verbose: -vv logging level branches
- test_aggregate_server_metrics_from_handlers: real handler data
- test_print_server_one_way_latency_with_data/zero: print paths

Client tests (standalone_client.rs, 86.3% coverage):
- test_client_blocking_tcp_round_trip/one_way: single client paths
- test_client_blocking_tcp_duration_round_trip/one_way: duration mode
- test_client_blocking_tcp_concurrent_round_trip/one_way: concurrent
- test_client_async_single_round_trip/one_way: async single
- test_client_async_duration_round_trip/one_way: async duration
- test_client_async_concurrent_round_trip/one_way: async concurrent
- test_client_blocking_with_send_delay: send_delay round-trip branch
- test_client_blocking_one_way_with_send_delay: send_delay one-way branch
- test_client_blocking_with_streaming_output: JSON streaming
- test_client_blocking_combined_streaming: combined mode streaming
- test_client_blocking_csv_streaming: CSV streaming
- test_client_blocking_concurrent_duration_one_way: concurrent duration
- test_client_async_concurrent_duration_one_way: async concurrent duration
- test_run_standalone_client_full_dispatch: full entry point dispatch
- test_run_standalone_client_rejects_all_via_dispatch: 'all' validation
- test_run_standalone_client_rejects_shm_direct: shm-direct guard
- test_connect_async_with_retry_succeeds: async retry path

Also: changed tracing .init() to .try_init() with eprintln fallback
in both server and client for test compatibility.

Coverage: standalone_server 82.7%, standalone_client 86.3%, combined 84.8%
Total lib tests: 355.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions

This comment was marked as outdated.

Copy link
Copy Markdown
Collaborator

@dustinblack dustinblack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage looks great — 83.44% changed-line coverage, with standalone_client.rs at 86% and standalone_server.rs at 84%. Nice work addressing all the review items.

Code-wise this is ready. Before I approve, I want to validate with hands-on testing:

  1. Existing benchmark scenarios (non-standalone) still work as expected — regression check
  2. Standalone --server / --client mode on the same host
  3. Standalone mode across container boundaries (container-to-host, container-to-container)

I'll follow up once testing is complete.

Reduce the non-blocking accept loop sleep from 50ms to 5ms in both
TCP and UDS multi-accept servers. This cuts connection acceptance
latency by 10x with no portability concerns.

Discovered during hands-on validation testing of standalone concurrent
mode, where the 50ms polling interval was the primary contributor to
elevated tail latency under multi-client workloads.

Improvement with -c 4 concurrent clients:
- RT P95: -46% (65.9us -> 35.5us)
- RT P99: -49% (91.4us -> 46.9us)
- Throughput: +66% (94.9 -> 157.1 MB/s)

Single-client workloads also benefit from faster initial connection
acceptance (P99 improved 4-7% across all test modes).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sberg-rh sberg-rh force-pushed the feat/standalone-client-server branch from 01e7e96 to 15f9c92 Compare April 16, 2026 19:25
@sberg-rh
Copy link
Copy Markdown
Author

@dustinblack just a heads up, I found a performance issue during some hands on testing that I pushed a small fix for in case you're doing your own hands on testing :)

@github-actions

This comment was marked as outdated.

@sberg-rh
Copy link
Copy Markdown
Author

I did some validation testing of my own before moving to the more in depth customer testing, here are my results:

PR #110 Validation Testing Summary

Scenario 1: Regression Check (PR #110 vs Main)

Method: 100 paired runs, TCP Async/UDS Async/TCP Blocking, 5,000 messages each, alternating branch order.

Test main PR #110 Delta 95% CI Statistically significant?
TCP Async RT Mean 23.20 us 23.48 us +1.2% [+0.16, +0.39] us Detectable but negligible (0.3us)
TCP Async RT P99 32.07 us 32.26 us +0.6% [+0.04, +0.34] us Detectable but negligible
UDS Async RT Mean 12.69 us 12.73 us +0.3% [-0.11, +0.20] us No
UDS Async RT P99 17.95 us 17.92 us -0.2% [-0.22, +0.16] us No
TCP Blocking RT Mean 17.69 us 17.69 us -0.0% [-0.12, +0.11] us No
TCP Blocking RT P99 24.12 us 24.06 us -0.2% [-0.44, +0.32] us No

Verdict: No actionable regression. Outlier rates are equal across branches (42 vs 37 mean, 21 vs 25 P99).


Scenario 2: Standalone Same-Host

Functional tests: All 6 pass (TCP/UDS blocking, TCP async, duration mode, concurrency, output files).

Standalone c=1 vs Internal c=1 (25 runs):

Metric Internal Standalone Delta
RT Mean 17.34 us 16.82 us -3.0% (standalone faster)
RT P99 23.92 us 22.23 us -7.1% (standalone faster)
Standalone won 24/25 runs

Standalone is measurably faster than internal mode due to avoiding child process spawn and pipe coordination overhead.

Concurrency note: Internal mode's -c flag is a no-op (workers run sequentially in both async and blocking paths). Standalone mode is the first real concurrency implementation in the codebase.

Accept loop optimization: Discovered and fixed a performance bottleneck -- reduced multi-accept polling from 50ms to 5ms. Impact with -c 4: RT P99 improved 49%, throughput improved 66%. Committed as 15f9c92a.


Scenario 3: Cross-Container (Podman)

Functional tests: All 3 pass.

Test Topology RT Mean RT P99
3a Host server, container client 164.43 us 225.79 us
3b Container server, host client 155.14 us 208.90 us
3c Container-to-container 40.68 us 50.53 us

Container-to-container is ~4x faster than cross-VM tests because both containers share the Podman VM's network stack.

Parity tests:

Test Comparison Mean Delta Interpretation
P1 Container vs host (internal mode) +108.7% Expected -- macOS Podman VM overhead
P2 Container standalone vs host standalone +145.9% Consistent with P1 + network bridge hop
P3 PR #110 container vs main container +0.6% No regression between branches

The ~2x container overhead is a macOS-specific artifact from the Hypervisor VM layer. On Linux, container overhead is near-zero since containers share the host kernel's network stack.


Overall Conclusion

  • No performance regressions from PR feat(cli): Add standalone --server and --client modes #110 across any scenario
  • Standalone mode is slightly faster than internal mode for equivalent workloads
  • Cross-container communication works correctly across all topologies
  • One optimization shipped: Accept loop polling interval reduced from 50ms to 5ms, significantly improving concurrent client tail latency

@mcurrier2
Copy link
Copy Markdown
Collaborator

PR #110 Review — New Issues Found

Branch: feat/standalone-client-server (HEAD: 15f9c92)
Reviewer: Matt Currier (AI-assisted)
Date: 2026-04-22


1. Infinite server hang when no client connects

Severity: Major
Files: src/standalone_server.rs lines 354–407, 466–507, 707–775,
819–877

In all four multi-accept paths (blocking TCP/UDS, async TCP/UDS), the
server uses a grace period that only starts when first_client_time is
set — which requires a successful accept(). If no client ever
connects, the grace timer never starts, and the loop never exits.

// standalone_server.rs:354
let mut first_client_time: Option<std::time::Instant> = None;
// ...
loop {
    // ... accept sets first_client_time on success ...
    // Line 405-406:
    let grace_elapsed = first_client_time
        .is_some_and(|t| t.elapsed() > accept_grace_period);
    if grace_elapsed && !handles.is_empty()
        && handles.iter().all(|h| h.is_finished())
    {
        break;  // never reached if no client connects
    }
}

This affects all four multi-accept functions:

  • run_multi_accept_tcp_blocking (line 354)
  • run_multi_accept_uds_blocking (line 466)
  • run_multi_accept_tcp_async (line 707)
  • run_multi_accept_uds_async (line 819)

Suggestion: Add an overall idle timeout (e.g. 60s with no
connections) or an optional --server-timeout flag so standalone
servers don't hang indefinitely when no client appears.


2. Swallowed canary and shutdown send errors

Severity: Major
File: src/standalone_client.rs

Canary sends (warmup discard messages) and shutdown sends silently
discard errors via let _ = transport.send_*(). If the canary fails,
measurement starts on a broken connection with no warning.

Production occurrences (non-test code):

Line Context
246 let _ = transport.send_blocking(&canary); (one-way canary)
307 let _ = transport.receive_blocking(); (round-trip canary response)
395 let _ = transport.send_blocking(&shutdown); (shutdown)
459 let _ = transport.send_blocking(&canary); (concurrent one-way canary)
490 let _ = transport.send_blocking(&shutdown); (concurrent shutdown)
543 let _ = transport.receive_blocking(); (concurrent RT canary response)
581 let _ = transport.send_blocking(&shutdown); (concurrent RT shutdown)
739 let _ = transport.send(&canary).await; (async one-way canary)
800 let _ = transport.receive().await; (async RT canary response)
888 let _ = transport.send(&shutdown).await; (async shutdown)
890 let _ = transport.close().await; (async close — also swallowed)
948 let _ = transport.send(&canary).await; (async concurrent canary)
979 let _ = transport.send(&shutdown).await; (async concurrent shutdown)
980 let _ = transport.close().await; (async concurrent close)
1062 let _ = transport.send(&shutdown).await; (async concurrent RT shutdown)
1063 let _ = transport.close().await; (async concurrent RT close)

Suggestion: At minimum, log canary failures at warn! level. For
shutdown sends, debug! is acceptable since the server will exit on
disconnect anyway. Canary failures should either warn or abort the
measurement.


3. Concurrent mode does not stream per-message records

Severity: Major (silent data loss for users)
File: src/standalone_client.rs lines 429–598, 920–1065

Single-connection paths call results_manager.stream_latency_record()
on every message (lines 342, 377, 835, 870). However, the concurrent
worker paths never call stream_latency_record — workers use a
bare MetricsCollector in their thread/task closures without access to
the ResultsManager.

// standalone_client.rs:449-487 (concurrent one-way worker)
std::thread::spawn(move || -> Result<PerformanceMetrics> {
    let mut metrics = MetricsCollector::new(None, percentiles)?;
    // ...
    metrics.record_message(message_size, None)?;
    // ^^^ no stream_latency_record call anywhere in worker
    // ...
    Ok(metrics.get_metrics())
})

Same gap in all four concurrent paths:

  • Blocking one-way workers (line 449–487)
  • Blocking round-trip workers (line 531–584)
  • Async one-way workers (line 935–982)
  • Async round-trip workers (line 1016–1065)

Suggestion: Either pass an Arc<Mutex<BlockingResultsManager>> to
workers, or document that --streaming is not supported with
--concurrency > 1.


4. Extra mechanisms silently ignored

Severity: Minor
Files: src/standalone_server.rs line 145,
src/standalone_client.rs line 38

// standalone_server.rs:145
let mechanism = match args.mechanisms.first() {
    Some(&m) => m,
    None => return Err(anyhow::anyhow!("No IPC mechanism specified")),
};

If a user runs --server -m tcp uds, only tcp is used and uds is
silently dropped. The -m all case is caught later with a specific
error, but -m tcp uds gives no indication that the second mechanism
was ignored.

Suggestion: Add a warning or error when mechanisms.len() > 1 in
standalone mode.


5. Server receive errors silently exit the loop

Severity: Minor
File: src/standalone_server.rs lines 252, 290

// standalone_server.rs:252
while let Ok((message, receive_time_ns)) =
    transport.receive_blocking_timed()
{
    // Any Err exits this loop with no warning/info log
}

Unlike the async single-server path which logs receive failures, the
blocking handle_client_connection and handle_client_connection_multi
functions exit their receive loops silently on any error. A transport
failure is indistinguishable from a clean client disconnect in the logs.

Suggestion: Add at least a debug! log on the receive error before
exiting the loop, similar to the async paths.


6. Concurrent mode skips warmup iterations

Severity: Minor (metadata mismatch)
File: src/standalone_client.rs lines 408–631, 899–1089

Single-connection paths run an explicit warmup loop using
config.warmup_iterations (blocking line 225, async line 718):

// standalone_client.rs:224-236 (single-connection blocking)
// Warmup
for _ in 0..config.warmup_iterations {
    let msg_type = if config.round_trip {
        MessageType::Request
    } else {
        MessageType::OneWay
    };
    let msg = Message::new(u64::MAX, payload.clone(), msg_type);
    transport.send_blocking(&msg)?;
    if config.round_trip {
        transport.receive_blocking()?;
    }
}
info!("Warmup complete ({} iterations)", config.warmup_iterations);

Concurrent paths (run_standalone_client_blocking_concurrent line 408,
run_standalone_client_async_concurrent line 899) never run this
warmup loop
. Workers only optionally send a single canary when
!include_first_message. However, BenchmarkResults::new() still
receives config.warmup_iterations (lines 611, 1086), so the reported
metadata claims warmup was performed when it was not.

Suggestion: Either run warmup per-worker in concurrent mode, or
pass warmup_iterations: 0 to BenchmarkResults::new() for
concurrent runs.


7. Two-phase reconnection in concurrent mode

Severity: Minor (design consideration)
File: src/standalone_client.rs lines 429–598

When both one_way and round_trip are enabled with concurrency > 1,
the client runs two sequential phases: all N workers connect, run
one-way, send shutdown, and disconnect; then all N workers reconnect
for round-trip, run it, send shutdown, and disconnect again. This means:

  • 2× connection establishment overhead
  • 2× shutdown signal processing per worker
  • Server must accept 2N total connections

This is inherent to the current design and may be intentional, but
it's worth documenting for users who may not expect the double
connection churn.


8. No signal handling — resource leak on kill

Severity: Minor (operational)
File: src/standalone_server.rs (entire module)

The standalone server has no SIGINT/SIGTERM handler. Shutdown
relies solely on the client sending a Shutdown message or
disconnecting. If the server is killed (kill, Ctrl+C), the normal
cleanup paths (close_blocking(), SHM shm_unlink, PMQ mq_unlink,
UDS socket file removal) are skipped.

This can leave stale resources:

  • /dev/shm/ipc_benchmark_shm (shared memory segment)
  • /dev/mqueue/ipc_benchmark_pmq (POSIX message queue)
  • /tmp/ipc_benchmark.sock (UDS socket file)

Users must manually clean up after a killed server.

Suggestion: Document this limitation, or add a ctrlc handler
that sets an AtomicBool shutdown flag checked in the accept loop.


9. PMQ has no receive_blocking_timed override

Severity: Minor (measurement methodology)
File: src/ipc/posix_message_queue_blocking.rs (unchanged file)
Related: src/ipc/mod.rs (new trait method added on this branch)

This branch adds the receive_blocking_timed() trait method to
BlockingTransport (in ipc/mod.rs) and adds overrides for TCP,
UDS, SHM, and SHM-direct — but not for POSIX Message Queue.
The PMQ file was not modified on this branch.

PMQ falls back to the trait default, which timestamps after
receive_blocking() returns (including deserialization):

// src/ipc/mod.rs:1195-1198 (trait default, new on this branch)
fn receive_blocking_timed(&mut self) -> Result<(Message, u64)> {
    let msg = self.receive_blocking()?;
    Ok((msg, get_monotonic_time_ns()))
}

The other four transports timestamp between raw I/O and
deserialization. This makes standalone server one-way latency for
PMQ include deserialization time while TCP/UDS/SHM do not.

Note: This is a gap in the scope of this branch's changes
(4 of 5 transports got overrides), not a bug in pre-existing code.

Suggestion: Add a receive_blocking_timed override to
BlockingPosixMessageQueue for consistent methodology.


10. No integration tests for standalone mode

Severity: Minor (coverage gap)
File: tests/ directory

The tests/ directory contains no references to standalone,
--server, or --client. All standalone test coverage is via
in-module unit tests in standalone_server.rs and
standalone_client.rs. There are no end-to-end integration tests
that spawn separate server and client processes (the actual use case).

In particular, SHM and PMQ standalone paths have no happy-path
test coverage beyond effective_concurrency unit tests.

Suggestion: Add at least one integration test per mechanism that
spawns --server in a background process and connects with --client.


11. Test comment contradicts assertion

Severity: Nit (but confusing for reviewers)
File: src/standalone_server.rs lines 2793–2797

// standalone_server.rs:2793-2797
// Should return an error (deserialization failure) but not panic
let result = handle_client_connection(transport.as_mut(), &cfg);
// The receive loop exits on error, returning the collector
// (which may have 0 messages recorded)
assert!(result.is_ok());

The comment says "should return an error" but the assertion checks
is_ok(). The actual behavior is correct (the function returns
Ok(MetricsCollector) even when receive fails because it returns the
collector), but the comment is misleading.

Suggestion: Fix the comment to: "Receive loop exits on error and
returns an Ok(collector) with 0 messages recorded."


Summary

# Issue Severity Action
1 Infinite server hang with no clients Major Add idle timeout
2 Swallowed canary/shutdown send errors Major Log canary failures
3 Concurrent streaming not wired Major Wire or document
4 Extra mechanisms silently ignored Minor Warn if len > 1
5 Silent receive error exit Minor Add debug log
6 Concurrent mode skips warmup Minor Fix metadata or add warmup
7 Two-phase reconnection in concurrent Minor Document
8 No signal handling — resource leak on kill Minor Document or add handler
9 PMQ missing receive_blocking_timed Minor Add override for fairness
10 No integration tests for standalone Minor Add E2E tests
11 Test comment contradicts assertion Nit Fix comment

sberg-rh and others added 5 commits April 23, 2026 13:49
- Add 60s idle timeout to prevent server hang when no client connects
- Log canary/shutdown send errors (warn for canary, debug for shutdown)
- Wire concurrent streaming: workers collect MessageLatencyRecord and
  stream through ResultsManager after completion
- Reject multiple mechanisms in standalone mode with clear error
- Add debug logging on server receive errors for diagnostics
- Add per-worker warmup in concurrent mode (fixes metadata mismatch)
- Document two-phase reconnection behavior in concurrent mode
- Add SIGINT/SIGTERM signal handler via ctrlc crate for graceful
  server shutdown and resource cleanup
- Add receive_blocking_timed override for PMQ transport to exclude
  deserialization from one-way latency measurement
- Add 11 integration tests spawning separate server/client processes
- Fix misleading test comment on garbage input handler

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Drop ctrlc "termination" feature (requires Rust 1.75+ due to nix
  static zeroed); SIGINT still handled, SIGTERM uses default OS behavior
- Remove unused BlockingTcpSocket import in canary failure test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ctrlc 3.5+ uses std::mem::zeroed() in statics which requires Rust 1.75+.
Version 3.4.5 does not have this issue and supports Rust 1.69+.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests used "tcp uds" but UDS is not a valid mechanism on Windows,
causing clap to reject the args before reaching our validation code.
Changed to "tcp shm" which exists on all platforms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test hardcoded /tmp/ which doesn't exist on Windows. Use
std::env::temp_dir() for cross-platform compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

📈 Changed lines coverage: 80.40% (1042/1296)

🚨 Uncovered lines in this PR

  • src/ipc/mod.rs: 1198-1200
  • src/ipc/posix_message_queue_blocking.rs: 474-475, 477-478, 484, 486-491, 493-495, 497-498, 503, 505-506, 508-509
  • src/ipc/shared_memory_blocking.rs: 759-760, 762-763, 770, 787, 789-790, 792
  • src/ipc/shared_memory_direct.rs: 652, 657-658
  • src/ipc/tcp_socket_blocking.rs: 322-325
  • src/ipc/unix_domain_socket_blocking.rs: 363-366
  • src/main.rs: 794
  • src/standalone_client.rs: 40, 68-69, 79, 84-85, 87, 96-99, 109-111, 113, 117, 133, 154, 255, 275, 319, 360, 364, 395, 413, 496, 514, 525, 532, 597, 600-601, 609-629, 631-632, 634, 658, 665, 681, 733-735, 740-741, 829, 849, 865, 893, 896-897, 934, 938, 969, 973, 987, 994, 1065, 1083, 1094, 1101, 1104, 1161, 1164-1165, 1173-1193, 1195-1196, 1198, 1222, 1229, 1232, 1244
  • src/standalone_server.rs: 138, 161, 187, 204-205, 207, 217-219, 221, 260, 284-285, 308-310, 334-335, 417-418, 421-422, 442-443, 453-454, 457, 459, 461, 471-473, 480, 507, 536-537, 552-553, 563-564, 567, 569, 571, 581-583, 590, 640, 719-720, 738-740, 744-746, 793-795, 799-800, 803-804, 820-821, 828-830, 839, 844-845, 848, 850, 852, 862, 885, 912-914, 918-919, 931-932, 939-941, 950, 955-956, 959, 961, 963, 973
  • src/utils.rs: 239-242

📊 Code Coverage Summary

File Line Coverage Uncovered Lines
src/benchmark.rs 83.64%
(506/605)
75, 78, 89, 93, 102, 105, 107, 124, 424, 429-434, 441-446, 513-516, 621, 705, 711-713, 717-719, 739, 808-810, 815, 836, 841, 859, 965, 969-972, 974, 983-986, 988, 1064, 1095, 1098, 1100-1101, 1110-1111, 1253, 1266, 1283, 1406, 1415, 1417-1418, 1421-1422, 1428-1429, 1434-1435, 1437, 1442-1443, 1447-1449, 1454-1455, 1458-1459, 1491, 1549-1553, 1555-1561, 1563, 1566, 1723, 1738
src/benchmark_blocking.rs 73.50%
(319/434)
97, 111, 127, 263, 369, 375-377, 380-382, 402, 434, 488, 587, 600, 614, 644-647, 732-735, 754, 758, 773, 815-817, 820, 823-825, 827, 830, 832-836, 838-839, 847-851, 853-857, 860-861, 865-866, 901, 950, 1029, 1040, 1070, 1073, 1138-1143, 1145, 1200-1203, 1208, 1221, 1224-1227, 1231, 1233-1236, 1238, 1240-1241, 1243-1244, 1247, 1249-1254, 1256, 1260-1261, 1263, 1265, 1289, 1301-1306, 1308, 1328-1331
src/cli.rs 92.39%
(85/92)
691, 790, 830, 832, 853-855
src/execution_mode.rs 100.00%
(14/14)
``
src/ipc/mod.rs 62.67%
(47/75)
115, 425, 427-430, 740-741, 756-757, 775-776, 807, 810, 813, 818, 845-846, 860, 862, 882, 884, 1007-1009, 1198-1200
src/ipc/posix_message_queue.rs 46.09%
(59/128)
139-140, 213-215, 217, 224, 229, 332-335, 337, 345, 437, 441-442, 446, 449-452, 454-458, 539, 679, 782, 789-790, 807-808, 819-820, 831-832, 849-850, 906, 910-911, 914-919, 921-923, 927, 929-931, 933, 935-937, 941-943, 945-947, 994-995, 1017
src/ipc/posix_message_queue_blocking.rs 72.16%
(127/176)
172, 182, 221, 251-255, 274, 325, 368, 387-390, 416-418, 422-423, 425-426, 436, 455, 457-458, 460-461, 474-475, 477-478, 484, 486-491, 493-495, 497-498, 503, 505-506, 508-509
src/ipc/shared_memory.rs 69.36%
(163/235)
61, 141, 145, 246-247, 257-258, 262, 390-391, 417-419, 421, 439-441, 443-444, 446-450, 467, 474, 480, 483-484, 488, 492, 496-497, 502-503, 666-667, 670-671, 674, 676, 681-682, 709-710, 713-714, 721-723, 725, 727-732, 734-735, 738-739, 741-745, 752, 782, 784-785, 787, 791
src/ipc/shared_memory_blocking.rs 75.91%
(208/274)
175, 196-198, 200-201, 204-206, 209-210, 212, 217, 219, 223-225, 230, 238-240, 243-245, 248-249, 251, 254, 257-258, 261-262, 266-267, 269, 273-274, 276, 311-312, 378-379, 403-407, 498, 506, 556, 573, 660, 726, 759-760, 762-763, 770, 787, 789-790, 792, 825, 834, 844, 866
src/ipc/shared_memory_direct.rs 82.42%
(150/182)
372-375, 444-451, 455, 482, 506-509, 513-514, 556-557, 569, 598, 605-606, 629-630, 636, 652, 657-658
src/ipc/tcp_socket.rs 59.43%
(63/106)
31-32, 61, 96, 113-114, 118, 124-125, 129, 136-137, 141, 147-148, 152, 171-172, 175-177, 184-185, 188, 362-363, 366-367, 370-371, 376-377, 422, 429, 447-449, 478, 480-482, 484, 487
src/ipc/tcp_socket_blocking.rs 94.23%
(98/104)
145, 170, 322-325
src/ipc/unix_domain_socket.rs 59.43%
(63/106)
29-30, 58, 93, 103, 122-123, 127, 133-134, 138, 145-146, 150, 156-157, 161, 180-181, 184-186, 193-194, 197, 346-347, 350-351, 354-355, 360-361, 412-414, 443, 445-447, 449, 452, 468
src/ipc/unix_domain_socket_blocking.rs 92.06%
(116/126)
287-288, 294-296, 298, 363-366
src/logging.rs 100.00%
(13/13)
``
src/main.rs 45.56%
(159/349)
85-87, 89, 130-131, 141-145, 149-151, 153-154, 156-157, 177-180, 204-208, 216, 222, 225, 230-233, 238-239, 245, 251, 253-255, 257, 263-264, 270, 275, 278-279, 283, 285-286, 290-291, 293, 299, 303-304, 306-311, 313-314, 317, 326, 329-330, 333, 380-383, 390, 392-396, 399-402, 404-405, 407-408, 410, 412-418, 422, 424-427, 430, 434-436, 440, 442, 445, 449, 454-457, 463-464, 470-471, 477, 479-480, 484, 486, 491-493, 497, 500-501, 503-504, 509, 511-513, 517-518, 520, 527, 532-533, 535-540, 542-543, 547, 556, 559-560, 563, 565, 584, 591, 595-597, 599, 627-628, 636, 669, 720, 724, 727-730, 786-787, 794-795, 798, 825-826, 829, 881-882, 886-889, 911, 938, 947, 952, 957-958
src/metrics.rs 84.04%
(158/188)
455-460, 493-494, 552, 558, 579-582, 732-734, 736, 768, 833, 838, 881, 904, 952, 980, 984, 1005, 1007-1008, 1013
src/results.rs 56.85%
(253/445)
726, 735-737, 739-740, 743-744, 747, 769, 772-773, 776, 778, 781, 785-790, 800-801, 804-809, 826, 838-839, 841, 843, 846-847, 849, 853, 880, 904-906, 909-910, 914-916, 919, 945, 950, 955, 961, 980, 982-983, 985, 987-991, 993, 995-996, 1030, 1071-1072, 1075, 1081-1082, 1086, 1090-1092, 1094-1095, 1119-1123, 1126-1129, 1132-1139, 1149-1150, 1169-1170, 1172-1176, 1178, 1195-1196, 1198-1203, 1205, 1223, 1225-1230, 1248, 1251, 1267-1268, 1283-1285, 1287-1289, 1291-1292, 1294-1295, 1297-1298, 1300, 1302-1303, 1305-1308, 1310-1312, 1314-1316, 1319, 1323-1324, 1332-1337, 1339-1340, 1344-1345, 1349-1351, 1353, 1357-1358, 1367-1370, 1374-1376, 1380, 1382, 1385, 1390-1391, 1396, 1403-1407, 1409, 1607-1608, 1828-1829, 1831-1832
src/results_blocking.rs 95.48%
(296/310)
489-490, 492-493, 544, 769, 774, 779, 815, 818-819, 827-828, 886
src/standalone_client.rs 84.49%
(610/722)
40, 68-69, 79, 84-85, 87, 96-99, 109-111, 113, 117, 133, 154, 255, 275, 319, 360, 364, 395, 413, 496, 514, 525, 532, 597, 600-601, 609-629, 631-632, 634, 658, 665, 681, 733-735, 740-741, 829, 849, 865, 893, 896-897, 934, 938, 969, 973, 987, 994, 1065, 1083, 1094, 1101, 1104, 1161, 1164-1165, 1173-1193, 1195-1196, 1198, 1222, 1229, 1232, 1244
src/standalone_server.rs 80.46%
(383/476)
138, 161, 187, 204-205, 207, 217-219, 221, 260, 284-285, 308-310, 334-335, 417-418, 421-422, 442-443, 453-454, 457, 459, 461, 471-473, 480, 507, 536-537, 552-553, 563-564, 567, 569, 571, 581-583, 590, 640, 719-720, 738-740, 744-746, 793-795, 799-800, 803-804, 820-821, 828-830, 839, 844-845, 848, 850, 852, 862, 885, 912-914, 918-919, 931-932, 939-941, 950, 955-956, 959, 961, 963, 973
src/utils.rs 75.00%
(42/56)
143, 147-148, 153, 159, 198-202, 239-242
Total 75.38%
(3932/5216)

@mcurrier2
Copy link
Copy Markdown
Collaborator

A few more things to look at Shawn..

PR Review Findings: feat/standalone-client-server

#gpt-5.3-codex review

Date: 2026-04-23
Repository: /home/mcurrier/auto/work/rusty-comms

Summary

This report now contains only findings that are unique relative to
pr110-review-findings.md in the same directory.

Findings

1) Critical: Server can exit between one-way and round-trip phases

  • In concurrent standalone client mode, one-way workers complete first, then
    round-trip workers are started.
  • Multi-accept server exits when grace period elapses and all handlers are
    finished.
  • Result: server may stop before round-trip workers connect.

Impacted paths:

  • src/standalone_client.rs (sequential one-way then round-trip concurrent phases)
  • src/standalone_server.rs (grace-based shutdown in multi-accept loops)

2) High: Accept-loop errors are not surfaced as function errors

  • Multi-accept server loops break on non-WouldBlock accept errors.
  • After break, functions continue to join handlers and return Ok(()).
  • Operational accept failures are effectively hidden from callers.

Impacted paths:

  • src/standalone_server.rs (blocking and async multi-accept TCP/UDS paths)

3) Medium: Duration precision loss in spawned server args

  • Server process duration is passed with duration.as_secs().
  • Sub-second portions are truncated.
  • Client/server run windows can diverge in duration mode.

Impacted path:

  • src/benchmark.rs

5) Medium: Latency file writes ignore line-level write failures

  • Loop uses writeln!(...).ok() and discards errors.
  • Function contract says it returns write errors, but per-line errors are not
    propagated.

Impacted path:

  • src/main.rs (write_latency_buffer)

Assumptions

  • Standalone server is expected to stay available across both benchmark phases
    when both are enabled.
  • Result<()> from server entrypoints is expected to represent runtime failure
    signals, not just startup outcomes.

Recommended Next Steps

  1. Fix server shutdown logic to avoid exiting during expected inter-phase gaps.
  2. Propagate/return accept-loop errors instead of silently returning success.
  3. Preserve full duration precision when passing -d to spawned server process.
  4. Propagate latency file write errors in write_latency_buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants