Skip to content

perf(sandbox): streaming SHA256 and spawn_blocking for identity resolution#555

Open
koiker wants to merge 1 commit intoNVIDIA:mainfrom
koiker:perf/tracing-instrumentation
Open

perf(sandbox): streaming SHA256 and spawn_blocking for identity resolution#555
koiker wants to merge 1 commit intoNVIDIA:mainfrom
koiker:perf/tracing-instrumentation

Conversation

@koiker
Copy link

@koiker koiker commented Mar 23, 2026

Replaces #553 (auto-closed by vouch gate)

Summary

The proxy's TOFU (Trust-On-First-Use) identity resolution performs synchronous /proc scanning and SHA256 hashing of binaries on every cold-cache network request. For large binaries like Node.js (~124 MB), this blocks the async Tokio runtime for nearly a second — stalling all concurrent connections — and allocates the entire file contents in memory just to hash them.

This PR fixes both issues:

  • Streaming SHA256: Replace std::fs::read() + Sha256::digest() (which loads the full binary into RAM) with a 64 KB buffered streaming read+hash loop. For a 124 MB binary this eliminates a 124 MB heap allocation per cold-cache check.
  • spawn_blocking wrapper: Wrap the entire evaluate_opa_tcp() call in tokio::task::spawn_blocking() in both handle_tcp_connection and handle_forward_proxy. The identity resolution does heavy synchronous I/O (/proc scanning, file hashing) that must not run on the async executor.
  • Profiling instrumentation: Add a lightweight file-based perf_log() helper that writes timestamped phase timings to /var/log/openshell-perf.log (or /tmp), providing visibility into proxy latency without depending on the tracing pipeline.

Context

Commit f88aecf ("avoid repeated TOFU rehashing for unchanged binaries") added fingerprint-based caching that made the warm path fast (0 ms TOFU, 11 ms total evaluate_opa_tcp). However:

  1. The cold path still reads the entire binary into memory before hashing — a 124 MB allocation for Node.js.
  2. The hashing and /proc I/O run synchronously on the Tokio runtime, blocking all other connections during the ~1 s cold-cache window.

Profiling Data (Node.js binary, 124 MB)

Phase Cold cache Warm cache
file_sha256 ~890 ms 0 ms (fingerprint hit)
evaluate_opa_tcp total 1002 ms 11 ms
OPA evaluation 1 ms 1 ms
DNS + TCP connect 166–437 ms 166–437 ms

Files Changed

  • crates/openshell-sandbox/src/procfs.rs — streaming SHA256 in file_sha256(), phase timing in resolve_tcp_peer_identity() and find_pid_by_socket_inode()
  • crates/openshell-sandbox/src/proxy.rsspawn_blocking wrapper around evaluate_opa_tcp() in both call sites, phase timing throughout
  • crates/openshell-sandbox/src/identity.rs — phase timing in verify_or_cache()

Test Plan

  • cargo build --release succeeds (cross-compiled for x86_64-unknown-linux-gnu)
  • Deployed to live NemoClaw sandbox and verified with curl and node requests through the proxy
  • Cold-cache: ~1 s total (dominated by SHA256 of 124 MB binary, now non-blocking)
  • Warm-cache: 11 ms total (fingerprint cache hit, unchanged from baseline)
  • No functional regressions — policy allow/deny decisions unchanged
  • cargo test -p openshell-sandbox (existing identity and procfs tests)

Signed-off-by: Rafael Koike koike.rafael@gmail.com

Made with Cursor

Key changes:
- Replace full file read + SHA256 with streaming 64KB-buffered hash
  (saves 124MB allocation for node binary)
- Wrap evaluate_opa_tcp in spawn_blocking to prevent blocking tokio
  runtime during heavy /proc I/O and SHA256 computation
- Add file-based perf logging for profiling proxy latency phases

Profiling data (node binary, 124MB):
- Cold TOFU: ~890ms (read+hash), warm: 0ms (cache hit)
- evaluate_opa_tcp: cold=1002ms, warm=11ms
- OPA evaluation: 1ms
- DNS+TCP connect: 166-437ms

Made-with: Cursor
@koiker koiker requested a review from a team as a code owner March 23, 2026 19:51
@github-actions
Copy link

github-actions bot commented Mar 23, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@koiker
Copy link
Author

koiker commented Mar 23, 2026

I have read the DCO document and I hereby sign the DCO.

@koiker
Copy link
Author

koiker commented Mar 23, 2026

recheck

Copy link
Collaborator

@johntmyers johntmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the perf_log something we need to actually ship or was this more for valiadting the improvements?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants