Skip to content

feat: run select commands while a node is running [skip-line-limit]#1446

Merged
ryardley merged 35 commits into
mainfrom
ry/1406-socket-serialize
Mar 22, 2026
Merged

feat: run select commands while a node is running [skip-line-limit]#1446
ryardley merged 35 commits into
mainfrom
ry/1406-socket-serialize

Conversation

@ryardley

@ryardley ryardley commented Mar 18, 2026

Copy link
Copy Markdown
Contributor

Closes #1406

This allows us to run select commands during enclave start.

Commands are given with this enum and are generated by using the cli with the appropriate command:

pub enum RemoteCommand {
    NetGetPeerId,
    CiphernodeStatus { chain: ChainArgs },
    NoirStatus,
    WalletGet,
    Rev,
    PrintEnv { vite: bool, chain: String },
}

Commands are serializable over json. Eg.

enclave wallet get

will send this

{"name":null,"otel":null,"quiet":false,"config":null,"verbose":0,"command":"WalletGet"}

over tcp://127.0.0.1:50505

Currently I send all the other noise over (otel,quiet etc.) despite not really doing anything with it - we might want to look into using this later.

We can add new commands as required. So far I have only added get commands as changing the database while a node is running could lead dto data corruption.

I anticipate we might want to setup a command in the future to stream events for external consumption for example to get events from the eventstore with a cursor. I exposed a shared eventlog when running ciphernodes to aid in this.

This PR also fixes a bug where quic disconnect messages were not being sent on shutdown which was causing issues when trying to reconnect to nodes.


To test:

enclave start

then in another terminal

enclave noir status
=== ZK Prover Status ===

Barretenberg (bb):
  Path: /nix/store/x099liwblzgzmsadbyxzh17iibz3dxwj-bb/bin/bb
  Installed

Circuits:
  Path: /home/user/.config/enclave/.enclave/noir/circuits
  Version: 0.1.15
  Installed

Status: Ready
image

Done

  • thread console through all commands to be able to forward output
  • setup socket server
  • share datastore

Required for dashboard.

  • share eventstore to allow for event sharing
  • convert from unix socket to TCP socket for eventual remote execution (not now)

Tidy up

  • ctrl_port configuration

Summary by CodeRabbit

  • New Features

    • Local socket client/server for remote CLI execution.
    • Async Console with streamed output handles and a log! macro for consistent CLI output.
    • Global shared caches to reuse datastore and eventstore connections across commands.
  • Refactor

    • CLI output unified to use the Console and threaded through subcommands.
    • Node startup can enable sharing store/eventstore for socket-driven commands.
  • Chores / Config

    • Per-node configurable ctrl_port added; docs warn it is local-only.

@vercel

vercel Bot commented Mar 18, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
crisp Ready Ready Preview, Comment Mar 21, 2026 11:31pm
enclave-docs Ready Ready Preview, Comment Mar 21, 2026 11:31pm

Request Review

@coderabbitai

coderabbitai Bot commented Mar 18, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a local TCP control socket and console abstraction, threads Console through CLI commands, introduces global OnceLock caches for DataStore and EventStoreReader, adds builder flags to share store/eventstore across processes, updates workspace manifests and Dockerfiles, and wires socket-based remote command dispatch into start/main flows.

Changes

Cohort / File(s) Summary
Workspace & Builds
Cargo.toml, crates/Dockerfile, examples/CRISP/server/Dockerfile
Added crates/console and crates/socket-server to workspace and Docker build copies; updated workspace dependencies and references.
Console crate
crates/console/Cargo.toml, crates/console/src/lib.rs
New crate exposing Console/ConsoleHandle, async sinks (stdout/writer/channel), log! macro and flush semantics used across CLI.
Socket-server crate
crates/socket-server/Cargo.toml, crates/socket-server/src/lib.rs
New TCP helpers: connect_socket(), run_on_socket(), start_socket_server() for localhost control-plane RPC over newline-delimited JSON.
Global caches
crates/ciphernode-builder/src/global_store_cache.rs, crates/ciphernode-builder/src/global_eventstore_cache.rs, crates/ciphernode-builder/src/lib.rs
Added OnceLock-based singletons and accessors: share_store/get_cached_store, share_eventstore_reader/get_shared_eventstore; exported new modules.
Ciphernode builder & event system
crates/ciphernode-builder/src/ciphernode_builder.rs, crates/ciphernode-builder/src/event_system.rs
Added global_shared_store/global_shared_eventstore flags and builder methods; unified eventstore access via eventstore_reader(), conditionally shares reader/store when flags set.
Entrypoint helpers & start flows
crates/entrypoint/src/helpers/datastore.rs, crates/entrypoint/src/helpers/mod.rs, crates/entrypoint/src/helpers/shutdown.rs, crates/entrypoint/src/start/*.rs
Added get_in_mem_store() and get_eventstore_reader() helpers; short-circuited to cached store; removed standalone shutdown helper; start flows now use .with_shared_store()/.with_shared_eventstore() and removed explicit backend.ensure_installed() calls.
CLI core & remoting
crates/cli/src/main.rs, crates/cli/src/cli.rs, crates/cli/Cargo.toml, crates/cli/src/start.rs
Plumbed Console through CLI, added RemoteCli/RemoteCommand + conversions, main attempts socket connect and remote dispatch via run_on_socket, start launches local socket server and adjusted graceful shutdown flow.
CLI commands
crates/cli/src/** (ciphernode/, password, wallet*, net*, noir.rs, rev.rs, print_env.rs, etc.)
Threaded Console into command handlers, replaced println! with log!(out, ...), added Clone/Serde derives and updated function signatures to accept Console.
Config & templates
crates/config/src/app_config.rs, examples/*, templates/*, tests/*
Added ctrl_port to NodeDefinition with accessor and defaults (50505); updated example/template/test enclave configs to include per-node ctrl_port.
Docs & CI
docs/pages/ciphernode-operators/running.mdx, .github/workflows/ci.yml
Documented local CLI socket port warning and expanded CI matrix to run base and persist suites.
Misc small changes
crates/net/src/net_sync_manager.rs, crates/config/src/validation.rs, various CLI derive updates
Added retry config to DirectRequester, implemented ToString for ValidUrl, and added Clone derives across many command enums for remoting convenience.

Sequence Diagram(s)

sequenceDiagram
    participant User as User CLI
    participant Main as main.rs
    participant SocketClient as run_on_socket
    participant SocketServer as start_socket_server
    participant Handler as Socket Handler
    participant Builder as CiphernodeBuilder
    participant StoreCache as Global Store Cache
    participant EventCache as Global Eventstore Cache

    User->>Main: invoke command
    Main->>Main: create Console & parse CLI
    alt socket available & command convertible
        Main->>SocketClient: connect + write RemoteCli JSON
        SocketServer->>Handler: accept & deserialize RemoteCli
        Handler->>Builder: build(with_shared_store/eventstore)
        Builder->>StoreCache: get_cached_store()
        StoreCache-->>Builder: DataStore? (maybe)
        Builder->>EventCache: get_shared_eventstore()
        EventCache-->>Builder: EventStoreReader? (maybe)
        Handler->>Handler: execute command, stream output lines
        Handler-->>SocketClient: send output lines
    else local execution
        Main->>Main: run cli.execute(out) locally
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Possibly related PRs

  • #344 — introduces socket/console control-plane, local CLI socket and start/shutdown refactor — strong overlap.
  • #1153 — changes to event-system/eventstore accessors and reader routing — directly related to global eventstore reader changes.
  • #1239 — related refactors around event-store reader composition and caching — relevant to caching/sharing additions.

Suggested labels

ciphernode

Suggested reviewers

  • hmzakhalid
  • ctrlc03
  • 0xjei

Poem

🐰 I tapped a tiny socket bright,
Sent JSON whispers through the night,
Console hummed and logs took flight,
Shared caches hopped to help the byte,
Remote commands woke nodes polite.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the primary change: enabling select CLI commands to run while a node is executing via a socket interface.
Linked Issues check ✅ Passed The PR successfully addresses issue #1406 by implementing a TCP socket interface to run read-only commands against a running ciphernode, solving the database locking problem.
Out of Scope Changes check ✅ Passed All changes directly support the socket interface feature: Console threading for output, socket server infrastructure, datastore/eventstore sharing, config updates for ctrl_port, and command serialization.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ry/1406-socket-serialize

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
Contributor

License Header Check Failed

Some files are missing the required SPDX license header. Please add the following header to the beginning of all .js, .jsx, .nr, .rs, .sol, .ts, and .tsx files:

// SPDX-License-Identifier: LGPL-3.0-only
//
// This file is provided WITHOUT ANY WARRANTY;
// without even the implied warranty of MERCHANTABILITY
// or FITNESS FOR A PARTICULAR PURPOSE.

You can run ./scripts/check-license-headers.sh --fix locally to automatically add missing headers, then commit the changes.

Or run ./scripts/check-license-headers.sh to see which files need headers.

@ryardley ryardley force-pushed the ry/1406-socket-serialize branch from 167badb to 1cb9956 Compare March 18, 2026 06:00
@ryardley ryardley changed the title feat: run select commands while a node is running feat: run select commands while a node is running [skip-line-limit] Mar 18, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/config/src/app_config.rs (1)

79-85: ⚠️ Potential issue | 🟠 Major

Fail fast on duplicate control ports.

Every node definition that omits ctrl_port now falls back to 50505. That means existing multi-node configs can still deserialize successfully, but the second node only fails later when enclave start tries to bind the control socket. Please validate ctrl_port uniqueness during config load and return a clear error.

Possible validation in AppConfig::try_from_unscoped
         // Deliberately clobber default
         config.nodes.insert("_default".to_string(), config.node);
+
+        let mut ctrl_ports = HashMap::new();
+        for (node_name, node) in &config.nodes {
+            if let Some(previous) = ctrl_ports.insert(node.ctrl_port, node_name) {
+                bail!(
+                    "ctrl_port {} is configured for both '{}' and '{}'",
+                    node.ctrl_port,
+                    previous,
+                    node_name
+                );
+            }
+        }
 
         let Some(node) = config.nodes.get(name) else {
             bail!("Could not find node definition for node '{}'. Did you forget to include it in your configuration?", name);
         };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/config/src/app_config.rs` around lines 79 - 85, Multiple
NodeDefinition defaults set ctrl_port to 50505 which permits duplicate control
ports until runtime; update config validation in AppConfig::try_from_unscoped to
detect and fail fast on duplicate ctrl_port values during load. Iterate over the
deserialized NodeDefinition list, collect ctrl_port values (treat None as
default 50505) and return a clear error if any port appears more than once,
referencing NodeDefinition::ctrl_port and AppConfig::try_from_unscoped to locate
the logic to modify.
♻️ Duplicate comments (1)
crates/socket-server/src/lib.rs (1)

27-43: ⚠️ Potential issue | 🟠 Major

The socket protocol still can't report command failure.

At Line 39 this helper treats EOF as success and returns Ok(()). Combined with crates/cli/src/start.rs Line 69 using ?, a failing remote command can close the stream without any explicit error frame, and callers cannot distinguish that from a successful run. The protocol needs an explicit success/error status instead of relying on EOF.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/socket-server/src/lib.rs` around lines 27 - 43, run_on_socket
currently treats EOF as success because it logs all lines until EOF and returns
Ok(()); change it to expect and parse an explicit status frame first: after
sending the payload, read the first line from the reader (use the existing lines
iterator), parse it as JSON into a small Response struct (e.g. fields status:
String and optional error: String) or serde_json::Value, and if status != "ok"
return an anyhow::Error containing the remote error message; only after a
successful status should you continue streaming remaining lines to log (and if
the first read yields None, return an error like "socket closed before status").
Ensure the function (run_on_socket) returns Err on non-OK status so callers
using ? can detect remote command failure.
🧹 Nitpick comments (1)
.github/workflows/ci.yml (1)

475-476: Update the stale matrix comment.

The comment above still says the base suite is removed, but this matrix now runs both base and persist. Keeping those in sync avoids a false lead when someone debugs CI failures later.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci.yml around lines 475 - 476, The comment above the
matrix is stale; update the comment text to reflect that the CI matrix now runs
both test suites. Edit the comment that currently reads "# TODO removed base
test for now" to a concise, accurate message such as "# running both base and
persist test suites" so it matches the matrix entry "test-suite: [base,
persist]" and avoids confusion during CI debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/cli/src/start.rs`:
- Around line 63-69: The remote socket handler is calling Cli::execute which
triggers autopassword/autowallet setup; instead add a dedicated remote execution
path that avoids lifecycle setup: change the handler (where RemoteCli is parsed)
to call a new method like RemoteCli::execute_remote(out, config_result) or
Cli::execute_readonly_remote(out, config_result) instead of Cli::execute, and
implement that method in crates/cli/src/cli.rs to only perform command
dispatch/handle the remote request (no autopassword/autowallet/autowallet
prompts, no wallet creation or other write-capable setup). Ensure the new method
uses the same dispatch logic as execute but explicitly skips the
autopassword/autowallet code paths so control-socket requests remain read-only.

In `@examples/CRISP/enclave.config.yaml`:
- Around line 43-45: Duplicate YAML key "autonetkey" in the cn2 node entry makes
the mapping invalid; remove the second occurrence so only a single autonetkey
key remains alongside the ctrl_port setting (keep ctrl_port: 50502 and a single
autonetkey: true), and then validate the file with a YAML linter to ensure the
mapping is unique and parses correctly.

---

Outside diff comments:
In `@crates/config/src/app_config.rs`:
- Around line 79-85: Multiple NodeDefinition defaults set ctrl_port to 50505
which permits duplicate control ports until runtime; update config validation in
AppConfig::try_from_unscoped to detect and fail fast on duplicate ctrl_port
values during load. Iterate over the deserialized NodeDefinition list, collect
ctrl_port values (treat None as default 50505) and return a clear error if any
port appears more than once, referencing NodeDefinition::ctrl_port and
AppConfig::try_from_unscoped to locate the logic to modify.

---

Duplicate comments:
In `@crates/socket-server/src/lib.rs`:
- Around line 27-43: run_on_socket currently treats EOF as success because it
logs all lines until EOF and returns Ok(()); change it to expect and parse an
explicit status frame first: after sending the payload, read the first line from
the reader (use the existing lines iterator), parse it as JSON into a small
Response struct (e.g. fields status: String and optional error: String) or
serde_json::Value, and if status != "ok" return an anyhow::Error containing the
remote error message; only after a successful status should you continue
streaming remaining lines to log (and if the first read yields None, return an
error like "socket closed before status"). Ensure the function (run_on_socket)
returns Err on non-OK status so callers using ? can detect remote command
failure.

---

Nitpick comments:
In @.github/workflows/ci.yml:
- Around line 475-476: The comment above the matrix is stale; update the comment
text to reflect that the CI matrix now runs both test suites. Edit the comment
that currently reads "# TODO removed base test for now" to a concise, accurate
message such as "# running both base and persist test suites" so it matches the
matrix entry "test-suite: [base, persist]" and avoids confusion during CI
debugging.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 882da445-dfaa-424f-a350-04413b0a7af0

📥 Commits

Reviewing files that changed from the base of the PR and between ba007de and 445917c.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • .github/workflows/ci.yml
  • crates/cli/src/cli.rs
  • crates/cli/src/main.rs
  • crates/cli/src/start.rs
  • crates/config/src/app_config.rs
  • crates/socket-server/Cargo.toml
  • crates/socket-server/src/lib.rs
  • examples/CRISP/enclave.config.yaml
  • templates/default/enclave.config.yaml
  • tests/integration/enclave.config.yaml
✅ Files skipped from review due to trivial changes (2)
  • crates/socket-server/Cargo.toml
  • templates/default/enclave.config.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/cli/src/main.rs

Comment thread crates/cli/src/start.rs
Comment thread examples/CRISP/enclave.config.yaml Outdated
@ryardley ryardley enabled auto-merge (squash) March 21, 2026 05:29
@vercel vercel Bot temporarily deployed to Preview – crisp March 21, 2026 05:42 Inactive
@vercel vercel Bot temporarily deployed to Preview – enclave-docs March 21, 2026 05:42 Inactive

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/net/src/net_sync_manager.rs`:
- Around line 370-373: The DirectRequester builder currently uses
max_retries(10) and retry_timeout(Duration::from_secs(20)) which produces long
inter-retry backoff and can stall sync; update the DirectRequester configuration
(the DirectRequester::builder call where max_retries and retry_timeout are set)
to reduce retry_timeout back toward the default (e.g., Duration::from_secs(5))
and/or lower max_retries (e.g., 4) so per-aggregate backoff is bounded and
sequential loop delays don’t compound; adjust only the parameters on the builder
(max_retries and retry_timeout) to these safer values and leave the rest of the
builder chain intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a3197638-6962-44d4-a24c-5064b7cc71ef

📥 Commits

Reviewing files that changed from the base of the PR and between 445917c and 3d42079.

📒 Files selected for processing (2)
  • crates/net/src/net_sync_manager.rs
  • examples/CRISP/enclave.config.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/CRISP/enclave.config.yaml

Comment thread crates/net/src/net_sync_manager.rs
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@ryardley

Copy link
Copy Markdown
Contributor Author

I solved the issue with connections being closed here so this is ready for review.

@ctrlc03 ctrlc03 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!

@ryardley ryardley merged commit 559d211 into main Mar 22, 2026
30 checks passed
@ctrlc03 ctrlc03 deleted the ry/1406-socket-serialize branch March 22, 2026 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add socket interface to ciphernodes

2 participants