Observe a running node as rosgraph_msgs/Node via ros2 nodl describe#83
Observe a running node as rosgraph_msgs/Node via ros2 nodl describe#83lsy3 wants to merge 17 commits into
Conversation
Stage one of the Observe -> Describe pipeline (ros-tooling#68): observe_node() fills a rosgraph_msgs/Node from the live graph - topics with actual QoS and RIHS type hash, services with honest *_UNKNOWN QoS (not observable externally), actions folded from their hidden _action/* constituents, and parameter descriptors/values via the target's parameter services with graceful degradation under a shared timeout ceiling. Tested in three layers: pure-builder unit tests (no executor), scenario graphs diffed against per-distro golden YAML files, and ROS-free rendering tests fed by those goldens. The goldens double as input fixtures for Describe (ros-tooling#53). Requires rosgraph_msgs >= 2.0.4 (Node.msg). Signed-off-by: Luke Sy <sylukewicent@gmail.com>
Observes the target node via nodl_observe, prints the rosgraph_msgs/Node as YAML to stdout (or -o FILE with the format inferred from the .yaml/.yml/ .json extension), and publishes it latched (reliable, transient_local, keep_last(1)) on /nodl/observed_node. Publish-once semantics: delivery to currently-matched subscribers is confirmed via wait_for_all_acked, bounded by --timeout; the latched history dies with the publisher, so consumers must subscribe before the verb runs - locked in by the smoke test. --no-params skips the only part of observation that contacts the target. Signed-off-by: Luke Sy <sylukewicent@gmail.com>
nodl_observe's tests are distro-aware: golden files live under test/expected/<ROS_DISTRO>/ (jazzy committed), and every test module skips cleanly on distros whose rosgraph_msgs predates 2.0.4 (no Node.msg), so the existing humble..rolling matrix stays green and flips tests on per distro as the rosgraph_msgs sync lands. Signed-off-by: Luke Sy <sylukewicent@gmail.com>
Signed-off-by: Luke Sy <sylukewicent@gmail.com>
__init__ now just re-exports the public surface (observe_node, NodeNotFoundError, latched_qos); the graph polling and endpoint collection move to _observe, matching the _-prefixed private-module pattern already used for _endpoints/_parameters/_qos. No behaviour change. Signed-off-by: Luke Sy <sylukewicent@gmail.com>
Adds rmw_cyclonedds_cpp alongside rmw_fastrtps_cpp in CI (distro x RMW matrix) and proves they observe QoS differently: cyclonedds propagates a remote endpoint's history policy and depth over discovery, fastrtps does not (history -> UNKNOWN, depth -> 0). Goldens are now resolved most-specific-first under expected/<distro>/<rmw>/, falling back to a shared expected/<distro>/ when RMWs agree, so identical sets are stored once. Commit a single YAML golden per case (the canonical, human-readable form); the JSON renderer is proven by an equivalence test instead of a duplicate JSON golden. The one RMW-divergent assertion is now keyed by RMW so each middleware's behaviour is locked in explicitly. Signed-off-by: Luke Sy <sylukewicent@gmail.com>
Node.msg ships on the jazzy line (>= 2.0.4) but is not yet in kilted (rosgraph_msgs 2.3.1 = Clock only) or current rolling, so version numbers are not comparable across distros. The package builds regardless; the import guard skips observation tests where Node.msg is absent. Signed-off-by: Luke Sy <sylukewicent@gmail.com>
The graph messages landed in rolling first and were backported to jazzy (the only distro shipping them in a pullable image today). The earlier 'rolling lacks them' note was an artifact of probing the EOL Ubuntu 24.04 rolling image (frozen at 2.4.4); live rolling/lyrical run on Ubuntu 26.04 images that are not published yet. Kilted's backport release simply has not been cut (2.3.1 ships Clock only). Signed-off-by: Luke Sy <sylukewicent@gmail.com>
The RMW axis is now driven entirely by the CI matrix 'rmw:' list: the install step derives each apt package name (rmw_x_cpp -> ros-<distro>-rmw-x-cpp), the golden resolver is already (distro, RMW)-keyed, and the per-RMW history-over-discovery expectation is a documented _HISTORY_OVER_DISCOVERY map. Adding an RMW is 'drop in goldens' -- the harness needs no per-RMW setup (every scenario runs in one process / one session). Also closes the silent-skip trap: observation tests importorskip when rosgraph_msgs lacks Node.msg, which on a distro that *should* support observe reads as green-having-tested-nothing. A best-effort step pulls Node.msg from ros2-testing where it leads the main index (e.g. jazzy 2.0.4), and a matrix-gated assertion (requires_node_msg) fails the leg loudly if Node.msg is still missing where it is required. Signed-off-by: Luke Sy <sylukewicent@gmail.com>
Zenoh slots in with no harness change -- scenarios run in one process / one session, so it discovers without a router daemon. Behaviourally it matches cyclonedds (propagates history and depth over discovery; identical endpoint set and type hashes to the other RMWs). Adds its (jazzy) goldens and its _HISTORY_OVER_DISCOVERY entry. Signed-off-by: Luke Sy <sylukewicent@gmail.com>
Node.msg has now landed across jazzy (2.0.4), kilted (2.3.2), lyrical (2.4.5) and rolling (2.5.0) -- all via the best-effort ros2-testing install where it leads the main index -- so all four are flagged requires_node_msg and run the observation suite (humble still skips; no Node.msg yet). Empirically, every (distro, RMW) observes the same Node *except* two gaps: cyclonedds reports a KEEP_ALL queue's depth as 0 (every distro), and jazzy's older fastrtps drops history/depth entirely (kilted-onward fastrtps does propagate). Goldens are deduplicated to match: a single _base/ holds the full observation, with overrides only where a combination differs -- rmw_cyclonedds_cpp/s2 and jazzy/rmw_fastrtps_cpp/. The resolver searches <distro>/<rmw>/ -> <rmw>/ -> _base/. Nine files now cover 4 distros x 3 RMWs x 4 scenarios (verified green on noble and resolute). Drops the _HISTORY_OVER_DISCOVERY assertion map: history/depth is (distro, RMW)-specific (fastrtps differs by distro), and the per-combination golden already locks it exactly. Signed-off-by: Luke Sy <sylukewicent@gmail.com>
Node.msg has now reached every distro including humble (rosgraph_msgs 1.2.3 via ros2-testing), so the package gets exercised on pre-Iron rclpy for the first time -- and it did not import there. Two defensive fixes make it import-safe everywhere: map the BEST_AVAILABLE QoS enum only where rclpy defines it (added in Iron), and read TopicEndpointInfo.topic_type_hash via getattr (REP-2011, Iron+) so an absent hash is simply left unset like a service's. Full observation still needs Iron+ (type hashes, BEST_AVAILABLE, and an int32-safe infinite QoS deadline that humble's builtin_interfaces overflows on), so the observation/rendering tests and the describe smoke tests are capability-gated to Iron+ (BEST_AVAILABLE presence as the proxy). Humble's CI leg now builds, runs the pure-argument and serialization tests, and skips the rest -- green instead of crashing. Full pre-Iron support is tracked as a working-group follow-up. No change on Iron+ (jazzy 154/0). Signed-off-by: Luke Sy <sylukewicent@gmail.com>
|
@emersonknapp a quick note on scope. The feature is kept minimal and faithful to #68 — the observer and the I'd prefer to keep the tests alongside the implementation they validate, but I can split the multi-RMW/distro coverage into a follow-up if you'd rather a leaner first pass — the commits are already separated for it. |
WG review - 9 June 2026@emersonknapp feel free to add if i missed anything but below is a summary of related points raised during the WG meeting
|
Replace the ament_python nodl_observe with an ament_cmake C++ package: a reusable observe_node(...) library plus an `observe` executable that latch-publishes the observed rosgraph_msgs/Node on /nodl/observed_node. The Python implementation is kept locally as an untracked reference. Addresses the WG review on this PR (points ros-tooling#2-ros-tooling#5 of plan_observe_cpp.md). - Port the pure builders 1:1 (QoS enum mapping, topic/service endpoints with REP-2011 type hash, action folding, parameter pairing, FQN split), with gtest unit tests mirroring the Python tests. - Actions use the rcl_action C API directly (no rclcpp_action wrapper). - Parameters via AsyncParametersClient driven by a short-lived executor, with graceful degradation on an unresponsive target (covered by a test). - Canonicalise infinite/overflow QoS durations to {INT32_MAX, 0} uniformly on every distro -- CDR-valid for MCAP and fixes Humble's int32 overflow. - Humble (pre-Iron) is a supported, tested runtime target: the type hash and BEST_AVAILABLE QoS enum are compiled out via ROS2_<DISTRO>; the message stays structurally identical, differences live only in unfilled fields. - Replace the ~2k-line YAML goldens with MCAP fixtures (one per (distro, RMW), most-specific-first resolver: <distro>_<rmw> -> <rmw> -> base) plus a human-readable print/diff helper. Verified field-for-field parity with the rclpy output (only the duration sentinel differs). - Rewire `ros2 nodl describe` to shell out to the `observe` binary and render via rosidl_runtime_py. - CI: one job per distro, build once, re-run the RMW-sensitive integration test over fastrtps/cyclonedds/zenoh; drop requires_node_msg. Validated in Docker on humble, jazzy, kilted, and lyrical (build + gtest unit tests + integration across each distro's RMWs). Signed-off-by: Luke Sy <sylukewicent@gmail.com>
- ARCHITECTURE.md: layered data-flow diagram, module table, and the observe_node step-by-step for contributors. - mcap_fixtures.py: add node_to_json + 'print -f yaml|json' so the fixture viewer matches the verb's -o output (was YAML only). Signed-off-by: Luke Sy <sylukewicent@gmail.com>
…_msgs on humble - rosidl_runtime_py was transitive via the old Python nodl_observe; the C++ rewrite dropped it, but the integration test (mcap_fixtures.py) and the describe verb still use it -> declare it (+ rclpy, ament_index_python for the verb, which the C++ lib no longer provides transitively). - humble's rosgraph_msgs ships Node.msg only via ros2-testing; the bridge used --only-upgrade, a no-op when the package isn't pre-installed (the rostooling image), so rosdep then pulled the main version without Node.msg and the C++ build failed. Plain install + apt-mark hold instead. Signed-off-by: Luke Sy <sylukewicent@gmail.com>
- Bridge silently no-op'd on humble: the keyring glob ros2*archive-keyring missed the actual ros-archive-keyring.gpg, so '[ -n key ] || exit 0' bailed and rosdep then installed main rosgraph_msgs (no Node.msg). Broaden the glob and fall back to [trusted=yes] instead of skipping. - Per-RMW integration steps passed the test but 'colcon test-result --all' defaulted --test-result-base to 'build'; the action-ros-ci workspace is 'ros_ws/build'. Point it there. Signed-off-by: Luke Sy <sylukewicent@gmail.com>
The CI image (rostooling/setup-ros-docker) installs ros2cli but not the ros2run package, so 'ros2 run rmw_zenoh_cpp rmw_zenohd' errors with "invalid choice: 'run'"; the zenoh router never starts, cross-process discovery fails, and every observation times out (this is why jazzy/kilted/ lyrical/rolling failed their zenoh leg while humble -- which has no zenoh -- passed). Locate and exec the rmw_zenohd binary directly. Validated end-to-end in the actual rostooling images: jazzy (fastrtps/cyclonedds/zenoh all 14 passed) and humble (fastrtps/cyclonedds). Signed-off-by: Luke Sy <sylukewicent@gmail.com>
|
@emersonknapp this is ready for review. Recap: CI is green across all five distros — humble, jazzy, kilted, lyrical, rolling — each over its RMWs (fastrtps / cyclonedds, plus zenoh on Iron+). One open question I'd value your take on (you wrote |
Implements "Observe" (#68): observe a running node → its runtime
rosgraph_msgs/Node, published latched on/nodl/observed_nodeand rendered byros2 nodl describe. TheNodemessage is the contract that "Describe" (#53) consumes. Validated across humble, jazzy, kilted, lyrical, rolling × {fastrtps, cyclonedds, zenoh} with gtest unit + pytest integration layers.Important
Updated per the WG review:
nodl_observeis now C++ (ament_cmake), not Python — covering review points #2–#5 (Humble support, MCAP fixtures, C++ reimpl, one-container RMW testing). The Python impl is kept locally as an untracked reference; the verb keeps its CLI and shells out to a C++observebinary.CLI
NODE_NAMEFQN (hidden nodes work);--timeout 5.0(discovery + param ceiling);--no-paramsskips the only contact with the target;--topic /nodl/observed_node;-oformat inferred from.yaml/.yml/.json(default stdout YAML). Output is the renderedNode, not yet a NoDL document — that's #53, behind this same verb.Architecture
nodl_observe::observe_node(node, fqn, opts)uses a caller-provided node, never spins its own; pure builders underneath, so graph-monitor can link the library directly. Anobserveexecutable wraps it (observe + latch-publish).ros2 nodl describespawns the binary, receives the latchedNode, renders withrosidl_runtime_py. The serialized message is the only language boundary (no pybind).Design decisions
/rosout, param services, …); filtering is "Describe": convert observed state to NoDL document #53's call. Schema exception:<action>/_action/*constituents fold into theirAction; orphans stay flat.rcl_actionC API — norclcpp_actionwrapper for the action graph (issue below).{INT32_MAX, 0}uniformly — the rmw sentinel overflowsDuration.sec(int32) and won't round-trip CDR/MCAP; this is valid, cross-distro-identical, and fixes Humble.*_UNKNOWN— no info-by-service API in rclcpp/rmw; honest-unknown over plausible-wrong. @emersonknapp input welcome.Humble (pre-Iron): supported + tested
Message-identical to Iron+; the REP-2011 type hash and
BEST_AVAILABLEenum are compiled out (ROS2_<DISTRO>), so on Humble the type hash is left unset (like services) — differences live only in unfilled fields. Built and observation-tested with its own fixtures, no longer gated out.Testing
observebinary, compared field-by-field vs committed MCAP fixtures. The can't-observe limits are test-locked (services created with explicit QoS still read*_UNKNOWN).MCAP fixtures (replacing the YAML goldens) — one
.mcapper(distro, RMW), resolver<distro>_<rmw>→<rmw>→base, collapsing to 5 files:base.mcaprmw_cyclonedds_cpp.mcapKEEP_ALLdepth→0(every distro)jazzy_rmw_fastrtps_cpp.mcaphistory/depthhumble_rmw_{fastrtps,cyclonedds}_cpp.mcaptest/mcap_fixtures.py print\|diffis a human-readable viewer; regen withREGEN_FIXTURES=1. Everything else (reliability, durability, deadline, type name, RIHS hash, structure, folding, service*_UNKNOWN) is byte-identical, so kilted/lyrical/rolling resolve tobase.mcapwith no override. zenoh is Iron+ only (no Humble package); its CI leg startsrmw_zenohd.Note
In C++ a missing
rosgraph_msgs/Node.msgis a build failure, not a silent skip — sorequires_node_msgis gone. CI still best-effort pulls the message fromros2-testingand installsmcapso the comparison actually runs.Upstream issues to file
rmw_get_*_info_by_serviceon any RMW.rcl_action_*_get_names_and_types[_by_node]not wrapped inrclcpp_action— request parity.Closes #68