Skip to content

[BUG] Collision-merged peer components return empty runtime data on primary #373

@bburda

Description

@bburda

Bug report

Steps to reproduce

  1. Run two or more gateway instances (primary + one or more peers) with aggregation enabled on the primary.
  2. Declare a shared parent Component (e.g. robot-alpha) across all manifests, with each ECU declaring its own child Component via parent_component_id. This is the topology used by the multi_ecu_aggregation demo.
  3. After discovery, issue sub-resource requests through the primary, for example:
    • GET /api/v1/components/{leaf_ecu_id}/logs
    • GET /api/v1/components/{leaf_ecu_id}/hosts
    • GET /api/v1/components/{leaf_ecu_id}/data

Expected behavior

The aggregation layer treats Components with the same symmetry as Areas:

  • A leaf Component maps to exactly one ECU. When the same ID is announced by a peer, the peer owns the runtime state (data, logs, hosts, operations, faults) and the primary should transparently forward every request - detail endpoint and all sub-resources - to that peer.
  • A hierarchical parent Component (referenced as parent_component_id by any other Component in the merged set) has no runtime state of its own; it only groups its children. The primary should serve it locally with the merged view, exactly like an Area whose Components come from different peers.
  • When more than one peer legitimately or accidentally announces the same leaf ID, the primary should flag the collision to operators without taking the gateway down.

Actual behavior

EntityMerger::merge_components previously skipped inserting a routing entry for any collision-merged Component, treating the merged entity as locally owned. That assumption is wrong for leaves: with no routing entry, sub-resource requests were handled on the primary, which has no runtime state for the peer's leaf, so /logs came back empty, /hosts missed peer apps, and /data had no resources.

The naive fix (route every collision to the peer) then broke the hierarchical case: GET /components/{parent_id} forwarded to a random peer's view and the merged cache was lost. Sub-components that arrive from different peers (typical for the multi-ECU demo) compounded the problem.

Additionally, there was no operator-visible signal when two peers declared the same leaf ID, and source on the merged entity could not express the fact that several contributors fed into a single view.

Environment

  • ros2_medkit version: main (reproduces on current tip)
  • ROS 2 distro: Jazzy / Humble / Rolling
  • OS: Linux (Ubuntu 22.04 / 24.04)

Additional information

The aggregation design doc previously described "Remote-only Components get a routing table entry", which matched the implementation but did not reflect the ECU-ownership model the merge logic otherwise followed. The fix needs to classify merged Components into hierarchical parents (served locally with merged view) and leaves (routed to the owning peer), emit structured warnings on multi-peer leaf collisions, and expose provenance to clients so they can distinguish local-only, peer-only, and merged entities.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions