Skip to content

feat(mosaico_integration): Mosaico demo with 3-robot fleet variant#57

Open
mfaferek93 wants to merge 1 commit intomainfrom
feat/mosaico-m0-demo
Open

feat(mosaico_integration): Mosaico demo with 3-robot fleet variant#57
mfaferek93 wants to merge 1 commit intomainfrom
feat/mosaico-m0-demo

Conversation

@mfaferek93
Copy link
Copy Markdown

@mfaferek93 mfaferek93 commented Apr 15, 2026

Docker Compose demo showing medkit fault snapshots flowing into mosaicod as queryable sequences.

Two variants:

  • Single-robot (docker-compose.yml) - one sensor-demo + bridge. Proves the pipeline: SSE fault event to bag download to Arrow Flight ingest.
  • Fleet (docker-compose.fleet.yml) - three robots with different fault signatures (LiDAR noise, IMU failure, LiDAR drift) sharing one mosaicod. That is what makes cross-robot .Q queries actually interesting.

Bridge is a separate Python process talking Arrow Flight via the mosaicolabs SDK. mosaicod runs as the unmodified upstream image, no linking or patching.

Run it

cd demos/mosaico_integration
docker compose -f docker-compose.fleet.yml up -d
# wait ~30s for healthchecks + ring buffer prime
./scripts/trigger-fleet-faults.sh
# three sequences in mosaicod within ~45s

See README.md for the single-robot flow and architecture diagram.

Notes

  • Ring buffer widened to 15s pre + 10s post so snapshots have enough baseline for drift-vs-noise comparison.
  • LaserScanAdapter still pinned to Mosaico PR #368 commit 8e090cd until it lands in a release.

@mfaferek93 mfaferek93 force-pushed the feat/mosaico-m0-demo branch 3 times, most recently from 883b019 to 2dbc307 Compare April 15, 2026 17:09
@mfaferek93 mfaferek93 changed the title feat(mosaico_integration): Mosaico M0 demo + 3-robot fleet variant feat(mosaico_integration): Mosaico demo with 3-robot fleet variant Apr 15, 2026
@mfaferek93 mfaferek93 force-pushed the feat/mosaico-m0-demo branch 2 times, most recently from fc85c79 to 222d947 Compare April 15, 2026 19:22
Copy link
Copy Markdown
Contributor

@bburda bburda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice demo - clean layout and the fleet scenario genuinely exercises the compound .Q query. Inline comments below: a few real functional bugs (SSE resume, savefig path, silent script timeout), some internal inconsistencies, and a batch of nits.

Comment thread demos/mosaico_integration/bridge/bridge.py Outdated
Comment thread demos/mosaico_integration/bridge/bridge.py
Comment thread demos/mosaico_integration/bridge/bridge.py
Comment thread demos/mosaico_integration/scripts/trigger-fault.sh
Comment thread demos/mosaico_integration/medkit_overrides/medkit_params.yaml Outdated
Comment thread demos/mosaico_integration/notebooks/mosaico_demo.ipynb Outdated
Comment thread demos/mosaico_integration/notebooks/mosaico_demo.ipynb Outdated
Comment thread demos/mosaico_integration/README.md Outdated
@mfaferek93 mfaferek93 self-assigned this Apr 17, 2026
…as queryable Mosaico sequences

A fault fires on the simulated LiDAR, medkit confirms it and flushes
its 15 s pre-fault + 10 s post-fault ring buffer to an .mcap file.
A small Python bridge picks up the SSE event, downloads the bag from
the gateway REST API, and ingests it into mosaicod over Apache Arrow
Flight using Mosaico's own Python SDK. From docker compose up to a
queryable Sequence in mosaicod takes roughly a minute.

## Two stacks

- docker-compose.yml: one sensor-demo + one bridge (single-robot).
- docker-compose.fleet.yml: three sensor-demos (warehouse-A, warehouse-B,
  outdoor-yard) each with its own bridge, all feeding one mosaicod.

## Fleet scenario exercises both query types

All three robots fire LIDAR_SIM. Robot-02 is rotating
(IMU drift_rate = 0.3 rad/s) during its fault window, so:

- Step 1 (QueryTopic by LaserScan ontology tag) matches 3 of 3.
- Step 2 (QueryOntologyCatalog, 6-axis IMU stationarity AND) matches
  2 of 3 - robot-02 is excluded because angular_velocity.z sits at
  0.300 rad/s, outside between(-0.1, 0.1).
- Content pull on the two stationary matches shows noise signature
  on robot-01 (range_std spike 0.41 to 0.63) vs drift on robot-03
  (range_mean 2.3 to 3.5 m, range_std collapses to 0 as all beams
  saturate at sensor max).

## Bridge

- Subscribes SSE at /api/v1/faults/stream and resumes via Last-Event-ID
  on reconnect.
- Resolves the SOVD entity that owns each bag by enumerating apps +
  components and HEAD-probing /bulk-data/rosbags/{fault_code}.
  A follow-up in medkit (ros2_medkit#380) can replace the probe with
  an x-medkit SSE extension or per-entity streams.
- FAULT_CODE_ALLOWLIST env var keeps the ingested catalog scoped to
  the topic the demo compares (LIDAR_SIM). The fleet compose sets
  it so the IMU DRIFTING diagnostic that robot-02 emits alongside
  its LIDAR fault does not land as a second Mosaico sequence.
- Verifies the MCAP magic header before calling RosbagInjector so a
  race with the rosbag2 finalizer does not mask itself as an ingest
  failure. Transport errors (FlightError, OSError) are caught;
  programmer errors (AttributeError, KeyError) propagate so SDK drift
  surfaces loudly.
- Sequence name includes robot_id so fleet runs cannot collide when
  two robots hit the same event_id in the same wall-clock second.

## Mosaico SDK pin

PR #368 (ROS adapters for the futures ontology) merged on 2026-04-13
as commit b3867be. The subsequent mosaicolabs==0.3.2 PyPI wheel is
missing the futures subpackage from the distributed artifact, so the
bridge Dockerfile installs from the upstream repo at b3867be until
a packaging-fixed release ships. Swap for pip install
mosaicolabs==<fixed_version> when it lands.

## Snapshot contents and storage

Four topics captured: /sensors/scan (LaserScan 10 Hz),
/sensors/imu (Imu 100 Hz), /sensors/fix (NavSatFix 1 Hz), /diagnostics.
/sensors/image_raw (30 Hz raw camera) is intentionally excluded from
snapshot capture - that single topic would dominate the bag at
~27 MB/s; drop in a CompressedImage topic when vision forensics are
needed. Bag size is ~2 MB per 25 s snapshot; 24/7 recording of the
same four topics would be ~6 GB/robot/day, so at ~5 confirmed
faults/robot/day the smart-snapshot catalog stays at ~10 MB/robot/day.

## License-safe

mosaicod runs as the unmodified upstream Docker image. The bridge is
a separate Python process speaking the public Apache Arrow Flight
protocol via Mosaico's own SDK. We never link or modify mosaicod or
its Rust crates.

## Verified end-to-end

After ./scripts/trigger-fleet-faults.sh: three LIDAR_SIM sequences
land in mosaicod with distinct robot_id metadata, QueryTopic matches
3, compound IMU .Q returns 2 (robot-02 excluded by measured
angular_velocity.z mean = 0.300 rad/s), noise and drift range
statistics are visible in the pulled LaserScan data. Lint, yaml, and
nbformat validation all pass.
@mfaferek93 mfaferek93 force-pushed the feat/mosaico-m0-demo branch from 58c1e21 to 0d00085 Compare April 17, 2026 13:08
@fdicorato
Copy link
Copy Markdown

This is awesome!!! 😄🔥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants