Skip to content

Add universal visualization tool for processed datasets#27

Open
stepankonev wants to merge 2 commits into
mainfrom
add-visualization-tool
Open

Add universal visualization tool for processed datasets#27
stepankonev wants to merge 2 commits into
mainfrom
add-visualization-tool

Conversation

@stepankonev

Copy link
Copy Markdown
Owner

Summary

A dataset-agnostic tool that turns a processed-output folder (.npz frames + index.parquet) into per-scene MP4s, auto-detecting whichever modalities each frame carries — no per-dataset code. To make the output self-describing, the converter now also writes a dataset_info.yaml, and the BEV adapters record their grid params in metadata.

What's new

standard_e2e/visualization/

  • render.py — per-frame compositor: a camera mosaic (a per-direction surround grid, or a single stitched panorama as the pano adapter emits) + a single co-registered BEV panel in meters: the raster BEVs (hd_map_bev / lidar_bev / detections_3d_bev, color-composited by channel), lidar_pc, vector detections_3d boxes, and past_states / future_states / preference_trajectory trajectories, plus ego. Camera-less datasets render BEV-only.
  • visualize_processed.py — CLI. Scene selection: --scene-id (repeatable) xor --num-scenes N (default: first scene); --max-frames, --out. --fps defaults to the rate inferred from frame timestamps so playback is real-time (e.g. ~2 Hz for nuScenes keyframes, ~10 Hz for KITScenes); low rates are encoded by duplicating frames up to a player-friendly ~10 fps without changing the real-time duration.

Self-describing output (benefits every dataset)

  • The converter writes dataset_info.yaml (dataset, split, each adapter's spec) next to index.parquet; AbstractAdapter.spec exposes name + metadata.
  • HDMapBEVAdapter / Detections3DBEVAdapter / LidarBEVAdapter metadata now carries the grid (min/max x/y, pixels_per_meter) and channel order under f"{modality}_grid" / f"{modality}_channels", so .npz aux_data and the yaml are self-describing — the BEV panel renders correctly for any grid config without hard-coding it.

Usage

python -m standard_e2e.visualization.visualize_processed \
    /data/out/kitscenes_multimodal/val --num-scenes 2 --out /tmp/viz

Verification

Rendered videos across 8 datasets from one tool + one shared config (cameras 5→11; per-direction grids and stitched panoramas; both raster BEVs where shipped; vector boxes; real + SfM lidar; trajectories): nuScenes, KITScenes Multimodal, AV2 Sensor, AV2 Lidar (camera-less → BEV-only), Waymo Perception, waymo_e2e, TruckDrive, WayveScenes. Inferred fps matched each capture rate (nuScenes 2 Hz, the 10 Hz datasets 10).

Tests: dataset_info.yaml emission, BEV grid metadata, renderer modality auto-detection across combinations (dict + pano cameras, rasters, point clouds, vector detections, trajectories, near-empty frame), scene selection, _infer_fps, and the CLI end-to-end. Full gate green (pytest / black / isort / flake8 / mypy).

Notes / out of scope

  • pano vs per-camera layout is auto-handled; legacy .npz (no grid metadata) degrade gracefully — vectors render, rasters are skipped.
  • Surfaced separately, not fixed here: the WayveScenes processor emits ~10 kHz timestamps (a 0.1 ms synthetic step) — the visualizer reads + clamps it; worth a follow-up in that processor.

A dataset-agnostic tool that renders processed output (.npz + index.parquet) to
per-scene MP4s, auto-detecting whichever modalities each frame carries: camera
mosaics (a per-direction surround grid, or a stitched panorama) and a single
co-registered BEV panel (hd_map_bev / lidar_bev / detections_3d_bev rasters,
lidar_pc, vector detections_3d boxes, past/future/preference trajectories, ego).

- standard_e2e/visualization/: render.py (per-frame compositor) +
  visualize_processed.py (CLI). Scene selection: --scene-id (repeatable) xor
  --num-scenes N (default: first scene); plus --max-frames / --fps / --out.
- Converter writes dataset_info.yaml (dataset, split, adapter specs) next to
  index.parquet; AbstractAdapter.spec exposes name + metadata.
- BEV adapters' metadata now carries the grid (min/max x/y, pixels_per_meter)
  and channel order under f"{modality}_grid"/f"{modality}_channels", so the
  .npz aux_data and the yaml are self-describing -- the BEV panel renders
  correctly for any grid config without hard-coding it.
- Tests: dataset_info.yaml emission, BEV grid metadata, renderer modality
  auto-detection across combinations, scene selection, and the CLI end-to-end.
- --fps now defaults to the rate inferred from each scene's frame timestamps
  (median inter-frame interval) so videos play at the data's real-world speed
  (e.g. ~2 Hz for nuScenes keyframes, ~10 Hz for KITScenes); --fps still
  overrides.
- Low data rates are encoded by duplicating frames up to a player-friendly
  ~10 fps without changing the real-time duration -- many players render
  sub-~10 fps mp4 as static/broken (nuScenes was unplayable at 2 fps).
- Tests for _infer_fps (median / ordering / fallback / clamp).
@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant