Add universal visualization tool for processed datasets#27
Open
stepankonev wants to merge 2 commits into
Open
Conversation
A dataset-agnostic tool that renders processed output (.npz + index.parquet) to
per-scene MP4s, auto-detecting whichever modalities each frame carries: camera
mosaics (a per-direction surround grid, or a stitched panorama) and a single
co-registered BEV panel (hd_map_bev / lidar_bev / detections_3d_bev rasters,
lidar_pc, vector detections_3d boxes, past/future/preference trajectories, ego).
- standard_e2e/visualization/: render.py (per-frame compositor) +
visualize_processed.py (CLI). Scene selection: --scene-id (repeatable) xor
--num-scenes N (default: first scene); plus --max-frames / --fps / --out.
- Converter writes dataset_info.yaml (dataset, split, adapter specs) next to
index.parquet; AbstractAdapter.spec exposes name + metadata.
- BEV adapters' metadata now carries the grid (min/max x/y, pixels_per_meter)
and channel order under f"{modality}_grid"/f"{modality}_channels", so the
.npz aux_data and the yaml are self-describing -- the BEV panel renders
correctly for any grid config without hard-coding it.
- Tests: dataset_info.yaml emission, BEV grid metadata, renderer modality
auto-detection across combinations, scene selection, and the CLI end-to-end.
- --fps now defaults to the rate inferred from each scene's frame timestamps (median inter-frame interval) so videos play at the data's real-world speed (e.g. ~2 Hz for nuScenes keyframes, ~10 Hz for KITScenes); --fps still overrides. - Low data rates are encoded by duplicating frames up to a player-friendly ~10 fps without changing the real-time duration -- many players render sub-~10 fps mp4 as static/broken (nuScenes was unplayable at 2 fps). - Tests for _infer_fps (median / ordering / fallback / clamp).
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A dataset-agnostic tool that turns a processed-output folder (
.npzframes +index.parquet) into per-scene MP4s, auto-detecting whichever modalities each frame carries — no per-dataset code. To make the output self-describing, the converter now also writes adataset_info.yaml, and the BEV adapters record their grid params inmetadata.What's new
standard_e2e/visualization/render.py— per-frame compositor: a camera mosaic (a per-direction surround grid, or a single stitched panorama as the pano adapter emits) + a single co-registered BEV panel in meters: the raster BEVs (hd_map_bev/lidar_bev/detections_3d_bev, color-composited by channel),lidar_pc, vectordetections_3dboxes, andpast_states/future_states/preference_trajectorytrajectories, plus ego. Camera-less datasets render BEV-only.visualize_processed.py— CLI. Scene selection:--scene-id(repeatable) xor--num-scenes N(default: first scene);--max-frames,--out.--fpsdefaults to the rate inferred from frame timestamps so playback is real-time (e.g. ~2 Hz for nuScenes keyframes, ~10 Hz for KITScenes); low rates are encoded by duplicating frames up to a player-friendly ~10 fps without changing the real-time duration.Self-describing output (benefits every dataset)
dataset_info.yaml(dataset, split, each adapter's spec) next toindex.parquet;AbstractAdapter.specexposes name + metadata.HDMapBEVAdapter/Detections3DBEVAdapter/LidarBEVAdaptermetadatanow carries the grid (min/max x/y,pixels_per_meter) and channel order underf"{modality}_grid"/f"{modality}_channels", so.npzaux_dataand the yaml are self-describing — the BEV panel renders correctly for any grid config without hard-coding it.Usage
Verification
Rendered videos across 8 datasets from one tool + one shared config (cameras 5→11; per-direction grids and stitched panoramas; both raster BEVs where shipped; vector boxes; real + SfM lidar; trajectories): nuScenes, KITScenes Multimodal, AV2 Sensor, AV2 Lidar (camera-less → BEV-only), Waymo Perception, waymo_e2e, TruckDrive, WayveScenes. Inferred fps matched each capture rate (nuScenes 2 Hz, the 10 Hz datasets 10).
Tests:
dataset_info.yamlemission, BEV grid metadata, renderer modality auto-detection across combinations (dict + pano cameras, rasters, point clouds, vector detections, trajectories, near-empty frame), scene selection,_infer_fps, and the CLI end-to-end. Full gate green (pytest / black / isort / flake8 / mypy).Notes / out of scope
panovs per-camera layout is auto-handled; legacy.npz(no grid metadata) degrade gracefully — vectors render, rasters are skipped.