Add KITScenes Multimodal dataset support by stepankonev · Pull Request #26 · stepankonev/StandardE2E

stepankonev · 2026-06-21T16:32:55Z

Summary

Adds support for the KITScenes Multimodal dataset (KIT / MRT, arXiv:2606.02956) — a large European urban dataset (~1000 scenes @ 10 Hz) whose headline annotation is a dense, georeferenced Lanelet2 HD map.

New package standard_e2e/caching/src_datasets/kitscenes/ (_kitscenes_io, _kitscenes_geometry, _kitscenes_map, _kitscenes_splits, processor, converter), registered in the process_source_dataset dispatch, plus configs/kitscenes.yaml, scripts/{extract,prepare_dataset}_kitscenes.sh, README + docs/datasets.rst rows/footnotes, a pyproj dependency, and a test suite.

Ingested modalities (→ StandardFrameData):

cameras — the 6 surround camera_ring_* views → canonical CameraDirection members (pinhole K, T_ego_from_camera extrinsics).
lidar — lidar_top xyz, de-discretized from int32 (×discretization_resolution), invalid-return sentinels dropped, in the ego (base_frame) frame.
hd_map — maps/map.osm (Lanelet2) parsed directly → unified MapElementType taxonomy (lanelet road/emergency/bicycle → LANE_CENTER; crosswalk → CROSSWALK; line_thin/line_thick/bike_marking → LANE_BOUNDARY; curbstone/road_border → ROAD_EDGE; stop_line → STOP_LINE; traffic_light* → TRAFFIC_LIGHT), ROI-cropped per frame and rasterized by HDMapBEVAdapter.
ego trajectory — poses.txt → T_maplocal_from_ego; past/future via FuturePastStatesFromMatricesAggregator.

One frame = one 10 Hz synced snapshot; one segment = one scene (UUID dir).

Georeferencing

poses.txt is already in the Lanelet2 map-local frame (UTM zone 32N minus the maps/origin.json anchor), verified by the ego trajectory lying on the map node cloud (mean ~1.4 m to nearest node). So the map needs no GNSS reconciliation — OSM nodes are projected with pyproj and the poses placed directly. No lanelet2 runtime dependency (it pins numpy<2, like the nuScenes devkit).

Verification

Pre-commit gate green: pytest / mypy / black / isort / flake8.
Tests: geometry / IO / map units + hermetic synthetic-scene build + optional real-data checks (gated on KITSCENES_ROOT).
End-to-end conversion on the public sample (KITScenes-Multimodal-Sample, 1 val scene, 220 frames): cameras + lidar_pc + hd_map_bev + aggregated past/future all produced; map verified visually co-registered with the LiDAR across all frames.

Limitations

No 3D object detections. KITScenes ships no object boxes/tracks (the HD map is its annotation product), so detections_3d is always empty.
Cameras: ring 6 only. The three "base" cameras (the high-res front-center and the rectified stereo pair, used mainly for depth / novel-view benchmarks) are not ingested — they have no canonical CameraDirection slot and adding members was deliberately deferred. No enum changes in this PR.
LiDAR: lidar_top only. The other 6 LiDARs (4 automotive + 2 corner) are not ingested; the merged ~7-sensor cloud is left as future work.
Radar / GNSS-INS not ingested. The 3 imaging radars, the dual GNSS/INS streams and the processed/ model outputs (ground-seg, panoptic) have no StandardE2E target yet.
Map scope. Lane-graph connectivity (successor / predecessor / neighbours) is not reconstructed (the BEV rasterizer doesn't use it). pole, traffic_sign, arrow, symbol, pedestrian_marking/zebra_marking ways, and virtual standalone ways are not emitted (no clean unified target; the crossing is captured by the crosswalk lanelet).
No per-point LiDAR deskew. Points are placed by the static sensor→ego extrinsic only (the devkit additionally ego-motion-deskews per-point); reflectivity/ring/timestamp columns are dropped (StandardE2E lidar is xyz-only).
UTM zone 32N assumed for the map projection (correct for all current KITScenes cities — Karlsruhe / Frankfurt / Sindelfingen).
Splits are path-based (the HF data/<split>/<uuid> layout); a flat layout (e.g. the sample) processes every scene with --split as a passthrough output label, rather than vendoring the ~1000 official-split UUIDs.

These mirror the scoping of the VoD / TruckDrive integrations (radar / extra sensors flagged as "no target yet") and are documented in the module docstrings, the README/docs/datasets.rst footnotes, and the config comments.

Notes

Does not bump the package version (separate release step).

codecov · 2026-06-21T16:42:45Z

Codecov Report

❌ Patch coverage is 89.72603% with 45 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...rc_datasets/kitscenes_multimodal/_kitscenes_map.py	88.82%	20 Missing ⚠️
...src_datasets/kitscenes_multimodal/_kitscenes_io.py	90.35%	11 Missing ⚠️
...ltimodal/kitscenes_multimodal_dataset_converter.py	58.82%	7 Missing ⚠️
...ltimodal/kitscenes_multimodal_dataset_processor.py	93.24%	5 Missing ⚠️
standard_e2e/caching/process_source_dataset.py	0.00%	1 Missing ⚠️
...tasets/kitscenes_multimodal/_kitscenes_geometry.py	97.67%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Adds the kitscenes_multimodal dataset (KIT/MRT, arXiv:2606.02956): six surround ring cameras, the lidar_top point cloud (xyz, ego frame), the ego past/future trajectory and the Lanelet2 HD map (parsed directly to the unified MapElementType taxonomy; no lanelet2 runtime dependency). KITScenes ships no 3D boxes, so detections are empty. The dataset key is `kitscenes_multimodal` (not the generic `kitscenes`) so the planned KITScenes-LongTail variant can be added as a sibling, mirroring the waymo_e2e/waymo_perception and av2_sensor/av2_lidar variant-naming convention. - New package standard_e2e/caching/src_datasets/kitscenes_multimodal/ (io, geometry, map, splits, processor, converter) - Register in process_source_dataset dispatch; configs/kitscenes_multimodal.yaml; extract (shared) + prepare scripts; README + docs/datasets.rst rows/footnotes - Declare pyproj (used for the HD map's UTM projection) - Tests: geometry/io/map units + hermetic synthetic-scene build + optional real-data checks gated on KITSCENES_ROOT poses.txt is already in the Lanelet2 map-local frame (UTM 32N minus the maps/origin.json anchor), verified by the ego trajectory lying on the map node cloud, so the map needs no GNSS reconciliation.

stepankonev force-pushed the add-kitscenes-multimodal branch from 819f00d to 0753ff0 Compare June 21, 2026 16:38

stepankonev force-pushed the add-kitscenes-multimodal branch from 0753ff0 to 7a53464 Compare June 22, 2026 08:07

stepankonev merged commit 148f281 into main Jun 22, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add KITScenes Multimodal dataset support#26

Add KITScenes Multimodal dataset support#26
stepankonev merged 1 commit into
mainfrom
add-kitscenes-multimodal

stepankonev commented Jun 21, 2026

Uh oh!

codecov Bot commented Jun 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stepankonev commented Jun 21, 2026

Summary

Georeferencing

Verification

Limitations

Notes

Uh oh!

codecov Bot commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented Jun 21, 2026 •

edited

Loading