Add KITScenes Multimodal dataset support#26
Merged
Conversation
819f00d to
0753ff0
Compare
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Adds the kitscenes_multimodal dataset (KIT/MRT, arXiv:2606.02956): six surround ring cameras, the lidar_top point cloud (xyz, ego frame), the ego past/future trajectory and the Lanelet2 HD map (parsed directly to the unified MapElementType taxonomy; no lanelet2 runtime dependency). KITScenes ships no 3D boxes, so detections are empty. The dataset key is `kitscenes_multimodal` (not the generic `kitscenes`) so the planned KITScenes-LongTail variant can be added as a sibling, mirroring the waymo_e2e/waymo_perception and av2_sensor/av2_lidar variant-naming convention. - New package standard_e2e/caching/src_datasets/kitscenes_multimodal/ (io, geometry, map, splits, processor, converter) - Register in process_source_dataset dispatch; configs/kitscenes_multimodal.yaml; extract (shared) + prepare scripts; README + docs/datasets.rst rows/footnotes - Declare pyproj (used for the HD map's UTM projection) - Tests: geometry/io/map units + hermetic synthetic-scene build + optional real-data checks gated on KITSCENES_ROOT poses.txt is already in the Lanelet2 map-local frame (UTM 32N minus the maps/origin.json anchor), verified by the ego trajectory lying on the map node cloud, so the map needs no GNSS reconciliation.
0753ff0 to
7a53464
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds support for the KITScenes Multimodal dataset (KIT / MRT, arXiv:2606.02956) — a large European urban dataset (~1000 scenes @ 10 Hz) whose headline annotation is a dense, georeferenced Lanelet2 HD map.
New package
standard_e2e/caching/src_datasets/kitscenes/(_kitscenes_io,_kitscenes_geometry,_kitscenes_map,_kitscenes_splits, processor, converter), registered in theprocess_source_datasetdispatch, plusconfigs/kitscenes.yaml,scripts/{extract,prepare_dataset}_kitscenes.sh, README +docs/datasets.rstrows/footnotes, apyprojdependency, and a test suite.Ingested modalities (→
StandardFrameData):camera_ring_*views → canonicalCameraDirectionmembers (pinholeK,T_ego_from_cameraextrinsics).lidar_topxyz, de-discretized from int32 (×discretization_resolution), invalid-return sentinels dropped, in the ego (base_frame) frame.maps/map.osm(Lanelet2) parsed directly → unifiedMapElementTypetaxonomy (lanelet road/emergency/bicycle →LANE_CENTER; crosswalk →CROSSWALK;line_thin/line_thick/bike_marking→LANE_BOUNDARY;curbstone/road_border→ROAD_EDGE;stop_line→STOP_LINE;traffic_light*→TRAFFIC_LIGHT), ROI-cropped per frame and rasterized byHDMapBEVAdapter.poses.txt→T_maplocal_from_ego; past/future viaFuturePastStatesFromMatricesAggregator.One frame = one 10 Hz synced snapshot; one segment = one scene (UUID dir).
Georeferencing
poses.txtis already in the Lanelet2 map-local frame (UTM zone 32N minus themaps/origin.jsonanchor), verified by the ego trajectory lying on the map node cloud (mean ~1.4 m to nearest node). So the map needs no GNSS reconciliation — OSM nodes are projected withpyprojand the poses placed directly. Nolanelet2runtime dependency (it pinsnumpy<2, like the nuScenes devkit).Verification
KITSCENES_ROOT).KITScenes-Multimodal-Sample, 1 val scene, 220 frames): cameras +lidar_pc+hd_map_bev+ aggregated past/future all produced; map verified visually co-registered with the LiDAR across all frames.Limitations
detections_3dis always empty.CameraDirectionslot and adding members was deliberately deferred. No enum changes in this PR.lidar_toponly. The other 6 LiDARs (4 automotive + 2 corner) are not ingested; the merged ~7-sensor cloud is left as future work.processed/model outputs (ground-seg, panoptic) have no StandardE2E target yet.pole,traffic_sign,arrow,symbol,pedestrian_marking/zebra_markingways, andvirtualstandalone ways are not emitted (no clean unified target; the crossing is captured by thecrosswalklanelet).data/<split>/<uuid>layout); a flat layout (e.g. the sample) processes every scene with--splitas a passthrough output label, rather than vendoring the ~1000 official-split UUIDs.These mirror the scoping of the VoD / TruckDrive integrations (radar / extra sensors flagged as "no target yet") and are documented in the module docstrings, the README/
docs/datasets.rstfootnotes, and the config comments.Notes