Skip to content

Intelligent-Vehicles-Lab-HM/BEV-LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BEV-LLM: Leveraging Multimodal BEV Maps for Scene Captioning in Autonomous Driving

Overview: BEV-LLM is a lightweight 1B model for 3D scene captioning in autonomous driving. It fuses LiDAR and multi-view images using BEVFusion and introduces novel absolute positional encoding for view-specific descriptions. Despite its size, it performs competitively on the nuCaption benchmark, enhancing transparency, safety, and human-AI interaction.

alt text

© 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.


About

Leveraging Multimodal BEV Maps for Scene Captioning in Autonomous Driving

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages