Skip to content

more details about image-text pair data #8

@annopackage

Description

@annopackage

Hi,
Thanks for your great work.

I have several questions about the data and method:

  1. I am curious about the pipeline about generation of text list about HD map. Can you share more details about how to get text for multi-view and bev images? Are those information from a pretrained mulit-modal model or rules based on hd map?
  2. Are the visual encoder the same for multi-view images and bev cloud images? The encoder in the paper seems different, but in the inference code https://github.com/LLVM-AD/MAPLM/blob/main/baseline/evaluation/inference.py#L72C29-L72C44, the image processors are the same.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions