I would like to ask how to do a visual grounding (REC) task directly using GPTY4v?

Thank you for your work!
Now I would like to directly to GPT4v input the image and a prompt like “This is an image, now I need to do the visual grounding task where you generate the coordinates [x,y,h,w] of a bounding box based on a query.”
But I found that this doesn't output very well, the model is even outputting the coordinates randomly. Should I have to preprocess the image first? How should this go about? Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I would like to ask how to do a visual grounding (REC) task directly using GPTY4v? #41

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

I would like to ask how to do a visual grounding (REC) task directly using GPTY4v? #41

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions