-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hi, thank you very much for your great work!
I have a question regarding the grounding results. In the paper, it is mentioned that "whenever the model references a ROI region in the image, it explicitly appends the corresponding bounding box coordinates [x1, y1, x2, y2] after the region text. This Chain-of-Box approach ensures the visual information is seamlessly integrated into the reasoning context, enabling VLMs to perform multimodal reasoning effectively."
However, I couldn’t find any grounding results (e.g., bounding boxes or coordinate information) in the section of the file eval/logs/rec22_results_cxr_test_qwen2_5vl_7b_instruct_r1_450.json.
Could you please check whether this is the correct file, or if the grounding results are stored elsewhere?
Thank you for your time and help!