From the results, it seems that the "coco_finetuned_mask_256_ffs" is trained without ground-truth hints provided, while the "coco_finetuned_mask_256" is trained with ground-truth hints used. But the strange thing is that the results shown in the paper are more likely to the latter.
Is that true?