Skip to content

Screenshot-code pair misalignment: real images in screenshots vs placeholder.png in ground truth HTML #20

@banyinjushi

Description

@banyinjushi

Hi, thanks for the great work on ScreenCoder!

While examining the Screen-10K dataset, I noticed that the screenshot-code pairs have a systematic misalignment in the image dimension:

Screenshots are captured from original URLs and contain real images (photos, icons, logos, etc.).

Ground truth HTML is processed by webpage2html, which replaces all image resources (, CSS background-image, favicon, font files, etc.) with placeholder.png:

.bg{background-image:url(placeholder.png)}
Since placeholder.png is not bundled with the dataset either, rendering the ground truth HTML produces broken images, which means:

The visual appearance of the rendered ground truth code does not match the input screenshot
Any pixel-level or CLIP-based similarity metric between the screenshot and the rendered code output would be penalized by this discrepancy
Models trained on this data learn a lossy mapping — they see real images in the input but are never expected to reproduce them in the output
Questions:

Is this intentional? If so, is the training objective purely focused on layout/structure reproduction, deliberately ignoring image content?
Were the filtering metrics (used to select 10K from 50K) computed against the original URL rendering or against the placeholder-ized HTML rendering?
Would it be possible to release placeholder.png alongside the dataset, or document the expected rendering behavior?
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions