Skip to content

feat: enrich screenshot diff guidance#403

Merged
thymikee merged 8 commits intomainfrom
codex/screenshot-diff-guidance
Apr 12, 2026
Merged

feat: enrich screenshot diff guidance#403
thymikee merged 8 commits intomainfrom
codex/screenshot-diff-guidance

Conversation

@thymikee
Copy link
Copy Markdown
Contributor

@thymikee thymikee commented Apr 12, 2026

Summary

Add region-aware screenshot diff output with light current-screen context, region outlines, structured changed-region metadata, best-effort Tesseract OCR text deltas, and non-text residual hints for icons, controls, separators, and backgrounds.

Wire diff screenshot --overlay-refs to capture a separate current-screen overlay guide and map changed regions to current refs, while keeping raw OCR boxes internal to the non-text masking pass. Update docs and the agent-device skill guidance for the new workflow.

> agent-device diff screenshot --baseline img1 --out diff.png

✗ 16.99% pixels differ
  Diff image: /tmp/agent-device-diff-readout/settings-diff-hints-current.png
  537177 different / 3162132 total pixels
  Hints:
    - text movement cluster: "Wi-Fi", "Bluetooth", "Battery" dx=+186px dy=-91..-47px
    - non-text controls: icon near "Battery"; chevron near "Not Connected" r3
    - non-text boundaries: separator r3; separator near "Family" r1
  Changed regions:
    1. center x=48 y=771 1110x323, 37.78% of diff, change=brighter
       size=large shape=large-area density=56.6% avgColor=#141314->#2e2b2d luminance=19->44
    2. bottom-center x=48 y=2187 1110x125, 16.77% of diff, change=brighter
       size=large shape=horizontal-band density=64.92% avgColor=#010101->#242426 luminance=1->36
    3. bottom-center x=48 y=1998 1110x162, 6.58% of diff, change=darker
       size=large shape=horizontal-band density=19.66% avgColor=#24282d->#070a0e luminance=40->10
    4. center x=48 y=1094 1110x163, 5.66% of diff, change=darker
       size=large shape=horizontal-band density=16.8% avgColor=#83807f->#35373c luminance=129->55
    5. top-center x=48 y=500 1110x141, 4.35% of diff, change=mixed
       size=large shape=horizontal-band density=14.94% avgColor=#5e5f61->#68686a luminance=95->104
  OCR text deltas (tesseract; baselineBlocks=19 currentBlocks=20; showing 8/12; px):
    item | text | movePx | sizeDeltaPx | bboxBaseline | bboxCurrent | confidence | issueHint
    1 | "Wi-Fi" | +186,-91 | -1,-17 | x=218,y=1279,w=117,h=56 | x=404,y=1188,w=116,h=39 | 90.84 | ocr-bbox-size-change
    2 | "Bluetooth" | +186,-82 | -1,0 | x=220,y=1439,w=213,h=39 | x=406,y=1357,w=212,h=39 | 53.43 | -
    3 | "Battery" | +186,-47 | -2,+1 | x=220,y=1909,w=161,h=46 | x=406,y=1862,w=159,h=47 | 90.45 | -
    4 | "Not Connected" | +3,-93 | -16,-2 | x=702,y=1284,w=338,h=38 | x=705,y=1191,w=322,h=36 | 94.77 | -
    5 | "Q Search" | -48,+60 | +2,0 | x=144,y=2441,w=235,h=49 | x=96,y=2501,w=237,h=49 | 94.04 | -
    6 | "On" | -9,-80 | -4,-3 | x=980,y=1440,w=61,h=38 | x=971,y=1360,w=57,h=35 | 96.8 | -
    7 | "General" | +9,-72 | -1,0 | x=219,y=2324,w=171,h=39 | x=228,y=2252,w=170,h=39 | 90.4 | -
    8 | "Not Connected" | +3,-33 | -16,-2 | x=702,y=2064,w=338,h=38 | x=705,y=2031,w=322,h=36 | 96.17 | -
  Non-text visual deltas (showing 4/4; px):
    item | region | slot | kind | bboxCurrent | nearestText
    1 | - | leading | icon | x=89,y=1883,w=87,h=87 | "Battery"
    2 | r3 | trailing | chevron | x=1071,y=2028,w=32,h=72 | "Not Connected"
    3 | r3 | separator | separator | x=216,y=2001,w=894,h=3 | -
    4 | r1 | separator | separator | x=462,y=771,w=648,h=3 | "Family"

577039829-b2951824-5b5c-45f4-858a-64675ddc83c6

Validation

  • pnpm format
  • pnpm vitest run src/utils/__tests__/screenshot-diff-non-text.test.ts src/utils/__tests__/screenshot-diff-ocr.test.ts src/utils/__tests__/output.test.ts src/utils/__tests__/screenshot-diff.test.ts src/__tests__/cli-diff.test.ts src/utils/__tests__/cli-option-schema.test.ts
  • pnpm check:quick
  • pnpm check:unit
  • git diff --check
  • Real-image smoke run against /Users/thymikee/Downloads/IMG_8160.PNG and /Users/thymikee/Downloads/simulator_screenshot_B3AD5337-53C6-4B34-AB5B-CDB8A0864CB9.png via compareScreenshots

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 12, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://callstackincubator.github.io/agent-device/pr-preview/pr-403/

Built to branch gh-pages at 2026-04-12 10:29 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e72b2664a4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +155 to +159
differentPixels > 0
? summarizeNonTextDiffDeltas({
diffMask,
width: baseline.width,
height: baseline.height,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate non-text delta generation on successful OCR analysis

This path computes nonTextDeltas for every pixel mismatch even when OCR is unavailable, so text-only changes get treated as "non-text" residuals and can be mislabeled as icons/toggles. In environments without tesseract (or when OCR fails), this produces misleading guidance rather than OCR-masked residuals; the non-text pass should be skipped unless OCR analysis is present.

Useful? React with 👍 / 👎.

Comment on lines +75 to +79
if (flags.overlayRefs && !result.match && !result.dimensionMismatch) {
const overlayResult = await client.capture.screenshot({
path: outputPath ? deriveCurrentOverlayPath(outputPath) : undefined,
overlayRefs: true,
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Delete stale current-overlay artifact on no-diff runs

The overlay guide is only captured on mismatch, but there is no cleanup when a later diff screenshot --overlay-refs --out ... run matches (or has a dimension mismatch). That leaves an old *.current-overlay.* file on disk, which can be mistaken for fresh output by users or scripts that rely on the deterministic filename.

Useful? React with 👍 / 👎.

@thymikee thymikee merged commit 3e0a7b5 into main Apr 12, 2026
16 checks passed
@thymikee thymikee deleted the codex/screenshot-diff-guidance branch April 12, 2026 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant