Skip to content

feat(live): native screenshot-bytes fast path for the NL LLM agent (driver-agnostic)#303

Open
chinmayajha wants to merge 2 commits into
mozarkai:mainfrom
chinmayajha:live-native-screenshot-bytes
Open

feat(live): native screenshot-bytes fast path for the NL LLM agent (driver-agnostic)#303
chinmayajha wants to merge 2 commits into
mozarkai:mainfrom
chinmayajha:live-native-screenshot-bytes

Conversation

@chinmayajha

Copy link
Copy Markdown
Collaborator

Adds ElementSourceInterface.capture_screenshot_bytes() — an optional fast path returning the device's natively-encoded screenshot, so callers needing only bytes skip the base64 → numpy → cv2.imdecode → cv2.imencode round-trip. StrategyManager.capture_screenshot_bytes() prefers any backend that implements it and otherwise encodes the numpy frame; LiveController.screenshot_png_bytes() (the NL provider) delegates to it — no driver-specific code in the live layer.

Implemented across all backends: Appium & Selenium (base64 + backoff retry), Playwright (returns page.screenshot() PNG bytes verbatim — no decode). Each backend's capture_screenshot_as_numpy() now reuses the bytes path so both share one fetch+retry.

Add ElementSourceInterface.capture_screenshot_bytes(), an optional fast path that
returns the device's natively-encoded screenshot (PNG/JPEG) so callers needing
only bytes skip the numpy decode -> re-encode round-trip.

Implement it across every backend:
- Appium / Selenium: base64 endpoint, with a backoff retry for truncated responses.
- Playwright: returns page.screenshot()'s PNG bytes verbatim (no decode at all).
Each backend's capture_screenshot_as_numpy() now reuses capture_screenshot_bytes(),
so the bytes and numpy paths share one fetch+retry implementation.

StrategyManager.capture_screenshot_bytes() prefers any source implementing the
native path and otherwise encodes the numpy frame, so every backend keeps working.
LiveController.screenshot_png_bytes() (the NL agent's screenshot provider) now
delegates to it, keeping the live layer driver-agnostic.
…back

StrategyManager prefers native bytes, falls back to numpy encode, and skips a
failing native source. Per-backend coverage: Appium/Selenium base64 decode with
retry, Playwright raw-PNG passthrough, and capture_screenshot_as_numpy reusing the
bytes path.
@chinmayajha chinmayajha requested a review from thakur-patel June 15, 2026 16:37
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant