Skip to content

feat: add agent observation and SPA wait helpers#198

Open
Lhy099 wants to merge 2 commits intobrowser-use:mainfrom
Lhy099:feat/agent-observation-waits
Open

feat: add agent observation and SPA wait helpers#198
Lhy099 wants to merge 2 commits intobrowser-use:mainfrom
Lhy099:feat/agent-observation-waits

Conversation

@Lhy099
Copy link
Copy Markdown

@Lhy099 Lhy099 commented Apr 25, 2026

Summary

This PR adds lightweight helpers for the agent execution loop: observe, wait, and act.
Agents often work on React, Next.js, and SPA pages where document.readyState === "complete" does not mean the UI is ready. Existing domain skills frequently compensate with fixed sleeps after wait_for_load(), which makes agent runs slower and less reliable.

This PR adds predicate-based waits so agents can wait for the actual page state they need. It also adds a compact page outline helper for agent observation. Instead of relying only on screenshots or ad hoc DOM probes, agents can request a structured summary of visible interactive elements with text, roles, labels, hrefs, disabled state, and bounding boxes.

Agent Workflow Improvements

  • wait_for_js() lets agents wait on explicit page state instead of sleeping blindly.
  • wait_for_selector() gives agents a simple primitive for waiting until a target element is mounted or visibly painted.
  • page_outline() gives agents a compact observation surface for deciding what to click, inspect, or verify next.
  • SKILL.md now guides agents to use predicate-based waits for React/SPA readiness.

Summary by cubic

Adds SPA-aware wait helpers and a compact page outline to make agents more reliable on React/Next.js and other SPA pages. Propagates JavaScript runtime errors to fail fast with clear messages.

  • New Features

    • wait_for_js(expression, timeout, interval): poll a JS predicate and return the first truthy value.
    • wait_for_selector(selector, visible=False): wait for a node; visible=True requires painted, styled, and in-viewport.
    • page_outline(limit): compact list of visible interactive elements (tag, text, role, aria, href, disabled, rect).
  • Bug Fixes

    • js(...): now raises on JS runtime exceptions with the CDP message; wait_for_js propagates these errors.

Written for commit b194083. Summary will update on new commits.

@browser-harness-review
Copy link
Copy Markdown

browser-harness-review Bot commented Apr 25, 2026

✅ Skill review passed

Reviewed 1 file(s) — no findings.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="helpers.py">

<violation number="1" location="helpers.py:207">
P2: New wait polling masks JavaScript evaluation errors as timeouts because `js()` return is treated as falsy readiness, not as execution failure.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread helpers.py
@Lhy099 Lhy099 force-pushed the feat/agent-observation-waits branch from b3db7cd to b194083 Compare April 25, 2026 10:23
@jingchang0623-crypto
Copy link
Copy Markdown

凌晨4点17分,我和这个SPA wait helpers PR对视了整整一个时辰。

作为一个在React/Next.js项目上被Agent折磨了无数次的人,看到这个PR我差点哭出来。

痛点太真实了。

我们做妙趣AI运营时,Agent在SPA页面上犯的错:

  • document.readyState === "complete" → 页面还在加载中
  • wait_for_load() → 等了3秒,但按钮还没渲染
  • find_element() → 元素存在但不可点击,因为还在动画中
  • 截图显示"完美",但实际是空白页

然后就是各种骚操作:

# 1. 瞎等
time.sleep(5)  # 5秒,够你喝杯咖啡了

# 2. 轮询
while not element.is_displayed():
    time.sleep(0.1)  # 100次 = 10秒

# 3. 暴力点击
element.click()  # 报错:ElementNotInteractableException

这个PR的三个工具简直是救命稻草:

  1. wait_for_js(expression) - 终于不用瞎猜了,直接等JS条件成立
  2. wait_for_selector(selector, visible=True) - 可见性检查,不用再手动判断
  3. page_outline() - 这个太酷了!Agent终于能"看懂"页面结构,而不是靠截图猜

但我想补充一个实战建议:

对于React的SSR/CSR混合场景,可能还需要:

  • wait_for_hydration() - 等hydration完成
  • wait_for_stable_state(timeout=3) - 等DOM稳定(比如动画结束)

另外,这个page_outline()如果能把React的component hierarchy也暴露出来,就更强了。有时候Agent需要知道"这个按钮属于哪个组件"来做决策。

总之,这个PR解决了Agent在SPA上最大的痛点。我们之前在OpenClaw的Agent调试中,这类问题占了40%的bug。

——妙趣AI | https://miaoquai.com | 专门在SPA页面上踩坑的AI运营者

@qodo-ai-reviewer
Copy link
Copy Markdown

Hi, page_outline() includes el.value in extracted text, which will return user-entered form contents (including password fields) to the caller. This is a direct sensitive-data exfiltration path to the agent/LLM whenever observation is used on login/PII forms.

Severity: action required | Category: security

How to fix: Mask/exclude input element values

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

helpers.page_outline() currently includes el.value in the returned text field. This can leak credentials/PII (notably password inputs) to the agent/LLM.

Issue Context

page_outline() is intended for observation. Observation should not include raw form field contents by default.

Fix Focus Areas

  • helpers.py[248-278]

Suggested fix

  • Remove el.value from the text fallback entirely, or only include it when safe (e.g., input types excluding password, and potentially only when readonly or explicitly opted-in).
  • If you need value visibility for UX reasons, return a masked value (e.g., '***' or length only) for sensitive types (password, maybe email, tel) and consider a hard denylist for common secret-like fields (autocomplete='current-password', name contains 'password', etc.).

Qodo code review - free for open-source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants