Skip to content

fix: agent fallback validator#736

Open
leo-notte wants to merge 2 commits intomainfrom
fix/agent-fallback-validator
Open

fix: agent fallback validator#736
leo-notte wants to merge 2 commits intomainfrom
fix/agent-fallback-validator

Conversation

@leo-notte
Copy link
Copy Markdown
Contributor

@leo-notte leo-notte commented Mar 11, 2026

Summary by CodeRabbit

  • New Features

    • Action execution and scrape results now include URL context so outputs show which page an action or scrape ran on.
    • Enhanced system guidance clarifying behavior across multi-page navigation and element ID resets.
  • Tests

    • Added comprehensive tests for URL tracking and multi-page navigation, plus an integration-marked placeholder test.
  • Chores

    • Test config updated to support an "integration" marker.

Greptile Summary

This PR fixes a bug where the CompletionValidator would incorrectly reject a valid agent completion after page navigation. When an agent clicked a link that navigated to a new page, the validator saw the new page's elements but the action history referenced elements from the previous page — leading to false negatives. The fix threads the page URL at time of execution through the data model (ExecutionResult.url), surfaces it in the agent's perception layer (FalcoPerception.perceive_action_result), and adds a clear multi-page context explanation to the validator system prompt.

Key changes:

  • observation.py — adds an optional url field to ExecutionResult (backward compatible, defaults to None).
  • session.py — captures self.page.url before both action execution and scrape execution, and passes it to every newly created ExecutionResult.
  • perception.py — appends (on <url>) to the formatted action-result string when a URL is available, giving the LLM provenance for each action in the history.
  • validator.py — extends the system prompt with an IMPORTANT block explaining that element IDs reset per page and that action history URLs reflect where actions were actually performed.
  • New tests cover ExecutionResult URL population, perceive_action_result output format, system prompt content, and end-to-end validation after page navigation.

Concerns:

  • test_validator_accepts_completion_after_page_navigation in tests/agent/test_validator.py is a full end-to-end test (real browser, LLM, and external network calls to example.com/iana.org) with no pytest.mark.integration guard or skip condition. It will run with the unit-test suite in any environment and fail in offline/resource-constrained CI.
  • Inside test_validator_receives_url_in_action_history, asyncio.run() is called inside a synchronous test to append to a trajectory. This can conflict with an already-running event loop in some pytest configurations.

Confidence Score: 4/5

  • Safe to merge — the core fix is correct, backward compatible, and well-tested; only the test hygiene concerns need attention.
  • The data-model change (url field with None default) is fully backward compatible. URL capture happens at exactly the right moment (before execution) in both the action and scrape code paths. The perception and validator prompt changes are additive. The only risks are a fragile end-to-end test placed outside the integration folder (no pytest.mark guard) and potential asyncio.run conflicts in the test suite — neither affects production behaviour.
  • tests/agent/test_validator.py — the new end-to-end test test_validator_accepts_completion_after_page_navigation should be guarded with pytest.mark.integration or moved to tests/integration/.

Important Files Changed

Filename Overview
packages/notte-core/src/notte_core/browser/observation.py Adds an optional `url: str
packages/notte-browser/src/notte_browser/session.py Captures self.page.url before both action execution and scrape execution, then passes it to all created ExecutionResult objects via the new url field. Correctly covers both the action path and the scrape path.
packages/notte-agent/src/notte_agent/falco/perception.py Adds (on <url>) to the success and failure action-result strings when a URL is present, providing the LLM with provenance context for each action in the history.
packages/notte-agent/src/notte_agent/common/validator.py Adds an IMPORTANT block to the validator system prompt explaining that actions were performed on previous pages, that the current observation is the result of navigation, and that element IDs reset per page.
tests/agent/test_validator.py Adds several unit tests for the new URL field and one end-to-end test (test_validator_accepts_completion_after_page_navigation) that requires a real browser, LLM, and external network access; this test is placed in the non-integration test folder without a marker and without a skip guard, making it potentially fragile in offline/CI environments.
tests/integration/sdk/test_agent_fallback.py Adds a stub test that immediately calls pytest.skip, redirecting testers to the local test in test_validator.py. Clean placeholder with a useful message.

Last reviewed commit: 3e35d68

Greptile also left 2 inline comments on this PR.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 11, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a7e5b273-b9a2-42c5-b60d-665491aa5b25

📥 Commits

Reviewing files that changed from the base of the PR and between 3e35d68 and 584fde6.

📒 Files selected for processing (3)
  • pyproject.toml
  • tests/agent/test_validator.py
  • tests/integration/sdk/test_agent_fallback.py
✅ Files skipped from review due to trivial changes (1)
  • tests/integration/sdk/test_agent_fallback.py

Walkthrough

Added URL context propagation across the agent/browser pipeline: a new optional url field on ExecutionResult captures page URL at action execution and scrape time; session code captures and supplies the URL to execution and scrape results; perception formatting includes URL context in success/failure messages; validator text was extended with a multi-page navigation note; tests and an integration marker were added/updated to assert URL presence and related behaviors.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: agent fallback validator' directly reflects the main objective of addressing a bug where CompletionValidator rejects valid completions after page navigation, and aligns with the primary changes across multiple files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/agent-fallback-validator
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

validation_msg = validator.validation_message(completion, trajectory, progress, mock_obs)

# The validation message should contain the URL context
assert "https://example.com" in validation_msg, (

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High test

The string
https://example.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix

AI about 1 month ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

Comment thread tests/agent/test_validator.py
Comment thread tests/agent/test_validator.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/agent/test_validator.py (1)

286-329: E2E test relies on external sites.

This test depends on example.com and iana.org being available and maintaining their current link structure. While this is a reasonable assumption for example.com (it's specifically designed as a stable example domain), consider adding a brief comment noting this dependency, or ensure the test is marked appropriately for CI environments where external network access may be restricted.

💡 Optional: Add network dependency marker
+@pytest.mark.network
 def test_validator_accepts_completion_after_page_navigation():
     """
     End-to-end test with real LLM and AgentFallback: Verify that the validator

This allows CI configurations to skip network-dependent tests if needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/agent/test_validator.py` around lines 286 - 329, The test
test_validator_accepts_completion_after_page_navigation relies on external sites
(example.com and iana.org); update the test to either document this dependency
with a brief inline comment near the test name or add a pytest marker so CI can
skip it when network access is restricted (e.g., add `@pytest.mark.network` or
`@pytest.mark.integration` or a `@pytest.mark.skipif` network-check before the
test), and ensure references to notte.Session and notte.AgentFallback remain
unchanged so the test behavior is preserved when it runs under network-enabled
CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/sdk/test_agent_fallback.py`:
- Around line 62-70: Docstring in
test_agent_fallback_validator_with_page_navigation references a non-existent
local test name; update the docstring to reference the actual local test name
test_validator_accepts_completion_after_page_navigation (or the correct local
test) so the comment is accurate, i.e., edit the triple-quoted string inside
test_agent_fallback_validator_with_page_navigation to replace
test_agent_fallback_validator_with_page_navigation_local with
test_validator_accepts_completion_after_page_navigation.

---

Nitpick comments:
In `@tests/agent/test_validator.py`:
- Around line 286-329: The test
test_validator_accepts_completion_after_page_navigation relies on external sites
(example.com and iana.org); update the test to either document this dependency
with a brief inline comment near the test name or add a pytest marker so CI can
skip it when network access is restricted (e.g., add `@pytest.mark.network` or
`@pytest.mark.integration` or a `@pytest.mark.skipif` network-check before the
test), and ensure references to notte.Session and notte.AgentFallback remain
unchanged so the test behavior is preserved when it runs under network-enabled
CI.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f1ca4efc-90c1-45c6-8c81-1a5c7c1e80d8

📥 Commits

Reviewing files that changed from the base of the PR and between 2a08326 and 3e35d68.

📒 Files selected for processing (6)
  • packages/notte-agent/src/notte_agent/common/validator.py
  • packages/notte-agent/src/notte_agent/falco/perception.py
  • packages/notte-browser/src/notte_browser/session.py
  • packages/notte-core/src/notte_core/browser/observation.py
  • tests/agent/test_validator.py
  • tests/integration/sdk/test_agent_fallback.py

Comment thread tests/integration/sdk/test_agent_fallback.py
- Convert test_validator_receives_url_in_action_history to async to avoid
  asyncio.run() conflicts with pytest-asyncio
- Add @pytest.mark.integration to E2E test requiring browser/LLM/network
- Fix docstring reference to correct test name in SDK test
- Register 'integration' marker in pyproject.toml

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants