fix: agent fallback validator by leo-notte · Pull Request #736 · nottelabs/notte

leo-notte · 2026-03-11T11:45:04Z

Summary by CodeRabbit

New Features
- Action execution and scrape results now include URL context so outputs show which page an action or scrape ran on.
- Enhanced system guidance clarifying behavior across multi-page navigation and element ID resets.
Tests
- Added comprehensive tests for URL tracking and multi-page navigation, plus an integration-marked placeholder test.
Chores
- Test config updated to support an "integration" marker.

Greptile Summary

This PR fixes a bug where the CompletionValidator would incorrectly reject a valid agent completion after page navigation. When an agent clicked a link that navigated to a new page, the validator saw the new page's elements but the action history referenced elements from the previous page — leading to false negatives. The fix threads the page URL at time of execution through the data model (ExecutionResult.url), surfaces it in the agent's perception layer (FalcoPerception.perceive_action_result), and adds a clear multi-page context explanation to the validator system prompt.

Key changes:

observation.py — adds an optional url field to ExecutionResult (backward compatible, defaults to None).
session.py — captures self.page.url before both action execution and scrape execution, and passes it to every newly created ExecutionResult.
perception.py — appends (on <url>) to the formatted action-result string when a URL is available, giving the LLM provenance for each action in the history.
validator.py — extends the system prompt with an IMPORTANT block explaining that element IDs reset per page and that action history URLs reflect where actions were actually performed.
New tests cover ExecutionResult URL population, perceive_action_result output format, system prompt content, and end-to-end validation after page navigation.

Concerns:

test_validator_accepts_completion_after_page_navigation in tests/agent/test_validator.py is a full end-to-end test (real browser, LLM, and external network calls to example.com/iana.org) with no pytest.mark.integration guard or skip condition. It will run with the unit-test suite in any environment and fail in offline/resource-constrained CI.
Inside test_validator_receives_url_in_action_history, asyncio.run() is called inside a synchronous test to append to a trajectory. This can conflict with an already-running event loop in some pytest configurations.

Confidence Score: 4/5

Safe to merge — the core fix is correct, backward compatible, and well-tested; only the test hygiene concerns need attention.
The data-model change (url field with None default) is fully backward compatible. URL capture happens at exactly the right moment (before execution) in both the action and scrape code paths. The perception and validator prompt changes are additive. The only risks are a fragile end-to-end test placed outside the integration folder (no pytest.mark guard) and potential asyncio.run conflicts in the test suite — neither affects production behaviour.
tests/agent/test_validator.py — the new end-to-end test test_validator_accepts_completion_after_page_navigation should be guarded with pytest.mark.integration or moved to tests/integration/.

Important Files Changed

Filename	Overview
packages/notte-core/src/notte_core/browser/observation.py	Adds an optional `url: str
packages/notte-browser/src/notte_browser/session.py	Captures `self.page.url` before both action execution and scrape execution, then passes it to all created `ExecutionResult` objects via the new `url` field. Correctly covers both the action path and the scrape path.
packages/notte-agent/src/notte_agent/falco/perception.py	Adds `(on <url>)` to the success and failure action-result strings when a URL is present, providing the LLM with provenance context for each action in the history.
packages/notte-agent/src/notte_agent/common/validator.py	Adds an IMPORTANT block to the validator system prompt explaining that actions were performed on previous pages, that the current observation is the result of navigation, and that element IDs reset per page.
tests/agent/test_validator.py	Adds several unit tests for the new URL field and one end-to-end test (`test_validator_accepts_completion_after_page_navigation`) that requires a real browser, LLM, and external network access; this test is placed in the non-integration test folder without a marker and without a skip guard, making it potentially fragile in offline/CI environments.
tests/integration/sdk/test_agent_fallback.py	Adds a stub test that immediately calls `pytest.skip`, redirecting testers to the local test in `test_validator.py`. Clean placeholder with a useful message.

_{Last reviewed commit: 3e35d68}

Greptile also left 2 inline comments on this PR.

coderabbitai · 2026-03-11T11:45:23Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a7e5b273-b9a2-42c5-b60d-665491aa5b25

📥 Commits

Reviewing files that changed from the base of the PR and between 3e35d68 and 584fde6.

📒 Files selected for processing (3)

pyproject.toml
tests/agent/test_validator.py
tests/integration/sdk/test_agent_fallback.py

✅ Files skipped from review due to trivial changes (1)

tests/integration/sdk/test_agent_fallback.py

Walkthrough

Added URL context propagation across the agent/browser pipeline: a new optional url field on ExecutionResult captures page URL at action execution and scrape time; session code captures and supplies the URL to execution and scrape results; perception formatting includes URL context in success/failure messages; validator text was extended with a multi-page navigation note; tests and an integration marker were added/updated to assert URL presence and related behaviors.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: agent fallback validator' directly reflects the main objective of addressing a bug where CompletionValidator rejects valid completions after page navigation, and aligns with the primary changes across multiple files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/agent-fallback-validator

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

+    validation_msg = validator.validation_message(completion, trajectory, progress, mock_obs)
+
+    # The validation message should contain the URL context
+    assert "https://example.com" in validation_msg, (


coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/agent/test_validator.py (1)
286-329: E2E test relies on external sites.

This test depends on example.com and iana.org being available and maintaining their current link structure. While this is a reasonable assumption for example.com (it's specifically designed as a stable example domain), consider adding a brief comment noting this dependency, or ensure the test is marked appropriately for CI environments where external network access may be restricted.
💡 Optional: Add network dependency marker
+@pytest.mark.network
 def test_validator_accepts_completion_after_page_navigation():
     """
     End-to-end test with real LLM and AgentFallback: Verify that the validator
This allows CI configurations to skip network-dependent tests if needed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/agent/test_validator.py` around lines 286 - 329, The test
test_validator_accepts_completion_after_page_navigation relies on external sites
(example.com and iana.org); update the test to either document this dependency
with a brief inline comment near the test name or add a pytest marker so CI can
skip it when network access is restricted (e.g., add `@pytest.mark.network` or
`@pytest.mark.integration` or a `@pytest.mark.skipif` network-check before the
test), and ensure references to notte.Session and notte.AgentFallback remain
unchanged so the test behavior is preserved when it runs under network-enabled
CI.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/sdk/test_agent_fallback.py`:
- Around line 62-70: Docstring in
test_agent_fallback_validator_with_page_navigation references a non-existent
local test name; update the docstring to reference the actual local test name
test_validator_accepts_completion_after_page_navigation (or the correct local
test) so the comment is accurate, i.e., edit the triple-quoted string inside
test_agent_fallback_validator_with_page_navigation to replace
test_agent_fallback_validator_with_page_navigation_local with
test_validator_accepts_completion_after_page_navigation.

---

Nitpick comments:
In `@tests/agent/test_validator.py`:
- Around line 286-329: The test
test_validator_accepts_completion_after_page_navigation relies on external sites
(example.com and iana.org); update the test to either document this dependency
with a brief inline comment near the test name or add a pytest marker so CI can
skip it when network access is restricted (e.g., add `@pytest.mark.network` or
`@pytest.mark.integration` or a `@pytest.mark.skipif` network-check before the
test), and ensure references to notte.Session and notte.AgentFallback remain
unchanged so the test behavior is preserved when it runs under network-enabled
CI.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f1ca4efc-90c1-45c6-8c81-1a5c7c1e80d8

📥 Commits

Reviewing files that changed from the base of the PR and between 2a08326 and 3e35d68.

📒 Files selected for processing (6)

packages/notte-agent/src/notte_agent/common/validator.py
packages/notte-agent/src/notte_agent/falco/perception.py
packages/notte-browser/src/notte_browser/session.py
packages/notte-core/src/notte_core/browser/observation.py
tests/agent/test_validator.py
tests/integration/sdk/test_agent_fallback.py

- Convert test_validator_receives_url_in_action_history to async to avoid asyncio.run() conflicts with pytest-asyncio - Add @pytest.mark.integration to E2E test requiring browser/LLM/network - Fix docstring reference to correct test name in SDK test - Register 'integration' marker in pyproject.toml Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix: agent fallback validator

3e35d68

github-advanced-security AI found potential problems Mar 11, 2026

View reviewed changes

greptile-apps Bot reviewed Mar 11, 2026

View reviewed changes

Comment thread tests/agent/test_validator.py

Comment thread tests/agent/test_validator.py Outdated

coderabbitai Bot reviewed Mar 11, 2026

View reviewed changes

Comment thread tests/integration/sdk/test_agent_fallback.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: agent fallback validator#736

fix: agent fallback validator#736
leo-notte wants to merge 2 commits intomainfrom
fix/agent-fallback-validator

leo-notte commented Mar 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 11, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

Check failure

Copilot Autofix

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leo-notte commented Mar 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

coderabbitai Bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Check failure

Uh oh!

Copilot Autofix

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leo-notte commented Mar 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 11, 2026 •

edited

Loading