Skip to content

πŸ›‘οΈ Sentinel: [HIGH] Fix Server-Side Request Forgery in JobParser#224

Open
anchapin wants to merge 3 commits intomainfrom
sentinel-ssrf-fix-13007455400231532099
Open

πŸ›‘οΈ Sentinel: [HIGH] Fix Server-Side Request Forgery in JobParser#224
anchapin wants to merge 3 commits intomainfrom
sentinel-ssrf-fix-13007455400231532099

Conversation

@anchapin
Copy link
Copy Markdown
Owner

@anchapin anchapin commented Apr 1, 2026

🚨 Severity: HIGH
πŸ’‘ Vulnerability: Server-Side Request Forgery (SSRF) was possible because parse_from_url passed arbitrary user-supplied URLs to requests.get without prior validation.
🎯 Impact: An attacker could exploit this to force the application server to make unauthorized requests to internal services, metadata endpoints (e.g., AWS IMDS), or the local loopback interface.
πŸ”§ Fix: Implemented a new _fetch_url_safe method that parses the URL scheme, resolves the hostname to its IP address, and verifies that the destination IP does not belong to private, loopback, link-local, reserved, or unspecified (0.0.0.0) ranges before passing the request along. The method fails securely on resolution errors.
βœ… Verification: Verified by attempting to access 169.254.169.254 and 0.0.0.0 programmatically with JobParser and confirming the operations threw exceptions, along with successful execution of the full test suite.


PR created automatically by Jules for task 13007455400231532099 started by @anchapin

Summary by Sourcery

Harden JobParser URL fetching to prevent server-side request forgery when parsing jobs from remote URLs.

New Features:

  • Add a JobParser helper that safely fetches HTTP(S) URLs with hostname resolution and IP range validation before issuing requests.

Bug Fixes:

  • Fix an SSRF vulnerability where JobParser.parse_from_url previously issued requests to arbitrary user-supplied URLs without validation.

Documentation:

  • Document the JobParser SSRF vulnerability, its root cause, and recommended prevention measures in the Sentinel security log.

Tests:

  • Add simple scripts to manually exercise JobParser SSRF protections against metadata and 0.0.0.0 targets.

Co-authored-by: anchapin <6326294+anchapin@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

πŸ‘‹ Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a πŸ‘€ emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Apr 1, 2026

Reviewer's Guide

Adds SSRF protection to JobParser URL fetching by introducing a safe URL fetch helper that validates scheme and resolved IP ranges, updates error handling accordingly, documents the incident in sentinel.md, and includes ad-hoc scripts for manually testing blocked metadata and 0.0.0.0 endpoints.

Sequence diagram for JobParser.parse_from_url with SSRF-safe fetching

sequenceDiagram
    actor Client
    participant JobParser
    participant _fetch_url_safe
    participant DNSResolver
    participant IPAddressValidator
    participant HttpClient

    Client->>JobParser: parse_from_url(url)
    JobParser->>_fetch_url_safe: _fetch_url_safe(url)

    _fetch_url_safe->>_fetch_url_safe: parse URL
    alt invalid_scheme
        _fetch_url_safe-->>JobParser: ValueError (invalid scheme)
        JobParser-->>Client: RuntimeError Failed to fetch URL
    end

    _fetch_url_safe->>DNSResolver: getaddrinfo(hostname)
    DNSResolver-->>_fetch_url_safe: addrinfo or socket.gaierror
    alt resolution_failed
        _fetch_url_safe-->>JobParser: ValueError could not resolve hostname
        JobParser-->>Client: RuntimeError Failed to fetch URL
    else resolution_succeeded
        loop each_resolved_address
            _fetch_url_safe->>IPAddressValidator: ip_address(ip_addr)
            IPAddressValidator-->>_fetch_url_safe: ip object or ValueError
            alt invalid_ip
                _fetch_url_safe-->>JobParser: ValueError invalid IP address
                JobParser-->>Client: RuntimeError Failed to fetch URL
            else internal_or_disallowed_range
                _fetch_url_safe-->>JobParser: RuntimeError internal IP not allowed
                JobParser-->>Client: RuntimeError Failed to fetch URL
            end
        end
        _fetch_url_safe->>HttpClient: requests.get(url, headers, timeout=30)
        HttpClient-->>_fetch_url_safe: Response or RequestException
        alt http_error
            _fetch_url_safe-->>JobParser: requests.RequestException
            JobParser-->>Client: RuntimeError Failed to fetch URL
        else http_success
            _fetch_url_safe-->>JobParser: Response
            JobParser->>JobParser: response.raise_for_status()
            JobParser->>JobParser: _parse_html(response.text)
            JobParser-->>Client: JobDetails
        end
    end
Loading

Updated class diagram for JobParser with SSRF-safe URL fetching

classDiagram
    class JobParser {
        +parse_from_file(file_path: Path, url: Optional[str]) JobDetails
        +parse_from_url(url: str) JobDetails
        -_fetch_url_safe(url: str) requests.Response
        -_parse_html(html: str) JobDetails
    }

    class Requests {
        +get(url: str, headers: dict, timeout: int) Response
    }

    class Response {
        +text: str
        +raise_for_status() None
    }

    JobParser ..> Requests: uses
    JobParser ..> Response: returns_and_consumes
Loading

Flow diagram for _fetch_url_safe SSRF validation logic

flowchart TD
    A_start["Start _fetch_url_safe(url)"] --> B_parse_url["Parse URL with urlparse"]
    B_parse_url --> C_check_scheme{"Is scheme http or https?"}
    C_check_scheme -- No --> Z1_invalid_scheme["Raise ValueError Invalid URL scheme"]
    C_check_scheme -- Yes --> D_check_hostname{"Hostname present?"}
    D_check_hostname -- No --> Z2_no_hostname["Raise ValueError Invalid URL no hostname"]
    D_check_hostname -- Yes --> E_resolve_dns["Call socket.getaddrinfo(hostname, None)"]

    E_resolve_dns --> F_dns_result{"DNS resolution successful?"}
    F_dns_result -- No (socket.gaierror) --> Z3_dns_error["Raise ValueError Could not resolve hostname"]
    F_dns_result -- Yes --> G_loop_addresses["For each addrinfo result"]

    G_loop_addresses --> H_extract_ip["Extract ip_addr from addrinfo"]
    H_extract_ip --> I_parse_ip["Parse with ipaddress.ip_address(ip_addr)"]
    I_parse_ip --> J_ip_parse_ok{"IP parsed successfully?"}
    J_ip_parse_ok -- No (ValueError) --> Z4_invalid_ip["Raise ValueError Invalid IP address resolved"]
    J_ip_parse_ok -- Yes --> K_check_range{"IP is private, loopback, link-local, reserved, or unspecified?"}
    K_check_range -- Yes --> Z5_internal_ip["Raise RuntimeError Access to internal IP address not allowed"]
    K_check_range -- No --> L_next_or_continue["Next resolved address or proceed"]

    L_next_or_continue --> M_all_checked{"All resolved addresses checked?"}
    M_all_checked -- No --> G_loop_addresses
    M_all_checked -- Yes --> N_make_request["Call requests.get(url, headers, timeout=30)"]

    N_make_request --> O_return_response["Return Response"]
    O_return_response --> P_end["End _fetch_url_safe"]
Loading

File-Level Changes

Change Details Files
Introduce a safe URL fetching helper that validates schemes and blocks internal IP ranges before performing HTTP requests in JobParser.
  • Add _fetch_url_safe helper to parse URLs, enforce http/https schemes, and require a hostname
  • Resolve hostnames via socket.getaddrinfo and validate each resolved address with ipaddress to reject private, loopback, link-local, reserved, or unspecified IPs, failing closed on resolution or parsing errors
  • Use the safe helper from parse_from_url instead of calling requests.get directly and broaden exception handling to include ValueError and RuntimeError raised by the helper
cli/integrations/job_parser.py
Document the SSRF vulnerability and the applied mitigations in the security sentinel notes.
  • Replace prior sentinel entries with a new section describing the JobParser SSRF vulnerability, its impact, learnings, and prevention guidance
.jules/sentinel.md
Add standalone scripts to manually exercise SSRF protection against metadata and 0.0.0.0 targets.
  • Add a small script that imports JobParser and calls parse_from_url against the AWS metadata IP, printing and tracing any exceptions
  • Add a similar script that targets 0.0.0.0:8000 for manual verification of blocking behavior
test_ssrf.py
test_ssrf_0000.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The current SSRF mitigation still has a TOCTOU window for DNS rebinding because you validate the resolved IP but then call requests.get with the hostname; consider either constructing the request using the validated IP (and explicitly setting the Host header) or otherwise ensuring the actual connection target cannot change after validation.
  • The new _fetch_url_safe helper imports urlparse, socket, and ipaddress on every call; moving these imports to module scope would reduce per-call overhead and make the function easier to read.
  • The standalone test_ssrf*.py scripts at the repository root look like ad-hoc helpers; consider moving them into an appropriate tests or tools directory (or gating them behind a CLI entry point) so they don’t clutter the main package surface.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The current SSRF mitigation still has a TOCTOU window for DNS rebinding because you validate the resolved IP but then call `requests.get` with the hostname; consider either constructing the request using the validated IP (and explicitly setting the `Host` header) or otherwise ensuring the actual connection target cannot change after validation.
- The new `_fetch_url_safe` helper imports `urlparse`, `socket`, and `ipaddress` on every call; moving these imports to module scope would reduce per-call overhead and make the function easier to read.
- The standalone `test_ssrf*.py` scripts at the repository root look like ad-hoc helpers; consider moving them into an appropriate tests or tools directory (or gating them behind a CLI entry point) so they don’t clutter the main package surface.

## Individual Comments

### Comment 1
<location path="cli/integrations/job_parser.py" line_range="287" />
<code_context>
                 "URL fetching requires 'requests' library. Install with: pip install requests"
             )
-        except requests.RequestException as e:
+        except (requests.RequestException, ValueError, RuntimeError) as e:
             raise RuntimeError(f"Failed to fetch URL: {e}")

</code_context>
<issue_to_address>
**suggestion (bug_risk):** Catching broad `ValueError`/`RuntimeError` here may hide programming errors unrelated to URL fetching.

This `except (requests.RequestException, ValueError, RuntimeError)` will also catch `ValueError`/`RuntimeError` from `_parse_html` or other later code, incorrectly rewrapping them as fetch failures. Please restrict this handler to the concrete exceptions raised by `_fetch_url_safe`/`requests`, or split the `try` so parsing errors are not masked as network/URL-fetch errors.

Suggested implementation:

```python
            try:
                response = self._fetch_url_safe(url)
                response.raise_for_status()
            except (requests.RequestException, ValueError, RuntimeError) as e:
                # Limit this handler strictly to URL fetching / validation failures
                raise RuntimeError(f"Failed to fetch URL: {e}")

            job_details = self._parse_html(response.text)
            raise NotImplementedError(
                "URL fetching requires 'requests' library. Install with: pip install requests"
            )

```

If there are other calls to `_parse_html` wrapped in the same broad `except (requests.RequestException, ValueError, RuntimeError)` pattern elsewhere in this file, they should be refactored similarly: keep the exception handler around only the fetch/validation logic, and let parsing errors propagate normally.
</issue_to_address>

### Comment 2
<location path="test_ssrf.py" line_range="1-10" />
<code_context>
+import os
+import sys
+
+# Make sure we can import from the cli package
+sys.path.insert(0, os.path.abspath('.'))
+
+from cli.integrations.job_parser import JobParser
+
+parser = JobParser()
+try:
+    print("Testing JobParser SSRF with metadata endpoint...")
+    parser.parse_from_url("http://169.254.169.254/latest/meta-data/")
+except Exception as e:
+    print(f"Error: {e}")
+    import traceback
+    traceback.print_exc()
</code_context>
<issue_to_address>
**suggestion (testing):** Convert this script into an automated test (e.g., pytest/unittest) with concrete assertions instead of a manual runner with print/traceback output.

Right now this behaves as a manual debug script (prints output, catches all exceptions, prints tracebacks), so it won’t be picked up or enforced by the test runner/CI. Either convert it into a real test (e.g., `test_*.py` with pytest assertions / `with pytest.raises(...)` on the expected exception type/message) so SSRF regressions fail CI, or move it to a `scripts/` or `manual_tests/` directory and keep automated coverage in a dedicated test module.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click πŸ‘ or πŸ‘Ž on each comment and I'll use the feedback to improve your reviews.

"URL fetching requires 'requests' library. Install with: pip install requests"
)
except requests.RequestException as e:
except (requests.RequestException, ValueError, RuntimeError) as e:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Catching broad ValueError/RuntimeError here may hide programming errors unrelated to URL fetching.

This except (requests.RequestException, ValueError, RuntimeError) will also catch ValueError/RuntimeError from _parse_html or other later code, incorrectly rewrapping them as fetch failures. Please restrict this handler to the concrete exceptions raised by _fetch_url_safe/requests, or split the try so parsing errors are not masked as network/URL-fetch errors.

Suggested implementation:

            try:
                response = self._fetch_url_safe(url)
                response.raise_for_status()
            except (requests.RequestException, ValueError, RuntimeError) as e:
                # Limit this handler strictly to URL fetching / validation failures
                raise RuntimeError(f"Failed to fetch URL: {e}")

            job_details = self._parse_html(response.text)
            raise NotImplementedError(
                "URL fetching requires 'requests' library. Install with: pip install requests"
            )

If there are other calls to _parse_html wrapped in the same broad except (requests.RequestException, ValueError, RuntimeError) pattern elsewhere in this file, they should be refactored similarly: keep the exception handler around only the fetch/validation logic, and let parsing errors propagate normally.

test_ssrf.py Outdated
Comment on lines +1 to +10
import os
import sys

# Make sure we can import from the cli package
sys.path.insert(0, os.path.abspath('.'))

from cli.integrations.job_parser import JobParser

parser = JobParser()
try:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Convert this script into an automated test (e.g., pytest/unittest) with concrete assertions instead of a manual runner with print/traceback output.

Right now this behaves as a manual debug script (prints output, catches all exceptions, prints tracebacks), so it won’t be picked up or enforced by the test runner/CI. Either convert it into a real test (e.g., test_*.py with pytest assertions / with pytest.raises(...) on the expected exception type/message) so SSRF regressions fail CI, or move it to a scripts/ or manual_tests/ directory and keep automated coverage in a dedicated test module.

google-labs-jules bot and others added 2 commits April 1, 2026 23:32
Co-authored-by: anchapin <6326294+anchapin@users.noreply.github.com>
Co-authored-by: anchapin <6326294+anchapin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant