🛡️ Sentinel: [HIGH] Fix SSRF vulnerability in JobParser by anchapin · Pull Request #214 · anchapin/resume-cli

anchapin · 2026-03-27T23:37:02Z

🛡️ Sentinel: [HIGH] Fix SSRF vulnerability in JobParser

Added IP address validation in cli/integrations/job_parser.py before fetching URLs to prevent Server-Side Request Forgery (SSRF) attacks against internal services or cloud metadata endpoints.

🚨 Severity: HIGH
💡 Vulnerability: SSRF when parsing job posting from URL
🎯 Impact: An attacker could force the CLI or API to fetch internal resources (e.g., http://localhost/ or http://169.254.169.254/) and access sensitive data.
🔧 Fix: Implemented _fetch_url_safe that uses socket.getaddrinfo and ipaddress to verify the resolved IP is not private, loopback, link-local, or reserved before calling requests.get(). Fails closed if resolution fails or returns an invalid format.
✅ Verification: Ran pytest tests/test_job_parser_integration.py to ensure normal URL fetching still works.

PR created automatically by Jules for task 3169344061033975925 started by @anchapin

Summary by Sourcery

Harden JobParser URL fetching to prevent SSRF by validating target IPs before making HTTP requests and updating security documentation accordingly.

Bug Fixes:

Prevent SSRF in JobParser by resolving hostnames and blocking URLs that map to private, loopback, link-local, reserved, or multicast IP addresses.
Treat missing or invalid hostnames and unsupported URL schemes as errors, failing closed before any HTTP request is issued.

Enhancements:

Refactor JobParser to centralize safe URL retrieval via a dedicated helper used by URL-based parsing.

Documentation:

Extend Sentinel security report with a new entry documenting the JobParser SSRF vulnerability, its impact, and mitigations.

Added IP address validation in `cli/integrations/job_parser.py` before fetching URLs to prevent Server-Side Request Forgery (SSRF) attacks against internal services or cloud metadata endpoints. 🚨 Severity: HIGH 💡 Vulnerability: SSRF when parsing job posting from URL 🎯 Impact: An attacker could force the CLI or API to fetch internal resources (e.g., http://localhost/ or http://169.254.169.254/) and access sensitive data. 🔧 Fix: Implemented `_fetch_url_safe` that uses `socket.getaddrinfo` and `ipaddress` to verify the resolved IP is not private, loopback, link-local, or reserved before calling `requests.get()`. Fails closed if resolution fails or returns an invalid format. ✅ Verification: Ran `pytest tests/test_job_parser_integration.py` to ensure normal URL fetching still works. Co-authored-by: anchapin <6326294+anchapin@users.noreply.github.com>

google-labs-jules · 2026-03-27T23:37:03Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

sourcery-ai · 2026-03-27T23:37:08Z

Reviewer's Guide

Adds a safe URL fetcher to JobParser that validates resolved IP addresses to mitigate SSRF, wires it into parse_from_url error handling, and documents the vulnerability and mitigation in the Sentinel security log.

Sequence diagram for safe URL fetching in JobParser.parse_from_url

sequenceDiagram
    actor User
    participant JobParser
    participant urllib
    participant socket
    participant ipaddress
    participant requests

    User->>JobParser: parse_from_url(url)
    JobParser->>JobParser: _fetch_url_safe(url)

    JobParser->>urllib: urlparse(url)
    urllib-->>JobParser: parsed_url
    JobParser->>JobParser: validate scheme and hostname

    JobParser->>socket: getaddrinfo(hostname, None)
    alt hostname resolution fails
        socket-->>JobParser: socket.gaierror
        JobParser-->>JobParser: raise ValueError
        JobParser-->>User: RuntimeError(unsafe or invalid URL)
    else hostname resolves
        socket-->>JobParser: addr_info
        JobParser->>ipaddress: ip_address(ip)
        ipaddress-->>JobParser: ip_obj
        alt ip is private or loopback or link_local or reserved or multicast
            JobParser-->>JobParser: raise ValueError
            JobParser-->>User: RuntimeError(unsafe or invalid URL)
        else ip allowed
            JobParser->>requests: get(url, headers, timeout)
            alt network error
                requests-->>JobParser: RequestException
                JobParser-->>User: RuntimeError(failed to fetch URL)
            else success
                requests-->>JobParser: response
                JobParser->>JobParser: response.raise_for_status()
                JobParser-->>JobParser: html_content
                JobParser->>JobParser: _parse_html(html_content)
                JobParser-->>JobParser: job_details
                JobParser->>JobParser: job_details.url = url
                JobParser->>JobParser: _save_to_cache(cache_key, job_details)
                JobParser-->>User: job_details
            end
        end
    end

Class diagram for updated JobParser URL parsing and fetching

classDiagram
    class JobParser {
        +parse_from_url(url: str) JobDetails
        -_parse_html(html: str) JobDetails
        -_fetch_url_safe(url: str) str
        -_is_linkedin(html: str) bool
        -_parse_generic(html: str) JobDetails
        -_save_to_cache(cache_key: str, job_details: JobDetails) None
    }

    class JobDetails {
        +url: str
        +title: str
        +company: str
        +location: str
        +description: str
    }

    JobParser --> JobDetails : parses

Flowchart for SSRF-safe _fetch_url_safe URL validation

flowchart TD
    A["Start _fetch_url_safe(url)"] --> B{"requests library available?"}
    B -- "No" --> C["Raise NotImplementedError: install requests"]
    B -- "Yes" --> D["parsed_url = urllib.parse.urlparse(url)"]
    D --> E{"Scheme in http or https?"}
    E -- "No" --> F["Raise ValueError: invalid scheme"]
    E -- "Yes" --> G{"Hostname present?"}
    G -- "No" --> H["Raise ValueError: missing hostname"]
    G -- "Yes" --> I["hostname = hostname.strip('[]')"]
    I --> J["addr_info = socket.getaddrinfo(hostname, None)"]

    J --> K{"socket.gaierror?"}
    K -- "Yes" --> L["Raise ValueError: failed to resolve hostname"]
    K -- "No" --> M["ip = addr_info[0][4][0]"]
    M --> N["ip_obj = ipaddress.ip_address(ip)"]

    N --> O{"ip is private or loopback or link_local or reserved or multicast?"}
    O -- "Yes" --> P["Raise ValueError: restricted IP"]
    O -- "No" --> Q["response = requests.get(url, headers, timeout=30)"]
    Q --> R["response.raise_for_status()"]
    R --> S["Return response.text"]

    N --> T{"ValueError from ipaddress or previous checks?"}
    T -- "Yes" --> U["Raise ValueError: invalid IP address format or restricted IP"]

File-Level Changes

Change	Details	Files
Route JobParser URL parsing through a new SSRF-safe fetch helper instead of calling requests.get directly.	Replace direct requests.get call in parse_from_url with _fetch_url_safe and pass returned HTML into _parse_html Set job_details.url and cache behavior unchanged after safe fetch Change exception handling in parse_from_url to convert ValueError from URL validation into a RuntimeError with a clear unsafe/invalid URL message Remove the ImportError special-case in parse_from_url; requests-availability check is now centralized in the helper	`cli/integrations/job_parser.py`
Introduce _fetch_url_safe to enforce scheme and IP-based restrictions before performing HTTP requests.	Add _fetch_url_safe method that validates scheme is http/https using urllib.parse.urlparse Validate hostname presence and normalize IPv6 literals by stripping brackets Resolve hostname via socket.getaddrinfo, pick the first IP, and parse it with ipaddress.ip_address Block requests whose IP is private, loopback, link-local, reserved, or multicast, failing closed on DNS resolution errors or invalid IP formats via ValueError Retain and reuse the existing User-Agent header and timeout when issuing requests.get, raising for HTTP errors before returning response.text Move the requests presence check into _fetch_url_safe, raising NotImplementedError with installation guidance if requests is missing	`cli/integrations/job_parser.py`
Document the discovered SSRF vulnerability in JobParser and its mitigation in the Sentinel security knowledge file.	Add a new dated section describing the high-severity SSRF in JobParser URL fetching, its impact, and the risks of using requests.get on untrusted URLs Record the IP-based pre-fetch validation using socket.getaddrinfo and ipaddress as the primary mitigation and note limitations regarding DNS rebinding and redirects	`.jules/sentinel.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 3 issues, and left some high level feedback:

In _fetch_url_safe, you only validate the first IP returned by socket.getaddrinfo; consider iterating over and validating all resolved addresses and rejecting the URL if any of them are in a restricted range to better guard against multi-A records and DNS-based tricks.
The current SSRF check runs only before the initial requests.get; if redirects are followed, responses could be fetched from internal addresses—consider disabling redirects or re-validating the resolved IP of each redirect target.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `_fetch_url_safe`, you only validate the first IP returned by `socket.getaddrinfo`; consider iterating over and validating all resolved addresses and rejecting the URL if any of them are in a restricted range to better guard against multi-A records and DNS-based tricks.
- The current SSRF check runs only before the initial `requests.get`; if redirects are followed, responses could be fetched from internal addresses—consider disabling redirects or re-validating the resolved IP of each redirect target.

## Individual Comments

### Comment 1
<location path="cli/integrations/job_parser.py" line_range="228-237" />
<code_context>

         # Fetch and parse
         try:
-
-            headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"}
</code_context>
<issue_to_address>
**🚨 suggestion (security):** Consider validating all resolved IPs instead of only the first from getaddrinfo.

Relying on `addr_info[0]` assumes the first DNS record is representative. If an attacker controls DNS, they could put a public IP first and a private/internal IP later; depending on the resolver, the client might still connect to the unvalidated address. Instead, iterate over all results from `socket.getaddrinfo`, require that every resolved IP is allowed, and reject the request if any entry is private, loopback, link-local, reserved, or multicast.

Suggested implementation:

```python
        addr_info = socket.getaddrinfo(hostname, port, type=socket.SOCK_STREAM)

        # Validate all resolved IPs; reject if any of them is disallowed.
        # This prevents an attacker-controlled DNS server from hiding a
        # private/internal IP behind a seemingly safe first record.
        for _family, _socktype, _proto, _canonname, sockaddr in addr_info:
            ip = sockaddr[0]
            self._validate_ip(ip)

```

Because I can’t see the full implementation of `_fetch_url_safe` and `_validate_ip`, you’ll likely need to:

1. Adjust the `SEARCH` block to match the actual code around your current `socket.getaddrinfo` call and the place where `addr_info[0]` is used.
2. Ensure `_validate_ip(ip: str)` rejects all of the following:
   - `ipaddress.ip_address(ip).is_private`
   - `.is_loopback`
   - `.is_link_local`
   - `.is_reserved`
   - `.is_multicast`
   and raises an appropriate exception if any of these are true.
3. Confirm that `hostname` and `port` are the same variables currently passed to `socket.getaddrinfo` in `_fetch_url_safe`, or adjust the names in the replacement accordingly.
</issue_to_address>

### Comment 2
<location path="cli/integrations/job_parser.py" line_range="226" />
<code_context>
         try:
-
-            headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"}
-            response = requests.get(url, headers=headers, timeout=30)
-            response.raise_for_status()
-
</code_context>
<issue_to_address>
**🚨 issue (security):** Redirect handling can bypass the SSRF protection by redirecting to internal/private IPs.

The SSRF validation only runs on the initial URL, but `requests.get` follows redirects, so an attacker could use a public URL that redirects to an internal/loopback IP and bypass the check. To keep the protection effective, either disable redirects (`allow_redirects=False` and treat 3xx as an error), or handle redirects yourself and re-validate each new URL/IP with the same `_fetch_url_safe` logic.
</issue_to_address>

### Comment 3
<location path="cli/integrations/job_parser.py" line_range="237-238" />
<code_context>
-            raise NotImplementedError(
-                "URL fetching requires 'requests' library. Install with: pip install requests"
-            )
+        except ValueError as e:
+            raise RuntimeError(f"Unsafe or invalid URL provided: {e}")
         except requests.RequestException as e:
             raise RuntimeError(f"Failed to fetch URL: {e}")
</code_context>
<issue_to_address>
**suggestion:** Preserve the original traceback when wrapping ValueError into RuntimeError.

The re-raised `RuntimeError` currently loses the original traceback, which complicates debugging. Please chain the exception (`raise RuntimeError("Unsafe or invalid URL provided") from e`) so callers retain the original stack from `_fetch_url_safe` along with the higher-level error.

```suggestion
        except ValueError as e:
            raise RuntimeError("Unsafe or invalid URL provided") from e
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-03-27T23:38:30Z

cli/integrations/job_parser.py

        try:
-
-            headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"}
-            response = requests.get(url, headers=headers, timeout=30)
-            response.raise_for_status()
-
-            job_details = self._parse_html(response.text)
+            html_content = self._fetch_url_safe(url)
+            job_details = self._parse_html(html_content)
            job_details.url = url

            # Save to cache
            self._save_to_cache(cache_key, job_details)

            return job_details
-
-        except ImportError:
-            raise NotImplementedError(
-                "URL fetching requires 'requests' library. Install with: pip install requests"
-            )
+        except ValueError as e:


🚨 suggestion (security): Consider validating all resolved IPs instead of only the first from getaddrinfo.

Relying on addr_info[0] assumes the first DNS record is representative. If an attacker controls DNS, they could put a public IP first and a private/internal IP later; depending on the resolver, the client might still connect to the unvalidated address. Instead, iterate over all results from socket.getaddrinfo, require that every resolved IP is allowed, and reject the request if any entry is private, loopback, link-local, reserved, or multicast.

Suggested implementation:

addr_info = socket.getaddrinfo(hostname, port, type=socket.SOCK_STREAM) # Validate all resolved IPs; reject if any of them is disallowed. # This prevents an attacker-controlled DNS server from hiding a # private/internal IP behind a seemingly safe first record. for _family, _socktype, _proto, _canonname, sockaddr in addr_info: ip = sockaddr[0] self._validate_ip(ip)

Because I can’t see the full implementation of _fetch_url_safe and _validate_ip, you’ll likely need to:

Adjust the SEARCH block to match the actual code around your current socket.getaddrinfo call and the place where addr_info[0] is used.

Ensure _validate_ip(ip: str) rejects all of the following:

ipaddress.ip_address(ip).is_private

.is_loopback

.is_link_local

.is_reserved

.is_multicast
and raises an appropriate exception if any of these are true.

Confirm that hostname and port are the same variables currently passed to socket.getaddrinfo in _fetch_url_safe, or adjust the names in the replacement accordingly.

sourcery-ai · 2026-03-27T23:38:30Z

cli/integrations/job_parser.py

@@ -221,23 +226,16 @@ def parse_from_url(self, url: str) -> JobDetails:



🚨 issue (security): Redirect handling can bypass the SSRF protection by redirecting to internal/private IPs.

The SSRF validation only runs on the initial URL, but requests.get follows redirects, so an attacker could use a public URL that redirects to an internal/loopback IP and bypass the check. To keep the protection effective, either disable redirects (allow_redirects=False and treat 3xx as an error), or handle redirects yourself and re-validate each new URL/IP with the same _fetch_url_safe logic.

sourcery-ai · 2026-03-27T23:38:30Z

cli/integrations/job_parser.py

+        except ValueError as e:
+            raise RuntimeError(f"Unsafe or invalid URL provided: {e}")


suggestion: Preserve the original traceback when wrapping ValueError into RuntimeError.

The re-raised RuntimeError currently loses the original traceback, which complicates debugging. Please chain the exception (raise RuntimeError("Unsafe or invalid URL provided") from e) so callers retain the original stack from _fetch_url_safe along with the higher-level error.

Suggested change

except ValueError as e:

raise RuntimeError(f"Unsafe or invalid URL provided: {e}")

except ValueError as e:

raise RuntimeError("Unsafe or invalid URL provided") from e

sourcery-ai bot reviewed Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🛡️ Sentinel: [HIGH] Fix SSRF vulnerability in JobParser#214

🛡️ Sentinel: [HIGH] Fix SSRF vulnerability in JobParser#214
anchapin wants to merge 1 commit intomainfrom
sentinel-ssrf-job-parser-3169344061033975925

anchapin commented Mar 27, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

google-labs-jules bot commented Mar 27, 2026

Uh oh!

sourcery-ai bot commented Mar 27, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Mar 27, 2026

Uh oh!

sourcery-ai bot Mar 27, 2026

Uh oh!

sourcery-ai bot Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -221,23 +226,16 @@ def parse_from_url(self, url: str) -> JobDetails:

		except ValueError as e:
		raise RuntimeError(f"Unsafe or invalid URL provided: {e}")

Conversation

anchapin commented Mar 27, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

google-labs-jules bot commented Mar 27, 2026

Uh oh!

sourcery-ai bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for safe URL fetching in JobParser.parse_from_url

Class diagram for updated JobParser URL parsing and fetching

Flowchart for SSRF-safe _fetch_url_safe URL validation

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anchapin commented Mar 27, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Mar 27, 2026 •

edited

Loading