🛡️ Sentinel: [CRITICAL] Fix SSRF vulnerability in JobParser#227
🛡️ Sentinel: [CRITICAL] Fix SSRF vulnerability in JobParser#227
Conversation
- Implemented `_fetch_url_safe` to validate IP addresses before fetching URLs - Blocked private, loopback, link-local, reserved, and unspecified IPs - Safely followed redirects to prevent bypasses - Handled `requests` as an optional dependency securely Co-authored-by: anchapin <6326294+anchapin@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Reviewer's GuideRefactors JobParser URL fetching to route all network requests through a new SSRF-hardened helper that validates schemes, resolves and filters IPs, and manually validates redirects, while simplifying error handling and making the requests dependency check explicit. Updated class diagram for JobParser SSRF hardeningclassDiagram
class SSRFError{
<<exception>>
}
SSRFError --|> ValueError
class JobParser{
+parse_from_url(url str) JobDetails
-_fetch_url_safe(url str) Any
-_parse_html(html str) JobDetails
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The IP validation logic for the initial URL and for redirect targets is nearly identical; consider extracting a shared helper (e.g.
_validate_resolved_ips(hostname, context_label)) to reduce duplication and keep the SSRF policy consistent in one place. - Redirect handling currently relies on
response.is_redirectand aLocationheader; you may want to explicitly check3xxstatus codes and normalize/validate the redirect URL once to avoid edge cases (e.g., 304 or malformedLocationheaders) and make the redirect logic easier to follow.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The IP validation logic for the initial URL and for redirect targets is nearly identical; consider extracting a shared helper (e.g. `_validate_resolved_ips(hostname, context_label)`) to reduce duplication and keep the SSRF policy consistent in one place.
- Redirect handling currently relies on `response.is_redirect` and a `Location` header; you may want to explicitly check `3xx` status codes and normalize/validate the redirect URL once to avoid edge cases (e.g., 304 or malformed `Location` headers) and make the redirect logic easier to follow.
## Individual Comments
### Comment 1
<location path="cli/integrations/job_parser.py" line_range="249" />
<code_context>
- "URL fetching requires 'requests' library. Install with: pip install requests"
- )
- except requests.RequestException as e:
+ except (requests.RequestException, SSRFError) as e:
raise RuntimeError(f"Failed to fetch URL: {e}")
</code_context>
<issue_to_address>
**issue (bug_risk):** Referencing requests.RequestException when requests can be None may cause import-time failures.
Since `requests` may be set to `None` on import failure, `requests.RequestException` in the `except` tuple is evaluated at function definition time and will raise `AttributeError` before the runtime `requests is None` check can run. Consider importing `RequestException` into a separate name (with a fallback) or restructuring the `except` so it doesn’t dereference `requests` when it may be `None`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| "URL fetching requires 'requests' library. Install with: pip install requests" | ||
| ) | ||
| except requests.RequestException as e: | ||
| except (requests.RequestException, SSRFError) as e: |
There was a problem hiding this comment.
issue (bug_risk): Referencing requests.RequestException when requests can be None may cause import-time failures.
Since requests may be set to None on import failure, requests.RequestException in the except tuple is evaluated at function definition time and will raise AttributeError before the runtime requests is None check can run. Consider importing RequestException into a separate name (with a fallback) or restructuring the except so it doesn’t dereference requests when it may be None.
🚨 Severity: CRITICAL
💡 Vulnerability: The
JobParser'sparse_from_urlmethod previously took user-supplied URLs and fetched them directly withrequests.get(). This is a classic Server-Side Request Forgery (SSRF) vulnerability. It allowed arbitrary requests to be made from the server, potentially giving attackers access to internal networks, cloud metadata APIs, or other internal services.🎯 Impact: An attacker could force the server to execute arbitrary GET requests.
🔧 Fix: Implemented a new
_fetch_url_safemethod that performs DNS resolution usingsocket.getaddrinfo, verifies the IP address does not belong to any internal/restricted ranges (is_private,is_loopback,is_link_local,is_reserved,is_multicast,is_unspecified), and explicitly follows redirects manually (allow_redirects=False) to ensure redirect targets are similarly validated.✅ Verification: Ran
pytest tests/test_job_parser_integration.pywhich passes all checks correctly without regressing on existing valid URL fetches.PR created automatically by Jules for task 14763629633683345882 started by @anchapin
Summary by Sourcery
Harden JobParser URL fetching to prevent SSRF when parsing jobs from remote URLs.
Bug Fixes:
Enhancements: