Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 4 additions & 41 deletions .jules/sentinel.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,4 @@
## 2024-05-23 - [Input Validation and Syntax Fix]
**Vulnerability:** The `create_folder` function contained a syntax error (positional arg after keyword arg) preventing execution. Additionally, `folder_url` and `profile_id` lacked validation, potentially allowing SSRF (via non-HTTPS URLs) or path traversal/injection (via crafted profile IDs).
**Learning:** Even simple scripts need robust input validation, especially when inputs are used to construct URLs or file paths. A syntax error can mask security issues by preventing the code from running in the first place.
**Prevention:**
1. Always validate external inputs against a strict allowlist (e.g., regex for IDs, protocol check for URLs).
2. Use linters/static analysis to catch syntax errors before runtime.

## 2024-12-13 - [Sensitive Data Exposure in Logs]
**Vulnerability:** The application was logging full HTTP response bodies at ERROR level when requests failed. This could expose sensitive information (secrets, PII) returned by the API during failure states.
**Learning:** Defaulting to verbose logging in error handlers (e.g., `log.error(e.response.text)`) is risky because API error responses often contain context that should not be persisted in production logs.
**Prevention:**
1. Log sensitive data (like full request/response bodies) only at DEBUG level.
2. Sanitize or truncate log messages if they must be logged at higher levels.
## 2024-12-15 - [Sensitive Data Exposure in Logs]
**Vulnerability:** The application was logging full HTTP error response bodies at `ERROR` level. API error responses can often contain sensitive data like tokens, PII, or internal debug info.
**Learning:** Default logging configurations can lead to data leaks if raw response bodies are logged without sanitization or level checks.
**Prevention:**
1. Log potentially sensitive data (like raw HTTP bodies) only at `DEBUG` level.
2. At `INFO`/`ERROR` levels, log only safe summaries or status codes.

## 2024-12-16 - [DoS via Unbounded Response Size]
**Vulnerability:** The `_gh_get` function downloaded external JSON resources without any size limit. A malicious URL or compromised server could serve a massive file (e.g., 10GB), causing the application to consume all available memory (RAM) and crash (Denial of Service).
**Learning:** When fetching data from external sources, never assume the response size is safe. `httpx.get()` (and `requests.get`) reads the entire body into memory by default.
**Prevention:**
1. Use streaming responses (`client.stream("GET", ...)`) when fetching external resources.
2. Inspect `Content-Length` headers if available.
3. Enforce a hard limit on the number of bytes read during the stream loop.

## 2024-12-22 - [Sensitive Data Exposure in Logs (Headers)]
**Vulnerability:** The application's `sanitize_for_log` function was insufficient, only escaping characters but not redacting secrets. If an exception occurred that included headers (e.g. `Authorization`), the `TOKEN` could be exposed in logs.
**Learning:** Generic sanitization (like `repr()`) is not enough for secrets. Explicit redaction of known secrets is required.
**Prevention:**
1. Maintain a list of sensitive values (tokens, keys).
2. Ensure logging utilities check against this list and mask values before outputting.

## 2025-01-21 - [SSRF Protection and Input Limits]
**Vulnerability:** The `folder_url` validation checked for HTTPS but allowed internal IP addresses (e.g., `127.0.0.1`, `10.0.0.0/8`). This could theoretically allow Server-Side Request Forgery (SSRF) if the script is run in an environment with access to sensitive internal services. Additionally, `profile_id` had no length limit.
**Learning:** HTTPS validation alone is insufficient to prevent SSRF against internal services that might support HTTPS or use self-signed certs (if verification was disabled or bypassed). Explicitly blocking private IP ranges provides necessary defense-in-depth.
**Prevention:**
1. Parse URLs and check hostnames against `localhost` and private IP ranges using `ipaddress` module.
2. Enforce strict length limits on user inputs (e.g., profile IDs) to prevent resource exhaustion or buffer abuse.
## 2024-03-25 - [SSRF Protection Gap]

Check notice

Code scanning / Remark-lint (reported by Codacy)

Warn when references to undefined definitions are found. Note

[no-undefined-references] Found reference to undefined definition

Check notice

Code scanning / Remark-lint (reported by Codacy)

Warn when shortcut reference links are used. Note

[no-shortcut-reference-link] Use the trailing [] on reference links
**Vulnerability:** The `validate_folder_url` function checked for private IPs only if the input was an IP literal, allowing domains resolving to private IPs to bypass the check.
**Learning:** `ipaddress.ip_address()` raises `ValueError` for domains, which was caught and ignored. Validating a URL requires resolving the domain to an IP to check network-level access restrictions.
**Prevention:** Always resolve hostnames to IPs when validating against network boundaries (like private vs public networks), and handle DNS resolution failures securely (fail closed).
20 changes: 16 additions & 4 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import concurrent.futures
import threading
import ipaddress
import socket
from urllib.parse import urlparse
from typing import Dict, List, Optional, Any, Set, Sequence

Expand Down Expand Up @@ -205,12 +206,23 @@

try:
ip = ipaddress.ip_address(hostname)
if ip.is_private or ip.is_loopback:
log.warning(f"Skipping unsafe URL (private IP): {sanitize_for_log(url)}")
if ip.is_private or ip.is_loopback or ip.is_link_local:
log.warning(f"Skipping unsafe URL (private/local IP): {sanitize_for_log(url)}")

Check warning

Code scanning / Prospector (reported by Codacy)

Use lazy % formatting in logging functions (logging-fstring-interpolation) Warning

Use lazy % formatting in logging functions (logging-fstring-interpolation)

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Use lazy % formatting in logging functions Note

Use lazy % formatting in logging functions
return False
except ValueError:
# Not an IP literal, it's a domain.
pass
# Not an IP literal, resolve it
try:
resolved_ip = socket.gethostbyname(hostname)
ip = ipaddress.ip_address(resolved_ip)

Check warning

Code scanning / Pylint (reported by Codacy)

Variable name "ip" doesn't conform to snake_case naming style Warning

Variable name "ip" doesn't conform to snake_case naming style

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Variable name "ip" doesn't conform to snake_case naming style Warning

Variable name "ip" doesn't conform to snake_case naming style
if ip.is_private or ip.is_loopback or ip.is_link_local:
log.warning(f"Skipping unsafe URL (resolved to private/local IP {resolved_ip}): {sanitize_for_log(url)}")

Check warning

Code scanning / Pylint (reported by Codacy)

Line too long (125/100) Warning

Line too long (125/100)

Check warning

Code scanning / Prospector (reported by Codacy)

Use lazy % formatting in logging functions (logging-fstring-interpolation) Warning

Use lazy % formatting in logging functions (logging-fstring-interpolation)

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Line too long (125/100) Warning

Line too long (125/100)

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Use lazy % formatting in logging functions Note

Use lazy % formatting in logging functions
return False
except socket.gaierror:
log.warning(f"Skipping invalid URL (DNS resolution failed): {sanitize_for_log(url)}")

Check warning

Code scanning / Pylint (reported by Codacy)

Line too long (101/100) Warning

Line too long (101/100)

Check warning

Code scanning / Prospector (reported by Codacy)

Use lazy % formatting in logging functions (logging-fstring-interpolation) Warning

Use lazy % formatting in logging functions (logging-fstring-interpolation)

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Line too long (101/100) Warning

Line too long (101/100)

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Use lazy % formatting in logging functions Note

Use lazy % formatting in logging functions
return False
except ValueError:
# Should not happen if gethostbyname returns a valid IP
pass

except Exception as e:
log.warning(f"Failed to validate URL {sanitize_for_log(url)}: {e}")
Expand Down
Loading