Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 5 additions & 40 deletions .jules/sentinel.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,6 @@
## 2024-05-23 - [Input Validation and Syntax Fix]
**Vulnerability:** The `create_folder` function contained a syntax error (positional arg after keyword arg) preventing execution. Additionally, `folder_url` and `profile_id` lacked validation, potentially allowing SSRF (via non-HTTPS URLs) or path traversal/injection (via crafted profile IDs).
**Learning:** Even simple scripts need robust input validation, especially when inputs are used to construct URLs or file paths. A syntax error can mask security issues by preventing the code from running in the first place.
**Prevention:**
1. Always validate external inputs against a strict allowlist (e.g., regex for IDs, protocol check for URLs).
2. Use linters/static analysis to catch syntax errors before runtime.
# Sentinel's Journal

## 2024-12-13 - [Sensitive Data Exposure in Logs]
**Vulnerability:** The application was logging full HTTP response bodies at ERROR level when requests failed. This could expose sensitive information (secrets, PII) returned by the API during failure states.
**Learning:** Defaulting to verbose logging in error handlers (e.g., `log.error(e.response.text)`) is risky because API error responses often contain context that should not be persisted in production logs.
**Prevention:**
1. Log sensitive data (like full request/response bodies) only at DEBUG level.
2. Sanitize or truncate log messages if they must be logged at higher levels.
## 2024-12-15 - [Sensitive Data Exposure in Logs]
**Vulnerability:** The application was logging full HTTP error response bodies at `ERROR` level. API error responses can often contain sensitive data like tokens, PII, or internal debug info.
**Learning:** Default logging configurations can lead to data leaks if raw response bodies are logged without sanitization or level checks.
**Prevention:**
1. Log potentially sensitive data (like raw HTTP bodies) only at `DEBUG` level.
2. At `INFO`/`ERROR` levels, log only safe summaries or status codes.

## 2024-12-16 - [DoS via Unbounded Response Size]
**Vulnerability:** The `_gh_get` function downloaded external JSON resources without any size limit. A malicious URL or compromised server could serve a massive file (e.g., 10GB), causing the application to consume all available memory (RAM) and crash (Denial of Service).
**Learning:** When fetching data from external sources, never assume the response size is safe. `httpx.get()` (and `requests.get`) reads the entire body into memory by default.
**Prevention:**
1. Use streaming responses (`client.stream("GET", ...)`) when fetching external resources.
2. Inspect `Content-Length` headers if available.
3. Enforce a hard limit on the number of bytes read during the stream loop.

## 2024-12-22 - [Sensitive Data Exposure in Logs (Headers)]
**Vulnerability:** The application's `sanitize_for_log` function was insufficient, only escaping characters but not redacting secrets. If an exception occurred that included headers (e.g. `Authorization`), the `TOKEN` could be exposed in logs.
**Learning:** Generic sanitization (like `repr()`) is not enough for secrets. Explicit redaction of known secrets is required.
**Prevention:**
1. Maintain a list of sensitive values (tokens, keys).
2. Ensure logging utilities check against this list and mask values before outputting.

## 2025-01-21 - [SSRF Protection and Input Limits]
**Vulnerability:** The `folder_url` validation checked for HTTPS but allowed internal IP addresses (e.g., `127.0.0.1`, `10.0.0.0/8`). This could theoretically allow Server-Side Request Forgery (SSRF) if the script is run in an environment with access to sensitive internal services. Additionally, `profile_id` had no length limit.
**Learning:** HTTPS validation alone is insufficient to prevent SSRF against internal services that might support HTTPS or use self-signed certs (if verification was disabled or bypassed). Explicitly blocking private IP ranges provides necessary defense-in-depth.
**Prevention:**
1. Parse URLs and check hostnames against `localhost` and private IP ranges using `ipaddress` module.
2. Enforce strict length limits on user inputs (e.g., profile IDs) to prevent resource exhaustion or buffer abuse.
## 2025-01-20 - SSRF Vulnerability in URL Validation
**Vulnerability:** The `validate_folder_url` function checked for explicit localhost strings and IP literals but failed to resolve domain names. This allowed attackers to bypass SSRF protection by using a domain name that resolves to a private IP (e.g., `local.example.com` -> `127.0.0.1`).
**Learning:** Checking hostnames against a blocklist is insufficient because DNS resolution decouples the name from the IP. `ipaddress` library only validates literals.
**Prevention:** Always resolve the hostname to an IP address and check the resolved IP against private ranges (`is_private`, `is_loopback`) before making a request. Be aware of TOCTOU (Time-of-Check Time-of-Use) issues like DNS rebinding, though basic resolution is a good first line of defense.
19 changes: 18 additions & 1 deletion main.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import concurrent.futures
import threading
import ipaddress
import socket
from urllib.parse import urlparse
from typing import Dict, List, Optional, Any, Set, Sequence

Expand Down Expand Up @@ -200,7 +201,7 @@

# Check for potentially malicious hostnames
if hostname.lower() in ('localhost', '127.0.0.1', '::1'):
log.warning(f"Skipping unsafe URL (localhost detected): {sanitize_for_log(url)}")

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Bad indentation. Found 13 spaces, expected 12 Note

Bad indentation. Found 13 spaces, expected 12
return False

try:
Expand All @@ -210,7 +211,23 @@
return False
except ValueError:
# Not an IP literal, it's a domain.
pass
try:
# Resolve domain to IP to check for private addresses (SSRF protection)

Check notice

Code scanning / Pylint (reported by Codacy)

Catching too general exception Exception Note

Catching too general exception Exception
resolved_ip_str = socket.gethostbyname(hostname)
resolved_ip = ipaddress.ip_address(resolved_ip_str)
if resolved_ip.is_private or resolved_ip.is_loopback:
log.warning(f"Skipping unsafe URL (domain resolves to private IP {resolved_ip_str}): {sanitize_for_log(url)}")

Check warning

Code scanning / Pylint (reported by Codacy)

Line too long (130/100) Warning

Line too long (130/100)

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Line too long (130/100) Warning

Line too long (130/100)

Check warning

Code scanning / Prospector (reported by Codacy)

Use lazy % formatting in logging functions (logging-fstring-interpolation) Warning

Use lazy % formatting in logging functions (logging-fstring-interpolation)

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Use lazy % formatting in logging functions Note

Use lazy % formatting in logging functions
return False
except socket.gaierror:
# DNS resolution failed, which might happen if the domain is invalid

Check warning

Code scanning / Prospector (reported by Codacy)

over-indented (comment) (E117) Warning

over-indented (comment) (E117)

Check warning

Code scanning / Prospector (reported by Codacy)

indentation is not a multiple of four (comment) (E114) Warning

indentation is not a multiple of four (comment) (E114)
# We'll allow it to proceed to HTTP request which will likely fail,

Check warning

Code scanning / Prospector (reported by Codacy)

indentation is not a multiple of four (comment) (E114) Warning

indentation is not a multiple of four (comment) (E114)

Check warning

Code scanning / Prospector (reported by Codacy)

over-indented (comment) (E117) Warning

over-indented (comment) (E117)
# or we could be strict and reject it.

Check warning

Code scanning / Prospector (reported by Codacy)

indentation is not a multiple of four (comment) (E114) Warning

indentation is not a multiple of four (comment) (E114)

Check warning

Code scanning / Prospector (reported by Codacy)

over-indented (comment) (E117) Warning

over-indented (comment) (E117)
# Being strict is safer for security.

Check warning

Code scanning / Prospector (reported by Codacy)

over-indented (comment) (E117) Warning

over-indented (comment) (E117)

Check warning

Code scanning / Prospector (reported by Codacy)

indentation is not a multiple of four (comment) (E114) Warning

indentation is not a multiple of four (comment) (E114)
log.warning(f"Skipping unsafe URL (DNS resolution failed): {sanitize_for_log(url)}")

Check warning

Code scanning / Pylint (reported by Codacy)

Line too long (101/100) Warning

Line too long (101/100)

Check notice

Code scanning / Pylint (reported by Codacy)

Bad indentation. Found 17 spaces, expected 16 Note

Bad indentation. Found 17 spaces, expected 16

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Line too long (101/100) Warning

Line too long (101/100)

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Use lazy % formatting in logging functions Note

Use lazy % formatting in logging functions

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Bad indentation. Found 17 spaces, expected 16 Note

Bad indentation. Found 17 spaces, expected 16

Check warning

Code scanning / Prospector (reported by Codacy)

over-indented (E117) Warning

over-indented (E117)

Check warning

Code scanning / Prospector (reported by Codacy)

Use lazy % formatting in logging functions (logging-fstring-interpolation) Warning

Use lazy % formatting in logging functions (logging-fstring-interpolation)

Check warning

Code scanning / Prospector (reported by Codacy)

Use lazy % formatting in logging functions (logging-fstring-interpolation) Warning

Use lazy % formatting in logging functions (logging-fstring-interpolation)
return False

Check notice

Code scanning / Pylint (reported by Codacy)

Bad indentation. Found 17 spaces, expected 16 Note

Bad indentation. Found 17 spaces, expected 16

Check warning

Code scanning / Prospector (reported by Codacy)

Bad indentation. Found 17 spaces, expected 16 (bad-indentation) Warning

Bad indentation. Found 17 spaces, expected 16 (bad-indentation)
except Exception as e:

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Variable name "e" doesn't conform to snake_case naming style Warning

Variable name "e" doesn't conform to snake_case naming style
log.warning(f"Failed to resolve domain for URL {sanitize_for_log(url)}: {e}")

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Use lazy % formatting in logging functions Note

Use lazy % formatting in logging functions

Check warning

Code scanning / Prospector (reported by Codacy)

Use lazy % formatting in logging functions (logging-fstring-interpolation) Warning

Use lazy % formatting in logging functions (logging-fstring-interpolation)
return False

except Exception as e:
log.warning(f"Failed to validate URL {sanitize_for_log(url)}: {e}")
Expand Down Expand Up @@ -373,7 +390,7 @@
log.error(f"Failed to get existing rules: {sanitize_for_log(e)}")
return set()

def fetch_folder_data(url: str) -> Dict[str, Any]:

Check warning

Code scanning / Pylint (reported by Codacy)

Variable name "e" doesn't conform to snake_case naming style Warning

Variable name "e" doesn't conform to snake_case naming style
js = _gh_get(url)
if not validate_folder_data(js, url):
raise KeyError(f"Invalid folder data from {sanitize_for_log(url)}")
Expand Down Expand Up @@ -662,7 +679,7 @@

with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
future_to_folder = {
executor.submit(

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Catching too general exception Exception Note

Catching too general exception Exception
_process_single_folder,
folder_data,
profile_id,
Expand Down
Loading