A security scanner that can't secure itself is a contradiction. This document describes how we protect the scanner, its users, and what to do if you find a vulnerability.
- Reporting a Vulnerability
- Threat Model
- Security Architecture
- Data Handling & Privacy
- Gate-Before-Scan Model
- Access Control
- Backup & Disaster Recovery
- Supply Chain Security
- Scope
If you discover a security issue in the scanner itself, please report it privately. Do not open a public GitHub issue and do not post it to the HuggingFace Space community tab — both are public and would expose the bug to attackers before a fix ships.
👉 GitHub Security Advisories (private): https://github.com/UnlimitedEdition/security-scanner/security/advisories/new
This channel is private between you and the maintainers, supports embargo coordination, and issues a CVE if warranted.
| Stage | Target |
|---|---|
| Acknowledgment | 72 hours |
| Initial assessment | 5 business days |
| Fix development | 14 days (critical), 30 days (high) |
| Public disclosure | After fix is deployed |
- Affected component — file, endpoint, or check module
- Reproduction steps — minimal proof-of-concept preferred
- Impact assessment — what an attacker gains (data access, code execution, DoS)
- Suggested fix — if you have one
- Your preferred credit name — we credit responsible reporters in release notes (unless you prefer anonymity)
We don't currently offer monetary rewards, but we credit reporters in
the CHANGELOG.md and AUTHORS file, and provide early access to
pre-release versions.
| Asset | Sensitivity | Protection |
|---|---|---|
| Scan results | Medium | Per-scan isolation, no cross-tenant access |
| User PII (IP, email, UA) | High | SHA-256 hashed with server salt, never stored raw |
| Database credentials | Critical | Environment variables, never in git |
| Backup encryption key | Critical | Supabase Vault + operator password manager |
| Pro license keys | Medium | Stored hashed, validated via Lemon Squeezy API |
| Target website data | Low | Transient in memory, not persisted beyond findings |
| Actor | Goal | Mitigations |
|---|---|---|
| Malicious target | Exploit SSRF to reach internal services | SSRF guard on every redirect hop, DNS rebinding check |
| Reconnaissance attacker | Use scanner as proxy for recon | Ownership verification gate, rate limits, distinct-target tracking |
| Data exfiltrator | Extract PII from database | PII hashing, RLS, append-only audit log |
| Credential thief | Steal API keys or DB credentials | No secrets in git, Vault for runtime secrets |
| DoS attacker | Overwhelm scanner or target | Scan deadline (180s), rate limit (5/30min), concurrent scan cap (3) |
| Supply chain attacker | Compromise via dependencies | Pinned deps, Dependabot, pip-audit |
┌─────────────────────────┐
│ Public Internet │
└────────────┬────────────┘
│
┌────────────────────────┴────────────────────────┐
│ Attack Surface │
│ │
│ POST /scan ← rate limited, SSRF guard │
│ POST /scan/request ← consent + verification │
│ POST /abuse-report ← rate limited │
│ GET /health ← no auth, returns "ok" │
│ GET /api/gallery ← public read-only │
│ │
│ All other endpoints: auth or intent-gated │
└──────────────────────────────────────────────────┘
Every outbound HTTP request flows through safe_get(), which:
- Scheme validation — only
http://andhttps://allowed - Hostname blocklist —
localhost,.local,.internal,.corp, etc. - DNS resolution — resolves hostname, checks ALL returned IPs
- IP range validation — blocks private, loopback, link-local (169.254.x.x / AWS metadata), CGNAT, multicast, reserved
- IPv4-mapped IPv6 unwrapping —
::ffff:127.0.0.1is caught - Redirect following — each hop re-validated (blocks
302 → localhost) - Redirect cap — max 5 hops to prevent redirect loops
# NEVER do this:
response = requests.get(url) # ❌ No SSRF protection
# ALWAYS do this:
from security_utils import safe_get
response = safe_get(session, url) # ✅ Every hop validatedSSL certificate validation is always on. We do not disable cert verification as a fallback or convenience measure. If SSL fails, the scan reports it as a finding — it does not bypass it.
Every scan has a hard 180-second wall-clock cap (SCAN_DEADLINE_SECONDS).
A malicious target that responds slowly on each check cannot keep the
scanner busy indefinitely. When the deadline fires, remaining checks are
skipped and a truncation notice is recorded.
Sensitive file detection (.env, .git/config) validates response
bodies for expected content patterns, not just HTTP status codes. This
prevents false positives from custom 404 pages returning 200 OK and
false negatives from .env files containing the word "error" in comments.
- URL format validation via Pydantic with regex + SSRF check
- Strictness profile whitelist validation
- Abuse report field length limits (4000 chars description, 320 email)
- Scan ID format validation (
^[a-f0-9]{8}$)
User request
│
▼
┌──────────────────┐
│ Extract PII │ IP, User-Agent, email (if provided)
└────────┬─────────┘
│
▼
┌──────────────────┐
│ SHA-256 + salt │ PII_HASH_SALT from environment
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Store hash only │ Original value discarded immediately
└────────┬─────────┘
│
▼
┌──────────────────┐
│ 90-day retention │ Auto-pruned via pg_cron
└──────────────────┘
| Control | Implementation |
|---|---|
| Row-Level Security | Enabled on every table, default deny |
| Append-only audit log | UPDATE/DELETE revoked at DB role level |
| Encryption at rest | AES-256, enforced by Supabase |
| TLS in transit | TLS 1.2+, enforced by Supabase for all connections |
| Credential isolation | Service key in env vars, application key for frontend |
- No telemetry. The scanner does not phone home.
- No third-party scan APIs. All analysis happens locally.
- No data sharing. Scan results are not sent to analytics services.
- Frontend analytics (GA4, AdSense) are cookie-gated and run only after explicit user consent. They have zero access to scan data.
Introduced in v4.0 (2026-04-12) to prevent scanner abuse for unauthorized reconnaissance.
| Mode | Trigger | Checks | Probes private surface? |
|---|---|---|---|
safe |
Default, no verification | 20 passive + 3 redacted | No — zero requests to /.env, /wp-admin/, ports, etc. |
full |
After ownership verification | All 33 checks | Yes — sensitive files, ports, admin panels, subdomains |
- Meta tag — place
<meta name="security-scanner-verify" content="TOKEN">in<head> - File upload — place
TOKENin/.well-known/security-scanner-verify.txt - DNS TXT record — add
TXT security-scanner-verify=TOKENto the domain
- Legacy
/scanhardcoded tomode='safe' - 7 server-side precondition checks on
/execute - 6-condition atomic
WHEREclause on DB state transition - IP binding on verification tokens
- Server-side consent logging with version tracking
- 3-second anti-reflex delay on recap screen
- One-time-use verification tokens
- 30-day verification cache per (domain, ip_hash)
- Rate limiting per IP hash
- Distinct-target-count tracking per IP
The wizard uses created_date DATE (no timestamp component) for consent
records. Even a complete database leak cannot reveal what time of day the
user clicked which checkbox.
Dual enforcement:
- Database-backed (primary): sliding window counter in
ip_rate_limits - In-memory (fallback):
defaultdict(list)with TTL
Both must agree — a database outage cannot disable rate limiting.
Limits:
- 5 scans per 30 minutes per IP hash (configurable via env)
- Applies to both
/scanand/abuse-reportendpoints
Backend enforces a strict origin allowlist. Access-Control-Allow-Origin
is not set to * in production — only authorized frontend origins are
permitted.
default-src 'self';
script-src 'self' 'unsafe-inline' [Google Ads stack];
style-src 'self' 'unsafe-inline' fonts.googleapis.com;
font-src fonts.gstatic.com;
img-src 'self' data: https:;
connect-src 'self' [API endpoints] [Google Ads stack];
frame-src [Google Ads stack];
frame-ancestors 'self' huggingface.co *.hf.space;
Advertising stack uses wildcard subdomains (*.googlesyndication.com,
*.g.doubleclick.net) because Google's ad delivery infrastructure uses
dynamic subdomains.
pg_cron (daily 04:00 UTC)
│
▼
Supabase Edge Function (/supabase/functions/backup)
│
├── Dump critical tables as JSON
├── gzip compress
├── Encrypt with AES-256-GCM (random 12-byte IV per blob)
│
▼
Cloudflare R2 bucket (security-scanner-backups)
│
Object key: backups/YYYY/MM/DD/backup-YYYYMMDDTHHMMSSZ.json.gz.enc
| Table | Purpose | Backed up? |
|---|---|---|
audit_log |
Forensic trail | ✅ |
abuse_reports |
User complaints | ✅ |
scans |
Scan history | ✅ |
verified_domains |
Verification cache | ✅ |
verification_tokens |
Ephemeral | ❌ |
scan_requests |
Ephemeral wizard state | ❌ |
rate_limits |
Regenerable | ❌ |
| Control | Implementation |
|---|---|
| Encryption key | 256-bit, stored in Supabase Vault + operator password manager |
| R2 credentials | Supabase Vault only, loaded at runtime via SECURITY DEFINER RPC |
| R2 token scope | Write-only to single bucket — leaked token cannot read backups |
| Webhook auth | Shared secret in X-Webhook-Secret header |
| No key material in git | Migrations reference secret names, never values |
-- Last 7 days of backup attempts
SELECT started_at, status, bytes_written, rows_exported
FROM backup_log
WHERE started_at > NOW() - INTERVAL '7 days'
ORDER BY started_at DESC;
-- Alert: any failures in last 2 days
SELECT COUNT(*) AS failures FROM backup_log
WHERE status = 'error' AND started_at > NOW() - INTERVAL '2 days';pip install boto3 cryptography psycopg python-dotenv
# List available backups
python scripts/restore_backup.py --list
# Inspect without touching DB
python scripts/restore_backup.py --inspect backups/2026/04/10/backup-20260410T040012Z.json.gz.enc
# Dry-run restore
python scripts/restore_backup.py --latest
# Apply restore (idempotent via ON CONFLICT DO NOTHING)
python scripts/restore_backup.py --latest --applyAt least quarterly, run a restore against a disposable Supabase project or local Postgres instance. A backup that has never been tested is not a backup.
- Pinned versions in
requirements.txtfor reproducible builds - Dependabot configured for weekly Python, Docker, and GitHub Actions updates
- Major version bumps get individual PRs for manual review
- pip-audit recommended before releases
The scanner uses 10 direct dependencies:
| Package | Purpose | Risk profile |
|---|---|---|
fastapi |
Web framework | Well-maintained, large community |
uvicorn |
ASGI server | Part of FastAPI ecosystem |
requests |
HTTP client | Most-used Python package |
dnspython |
DNS queries | Mature, security-focused |
pydantic |
Input validation | Type-safe, well-audited |
beautifulsoup4 |
HTML parsing | Mature, stable API |
certifi |
CA certificates | Mozilla-backed |
supabase |
Database client | Official SDK |
psycopg |
PostgreSQL driver | C-extension, well-audited |
fpdf2 |
PDF generation | Pure Python, no system deps |
No dependency requires native compilation beyond psycopg[binary],
which uses the standard libpq.
- FastAPI backend (
api.py,scanner.py,security_utils.py) - All check modules (
checks/*.py) - Frontend (
index.html, blog pages) - Database schema and migrations
- SSRF / auth / crypto / data handling bugs
- Rate limit bypass
- Ownership verification bypass
- Findings on third-party websites (report to the site owner)
- DoS via legitimate scanning traffic (use rate limit path)
- Issues in upstream dependencies (report to the dependency maintainer)
- Social engineering attacks against maintainers
- Physical attacks against hosting infrastructure
- Source code — MIT License
- Editorial content — CC BY-NC 4.0