Insight is an open-source passive web threat scanner. Submit any URL and it fetches all public resources — HTML, scripts, headers, certificates — and analyses them entirely on content alone, with no reliance on reputation databases or external threat intelligence APIs. The result is a prioritised findings report covering JavaScript threats, phishing indicators, domain intelligence, security misconfigurations, and the full detected technology stack.
Because detection is content-based, Insight catches zero-day campaigns, freshly registered phishing domains, and newly injected skimmers that reputation feeds haven't yet indexed.
A companion tool to vault1337.com. Shares the same design system and mirrors the stack.
| Layer | Technology |
|---|---|
| Backend | Python 3.11 / Django 5.2 / Django REST Framework |
| Task Queue | Celery + Redis |
| Frontend | React 19 / TypeScript / Vite / Tailwind CSS 4 |
| Database | PostgreSQL (production) / SQLite (development) |
| Cache / Broker | Redis |
| Visual Renderer | Carapace (Rust) |
Insight uses Carapace as its visual rendering engine. When a scan runs, Carapace fetches and renders the target URL in a hardened Chromium headless environment with JavaScript fully disabled and all outbound network requests blocked. The result is returned to Insight alongside additional threat signals.
What Carapace adds to every scan:
- A screenshot of the page as a visitor would see it, captured without executing any JavaScript
- Additional threat flags from static JS analysis (eval chains, obfuscation, exfiltration calls, sandbox evasion probes, drive-by download detection)
- An extended technology stack — detected from the pre-sanitisation DOM before framework-specific attributes are stripped, giving higher accuracy than header-only detection
- Risk scoring that feeds into the overall scan verdict
Carapace runs as a sandboxed Docker sidecar (--cap-drop=ALL, non-root, network kill-switch). The screenshot is displayed on the results page and collapsed by default.
- Remote code execution:
fetch()+eval()async chains — the compromised WordPress staging pattern - Eval-based obfuscation:
eval(atob(...)),eval(unescape(...)), nested decode chains - Decrypt-then-execute: WebCrypto API (
crypto.subtle) used to decrypt and run a payload at runtime - Payment card skimmers (Magecart-style): DOM queries targeting card/CVV fields + exfiltration
- Keyloggers: keyboard event listeners reading key values with outbound network calls
- Cookie and session exfiltration
- Form hijacking and credential harvesting
- Web3 wallet drainers (Inferno/Angel Drainer pattern)
- HTML smuggling via the Blob API
- Malicious and external service worker registrations
- Crypto miners (CoinHive, CryptoLoot, WebWorker + WASM)
- Shell droppers embedded in JS: Unix (
base64 -d | bash) and PowerShell (irm | iex) - Dynamic
import()loading ES modules from external unknown URLs - Living off Trusted Sites (LoTS): exfiltration routed through Telegram, Discord, Slack, Google Apps Script, and similar platforms to bypass domain-reputation blocklists
- Obfuscation fingerprints: obfuscator.io
_0xarrays,String.fromCharCodechains, high Shannon entropy strings - Anti-analysis: DevTools detection, right-click disable, auto-redirects
- Phishing forms with cross-domain
actiontargets - Hidden iframes, base tag hijacking, meta-refresh redirects
- Fake browser update pages (SocGholish / ClearFake signature)
- Fake CAPTCHA / ClickFix social engineering (Win+R execution instructions)
- Clickjacking overlay elements
- IPFS-hosted resources (takedown-resistant phishing and drainer hosting)
- External script preload/prefetch hints — a common WordPress malware injection staging pattern
- Executable download links, inline script anomalies, sensitive HTML comments
- Security misconfigurations: missing SRI, password fields without autocomplete, login forms over HTTP
- Subdomain and SLD typosquatting via Levenshtein edit distance against a brand watchlist
- Exact brand impersonation in subdomain tokens
- IDN / homograph attacks (Cyrillic and mixed-script lookalikes)
- DGA probability scoring (consonant ratio, entropy, English subword absence)
- High-risk TLDs (
.xyz,.top,.click,.loan,.zip,.cyou, and 20+ more) - Digit substitution (
g00gle,faceb00k) - Abuse-prone free hosting platforms (Cloudflare R2, Pages.dev, Firebase) with random subdomains
Missing CSP, X-Frame-Options, HSTS, X-Content-Type-Options, Referrer-Policy, Permissions-Policy; server/version disclosure; deprecated software versions; insecure cookie flags; CORS wildcard with credentials.
Certificate expiry, self-signed certificates, hostname mismatch, Let's Encrypt on brand-impersonating domains, deprecated TLS versions, newly issued certificates on suspicious domains.
| Verdict | Condition |
|---|---|
| MALICIOUS | Any CRITICAL finding |
| SUSPICIOUS | Any HIGH finding, or 2+ MEDIUM findings |
| CLEAN | LOW and INFO findings only |
| UNKNOWN | No findings |
Context collapse rules fire additional synthetic findings when signal combinations indicate coordinated attack infrastructure (e.g. DGA domain + hidden iframe + obfuscated JS → CRITICAL "drive-by malware delivery").
Identifies CMS, JS frameworks, build tools, libraries, CSS frameworks, backend runtime, web server, CDN, hosting platform, analytics, security tools, and payment providers — displayed as colour-coded badges with logos on the results page.
- Python 3.11+
- Node.js 18+
- Redis 7+ (running locally or via Docker)
# Docker (any OS)
docker run -d -p 6379:6379 redis:7-alpinegit clone https://github.com/DanDreadless/insight_vault1337.git
cd insight_vault1337/backend
pip install -r requirements.txt
cp ../.env.sample ../.env
# Edit ../.env — set a SECRET_KEY value at minimum
python manage.py migrate
python manage.py runservercd backend
celery -A insight worker -l infocd frontend
npm install
npm run devOpen http://localhost:5173. The Vite dev server proxies all /api/ requests to Django on :8000.
Copy .env.sample to .env in the repo root. The only required change for local development is setting a SECRET_KEY.
| Variable | Default | Notes |
|---|---|---|
SECRET_KEY |
(insecure sample) | Change before running |
DEBUG |
True |
Set False in production |
REDIS_URL |
redis://localhost:6379/0 |
|
DATABASE_URL |
sqlite:///db.sqlite3 |
Use PostgreSQL in production |
CORS_ALLOWED_ORIGINS |
http://localhost:5173 |
|
RATE_LIMIT_SCANS_PER_HOUR |
5 |
Per IP |
MAX_SCAN_RESOURCES |
50 |
External scripts analysed per scan |
SCAN_TIMEOUT_SECONDS |
60 |
Hard Celery task limit |
CARAPACE_URL |
(unset) | URL of the Carapace API (http://carapace:8080 when using Docker Compose). Screenshots and additional threat flags are skipped if unset. |
If you prefer not to install Python and Node locally:
cp .env.sample .env # edit SECRET_KEY
docker-compose up --buildStops all services and removes volumes:
docker-compose down -v| Method | Endpoint | Description |
|---|---|---|
| POST | /api/scan/ |
Submit a URL for scanning |
| GET | /api/scan/{id}/ |
Poll results |
| GET | /api/scan/{id}/stream/ |
Server-Sent Events progress stream |
| GET | /api/health/ |
Health check |
| GET | /api/schema/swagger-ui/ |
Interactive API docs |
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). This ensures that:
- You are free to use, modify, and share this software under the terms of the AGPL-3.0.
- If you deploy this software as a hosted service, you must make the source code — including any modifications — available to your users under the same licence.
The full licence text is in the LICENSE file.
Insight is open-source, but organisations that need to deploy it privately without the AGPL's copyleft requirements can obtain a commercial licence.
Benefits of a commercial licence:
- Deploy in proprietary environments without open-sourcing modifications.
- Support continued development of the project.
To enquire: contact via LinkedIn — www.linkedin.com/in/dan-pickering
If Insight is useful to you, consider supporting it through sponsorship or donations. Your contributions help keep it free and actively maintained.
Thank you for using Insight.