"Trace Every Thread. Unravel Every Connection." A full-stack OSINT cyber intelligence platform with dual-mode scanning: hackathon-safe local analysis + deep research with live OSINT sources.
ThreadLine lets a user enter one identifier (email, username, domain, or IP) and get a deep investigation report. It operates in two modes:
- Standard Scan — 100% local, zero external calls. Uses local datasets, DNS (system-level), direct HTTP checks, and pattern analysis. Hackathon-safe.
- Deep Research Mode — Unlocks live OSINT sources (crt.sh, ip-api.com, WHOIS, Shodan InternetDB, OpenPhish). Produces richer, real-world intelligence. Toggle-activated.
┌─────────────────────────────────────────────────────────┐
│ SEARCH INPUT │
│ ┌─────────────────────────┐ │
│ │ [ email / user / domain / IP ] │
│ └──────────┬──────────────┘ │
│ │ │
│ ┌──────────▼──────────────┐ │
│ │ ⚡ Standard 🔬 Deep │ ← Mode Toggle │
│ └──────────┬──────────────┘ │
│ │ │
└─────────────────────────┼───────────────────────────────┘
│
┌───────────────┴───────────────┐
│ │
┌───────▼────────┐ ┌────────▼─────────┐
│ STANDARD SCAN │ │ DEEP RESEARCH │
│ (No ext. API) │ │ (Live OSINT) │
│ │ │ │
│ ✅ DNS Recon │ │ Everything in │
│ ✅ HTTP Finger │ │ Standard PLUS: │
│ ✅ Username │ │ │
│ Enum │ │ 🔬 crt.sh │
│ ✅ Email Intel │ │ Subdomains │
│ ✅ Pattern │ │ 🔬 WHOIS │
│ Analysis │ │ (whoiser) │
│ ✅ Dataset │ │ 🔬 ip-api.com │
│ Matching │ │ Geolocation │
│ ✅ Local IP │ │ 🔬 Shodan │
│ Lookup │ │ InternetDB │
│ │ │ 🔬 OpenPhish │
└───────┬────────┘ │ Live Feed │
│ └────────┬─────────┘
│ │
└──────────┬───────────────────┘
│
┌──────────▼──────────┐
│ Correlation Engine │
└──────────┬──────────┘
┌──────────▼──────────┐
│ Scoring Engine │
└──────────┬──────────┘
┌──────────▼──────────┐
│ Explanation Engine │
└──────────┬──────────┘
┌──────────▼──────────┐
│ SSE → Dashboard │
└─────────────────────┘
| Module | Standard ⚡ | Deep 🔬 | How |
|---|---|---|---|
| DNS Reconnaissance | ✅ | ✅ | Node.js built-in dns.promises (system-level, not an API) |
| HTTP Tech Fingerprinting | ✅ | ✅ | Direct fetch() to target domain (visiting the website) |
| Username Enumeration | ✅ | ✅ | Direct HEAD/GET to profile URLs (visiting pages) |
| Email Intelligence | ✅ | ✅ | MX lookup (DNS) + local pattern analysis |
| Pattern Analysis | ✅ | ✅ | Pure algorithmic — regex, heuristics, no network |
| Dataset Matching | ✅ | ✅ | Local JSON file lookups, zero network |
| Local IP GeoIP | ✅ | ✅ | Offline DB-IP Lite CSV bundled in project |
| Subdomain Discovery | ❌ | ✅ | crt.sh Certificate Transparency API |
| WHOIS Domain Lookup | ❌ | ✅ | whoiser npm — TCP port 43 to WHOIS servers |
| IP Geolocation (full) | ❌ | ✅ | ip-api.com free JSON endpoint |
| Shodan InternetDB | ❌ | ✅ | internetdb.shodan.io/{ip} — ports, vulns |
| OpenPhish Live Feed | ❌ | ✅ | openphish.com/feed.txt — active phishing URLs |
| Correlation Engine | ✅ | ✅ | In-memory graph building (pure logic) |
| Scoring Engine | ✅ | ✅ | Weighted heuristic calculation (pure logic) |
| Explanation Engine | ✅ | ✅ | Template-based narrative (pure logic) |
Important
Standard mode alone is already powerful. It runs 7 local modules, builds a full correlation graph, generates a risk score, and produces a professional-looking report — all without a single external API call. Deep Research just makes it richer with real-world data.
Important
Tech Stack: Using Next.js App Router with JavaScript (not TypeScript) as the unified full-stack framework. API routes handle all backend logic. Single deployment, single language. Confirm this is good.
Important
App Name: Updated to ThreadLine as requested. All branding, logos, and UI copy will use this name.
Warning
Deep Research Mode UI: The mode toggle will be clearly labeled so judges understand:
- Standard = "Local Intelligence Only — No External APIs"
- Deep Research = "Live OSINT Sources — External Queries Enabled"
This way you can demo Standard mode for hackathon compliance, and switch to Deep Research to show the full power.
Initialize Next.js 15 App Router with Tailwind CSS, JavaScript, and required dependencies.
Dependencies:
| Package | Purpose | Used In |
|---|---|---|
next + react + react-dom |
Framework | Both modes |
tailwindcss + @tailwindcss/postcss |
Styling | Both modes |
framer-motion |
Animations & page transitions | Both modes |
d3 + d3-force |
Force-directed graph | Both modes |
jspdf + html2canvas |
Client-side PDF report | Both modes |
lucide-react |
Icons | Both modes |
whoiser |
WHOIS lookups (TCP port 43) | Deep mode only |
All datasets are JSON files shipped inside the project. Zero runtime downloads.
~300 synthetic emails in simulated breach records. Includes breach metadata.
[
{
"email": "aryan.dev@gmail.com",
"username": "aryan_dev",
"breaches": [
{
"name": "SocialVault 2023",
"date": "2023-06-15",
"dataExposed": ["email", "password_hash", "username"],
"recordCount": 14200000,
"severity": "high"
}
]
}
]Source: Create ourselves. Include common name patterns so demo queries always get hits.
~200 domains flagged as suspicious with categories and risk levels. Source: Create ourselves based on public phishing research patterns.
~150 synthetic usernames in simulated leak records. Source: Create ourselves.
~300 domains with reputation scores, categories, and flags (including known-good like google.com, github.com). Source: Create ourselves.
Regex patterns + keyword lists for detecting phishing-style naming. Source: Public research — OWASP, MITRE ATT&CK, common phishing pattern knowledge.
50+ platform profile URL templates for username enumeration. Source: Sherlock's data.json — MIT licensed, 400+ sites.
3000+ disposable email provider domains. Source: disposable-email-domains — MIT licensed.
Offline IP-to-country/ISP mapping for common ranges (~500-1000 entries). Source: DB-IP Lite — free, Creative Commons licensed. Download CSV, convert to JSON.
MITRE ATT&CK-based indicator patterns for threat scoring context. Source: MITRE ATT&CK — public knowledge base.
All backend logic lives in Next.js API routes and library files.
The main SSE endpoint. Accepts ?q=<input>&mode=standard|deep
Flow:
1. Parse query: input value + scan mode
2. Detect input type (email / username / domain / IP)
3. Initialize SSE stream
4. Select modules based on type + mode
5. Dispatch modules in parallel (async)
6. Stream each module result as SSE event
7. Run correlation engine on all results
8. Run scoring engine
9. Generate AI explanation
10. Stream final summary + graph data
11. Close stream
SSE Event Types:
| Event | Payload |
|---|---|
status |
{ message: "Scanning...", module: "dns" } |
module_result |
{ module: "dns", data: {...}, nodes: [...], edges: [...] } |
graph_update |
{ nodes: [...], edges: [...] } |
score |
{ score: 72, label: "High Risk", breakdown: [...] } |
explanation |
{ text: "This domain..." } |
complete |
{ summary: {...} } |
Input type detection + mode-aware module routing.
// Detection:
// Email: contains '@' and valid domain
// Domain: has TLD, no @, not IP
// IP: matches x.x.x.x
// Username: everything else
// Routing (mode-aware):
function getModules(inputType, mode) {
const standard = {
domain: ['dns', 'techFingerprint', 'patternAnalysis', 'datasetMatch'],
email: ['emailIntel', 'dns', 'patternAnalysis', 'datasetMatch'],
username: ['usernameEnum', 'patternAnalysis', 'datasetMatch'],
ip: ['localIpLookup', 'dns', 'patternAnalysis', 'datasetMatch']
};
const deep = {
domain: ['subdomain', 'whois', 'dns', 'techFingerprint', 'patternAnalysis', 'datasetMatch', 'openphish'],
email: ['emailIntel', 'whois', 'dns', 'usernameEnum', 'patternAnalysis', 'datasetMatch'],
username: ['usernameEnum', 'patternAnalysis', 'datasetMatch'],
ip: ['ipGeoApi', 'shodanInternetDB', 'localIpLookup', 'dns', 'patternAnalysis', 'datasetMatch']
};
return mode === 'deep' ? deep[inputType] : standard[inputType];
}- Uses Node.js built-in
dns.promises(system-level DNS resolver) - Queries: A, AAAA, MX, NS, TXT, CNAME, SOA records
- Why it's not an API: Uses the OS DNS resolver, same as any app that connects to the internet
- Each record type → graph node
- Direct
fetch()tohttps://{domain}— like opening the website in a browser - Reads response headers:
Server,X-Powered-By,X-Generator,Via - Checks security headers: HSTS, CSP, X-Frame-Options, X-Content-Type-Options
- Flags missing security headers as risk factors
- Why it's not an API: You're visiting the target website, not calling a third-party service
- Loads platform URLs from
data/platform_urls.json - HTTP HEAD/GET to
github.com/{user},instagram.com/{user}, etc. - Check status: 200 = exists, 404 = not found
- Rate-limited: 500ms delay, User-Agent rotation
- Standard mode: Top 25 platforms (~10-12 sec)
- Deep Research mode: 50+ platforms (~25 sec)
- Why it's not an API: You're visiting profile pages, same as typing URLs in a browser
- Extracts domain from email → DNS MX lookup
- Detects provider (Gmail, Outlook, ProtonMail, custom)
- Checks disposable email list (local JSON)
- Format validation + pattern analysis
- Extracts username part for cross-referencing
- Pure algorithmic analysis on the input string
- Checks: phishing keywords, suspicious TLDs, excessive hyphens, digit patterns, domain age indicators, typosquatting patterns, leetspeak detection
- Loads rules from
data/phishing_patterns.json - Works on ALL input types
- Zero network calls
- Searches local JSON datasets for matches:
breach_emails.json— email/username breach matchessuspicious_domains.json— domain reputationknown_usernames.json— username leak recordsdomain_reputation.json— domain scoringthreat_indicators.json— MITRE pattern matches
- Returns matched records with metadata
- Zero network calls
- Searches
data/ip_ranges.jsonfor IP range matches - Returns country, ISP, organization from offline dataset
- Reverse DNS lookup via
dns.promises.reverse() - Zero external calls — uses bundled dataset + system DNS
Based on an offline/No-API architecture, here is exactly what Standard Mode HAS versus what it REQUIRES DEEP MODE FOR:
✅ What Standard Mode HAS (Runs Offline):
- Homograph / Punycode Detection: Native JS parses strings for the
xn--prefix or regex mixed scripts (Latin + Cyrillic). Needs zero external data. - Domain Reputation: Cross-references target against the local
top-1m-domains.csvchecking for "newly registered/unknown" signals. - Known Bad Matches: Exact string lookups against the local
security.csv(Maligna entries) andphisetank-dataset-phising.csv. - IP Geo-Mismatch: Uses offline
dbip-country-liteto detect if an IP claiming to be a US service physically resolves to a high-risk country.
❌ What Standard Mode LACKS (Requires Deep Mode):
- Live ASN / Network Ownership Validation: We cannot mathematically prove an IP belongs to AS15169 (Google) without a live ASN API request or a massive 10GB BGP routing table offline file.
- SSL/TLS Certificate Extraction (Port 443): Node.js
tls.connectcan extract live certificates to check if they are 2-day old Let's Encrypt certs, but this requires making a live external cryptographic request to the target domain. - JARM Fingerprinting: Sending active malformed TLS packets to a server to hash its response (to detect Cobalt Strike vs Nginx) requires live external interaction.
- Live Reverse DNS Verification: Accurately tracing a spoofed IP back to its true PTR record requires pinging authoritative DNS servers.
- 🔬 Open direct cryptographic socket via Node.js native
tls.connect(443, domain) - Extract exact
valid_fromandissuerfields fromsocket.getPeerCertificate() - Flags Let's Encrypt certificates under 3 days old as critical phishing risks.
- External: Port 443 to Target Domain
- 🔬 Live Reverse DNS (PTR) Checks using Node.js
dns.promises.reverse(ip) - Accurately traces a spoofed IP back to its true registered PTR domain.
- External: Authoritative DNS Servers
- 🔬 Fetches
http://ip-api.com/json/{ip}?fields=status,isp,org,as - Validates data center ownership (e.g., checking if it's hosted on DigitalOcean vs AS15169 Google).
- External: IP-API
- 🔬 Queries Shodan internetDB or runs a native Python child process
exec('python3 jarm.py'). - Hashes active TLS Server Responses to fingerprint exact backend software (Cobalt Strike vs Nginx).
- External: Shodan API or Direct Port 443 malformed packets.
- 🔬 Fetches
https://crt.sh/?q=%25.{domain}&output=json - Parses Certificate Transparency logs
- Deduplicates, removes wildcards
- Returns subdomains as graph nodes
- External: crt.sh (free, no key)
- 🔬 Uses
whoisernpm package (TCP port 43) - Extracts: registrar, creation date, expiry, name servers, registrant org
- Flags young domains (<90 days) as suspicious
- External: WHOIS protocol servers
- 🔬 Queries
http://ip-api.com/json/{ip}(full geolocation) - Returns: country, city, ISP, lat/long, timezone, org, AS
- External: ip-api.com (free, 45 req/min, HTTP only)
- 🔬 Queries
https://internetdb.shodan.io/{ip} - Returns: open ports, known CVEs, hostnames, tags
- External: Shodan (free, no key)
- 🔬 Fetches
https://openphish.com/feed.txt - Checks if target domain appears in active phishing feed
- Caches feed for 1 hour to minimize requests
- External: OpenPhish (free, non-commercial)
Connects all discovered entities into a unified graph:
email → domain ("registered_on")
domain → subdomain ("parent_of")
domain → IP ("resolves_to")
IP → geolocation ("located_in")
username → platform ("active_on")
email → username ("derived_from")
entity → dataset ("matched_in")
entity → pattern ("flagged_by")
domain → tech ("powered_by")
domain → DNS record ("has_record")
domain → breach ("exposed_in")
IP → open_port ("exposes")
domain → phishing_feed ("flagged_in")
Heuristic threat scoring (0-100):
Standard mode factors:
+15 per breach dataset match
+10 per suspicious pattern detected
+8 per missing security header
+5 per exposed social profile
+10 matches phishing pattern rule
+3 disposable email provider
+7 suspicious DNS configuration
+5 email format anomaly
Deep mode bonus factors:
+20 domain age < 90 days (WHOIS)
+12 no SSL / HTTPS redirect fails
+8 resolves to known-bad IP range
+5 per open port (Shodan)
+15 found in OpenPhish feed
+7 excessive subdomains (>50, crt.sh)
Labels:
0-30: "Low Risk" (green)
31-60: "Moderate" (amber)
61-80: "High Risk" (red)
81-100: "Critical" (pulsing red)
AI-style narrative generator. Produces polished text based on findings:
"This identity was matched against 2 simulated breach datasets, suggesting credential exposure through data leaks. The associated domain uses a disposable email provider and lacks critical security headers. Pattern analysis detected phishing-style naming conventions. Combined signals indicate a High Risk of compromise."
src/
├── app/
│ ├── layout.js # Root layout, fonts, metadata
│ ├── page.js # Landing screen (Step 1)
│ ├── investigate/
│ │ └── page.js # Search + Results (Steps 2-4)
│ ├── api/
│ │ └── investigate/
│ │ └── route.js # SSE endpoint
│ └── globals.css # Design system + Tailwind
├── components/
│ ├── landing/
│ │ ├── Hero.jsx # Hero with CTA
│ │ └── ParticleField.jsx # Neural network background
│ ├── search/
│ │ ├── SearchConsole.jsx # Input field + mode toggle
│ │ ├── InputTypeIndicator.jsx # Type detection badge
│ │ └── ModeToggle.jsx # Standard ⚡ / Deep 🔬 switch
│ ├── scan/
│ │ ├── ScanAnimation.jsx # Full-screen loading overlay
│ │ └── ScanMessages.jsx # Rotating status messages
│ ├── dashboard/
│ │ ├── DashboardLayout.jsx # Main grid layout
│ │ ├── SummaryPanel.jsx # Identity + risk score card
│ │ ├── EvidencePanel.jsx # Matched data + pattern flags
│ │ ├── ThreatMeter.jsx # Animated circular gauge (0-100)
│ │ ├── TimelinePanel.jsx # Chronological evidence
│ │ ├── ExplanationPanel.jsx # AI narrative (typewriter)
│ │ ├── ModuleStatusGrid.jsx # Shows which modules ran
│ │ └── ExportButton.jsx # PDF download
│ ├── graph/
│ │ ├── IntelGraph.jsx # D3 force-directed graph
│ │ └── NodeDetailPanel.jsx # Side panel on node click
│ ├── terminal/
│ │ └── LiveTerminal.jsx # Green-on-black scan log
│ └── ui/
│ ├── GlassCard.jsx # Glassmorphism wrapper
│ ├── GlowBorder.jsx # Neon border effect
│ ├── Logo.jsx # ThreadLine logo
│ └── UserBadge.jsx # Top-left name badge
├── hooks/
│ ├── useInvestigation.js # SSE + state management
│ └── useLocalStorage.js # Name + prefs persistence
├── lib/ # (modules described above)
└── data/ # (datasets described above)
Step 1: Landing (/)
- Dark cyber hero, particle field background
- ThreadLine logo with glow
- "Begin Investigation" CTA
- First visit: name prompt modal → saved to localStorage
- Top-left: icon + user name
Step 2: Search Console (/investigate)
- Single large input field
- Auto-detects type → shows badge (Email / Domain / Username / IP)
- Mode Toggle: ⚡ Standard (default) | 🔬 Deep Research
- Mode descriptions:
- Standard: "Local analysis, no external queries"
- Deep: "Full OSINT — live external sources"
- "Investigate" button
Step 3: Scan Animation (overlay)
- Full-screen Framer Motion transition
- Radar pulse + progress ring
- Real SSE status messages replace placeholders
- Module completion checkmarks appear
Step 4: Dashboard (reveals below search)
- Summary + Threat Meter + Evidence + Graph + Timeline + AI Explanation + Terminal
- Module Status Grid shows which modules ran (Standard vs Deep)
- Export Report button → client-side PDF
/* ═══ ThreadLine Design Tokens ═══ */
/* Backgrounds */
--bg-void: #030305;
--bg-primary: #06060a;
--bg-secondary: #0d0d14;
--bg-elevated: #12121e;
--glass-bg: rgba(13,13,25,0.6);
--glass-border: rgba(0,212,255,0.12);
/* Accents */
--cyan: #00d4ff;
--green: #00ff88;
--red: #ff3366;
--amber: #ffaa00;
--purple: #a855f7;
/* Text */
--text-primary: #e2e8f0;
--text-secondary: #64748b;
/* Typography */
--font-ui: 'Inter', system-ui, sans-serif;
--font-mono: 'JetBrains Mono', monospace;
/* Effects */
--blur-glass: blur(20px);
--glow-cyan: 0 0 20px rgba(0,212,255,0.3);D3.js force-directed graph on HTML5 Canvas.
Node Types:
| Type | Color | Shape |
|---|---|---|
| Input (query) | Cyan | Large pulsing circle |
| Domain | Purple | Circle |
| Subdomain | Purple-dim | Small circle |
| Green | Circle | |
| Username | Amber | Circle |
| IP Address | Red | Circle |
| Platform | White | Square |
| Dataset Match | Red | Triangle |
| Pattern Flag | Amber | Diamond |
Interactions: Hover highlights, click opens detail panel, drag nodes, zoom, auto-animate on load.
Dynamic import: next/dynamic with { ssr: false }.
Client-side with jsPDF + html2canvas.
Sections:
- Header — ThreadLine logo, title, date
- Executive Summary — risk score, verdict
- Evidence Table — all findings
- AI Explanation — full narrative
- Graph Snapshot — canvas capture
- Timeline — chronological events
- Footer — "Generated by ThreadLine"
Each module follows a standard interface and runs as an independent worker:
// Every module exports:
export async function run(input, inputType, options) {
return {
module: 'module_name',
mode: 'standard' | 'deep', // which mode this belongs to
status: 'success' | 'partial' | 'error',
data: { ... }, // raw findings
nodes: [{ id, label, type }], // graph nodes
edges: [{ source, target, rel }], // graph edges
riskContribution: 0-25, // score contribution
timeline: [{ time, event, detail }] // timeline entries
}
}The orchestrator dispatches all relevant modules in parallel via Promise.allSettled(), streaming results as they complete.
- Mode Toggle: Simple on/off toggle next to the search bar. Off = Standard (default), On = Deep Research.
- Username Enumeration: 25 platforms in Standard mode, 50+ platforms in Deep Research mode.
npm run build # Verify no errors
npm run dev # Start dev server- Landing page renders with particle background
- Search console detects input types
- Mode toggle switches between Standard/Deep
- SSE stream connects and delivers events
- Scan animation plays with real status messages
- Dashboard panels populate with data
- Graph renders with interactive nodes
- Threat meter animates to correct score
- AI explanation renders with typewriter effect
- PDF export generates and downloads
- localStorage persists name and preferences
- Invalid input handling
- Network timeout for Deep mode modules
- Empty results graceful handling
- Mode switch mid-investigation
| Phase | Scope | Time |
|---|---|---|
| 1. Foundation | Next.js + Tailwind + design system | 1-2h |
| 2. Datasets | Create all 8 JSON datasets | 1-2h |
| 3. Standard Modules | 6 local modules | 3-4h |
| 4. Deep Modules | 5 external modules | 2-3h |
| 5. Orchestrator + SSE | Streaming pipeline | 2h |
| 6. Correlation + Scoring | Intelligence layer | 2h |
| 7. Landing + Search | Landing, console, mode toggle | 2-3h |
| 8. Scan Animation | Full-screen dramatic loading | 1-2h |
| 9. Dashboard | 7 panels + layout | 3-4h |
| 10. Graph | D3 force-directed + interactions | 3-4h |
| 11. PDF | Report generation | 1-2h |
| 12. Polish | Animations, edge cases, demo prep | 2-3h |
| Total | ~24-34h |