🧠 Ultimate Web Intelligence & Reverse Engineering Lab

🎯 Mission

Perform a full-spectrum technical, behavioral, and security analysis of:

Work out of this GitHub repo. All logs, scripts, captures, and reports are stored here.

You will behave as an autonomous:

OSINT analyst
Network traffic analyst
Reverse engineer
Backend inference engine

Your goal is to capture, analyze, reconstruct, and classify every observable behavior, data source, and communication channel used by the target system.

⚠️ Operating Principles

Operate autonomously — minimize manual intervention
Log ALL observable data — nothing is irrelevant until proven otherwise
Focus on observation, inference, reconstruction
Prefer evidence over assumptions — cite specific requests, headers, payloads
Remain low-noise during probing — no aggressive scanning, no brute-force
Document every finding as you go, not after the fact
If a phase produces no results, log that explicitly (absence of evidence is data)

📁 PROJECT STRUCTURE

Maintain this directory layout in the repo root:

worldviewosnit/
├── README.md
├── captures/
│   ├── traffic.har              # HAR capture from Playwright
│   ├── traffic_dump.mitm        # Raw mitmproxy dump
│   └── network.json             # Parsed request/response log
├── scripts/
│   ├── capture.js               # Playwright automation script
│   ├── replay.sh                # API replay script
│   └── analyze.py               # Log analysis / correlation
├── logs/
│   ├── endpoints.json           # Discovered endpoints + status codes
│   ├── telegram.json            # Telegram-specific traffic
│   ├── cookies.json             # Cookie inventory
│   └── console.json             # Browser console output
├── recon/
│   ├── dns.txt                  # WHOIS + DNS records
│   ├── tls.txt                  # TLS certificate details
│   ├── subdomains.txt           # Subdomain enumeration results
│   └── source-analysis.md       # JavaScript / source map findings
├── reports/
│   ├── architecture.md          # Architecture diagram
│   ├── api-map.md               # Full endpoint map
│   ├── telegram-report.md       # Telegram forensics report
│   ├── data-authenticity.md     # Signal validation report
│   ├── backend-fingerprint.md   # Stack identification
│   ├── security-assessment.md   # Risk assessment
│   └── final-classification.md  # System intent + verdict
└── tools/
    └── setup.sh                 # Automated toolchain installer

🧰 TOOLCHAIN SETUP

Install all tools before beginning. Verify each installation before proceeding.

Core

Node.js (latest LTS) — node --version
Python 3.11+ — python --version
jq — JSON processing — jq --version
curl — HTTP requests — curl --version
httpie — human-readable HTTP — http --version

Browser Automation

Playwright — npx playwright --version
Chromium — installed via Playwright: npx playwright install chromium

Network Interception

mitmproxy (v10+) — mitmproxy --version

Note: The -w flag is deprecated in mitmproxy v10+. Use --save-stream-file or mitmdump -w instead.

DNS / Domain Recon

whois — domain registration lookup
dig or nslookup — DNS record queries
subfinder (optional) — subdomain enumeration
openssl — TLS certificate inspection

Optional

tcpdump / Wireshark — low-level packet capture
Wappalyzer CLI or webanalyze — technology fingerprinting

Verification

Run the following to confirm the toolchain is ready:

node --version && python3 --version && jq --version && curl --version | head -1 && mitmproxy --version && npx playwright --version

All tools must return valid version numbers before proceeding to Phase 1.

🌐 ARCHITECTURE STACK

[Agent]
  ↓
[Playwright Automation]
  ↓
[Chromium Browser (headless or headed)]
  ↓
[mitmproxy Interception Layer — 127.0.0.1:8080]
  ↓
[Target Site + APIs + Third-Party Endpoints (Telegram, CDNs, etc.)]

Data flows downward. Every layer captures and logs. mitmproxy sees all HTTPS traffic after certificate trust is established.

🧪 PHASE 0 — ENVIRONMENT PREPARATION

Before touching the target, prepare a clean analysis environment.

Operational Security

Use a VPN or isolated network — your real IP will be logged by the target (especially if it phones home to Telegram)
Use a clean browser profile — no cookies, no extensions, no saved logins
Dedicated workspace — clone this repo, work exclusively inside it
No authentication — do NOT log into anything on the target; observe as an anonymous visitor

Environment Checklist

VPN active and verified (check IP at https://ifconfig.me)
Toolchain installed and verified (see above)
Repo cloned and directory structure created
mitmproxy CA certificate NOT yet trusted (do that in Phase 2)
Browser profile is clean / fresh

🧪 PHASE 1 — DNS & DOMAIN RECONNAISSANCE

Before intercepting traffic, gather passive intelligence about the target's infrastructure.

1.1 — WHOIS Lookup

whois worldviewosint.com > recon/dns.txt

Extract:

Registrar
Registration / expiration dates
Registrant info (or privacy service)
Name servers

1.2 — DNS Records

dig worldviewosint.com ANY +noall +answer >> recon/dns.txt
dig worldviewosint.com A >> recon/dns.txt
dig worldviewosint.com AAAA >> recon/dns.txt
dig worldviewosint.com CNAME >> recon/dns.txt
dig worldviewosint.com MX >> recon/dns.txt
dig worldviewosint.com TXT >> recon/dns.txt
dig worldviewosint.com NS >> recon/dns.txt

Look for:

CNAME to Vercel/Netlify/Cloudflare → hosting provider
TXT records → SPF, DKIM, domain verification tokens (Google, etc.)
MX records → email infrastructure (or lack thereof)

1.3 — Subdomain Enumeration

subfinder -d worldviewosint.com -o recon/subdomains.txt

Or manually check common subdomains:

for sub in www api app admin dev staging mail; do
  dig +short "$sub.worldviewosint.com" >> recon/subdomains.txt
done

1.4 — TLS Certificate Analysis

echo | openssl s_client -connect worldviewosint.com:443 -servername worldviewosint.com 2>/dev/null | openssl x509 -noout -text > recon/tls.txt

Extract:

Issuer (Let's Encrypt → likely automated; Cloudflare → CDN-proxied)
Subject Alternative Names (SANs) — may reveal related domains
Validity period
Certificate chain

1.5 — Passive Observations

Note:

Does the domain resolve to a CDN (Cloudflare, Vercel, AWS CloudFront)?
Are there any CAA records restricting certificate issuance?
Does the IP belong to a known hosting provider? (whois <IP>)

Save all output to recon/dns.txt and recon/tls.txt.

🧪 PHASE 2 — NETWORK INTERCEPTION SETUP

2.1 — Start mitmproxy

mitmdump --save-stream-file captures/traffic_dump.mitm

This starts the proxy on 127.0.0.1:8080 by default and writes all traffic to the dump file.

For interactive inspection during capture, use mitmweb instead of mitmdump — it provides a browser-based UI at http://127.0.0.1:8081.

2.2 — Install and Trust the mitmproxy CA Certificate

With mitmproxy running, open a browser and navigate to: http://mitm.it
Download the certificate for your OS
Trust the certificate:
- macOS: Add to Keychain Access → System → "Always Trust"
- Windows: Install to "Trusted Root Certification Authorities"
- Linux: Copy to /usr/local/share/ca-certificates/ and run sudo update-ca-certificates

2.3 — Verify Interception

curl --proxy http://127.0.0.1:8080 https://worldviewosint.com/ -o /dev/null -w "%{http_code}" -s

Expected: HTTP 200. If you get a certificate error, the CA cert is not trusted correctly. Fix before proceeding.

🧪 PHASE 3 — FULL TRAFFIC CAPTURE

Launch Chromium via Playwright with the mitmproxy proxy enabled. Capture everything.

What to Capture

XHR / fetch requests
WebSocket connections and frames
Script loads (JS bundles, chunks)
Stylesheet and font loads
Image / media requests
Third-party API calls (analytics, CDNs, Telegram, etc.)
Service Worker registrations and fetch events

What to Log for Each Request/Response

Field	Description
`url`	Full request URL
`method`	HTTP method (GET, POST, etc.)
`headers`	All request and response headers
`request_body`	POST/PUT body (if any)
`response_body`	Response content
`status`	HTTP status code
`content_type`	Response content type
`timestamp`	Unix timestamp (ms)
`duration`	Time from request to response (ms)
`initiator`	What triggered the request (script, user action, etc.)

Output Files

captures/network.json — structured request/response log
captures/traffic.har — HAR format for browser devtools import
logs/console.json — browser console output (errors, warnings, logs)

🧪 PHASE 4 — BEHAVIOR SIMULATION

Simulate realistic user behavior to trigger all network activity the site is capable of producing.

Interaction Sequence

Initial page load — observe all requests fired on first visit
Wait 30 seconds — watch for background polling, WebSocket connections, delayed XHR calls
Scroll the full page — some content lazy-loads or triggers on scroll
Click all interactive elements:
- Buttons, toggles, tabs, dropdowns
- Map interactions (zoom, pan, click markers)
- Any navigation links / route changes
Hover over interactive elements — tooltips, popups, info panels
Resize the browser window — responsive breakpoints may load different assets or APIs
Navigate to all visible routes/pages — capture each page's network activity independently
Wait another 30 seconds after all interactions — catch any delayed or periodic calls
Repeat the full sequence once — compare traffic patterns for consistency

What to Watch For

Requests that fire on a timer (polling intervals)
Requests that fire on specific user actions only
WebSocket messages that arrive without user interaction
Differences between first-load and subsequent-load traffic
Requests to domains other than the target (third-party calls)

🧪 PHASE 5 — DATA LOGGING & ORGANIZATION

Structure all captured data for analysis.

Structured Outputs

Unique endpoints list — deduplicated, sorted by domain, then path
Grouped responses by endpoint — all responses for each endpoint in chronological order
Frequency tracking — how often each endpoint is called per minute
Payload size tracking — request and response sizes per endpoint
Timeline view — all requests plotted on a timeline (use timestamps)

Deduplication Rules

Same URL + same method + same request body = duplicate (keep first occurrence, count total)
Same URL + different query params = separate entries
Same URL + different response body = track as "dynamic" endpoint

File Format

All logs as JSON arrays. Example entry:

{
  "url": "https://worldviewosint.com/api/data",
  "method": "GET",
  "status": 200,
  "content_type": "application/json",
  "request_headers": {},
  "response_size": 4523,
  "timestamp": 1716300000000,
  "duration_ms": 142,
  "occurrence_count": 5,
  "classification": "REAL-TIME API"
}

🧪 PHASE 6 — TELEGRAM FORENSICS (CRITICAL)

This is the highest-priority analysis phase. The target may be exfiltrating visitor data to Telegram.

Detection

Filter all captured traffic for any requests to:

api.telegram.org

Also check for:

Obfuscated Telegram calls (base64-encoded URLs, proxied through the target's own backend)
References to t.me or telegram in JavaScript source code

For EACH Telegram Request, Extract:

URL Structure

Telegram Bot API URLs follow this pattern:

https://api.telegram.org/bot<BOT_TOKEN>/<METHOD>

Extract:

Bot token — the string between bot and / (e.g., bot123456:ABC-DEF...)
Method — the API method called (e.g., sendMessage, sendPhoto, sendDocument)

Request Payload

chat_id — the target chat/channel/group
- Positive number = individual user
- Negative number = group chat
- -100 prefix = supergroup or channel
text — message content (may contain visitor data)
parse_mode — formatting mode (HTML, Markdown)
Any file attachments or media

Response

ok field (boolean)
result.message_id — confirms message was sent
Error messages if failed

Bot Identity Verification

Using the extracted bot token, query:

curl https://api.telegram.org/bot<TOKEN>/getMe

This reveals:

Bot username
Bot display name
Whether it can join groups
Whether it's a public bot

Determine for Each Call

Question	How to Answer
What triggers it?	Correlate timestamp with user action timeline from Phase 4
How often?	Count occurrences, check for interval patterns
What data is sent?	Parse the `text` field and any attached data
Does it include visitor IP?	Check text content and any headers forwarded
Does it include user agent?	Check text content
Does it include interaction data?	Check if click/scroll events are referenced
Is it one-way logging or bidirectional?	Check for any response-dependent behavior

Classification

Classify the Telegram integration as:

Visitor logging — sends data about each visitor (IP, UA, etc.)
Alert system — sends notifications on specific events
Analytics/tracking — aggregates and reports interaction data
C2 (command and control) — receives instructions from Telegram (check for polling of getUpdates)
Exfiltration — sends sensitive or identifiable data without user consent

🧪 PHASE 7 — ENDPOINT DISCOVERY

Go beyond observed endpoints. Probe for hidden or undocumented paths.

Step 1 — Harvest from Observed Traffic

From Phase 3-5 captures, extract all unique URL paths. These are your known endpoints.

Step 2 — Check Standard Discovery Files

curl -s https://worldviewosint.com/robots.txt
curl -s https://worldviewosint.com/sitemap.xml
curl -s https://worldviewosint.com/.well-known/security.txt
curl -s https://worldviewosint.com/.well-known/openapi.json

Step 3 — Probe Common Framework Paths

Based on the stack inferred in Phase 1 (DNS) and Phase 10 (Fingerprinting), probe paths relevant to the detected framework:

General

/api/
/api/v1/
/api/v2/
/internal/
/debug/
/admin/
/health
/healthcheck
/status
/metrics
/graphql
/.env
/config.json

Next.js Specific

/_next/data/
/api/hello
/_next/static/
/__nextjs_original-stack-frame

Python/FastAPI Specific

/docs
/redoc
/openapi.json

Vercel Specific

/.vercel/
/_vercel/insights/script.js

Step 4 — Log Results

For each probed path, record:

Field	Value
Path	URL path
Status Code	HTTP response code
Response Size	Content-Length or body length
Content-Type	Response type
Classification	`valid` / `hidden` / `redirect` / `error` / `dead`

Rate Limiting

Wait 500ms between requests — do not hammer the server
If you receive 429 Too Many Requests, back off and note the rate limit headers
If you receive 403 Forbidden, log it but do not retry with bypass techniques

Save results to logs/endpoints.json.

🧪 PHASE 8 — CLIENT-SIDE CODE ANALYSIS

Analyze the JavaScript bundles, source maps, and client-side storage used by the target.

8.1 — JavaScript Bundle Analysis

From the captured traffic (Phase 3), extract all .js files loaded by the site.

For each bundle:

Note the filename pattern (hashed chunks = Webpack/Vite/Next.js build)
Check file size — large bundles may contain embedded data or logic
Search for readable strings: API keys, endpoints, tokens, config objects

# Extract all JS URLs from the HAR file
cat captures/traffic.har | jq -r '.log.entries[].request.url' | grep '\.js' | sort -u

8.2 — Source Map Recovery

Check if source maps are exposed:

# For each JS bundle URL, check for .map suffix
curl -s -o /dev/null -w "%{http_code}" https://worldviewosint.com/_next/static/chunks/main-abc123.js.map

If source maps are available:

Download them
Reconstruct the original source tree
This reveals the full application source code, component structure, and build configuration

8.3 — Hardcoded Secrets Search

In all captured JavaScript, search for:

/api[_-]?key/i
/token/i
/secret/i
/password/i
/telegram/i
/bot[0-9]/i
/chat_id/i
/firebase/i
/supabase/i
/\.env/i

Any hardcoded API keys or tokens are critical findings.

8.4 — Client-Side Storage

During the Playwright session, dump:

// Local Storage
JSON.stringify(Object.entries(localStorage));

// Session Storage
JSON.stringify(Object.entries(sessionStorage));

// Cookies
document.cookie;

// IndexedDB databases
indexedDB.databases();

Look for:

Session tokens or identifiers
Cached API responses
User tracking identifiers (fingerprint hashes, UUIDs)
Analytics IDs

8.5 — Service Worker Analysis

Check if a Service Worker is registered:

navigator.serviceWorker.getRegistrations();

If present:

Download the SW script
Analyze what it caches
Check if it intercepts or modifies requests
Check if it sends background sync / push notification requests

Save findings to recon/source-analysis.md.

🧪 PHASE 9 — DATA SOURCE CLASSIFICATION

For ALL observed data streams (API responses, WebSocket messages, embedded data), classify each one.

Classification Labels

Label	Definition	How to Identify
STATIC	Hardcoded in the page or JS bundles	Same content on every load, present in source code
REAL-TIME API	Fetched live from an external data source	Changes between requests, has timestamps near current time
CACHED	Server-side cached version of live data	Identical responses within a time window, then changes
AGGREGATED	Compiled from multiple sources (RSS, scraping)	Mixed formatting, inconsistent structure, attribution markers
PROXIED	Fetched server-side from another API, served to client	Target domain URL but data structure matches known external API
SYNTHETIC	Fabricated or procedurally generated	Unrealistic values, perfect distributions, no external source match

For Each Data Stream, Document:

Endpoint URL
Data type (JSON, XML, plaintext, binary)
Update frequency (if polled)
Classification label
Confidence level (HIGH / MED / LOW)
Evidence supporting the classification

🧪 PHASE 10 — BACKEND FINGERPRINTING

Identify the server-side technology stack through observable signals.

10.1 — Response Header Analysis

Extract and analyze these headers from ALL responses:

Header	What It Reveals
`server`	Web server software (nginx, Apache, etc.)
`x-powered-by`	Framework (Express, Next.js, PHP, etc.)
`x-vercel-id`	Vercel deployment (confirms serverless hosting)
`x-vercel-cache`	Vercel edge cache status
`x-nextjs-cache`	Next.js ISR/SSR cache status
`cf-ray`	Cloudflare proxy (confirms CDN)
`x-request-id`	Request tracing (common in production systems)
`set-cookie`	Session management, tracking cookies
`content-security-policy`	CSP rules — reveals allowed script/connect sources
`access-control-allow-origin`	CORS configuration — reveals allowed origins
`strict-transport-security`	HSTS configuration
`x-frame-options`	Clickjacking protection
`x-content-type-options`	MIME sniffing protection

10.2 — Inference Table

Observed Signal	Inference
`x-vercel-id` present	Hosted on Vercel (serverless)
`x-powered-by: Next.js`	React SSR/SSG frontend
`server: uvicorn`	Python ASGI backend (likely FastAPI)
`server: nginx`	Reverse proxy or direct server
`cf-ray` present	Behind Cloudflare CDN
`/_next/` paths in URLs	Confirmed Next.js application
`/api/` routes returning JSON	Backend API layer present
WebSocket upgrade headers	Real-time data push capability
`set-cookie: __cf_bm`	Cloudflare bot management active

10.3 — Error Page Fingerprinting

Deliberately trigger error responses to reveal framework information:

curl -s https://worldviewosint.com/nonexistent-page-abc123
curl -s https://worldviewosint.com/api/nonexistent
curl -s -X POST https://worldviewosint.com/

Custom error pages often reveal the framework (Next.js default 404, FastAPI validation errors, etc.).

10.4 — Technology Fingerprinting

If webanalyze or Wappalyzer CLI is available:

webanalyze -host worldviewosint.com -crawl 2

Summary Determination

Document:

Hosting: serverless vs. traditional vs. containerized
Frontend framework: React, Vue, vanilla, etc.
Backend framework: Next.js API routes, FastAPI, Express, etc.
CDN/proxy: Cloudflare, Vercel Edge, none
Caching behavior: edge-cached, stale-while-revalidate, no-cache
Polling frequency: interval between repeated API calls

Save to reports/backend-fingerprint.md.

🧪 PHASE 11 — API REPLAY

Replay observed API calls outside the browser to determine authentication requirements, response stability, and rate limits.

Method

For each discovered API endpoint, replay using curl:

# Basic replay
curl -s "https://worldviewosint.com/api/endpoint" \
  -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" \
  | jq . > replay_response.json

# Compare with captured response
diff <(cat captured_response.json | jq -S .) <(cat replay_response.json | jq -S .)

For Each Endpoint, Determine:

Test	Method	What It Reveals
No headers	`curl URL`	Does it require auth at all?
With browser UA	`curl -H "User-Agent: ..." URL`	UA-based filtering?
With cookies	`curl -b "cookie=value" URL`	Session-dependent?
Without referer	Omit Referer header	Referer checking?
Repeated calls	10 calls in 60 seconds	Rate limited? At what threshold?
Different IP	Via VPN/proxy change	IP-based rate limiting?
After time delay	Same call 5 minutes later	Response changes? (dynamic vs cached)
Parameter manipulation	Change query params	Input validation? Different data?

Comparison Matrix

For each endpoint, record:

Endpoint	Auth Required	Static/Dynamic	Rate Limited	Cache TTL
`/api/...`	yes/no	S/D	yes/no/unknown	seconds

Save to reports/api-map.md.

🧪 PHASE 12 — SIGNAL VALIDATION

Assess whether the data displayed by the target is real, transformed, or fabricated.

Detection Methods

Temporal Analysis

Compare data timestamps to current wall-clock time
Are timestamps plausible? (within expected range for the data type)
Do timestamps progress naturally, or do they repeat/reset?
Is there a fixed offset between "live" data and actual time?

Content Analysis

Are there repeated entries with identical content but different timestamps?
Do severity/priority values follow a realistic distribution, or are they uniform/random?
Do geographic coordinates correspond to real locations?
Do entity names (ships, aircraft, people) resolve to real-world entities?

Behavioral Analysis

Does data update on a fixed interval regardless of world events?
Does "real-time" data change when the page is backgrounded (tab not visible)?
Does refreshing the page show the same "live" data or different data?
Are update intervals suspiciously regular (exactly every N seconds)?

Cross-Reference

Compare claimed data against known public sources:
- Earthquakes → USGS API (earthquake.usgs.gov)
- Aircraft → ADS-B Exchange, FlightRadar24
- Maritime → MarineTraffic, AIS data
- News → Original RSS source articles
- Weather → NOAA, OpenWeatherMap

Statistical Methods

Calculate entropy of data fields — low entropy suggests synthetic/templated data
Check distribution of numeric values — uniform distribution is unnatural
Look for copy-paste artifacts (identical phrasing, formatting)

Confidence Assignment

Level	Criteria
HIGH (real)	Data matches external sources, timestamps are current, content is unique
MED (transformed)	Data resembles real sources but is reformatted, delayed, or aggregated
LOW (synthetic)	Data has no external source match, shows statistical anomalies, or is clearly fabricated

Save to reports/data-authenticity.md.

🧪 PHASE 13 — BEHAVIOR MODELING

Reconstruct the complete system logic from observed evidence.

Data Lifecycle Model

Map the full path of data through the system:

[External Source] → [Ingestion Method] → [Server-Side Processing] → [API Response] → [Client-Side Rendering] → [User Display]

For each data type, document:

Source — where does the data originate?
Ingestion — how does the server get it? (API call, scrape, WebSocket, RSS)
Transformation — is it modified? (reformatted, filtered, enriched, delayed)
Delivery — how does it reach the browser? (REST API, WebSocket, SSR, embedded in HTML)
Rendering — what UI component displays it?

UI → Network Relationship Map

For each user-facing feature:

UI Element	User Action	Network Request Triggered	Endpoint
Globe	Click marker	GET /api/...	...
Feed panel	Scroll	GET /api/...	...
Auto-refresh	None (timer)	GET /api/... every Ns	...

System Flow Diagram

Produce a text-based or Mermaid diagram:

graph TD
    A[User visits site] --> B[Initial page load]
    B --> C[JS bundles loaded]
    C --> D[API calls fire]
    D --> E[Data rendered in UI]
    D --> F[Telegram notification sent?]
    E --> G[Polling loop begins]
    G --> D

Purpose Classification

Based on all evidence, classify the system's purpose:

Visualization tool — displays data for informational purposes
Monitoring dashboard — actively watches for events/thresholds
Tracking/logging tool — records visitor behavior
Honeypot — designed to attract and monitor visitors
Portfolio/demo — non-functional showcase

Save to reports/architecture.md.

🧪 PHASE 14 — SECURITY ANALYSIS

Evaluate the security posture and risk profile of the target system.

14.1 — Data Exfiltration Assessment

Vector	Check	Finding
Telegram API calls	Phase 6 results	What data is sent?
Third-party analytics	Google Analytics, Mixpanel, etc.	What is tracked?
Pixel tracking	1x1 images, tracking pixels	Present? To where?
WebSocket exfiltration	Data sent via WS to non-target origins	Present?

14.2 — Visitor Tracking Assessment

Technique	How to Detect
IP logging	Check Telegram payloads for IP addresses
Browser fingerprinting	Look for canvas, WebGL, AudioContext, font enumeration in JS
Cookie tracking	Unique identifiers in cookies that persist across sessions
Local storage tracking	UUIDs or fingerprint hashes stored client-side
Session recording	Look for FullStory, Hotjar, LogRocket scripts

14.3 — Application Security

Check	Method
Content Security Policy	Read CSP header — is it restrictive or permissive?
CORS configuration	Is `Access-Control-Allow-Origin: *`? (overly permissive)
Mixed content	Any HTTP resources loaded on HTTPS page?
Exposed secrets	API keys, tokens in JS source (Phase 8)
Information disclosure	Verbose error messages, stack traces, debug headers
Exposed source maps	`.js.map` files accessible (Phase 8)
Open redirects	URL parameters that control redirects
Clickjacking protection	X-Frame-Options or CSP frame-ancestors set?

14.4 — Privacy Compliance

Does the site display a cookie consent banner?
Does it have a privacy policy?
Does the privacy policy disclose Telegram data transmission?
Are tracking cookies set before user consent?
Does it comply with GDPR/CCPA requirements?

14.5 — Risk Classification

For each finding, assign a risk level:

Level	Criteria	Examples
LOW	Informational, no direct user harm	Technology disclosure via headers
MEDIUM	Privacy concern or minor security issue	Tracking without disclosure, permissive CORS
HIGH	Active data exfiltration or serious security flaw	Sending visitor IPs to Telegram, exposed API keys, no CSP
CRITICAL	Malicious intent or severe vulnerability	C2 communication, credential harvesting, malware delivery

Provide specific justification for each rating — evidence, not opinion.

Save to reports/security-assessment.md.

🧬 PHASE 15 — ADVANCED CORRELATION

Cross-analyze findings from all previous phases to detect patterns invisible in isolation.

Correlation Targets

Payload Similarity

Do different endpoints return data with shared structures or field names?
Are the same data objects referenced across multiple API responses?
Do Telegram payloads contain data from specific API responses?

Timing Correlation

Do API calls and Telegram calls happen at the same time? (piggyback exfiltration)
Is there a fixed delay between data ingestion and display?
Do polling intervals match any external data source's update frequency?

Behavioral Loops

Does the system exhibit cycles? (e.g., every 60s: fetch data → display → report to Telegram)
Do "live" updates follow a deterministic pattern? (same sequence repeating)
Is "randomness" in the data actually pseudorandom with a detectable seed?

Synthetic Behavior Detection

Signal	Indicates
Data updates at exact intervals (e.g., every 30.0s)	Polling, not real-time push
Data changes but structure stays identical	Template-based generation
"Breaking" events appear on a schedule	Scripted, not organic
All data sources update simultaneously	Single orchestrator, not independent feeds

Output

Produce a correlation matrix documenting:

Which data streams are related
Which events are causally linked
Any evidence of fabrication or simulation

🧬 PHASE 16 — SYSTEM INTENT INFERENCE

Based on ALL evidence gathered across Phases 0–15, determine what this system truly is.

Decision Framework

Classification	Evidence Required
Production OSINT platform	Real data sources, original analysis, functional backend, no synthetic data
Monitoring dashboard	Live data feeds, threshold alerts, operational indicators
Portfolio / demo	Synthetic or cached data, no real backend processing, impressive UI with shallow depth
Tracking / surveillance tool	Primary function is collecting visitor data (Telegram exfil, fingerprinting)
Honeypot	Deliberately attracts security researchers or specific audiences, logs all visitors
Prototype / MVP	Partial functionality, mix of real and placeholder data, incomplete features

Key Questions

Does it do what it claims to do? (Does the OSINT data represent real intelligence?)
Who is the intended audience? (Analysts, clients, the public, or no one?)
Is visitor tracking a side effect or the primary purpose?
How much engineering effort is behind it? (Sophisticated backend or API-wrapper frontend?)
Is it commercially operated or a personal project?

Confidence Rating

Assign an overall confidence level to your classification:

HIGH — multiple independent evidence streams support the conclusion
MED — evidence supports it but alternative explanations exist
LOW — insufficient evidence for a definitive conclusion

Save to reports/final-classification.md.

🧪 PHASE 17 — OPEN-SOURCE INTELLIGENCE TOOL RECON (GITHUB + ECOSYSTEM)

Identify and correlate the tools, frameworks, and data sources used by the target system by searching open-source ecosystems.

🎯 Objective

Determine whether the application is built from:

Known open-source OSINT platforms
Cloned or forked dashboards
Publicly available data pipelines
Common frontend visualization stacks

🔍 Search Targets

GitHub

Search for repositories related to:

"osint dashboard"
"worldview osint"
"osint globe"
"3d globe intelligence"
"cesium osint"
"react osint dashboard"
"next.js osint"
"threat intelligence dashboard"

Package Registries

npm: Search for OSINT, globe, geospatial, cesium packages
PyPI: Search for OSINT, intelligence, data aggregation libraries

Other Sources

Developer blogs and tutorials
OSINT tool directories (OSINT Framework, IntelTechniques)
GitHub Awesome lists (awesome-osint, awesome-threat-intelligence)

🔎 Keyword Sets

Use variations of:

"OSINT platform"
"global intelligence dashboard"
"C4ISR dashboard"
"threat monitoring dashboard"
"real-time geospatial intelligence"
"incident monitoring globe"
"cyber threat map"

🧪 Component Identification

From observed behavior + code patterns in Phase 8, match against known tools:

Frontend Libraries

Component	Look For
CesiumJS	3D globe rendering, `cesium.com` in network traffic
Mapbox GL	`mapbox.com` tokens or API calls
Leaflet	`leafletjs.com` references, L.map() calls
Three.js / Globe.gl	WebGL globe rendering
Deck.gl	Geospatial data layers
D3.js	Custom visualizations, SVG elements
React / Next.js	`_next/` paths, React devtools markers

Backend / Data

Component	Look For
FastAPI	`/docs` or `/redoc` endpoints, `uvicorn` header
Express	`x-powered-by: Express` header
Vercel Functions	`x-vercel-id` header, `/api/` routes
Supabase	`supabase.co` in network traffic
Firebase	`firebase.googleapis.com` calls

Data Sources

Source	Data Type	Verification URL
USGS	Earthquakes	`earthquake.usgs.gov/fdsnws/`
ADS-B Exchange	Aircraft tracking	`adsbexchange.com`
MarineTraffic / AIS	Ship tracking	`marinetraffic.com`
GDELT	Global events	`api.gdeltproject.org`
RSS feeds	News aggregation	Various
ACLED	Conflict data	`acleddata.com`

🔗 Correlation Process

For each discovered open-source project, compare with the target system:

UI similarity — visual layout, color scheme, component arrangement
Data structures — API response shapes, field names, nesting patterns
Endpoint naming — /api/events, /api/threats, etc.
Visualization style — globe type, marker styles, panel layouts
Feature overlap — same feature set or subset?
Code structure — if source maps are available, compare component trees

🧬 Clone / Template Detection

Determine if the system is:

Classification	Criteria
Direct clone	Identical or near-identical to a known repo, minimal changes
Fork/modification	Based on a known project but customized significantly
Inspired by	Similar concept but different implementation
Original	No significant matches found in open-source ecosystem

🧠 Confidence Scoring

For each match:

Level	Criteria
HIGH	Nearly identical structure, shared code, same API calls
MED	Similar architecture and components, different implementation
LOW	General conceptual similarity only

📊 Output

Produce:

List of matching repositories/tools with URLs
Description of similarities for each match
Likelihood of shared origin (HIGH / MED / LOW)
Identified tech stack components with evidence
Assessment: original work vs. assembled from existing tools

⚠️ Rules

Prioritize high-similarity matches — don't pad the report with weak matches
Ignore generic tools unless clearly relevant to the target
Focus on structural and functional overlap, not surface-level appearance
Remain evidence-based — link every claim to a specific observation

📊 OUTPUT REQUIREMENTS

Produce a structured final report covering all findings. Each section should be a standalone document in the reports/ directory.

1. Architecture Diagram (`reports/architecture.md`)

Text-based and/or Mermaid diagram
Shows all system components and data flows
Includes external dependencies and third-party services

2. API Map (`reports/api-map.md`)

Every discovered endpoint
Method, auth requirements, response type
Classification (public, hidden, dead)
Rate limit behavior

3. Telegram Analysis (`reports/telegram-report.md`)

All Telegram endpoints called
Bot identity (token, username)
Payload contents and frequency
Purpose classification
Risk assessment specific to Telegram usage

4. Data Authenticity Report (`reports/data-authenticity.md`)

Classification of each data stream
Confidence levels with evidence
Cross-reference results against public sources

5. Backend Fingerprint (`reports/backend-fingerprint.md`)

Confirmed technology stack
Hosting provider and infrastructure
Caching and performance characteristics

6. Security Risk Assessment (`reports/security-assessment.md`)

All findings with risk levels
Privacy compliance assessment
Visitor tracking inventory
Recommendations (if applicable)

7. Final Classification (`reports/final-classification.md`)

System type: demo / prototype / operational / surveillance / other
Confidence level with justification
Summary of all supporting evidence
Open questions or areas needing further investigation

🧠 AUTOMATION SCRIPT (Playwright + Proxy)

Save this as scripts/capture.js and run with node scripts/capture.js.

const { chromium } = require('playwright');
const fs = require('fs');
const path = require('path');

const TARGET = 'https://worldviewosint.com/';
const PROXY = 'http://127.0.0.1:8080';
const CAPTURE_DIR = path.join(__dirname, '..', 'captures');
const LOG_DIR = path.join(__dirname, '..', 'logs');

// Ensure output directories exist
[CAPTURE_DIR, LOG_DIR].forEach(dir => {
  if (!fs.existsSync(dir)) fs.mkdirSync(dir, { recursive: true });
});

(async () => {
  console.log('[*] Launching browser with proxy:', PROXY);

  const browser = await chromium.launch({
    headless: false,
    proxy: { server: PROXY }
  });

  const context = await browser.newContext({
    recordHar: { path: path.join(CAPTURE_DIR, 'traffic.har') },
    userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
  });

  const page = await context.newPage();
  const networkLogs = [];
  const consoleLogs = [];
  const telegramLogs = [];

  // Capture all network requests
  page.on('request', req => {
    const entry = {
      type: 'request',
      timestamp: Date.now(),
      url: req.url(),
      method: req.method(),
      headers: req.headers(),
      body: req.postData() || null,
      resource_type: req.resourceType()
    };
    networkLogs.push(entry);

    if (req.url().includes('api.telegram.org')) {
      telegramLogs.push({ ...entry, direction: 'outbound' });
      console.log('[!] TELEGRAM REQUEST:', req.method(), req.url());
    }
  });

  // Capture all network responses
  page.on('response', async res => {
    let body = null;
    try { body = await res.text(); } catch {}

    const entry = {
      type: 'response',
      timestamp: Date.now(),
      url: res.url(),
      status: res.status(),
      headers: res.headers(),
      body: body,
      content_type: res.headers()['content-type'] || null
    };
    networkLogs.push(entry);

    if (res.url().includes('api.telegram.org')) {
      telegramLogs.push({ ...entry, direction: 'inbound' });
      console.log('[!] TELEGRAM RESPONSE:', res.status(), res.url());
    }
  });

  // Capture browser console output
  page.on('console', msg => {
    consoleLogs.push({
      timestamp: Date.now(),
      type: msg.type(),
      text: msg.text(),
      location: msg.location()
    });
  });

  // Capture page errors
  page.on('pageerror', err => {
    consoleLogs.push({
      timestamp: Date.now(),
      type: 'error',
      text: err.message,
      stack: err.stack
    });
  });

  // --- Phase 1: Initial load ---
  console.log('[*] Loading target:', TARGET);
  await page.goto(TARGET, { waitUntil: 'networkidle' });
  console.log('[*] Page loaded. Waiting 30s for background activity...');
  await page.waitForTimeout(30000);

  // --- Phase 2: Scroll the page ---
  console.log('[*] Scrolling page...');
  await page.evaluate(() => {
    return new Promise(resolve => {
      let distance = 0;
      const step = 300;
      const interval = setInterval(() => {
        window.scrollBy(0, step);
        distance += step;
        if (distance >= document.body.scrollHeight) {
          clearInterval(interval);
          window.scrollTo(0, 0);
          resolve();
        }
      }, 200);
    });
  });
  await page.waitForTimeout(3000);

  // --- Phase 3: Click interactive elements ---
  console.log('[*] Clicking interactive elements...');
  const clickable = await page.$$('button, [role="button"], a[href], .clickable, [onclick]');
  for (const el of clickable.slice(0, 20)) {
    try {
      await el.click({ timeout: 2000 });
      await page.waitForTimeout(1500);
    } catch {}
  }

  // --- Phase 4: Wait for more background activity ---
  console.log('[*] Waiting 30s for additional background activity...');
  await page.waitForTimeout(30000);

  // --- Phase 5: Dump client-side storage ---
  console.log('[*] Dumping client-side storage...');
  const storage = await page.evaluate(() => {
    const ls = {};
    for (let i = 0; i < localStorage.length; i++) {
      const key = localStorage.key(i);
      ls[key] = localStorage.getItem(key);
    }
    const ss = {};
    for (let i = 0; i < sessionStorage.length; i++) {
      const key = sessionStorage.key(i);
      ss[key] = sessionStorage.getItem(key);
    }
    return {
      localStorage: ls,
      sessionStorage: ss,
      cookies: document.cookie
    };
  });

  // --- Save all outputs ---
  console.log('[*] Saving captures...');

  // Close context first to finalize HAR
  await context.close();

  fs.writeFileSync(
    path.join(CAPTURE_DIR, 'network.json'),
    JSON.stringify(networkLogs, null, 2)
  );
  fs.writeFileSync(
    path.join(LOG_DIR, 'console.json'),
    JSON.stringify(consoleLogs, null, 2)
  );
  fs.writeFileSync(
    path.join(LOG_DIR, 'telegram.json'),
    JSON.stringify(telegramLogs, null, 2)
  );
  fs.writeFileSync(
    path.join(LOG_DIR, 'cookies.json'),
    JSON.stringify(storage, null, 2)
  );

  await browser.close();

  // --- Summary ---
  console.log('\n[*] Capture complete.');
  console.log(`    Total network entries: ${networkLogs.length}`);
  console.log(`    Console entries: ${consoleLogs.length}`);
  console.log(`    Telegram entries: ${telegramLogs.length}`);
  console.log(`    Files saved to: ${CAPTURE_DIR} and ${LOG_DIR}`);

  if (telegramLogs.length > 0) {
    console.log('\n[!] WARNING: Telegram API traffic detected! Review logs/telegram.json immediately.');
  }
})();

🔧 QUICK REFERENCE — COMMAND CHEAT SHEET

# Start mitmproxy (run in a separate terminal)
mitmdump --save-stream-file captures/traffic_dump.mitm

# Run the capture script
node scripts/capture.js

# Extract unique endpoints from captured traffic
cat captures/network.json | jq -r '.[].url' | sort -u > logs/endpoints.txt

# Filter Telegram traffic only
cat captures/network.json | jq '[.[] | select(.url | contains("telegram"))]' > logs/telegram_filtered.json

# Check for exposed source maps
cat captures/traffic.har | jq -r '.log.entries[].request.url' | grep '\.js$' | while read url; do
  curl -s -o /dev/null -w "%{http_code} $url.map\n" "${url}.map"
done

# WHOIS lookup
whois worldviewosint.com > recon/dns.txt

# TLS certificate check
echo | openssl s_client -connect worldviewosint.com:443 -servername worldviewosint.com 2>/dev/null | openssl x509 -noout -text > recon/tls.txt

# Replay an API endpoint
curl -s "https://worldviewosint.com/api/ENDPOINT" -H "User-Agent: Mozilla/5.0" | jq .

# Count requests per domain
cat captures/network.json | jq -r '.[].url' | awk -F/ '{print $3}' | sort | uniq -c | sort -rn

Legal & Ethical Disclaimer

This framework is provided for authorized security research and educational purposes only. Only test, scan, intercept, or analyze systems that you own or for which you have explicit written permission. Unauthorized reconnaissance or interception of systems you do not own may violate computer-misuse, wiretap, and other laws. You are solely responsible for ensuring your use complies with all applicable laws and regulations. The author provides this software as-is, with no warranty, and accepts no liability for any misuse or damage.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
captures		captures
logs		logs
recon		recon
reports		reports
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
NEXT-STEPS.md		NEXT-STEPS.md
README.md		README.md
SECURITY.md		SECURITY.md
package-lock.json		package-lock.json
package.json		package.json
wordlist.txt		wordlist.txt

Folders and files

Latest commit

History

Repository files navigation

🧠 Ultimate Web Intelligence & Reverse Engineering Lab

🎯 Mission

⚠️ Operating Principles

📁 PROJECT STRUCTURE

🧰 TOOLCHAIN SETUP

Core

Browser Automation

Network Interception

DNS / Domain Recon

Optional

Verification

🌐 ARCHITECTURE STACK

🧪 PHASE 0 — ENVIRONMENT PREPARATION

Operational Security

Environment Checklist

🧪 PHASE 1 — DNS & DOMAIN RECONNAISSANCE

1.1 — WHOIS Lookup

1.2 — DNS Records

1.3 — Subdomain Enumeration

1.4 — TLS Certificate Analysis

1.5 — Passive Observations

🧪 PHASE 2 — NETWORK INTERCEPTION SETUP

2.1 — Start mitmproxy

2.2 — Install and Trust the mitmproxy CA Certificate

2.3 — Verify Interception

🧪 PHASE 3 — FULL TRAFFIC CAPTURE

What to Capture

What to Log for Each Request/Response

Output Files

🧪 PHASE 4 — BEHAVIOR SIMULATION

Interaction Sequence

What to Watch For

🧪 PHASE 5 — DATA LOGGING & ORGANIZATION

Structured Outputs

Deduplication Rules

File Format

🧪 PHASE 6 — TELEGRAM FORENSICS (CRITICAL)

Detection

For EACH Telegram Request, Extract:

URL Structure

Request Payload

Response

Bot Identity Verification

Determine for Each Call

Classification

🧪 PHASE 7 — ENDPOINT DISCOVERY

Step 1 — Harvest from Observed Traffic

Step 2 — Check Standard Discovery Files

Step 3 — Probe Common Framework Paths

General

Next.js Specific

Python/FastAPI Specific

Vercel Specific

Step 4 — Log Results

Rate Limiting

🧪 PHASE 8 — CLIENT-SIDE CODE ANALYSIS

8.1 — JavaScript Bundle Analysis

8.2 — Source Map Recovery

8.3 — Hardcoded Secrets Search

8.4 — Client-Side Storage

8.5 — Service Worker Analysis

🧪 PHASE 9 — DATA SOURCE CLASSIFICATION

Classification Labels

For Each Data Stream, Document:

🧪 PHASE 10 — BACKEND FINGERPRINTING

10.1 — Response Header Analysis

10.2 — Inference Table

10.3 — Error Page Fingerprinting

10.4 — Technology Fingerprinting

Summary Determination

🧪 PHASE 11 — API REPLAY

Method

For Each Endpoint, Determine:

Comparison Matrix

🧪 PHASE 12 — SIGNAL VALIDATION

Detection Methods

1. Architecture Diagram (`reports/architecture.md`)

2. API Map (`reports/api-map.md`)

3. Telegram Analysis (`reports/telegram-report.md`)

4. Data Authenticity Report (`reports/data-authenticity.md`)

5. Backend Fingerprint (`reports/backend-fingerprint.md`)

6. Security Risk Assessment (`reports/security-assessment.md`)

7. Final Classification (`reports/final-classification.md`)