Project Requirements Document (PRD): Scrybe – Browser Behavior Intelligence System
Vision Statement Scrybe is a high-fidelity, Rust-powered browser observation system designed to detect and understand automation with forensic granularity. It is equal parts data collector, behavior profiler, and session fingerprint historian—engineered to act as a sophisticated anti-bot detection engine and training ground for resilient bot defenses. More than a passive observer, Scrybe is a vigilant system that watches browsers with contextual memory and scientific rigor. Its mission is not just to block bots—it’s to understand them, adapt to them, and learn from every interaction.
Mascot & Persona: Scrybe
Name: Scrybe
Species: Autonomous Rust Intelligence
Personality: Scholarly, curious, and unflinchingly meticulous. Scrybe documents all who visit its domain—not to judge, but to remember. Every movement, header, and anomaly becomes a piece of a broader behavioral mosaic. Humans find Scrybe charming. Bots find it uncanny.
Greeting: “Welcome, traveler. I am Scrybe. You have just gifted me a fingerprint. My task is to remember it, enrich it, and test its truth.”
Techniques & Approach (Embedded from Market Leaders & Research) Scrybe draws its strategic depth from the world-class practices observed in Cloudflare, HUMAN Security, and academic literature on bot mitigation:
Multi-layer fingerprinting: Inspired by JA3/JA4 and HTTP2/3 signature strategies, Scrybe captures signals from TLS, ALPN, HTTP version/frame order, and header permutations.
JS surface entropy: Scrybe uses browser-level entropy tests similar to those seen in Cloudflare’s canvas/audio/DOM fingerprinting.
Behavioral modeling: Every interaction is timestamped and charted—keystroke rhythm, scroll deltas, cursor entropy—feeding into a behavioral signature graph.
Per-client anomaly detection: Like Cloudflare’s per-zone ML models, Scrybe learns behavioral baselines per session ID and flags deviation vectors.
Cryptographic identity layering: A future module supports signed client tokens and authenticated automation (like Cloudflare's Verified Bots protocol).
Privacy-aware collection: Scrybe uses salted hashes and does not collect PII or input field data. Privacy and transparency are native to its architecture.
Core Functional Requirements
Session Initialization & Identity Tracking
On first contact, Scrybe assigns a deterministic UUID tied to a fingerprint hash.
Subsequent sessions are associated via cookie/localStorage + passive fingerprint correlation.
Multi-Layer Signal Harvesting
Network/Transport Layer:
TLS JA3/JA4, cipher suite, ALPN, TCP options.
IP, ASN, reverse DNS, proxy/VPN suspicion levels.
HTTP Layer:
Header ordering, Accept string divergence, version negotiation.
Cache behavior, compression negotiation quirks.
Browser & JS Environment:
DOM features, timezone/language drift, font fingerprint.
Canvas/WebGL/audio hash, JS function hooks (e.g. navigator.webdriver).
Behavioral Telemetry:
Scroll cadence, pointer velocity, click density maps.
Visibility toggling, refocus/blur jitter, idle/resume curves.
Data Enrichment Pipeline
Resolve IPs to geo/ASN.
Compute composite device hash (with pluggable fingerprint engine).
Apply fingerprint similarity scoring and clustering (e.g., MinHash or cosine distance).
Model Training + Threat Intelligence Hooks
Feed session data into behavioral clustering models (HDBSCAN, Isolation Forests).
Optional integration into supervised models trained on known bot corpora.
Exportable anomalies (e.g., timestamp gaps, velocity spikes, repeat fingerprint collisions).
Data Infrastructure
Primary: Clickhouse for immutable session telemetry.
Session cache: Redis or SQLite for fast local correlation.
Optional: Kafka pipeline for real-time session stream enrichment.
Dashboard & Analyst Interface
React-based dashboard to visualize session replays, behavior trees, entropy maps.
Filter and compare across sessions by entropy score, origin, anomaly type.
Security & Ethics
Opt-out header respected (Do Not Track or specific header flag).
Fingerprinting limited to non-invasive surfaces; no DOM scraping or field logging.
Logs are rotated, and hash salts refreshed periodically.
Architectural Blueprint (System View)
+------------+ +---------------+ +------------------+ +---------------+ | Browser | --> | Ingestion | --> | Enrichment & ML | --> | Clickhouse | | (JS SDK) | | Gateway/API | | Fingerprinting | | Storage | +------------+ +---------------+ +------------------+ +---------------+ | | v v +----------------+ +----------------+ | Session Cache | | Analyst UI/API | | (Redis/LS) | | Dashboard | +----------------+ +----------------+
Naming Justification Scrybe evokes both “scribe” (recorder of truth) and “scrying” (the act of divining hidden meaning). It reinforces the system’s goal: to record, analyze, and interpret browser identity and intent in real time.
Let me know if you'd like this PRD split into RFCs—for JS collection agent, Rust core ingestion service, or the enrichment ML layer. We can architect each module like a command center inside Scrybe's mind.