Welcome to EVERYTHING, a sophisticated, lightweight, and entirely open-source search architecture. Unlike black-box commercial engines, EVERYTHING prioritizes transparency by utilizing foundational information retrieval algorithmsโspecifically BM25 relevance scoring and PageRank authority analysisโto deliver precise results without the overhead of data tracking or invasive telemetry.
Experience the engine live at EVERYTHING.
- Architectural Efficiency: Engineered for high-velocity performance, ensuring a "fast and furious" user experience.
- Privacy Preserving: A zero-data-retention policy ensures maximum security and user anonymity.
- Academic Transparency: Fully open-sourced to empower developers and researchers exploring information retrieval.
- Collaborative Growth: Supported by a robust and expanding community of open-source enthusiasts.
- Authentic Web Exploration: An automated crawler that traverses the live web starting from curated seed URLs.
- Extensive Domain Indexing: Currently tracking 357+ unique domains and their complex inter-link relationships.
- Modern Relevance Ranking: Implementation of the Okapi BM25 algorithm, the industry standard for full-text search.
- Authority-Based Ranking: Integration of the PageRank algorithm to evaluate page importance based on link topology.
- Synergistic Hybrid Model: A refined ranking formula that synthesizes 80% content relevance with 20% link authority.
- Serendipitous Discovery: A "Click Me!" feature for random query generation and serendipitous web exploration.
Improved TF-IDF algorithm that:
- Tokenizes queries and removes stop words
- Calculates document frequency and term frequency
- Normalizes by document length to prevent bias
- Uses saturation parameters (k1=1.5, b=0.75) for natural ranking
score = IDF * (f_q * (k1 + 1)) / (f_q + k1 * (1 - b + b * (docLen / avgLen)))Ranks pages by link authority:
- Models the web as a directed graph
- Iterates 20 times to convergence
- Uses damping factor (d=0.85) for random surfer model
- Distributes rank through outgoing links
- Handles sink pages (dead ends)
PR(A) = (1-d)/N + d * ฮฃ(PR(T)/C(T))
Combines both signals:
- 80% BM25 Score (Content Relevance)
- 20% PageRank Score (Link Authority)
- Normalizes both to 0-1 scale for fair weighting
The crawler automatically discovers and indexes websites:
node crawler.jsWhat it does:
- Starts from seed URLs (GitHub, Stack Overflow, Wikipedia, etc.)
- Extracts links from each page using Cheerio
- Crawls up to 500 pages with concurrency limit of 15
- Extracts page metadata (title, description)
- Builds link graph relationships
- Saves to
data/crawled-web.json
Output:
- 357 unique domains
- 960+ link relationships
- Full page metadata
- ~116KB JSON file
- Frontend: Vanilla JavaScript (no dependencies)
- Crawler: Node.js + Axios + Cheerio
- Algorithms: BM25, PageRank from scratch
- Data: JSON-based index
- Hosting: GitHub Pages
- Type a query in the search box
- Click the search button (๐)
- View ranked results with:
- Page title and link
- Description
- Relevant tags
Click the "Click Me!" button to:
- Generate a random search query
- Automatically search
- Discover new pages across the index
- Help (
struct/help.html) - Usage guide - Stats (
struct/stats.html) - Search statistics - Updates (
struct/updates.html) - Project updates
Index Stats:
- Domains: 2,500
- Total Links: 1000+
- Index Size: ~373 KB
- Search Time: ~15ms (with BM25 + PageRank)
Ranking Weights:
- Content Relevance (BM25): 80%
- Link Authority (PageRank): 20%
Edit search weights in script/main.js:
// Adjust ranking formula (currently 80/20)
combinedScore: (normBM25 * 0.8) + (normPR * 0.2)PageRank parameters:
const d = 0.85; // Damping factor
const iterations = 20; // Convergence iterationsBM25 parameters:
const k1 = 1.5; // Saturation parameter
const b = 0.75; // Length normalizationsearch(query) โ Array<Result>Returns ranked results sorted by combined score.
{
domain: "github.com",
title: "GitHub",
description: "Where the world builds software",
tags: ["code", "development", ...],
content: "full text content",
combinedScore: 0.87,
relevance: 0.92,
pageRankScore: 0.65
}loadWebData() // Loads crawled-web.json
loadDemoData() // Fallback demo data
calculateBM25(query) // BM25 scoring
calculatePageRank() // PageRank calculationnpm install axios cheerionode crawler.jsnpx http-serverThen visit http://localhost:8080
This project implements real search engine algorithms:
- BM25: Okapi BM25 - industry standard relevance ranking
- PageRank: Google's original link-based ranking algorithm
- Web Crawling: DOM parsing and graph building
Great for learning:
- How search engines work
- Information retrieval concepts
- Web crawling techniques
- Graph algorithms
MIT -- feel free to use, clone, pull and modify!
- Query expansion and synonyms
- Personalization based on history
- Advanced filtering (date, domain, type)
- Search analytics and trending
- Multi-language support
- Snippet generation
- Real-time index updates
- Distributed crawling
The README.md was made by Ai due to time limatations