EVERYTHING: A Transparent, Algorithmic Web Discovery Engine

Welcome to EVERYTHING, a sophisticated, lightweight, and entirely open-source search architecture. Unlike black-box commercial engines, EVERYTHING prioritizes transparency by utilizing foundational information retrieval algorithms—specifically BM25 relevance scoring and PageRank authority analysis—to deliver precise results without the overhead of data tracking or invasive telemetry.

Experience the engine live at EVERYTHING.

🎯 Strategic Objectives

Architectural Efficiency: Engineered for high-velocity performance, ensuring a "fast and furious" user experience.
Privacy Preserving: A zero-data-retention policy ensures maximum security and user anonymity.
Academic Transparency: Fully open-sourced to empower developers and researchers exploring information retrieval.
Collaborative Growth: Supported by a robust and expanding community of open-source enthusiasts.

🚀 Technical Features

Authentic Web Exploration: An automated crawler that traverses the live web starting from curated seed URLs.
Extensive Domain Indexing: Currently tracking 357+ unique domains and their complex inter-link relationships.
Modern Relevance Ranking: Implementation of the Okapi BM25 algorithm, the industry standard for full-text search.
Authority-Based Ranking: Integration of the PageRank algorithm to evaluate page importance based on link topology.
Synergistic Hybrid Model: A refined ranking formula that synthesizes 80% content relevance with 20% link authority.
Serendipitous Discovery: A "Click Me!" feature for random query generation and serendipitous web exploration.

For DEVS:

🔍 How It Works

1. BM25 Relevance Scoring

Improved TF-IDF algorithm that:

Tokenizes queries and removes stop words
Calculates document frequency and term frequency
Normalizes by document length to prevent bias
Uses saturation parameters (k1=1.5, b=0.75) for natural ranking

score = IDF * (f_q * (k1 + 1)) / (f_q + k1 * (1 - b + b * (docLen / avgLen)))

2. PageRank Algorithm

Ranks pages by link authority:

Models the web as a directed graph
Iterates 20 times to convergence
Uses damping factor (d=0.85) for random surfer model
Distributes rank through outgoing links
Handles sink pages (dead ends)

PR(A) = (1-d)/N + d * Σ(PR(T)/C(T))

3. Hybrid Ranking

Combines both signals:

80% BM25 Score (Content Relevance)
20% PageRank Score (Link Authority)
Normalizes both to 0-1 scale for fair weighting

🕷️ Web Crawler

The crawler automatically discovers and indexes websites:

node crawler.js

What it does:

Starts from seed URLs (GitHub, Stack Overflow, Wikipedia, etc.)
Extracts links from each page using Cheerio
Crawls up to 500 pages with concurrency limit of 15
Extracts page metadata (title, description)
Builds link graph relationships
Saves to data/crawled-web.json

Output:

357 unique domains
960+ link relationships
Full page metadata
~116KB JSON file

💻 Technology Stack

Frontend: Vanilla JavaScript (no dependencies)
Crawler: Node.js + Axios + Cheerio
Algorithms: BM25, PageRank from scratch
Data: JSON-based index
Hosting: GitHub Pages

🎮 Usage

Search

Type a query in the search box
Click the search button (🔍)
View ranked results with:
- Page title and link
- Description
- Relevant tags

Random Discovery

Click the "Click Me!" button to:

Generate a random search query
Automatically search
Discover new pages across the index

Available Pages

Help (struct/help.html) - Usage guide
Stats (struct/stats.html) - Search statistics
Updates (struct/updates.html) - Project updates

📊 Performance

Index Stats:

Domains: 2,500
Total Links: 1000+
Index Size: ~373 KB
Search Time: ~15ms (with BM25 + PageRank)

Ranking Weights:

Content Relevance (BM25): 80%
Link Authority (PageRank): 20%

🔧 Configuration

Edit search weights in script/main.js:

// Adjust ranking formula (currently 80/20)
combinedScore: (normBM25 * 0.8) + (normPR * 0.2)

PageRank parameters:

const d = 0.85;           // Damping factor
const iterations = 20;    // Convergence iterations

BM25 parameters:

const k1 = 1.5;           // Saturation parameter
const b = 0.75;           // Length normalization

📝 API Reference

Search Function

search(query) → Array<Result>

Returns ranked results sorted by combined score.

Result Object

{
  domain: "github.com",
  title: "GitHub",
  description: "Where the world builds software",
  tags: ["code", "development", ...],
  content: "full text content",
  combinedScore: 0.87,
  relevance: 0.92,
  pageRankScore: 0.65
}

Data Loading

loadWebData()          // Loads crawled-web.json
loadDemoData()         // Fallback demo data
calculateBM25(query)   // BM25 scoring
calculatePageRank()    // PageRank calculation

🚀 Development

Install Dependencies (for crawler)

npm install axios cheerio

Run Crawler

node crawler.js

Serve Locally

npx http-server

Then visit http://localhost:8080

📚 Learning Resources

This project implements real search engine algorithms:

BM25: Okapi BM25 - industry standard relevance ranking
PageRank: Google's original link-based ranking algorithm
Web Crawling: DOM parsing and graph building

Great for learning:

How search engines work
Information retrieval concepts
Web crawling techniques
Graph algorithms

🔗 Links

📄 License

MIT -- feel free to use, clone, pull and modify!

🎯 Future Enhancements

Disclamer

The README.md was made by Ai due to time limatations

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
node_modules		node_modules
script		script
struct		struct
style		style
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
crawler.js		crawler.js
edges.ndjson		edges.ndjson
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
settings.json		settings.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EVERYTHING: A Transparent, Algorithmic Web Discovery Engine

🎯 Strategic Objectives

🚀 Technical Features

For DEVS:

🔍 How It Works

1. BM25 Relevance Scoring

2. PageRank Algorithm

3. Hybrid Ranking

🕷️ Web Crawler

💻 Technology Stack

🎮 Usage

Search

Random Discovery

Available Pages

📊 Performance

🔧 Configuration

📝 API Reference

Search Function

Result Object

Data Loading

🚀 Development

Install Dependencies (for crawler)

Run Crawler

Serve Locally

📚 Learning Resources

🔗 Links

📄 License

🎯 Future Enhancements

Disclamer

About

Uh oh!

Releases

Packages

Languages

License

JustaNormalComputer-Guy/JustaNormalComputer-Guy.github.io

Folders and files

Latest commit

History

Repository files navigation

EVERYTHING: A Transparent, Algorithmic Web Discovery Engine

🎯 Strategic Objectives

🚀 Technical Features

For DEVS:

🔍 How It Works

1. BM25 Relevance Scoring

2. PageRank Algorithm

3. Hybrid Ranking

🕷️ Web Crawler

💻 Technology Stack

🎮 Usage

Search

Random Discovery

Available Pages

📊 Performance

🔧 Configuration

📝 API Reference

Search Function

Result Object

Data Loading

🚀 Development

Install Dependencies (for crawler)

Run Crawler

Serve Locally

📚 Learning Resources

🔗 Links

📄 License

🎯 Future Enhancements

Disclamer

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages