Skip to content

Everything is an ultra-fast, lightweight, open-source search engine designed for instant results, often returning matches in around ~0.025 seconds (depending on internet connection this may vary--with 0.25 seconds max)

License

Notifications You must be signed in to change notification settings

JustaNormalComputer-Guy/JustaNormalComputer-Guy.github.io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

24 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

EVERYTHING: A Transparent, Algorithmic Web Discovery Engine

Welcome to EVERYTHING, a sophisticated, lightweight, and entirely open-source search architecture. Unlike black-box commercial engines, EVERYTHING prioritizes transparency by utilizing foundational information retrieval algorithmsโ€”specifically BM25 relevance scoring and PageRank authority analysisโ€”to deliver precise results without the overhead of data tracking or invasive telemetry.

Experience the engine live at EVERYTHING.

๐ŸŽฏ Strategic Objectives

  • Architectural Efficiency: Engineered for high-velocity performance, ensuring a "fast and furious" user experience.
  • Privacy Preserving: A zero-data-retention policy ensures maximum security and user anonymity.
  • Academic Transparency: Fully open-sourced to empower developers and researchers exploring information retrieval.
  • Collaborative Growth: Supported by a robust and expanding community of open-source enthusiasts.

๐Ÿš€ Technical Features

  • Authentic Web Exploration: An automated crawler that traverses the live web starting from curated seed URLs.
  • Extensive Domain Indexing: Currently tracking 357+ unique domains and their complex inter-link relationships.
  • Modern Relevance Ranking: Implementation of the Okapi BM25 algorithm, the industry standard for full-text search.
  • Authority-Based Ranking: Integration of the PageRank algorithm to evaluate page importance based on link topology.
  • Synergistic Hybrid Model: A refined ranking formula that synthesizes 80% content relevance with 20% link authority.
  • Serendipitous Discovery: A "Click Me!" feature for random query generation and serendipitous web exploration.

For DEVS:

๐Ÿ” How It Works

1. BM25 Relevance Scoring

Improved TF-IDF algorithm that:

  • Tokenizes queries and removes stop words
  • Calculates document frequency and term frequency
  • Normalizes by document length to prevent bias
  • Uses saturation parameters (k1=1.5, b=0.75) for natural ranking
score = IDF * (f_q * (k1 + 1)) / (f_q + k1 * (1 - b + b * (docLen / avgLen)))

2. PageRank Algorithm

Ranks pages by link authority:

  • Models the web as a directed graph
  • Iterates 20 times to convergence
  • Uses damping factor (d=0.85) for random surfer model
  • Distributes rank through outgoing links
  • Handles sink pages (dead ends)
PR(A) = (1-d)/N + d * ฮฃ(PR(T)/C(T))

3. Hybrid Ranking

Combines both signals:

  • 80% BM25 Score (Content Relevance)
  • 20% PageRank Score (Link Authority)
  • Normalizes both to 0-1 scale for fair weighting

๐Ÿ•ท๏ธ Web Crawler

The crawler automatically discovers and indexes websites:

node crawler.js

What it does:

  • Starts from seed URLs (GitHub, Stack Overflow, Wikipedia, etc.)
  • Extracts links from each page using Cheerio
  • Crawls up to 500 pages with concurrency limit of 15
  • Extracts page metadata (title, description)
  • Builds link graph relationships
  • Saves to data/crawled-web.json

Output:

  • 357 unique domains
  • 960+ link relationships
  • Full page metadata
  • ~116KB JSON file

๐Ÿ’ป Technology Stack

  • Frontend: Vanilla JavaScript (no dependencies)
  • Crawler: Node.js + Axios + Cheerio
  • Algorithms: BM25, PageRank from scratch
  • Data: JSON-based index
  • Hosting: GitHub Pages

๐ŸŽฎ Usage

Search

  1. Type a query in the search box
  2. Click the search button (๐Ÿ”)
  3. View ranked results with:
    • Page title and link
    • Description
    • Relevant tags

Random Discovery

Click the "Click Me!" button to:

  • Generate a random search query
  • Automatically search
  • Discover new pages across the index

Available Pages

  • Help (struct/help.html) - Usage guide
  • Stats (struct/stats.html) - Search statistics
  • Updates (struct/updates.html) - Project updates

๐Ÿ“Š Performance

Index Stats:

  • Domains: 2,500
  • Total Links: 1000+
  • Index Size: ~373 KB
  • Search Time: ~15ms (with BM25 + PageRank)

Ranking Weights:

  • Content Relevance (BM25): 80%
  • Link Authority (PageRank): 20%

๐Ÿ”ง Configuration

Edit search weights in script/main.js:

// Adjust ranking formula (currently 80/20)
combinedScore: (normBM25 * 0.8) + (normPR * 0.2)

PageRank parameters:

const d = 0.85;           // Damping factor
const iterations = 20;    // Convergence iterations

BM25 parameters:

const k1 = 1.5;           // Saturation parameter
const b = 0.75;           // Length normalization

๐Ÿ“ API Reference

Search Function

search(query) โ†’ Array<Result>

Returns ranked results sorted by combined score.

Result Object

{
  domain: "github.com",
  title: "GitHub",
  description: "Where the world builds software",
  tags: ["code", "development", ...],
  content: "full text content",
  combinedScore: 0.87,
  relevance: 0.92,
  pageRankScore: 0.65
}

Data Loading

loadWebData()          // Loads crawled-web.json
loadDemoData()         // Fallback demo data
calculateBM25(query)   // BM25 scoring
calculatePageRank()    // PageRank calculation

๐Ÿš€ Development

Install Dependencies (for crawler)

npm install axios cheerio

Run Crawler

node crawler.js

Serve Locally

npx http-server

Then visit http://localhost:8080

๐Ÿ“š Learning Resources

This project implements real search engine algorithms:

  • BM25: Okapi BM25 - industry standard relevance ranking
  • PageRank: Google's original link-based ranking algorithm
  • Web Crawling: DOM parsing and graph building

Great for learning:

  • How search engines work
  • Information retrieval concepts
  • Web crawling techniques
  • Graph algorithms

๐Ÿ”— Links

๐Ÿ“„ License

MIT -- feel free to use, clone, pull and modify!

๐ŸŽฏ Future Enhancements

  • Query expansion and synonyms
  • Personalization based on history
  • Advanced filtering (date, domain, type)
  • Search analytics and trending
  • Multi-language support
  • Snippet generation
  • Real-time index updates
  • Distributed crawling

Disclamer

The README.md was made by Ai due to time limatations

About

Everything is an ultra-fast, lightweight, open-source search engine designed for instant results, often returning matches in around ~0.025 seconds (depending on internet connection this may vary--with 0.25 seconds max)

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published