Real-time phishing detection using ML trained on 651k real URLs. Runs 100% locally, blocks malicious sites before they load.
Stats:
- ✅ 86.84% ML accuracy (RandomForest on Kaggle dataset)
- ✅ <2s analysis time
- ✅ 10k phishing URL blacklist
- ✅ Zero external API calls (privacy-first)
📖 Full Technical Documentation — Architecture, API, troubleshooting
pip install -r requirements.txtDownload the dataset from Kaggle:
🔗 Dataset: https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset
Option 1: Download via Browser
- Visit the Kaggle dataset link above
- Click Download (you may need to create a free Kaggle account)
- Extract the downloaded zip file
- Copy
malicious_phish.csvtobackend/data/malicious_phish.csv
Option 2: Download via Kaggle CLI
- Get API key: https://www.kaggle.com/settings → API → Create Token
- Place
kaggle.jsonat:C:\Users\<You>\.kaggle\kaggle.json(Windows) or~/.kaggle/(Mac/Linux) - Run:
kaggle datasets download -d sid321axn/malicious-urls-dataset -p backend/data --unzip
Verify setup:
# Should exist: backend/data/malicious_phish.csv (651k URLs)python generate_icons.py
cd backend
python extract_phishing_urls.py
python train_model.pyCreates phishing URL blacklist (10k URLs) and trains model (~60 seconds, 86.84% accuracy)
python main.pyKeep this running. Server: http://127.0.0.1:8000
- Go to
chrome://extensions - Enable Developer mode
- Click Load unpacked → Select
extension/folder
- Safe: https://github.com, https://microsoft.com, https://google.com
- Dangerous: https://meganmacylesolutions.com/secure/login.onlinebanking.suntrust.com/online.htm
- Suspicious: http://account-verify-secure.example.com/login, http://paypal-support-center.tk/verify
URL → Blacklist (10k phishing URLs)
→ Domain Age (WHOIS)
→ ML Model (14 features, RandomForest)
→ Heuristics (IP usage, keywords)
→ Score: 0-100
0-49: Safe ✅
50-69: Suspicious 🟡 (warning banner)
70-100: Dangerous 🔴 (blocked)
Detection speed: Cache hit <10ms | New URL ~1.2s avg
backend/
├── main.py FastAPI server
├── train_model.py ML trainer (Kaggle dataset)
├── scorer.py Multi-signal scoring
├── feature_extractor.py 14 URL features
├── domain_checker.py WHOIS lookup
├── reputation.py Blacklist manager
├── phishing_urls.csv 10k known phishing URLs
└── data/malicious_phish.csv 651k URLs from Kaggle
extension/
├── background.js Navigation interceptor
├── content.js UI injection
├── popup.html/js Extension popup
├── warning.html/js Block page
└── manifest.json Chrome MV3 config
Backend: Python, FastAPI, scikit-learn, pandas
Frontend: Chrome Extension (Manifest V3), Vanilla JS
Dataset: Kaggle Malicious URLs (651k URLs, CC0 license)
ML Model: RandomForest (200 trees, 86.84% accuracy)
| Issue | Fix |
|---|---|
| Backend offline | Run python backend/main.py |
model.pkl not found |
Run python backend/train_model.py |
| Dataset missing | Download from Kaggle (see step 2) |
phishing_urls.csv not found |
Run python backend/extract_phishing_urls.py |
| Extension not working | Check chrome://extensions for errors |
See DOCUMENTATION.md for API reference, advanced config, and detailed guides.