A production-grade, ML-powered NIDS with real-time packet capture, SHAP explainability, and a live threat intelligence dashboard.
The Sentinel is a complete, end-to-end Network Intrusion Detection System that:
- Captures live network packets using Scapy and assembles them into bidirectional flows
- Extracts 52 CICIDS2017-compatible features per flow (IATs, packet lengths, TCP flags, etc.)
- Classifies flows using an ML ensemble (9 models trained, best saved automatically)
- Explains every prediction using SHAP values β top 5 most influential features per alert
- Streams alerts in real time to a React dashboard via WebSocket
- Simulates attack traffic for demo and testing without a real adversary
Built for hackathons, research, and production prototyping. No black box β every prediction is explainable.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LIVE NETWORK β
β (Wi-Fi / Ethernet traffic) β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β Raw packets
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β NetworkSniffer (Scapy) β
β β’ Captures IP/TCP/UDP/ICMP packets on auto-detected interface β
β β’ Groups packets into flows via 5-tuple key β
β (src_ip, dst_ip, src_port, dst_port, protocol) β
β β’ Closes flows on TCP FIN/RST or after 30s timeout β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β Flow packet dicts
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FlowExtractor (CICIDS2017) β
β β’ Computes 52 exact CICIDS2017 features per flow β
β β’ Statistical: packet length mean/std/min/max β
β β’ Temporal: IATs, flow duration, active/idle periods β
β β’ Protocol: TCP flags (FIN/PSH/ACK), window sizes β
β β’ Rates: Flow Bytes/s, Flow Packets/s, Fwd/Bwd Packets/s β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β Feature dict (52 keys)
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FastAPI Backend (port 8000) β
β β
β POST /api/predict βββΊ StandardScaler βββΊ ML Model β
β ββββΊ SHAP TreeExplainer β
β ββββΊ SQLite / PostgreSQL DB β
β β
β GET /api/stats β Total flows, attack counts by type β
β GET /api/alerts β Paginated alert history β
β GET /api/ip-leaderboard β Top attacker IPs β
β WS /ws/live β Real-time alert stream β
β POST /api/sniffer/start β Start packet capture β
β POST /api/sniffer/stop β Stop packet capture β
β GET /health β System health (DB, model, sniffer) β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β WebSocket / REST
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β React Dashboard (port 5173) β
β β
β β’ KPI Cards β total flows, attacks, detection rate, uptime β
β β’ Live Traffic Chart β alerts per minute over time β
β β’ Attack Pie Chart β distribution by attack type β
β β’ Alert Feed β scrollable list of real-time alerts β
β β’ IP Leaderboard β top source IPs by attack count β
β β’ Attack Timeline β temporal heat map of attack events β
β β’ SHAP Explainer β per-alert feature importance bars β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The model is trained on the CICIDS2017 dataset β a widely used benchmark containing normal traffic and 14 attack categories including DDoS, DoS, PortScan, Brute Force, Bot, and Web Attacks.
| Step | Description |
|---|---|
| A. Load | Reads all CSVs from data/raw/ (up to 400,000 rows via load_data(RAW_DIR, n_samples=400_000)) |
| B. Clean | Removes NaN, Inf, and duplicate rows |
| C. Feature Engineering | Adds 7 domain-specific ratio features (flow bytes/packet, fwd/bwd ratios, IAT jitter, etc.) |
| D. Split | Separates features and label column (Attack Type) |
| E. Encode | LabelEncoder β numeric class indices, saved as label_encoder.pkl |
| F. Stratified Split | 80/20 train/test with stratify=y |
| G. Class Balancing | SMOTE disabled (USE_SMOTE = False); class imbalance handled natively via class_weight="balanced" on supported classifiers |
| H. Dual Scalers | Fits StandardScaler and RobustScaler; best scaler saved as scaler.pkl |
| I. PCA Analysis | Experimental comparison at 90/95/99% variance (not used in production) |
| J. Train 9 Models | See table below |
| K. Cross-Validation | 5-fold stratified CV on top 2 models, run on the full standard-scaled training dataset (X_tr_std) |
| L. Compare | Sorted leaderboard by Macro F1 |
| M. Save Best | Best model saved as model.pkl |
| # | Model | Search Strategy | Notes |
|---|---|---|---|
| 1 | Logistic Regression | Fixed params | Baseline; class_weight="balanced" |
| 2 | Decision Tree | GridSearchCV | max_depth, criterion; class_weight="balanced" (both Standard & Robust scaler instances) |
| 3 | Random Forest | RandomizedSearchCV | n_estimators, max_features; class_weight="balanced" (both Standard & Robust scaler instances) |
| 4 | XGBoost | RandomizedSearchCV | learning_rate, subsample |
| 5 | LightGBM | Fixed params | Standard dependency (no longer optional fallback) |
| 6 | SVM (RBF kernel) | Fixed params | Deterministic first-15k slice (X_tr_std[:15000]); no CV; class_weight="balanced" |
| 7 | Neural Network (MLP) | Fixed params | 256β128β64 ReLU, Adam, early stopping |
| 8 | Voting Ensemble | Soft voting | RF + XGB + LGBM/MLP |
| 9 | Stacking Ensemble | LR meta-learner | RF + XGB + LGBM/MLP β LR |
Both StandardScaler and RobustScaler are compared for tree models. RobustScaler handles DDoS-induced outliers (e.g. 10βΆ pkt/s) better because it uses median and IQR instead of mean/std.
Input (52 features)
β
Dense(256, relu)
β
Dense(128, relu)
β
Dense(64, relu)
β
Dense(n_classes, softmax)
Optimizer: Adam (lr=1e-3, adaptive)
L2 Reg: Ξ± = 1e-4
Batch: 512
Early Stop: 15 no-improve epochs on 10% validation split
A TreeExplainer is cached once at server startup. For every non-benign prediction, the top 5 features by absolute SHAP value are returned alongside the prediction. The React SHAPExplainer component visualizes these as a horizontal bar chart per alert.
The FlowExtractor (src/features/extractor.py) converts raw Scapy packet dicts into the exact 52-feature CICIDS2017 vector expected by the model.
| Category | Features |
|---|---|
| Basic | Destination Port, Flow Duration, Total Fwd/Bwd Packets |
| Packet Length | Fwd/Bwd Packet Length (Max, Min, Mean, Std), Min/Max/Mean/Std/Variance overall |
| Flow Rates | Flow Bytes/s, Flow Packets/s, Fwd Packets/s, Bwd Packets/s |
| IAT (Inter-Arrival Time) | Flow IAT (Mean/Std/Max/Min), Fwd IAT (Total/Mean/Std/Max/Min), Bwd IAT (same) |
| Headers | Fwd Header Length, Bwd Header Length |
| TCP Flags | FIN Flag Count, PSH Flag Count, ACK Flag Count |
| Window / Segment | Init_Win_bytes_forward, Init_Win_bytes_backward, min_seg_size_forward |
| Subflow | Subflow Fwd Bytes, act_data_pkt_fwd, Average Packet Size |
| Active / Idle | Active Mean/Max/Min, Idle Mean/Max/Min |
IATs are computed in microseconds (matching CICIDS2017 scale). Active/Idle periods are classified by a 5-second inter-packet gap threshold.
1. NetworkSniffer captures packet on interface 'eth0' / 'Wi-Fi'
β
2. Packet parsed β 12-field dict (src_ip, dst_ip, ports, protocol,
size, payload_len, header_len, time, tcp_flags, window_size, ttl)
β
3. Packet added to flow bucket (keyed by 5-tuple)
Flow closes on: TCP FIN/RST | 30s timeout | 500-packet cap
β
4. FlowExtractor.extract_from_dicts() β {52 CICIDS features}
β
5. POST http://localhost:8000/api/predict (non-blocking thread)
β
6. Backend:
a. Strips metadata (_source_ip, _dst_port, ...)
b. Scales 52 features with StandardScaler
c. model.predict() β class index
d. model.predict_proba() β confidence score
e. LabelEncoder.inverse_transform() β human-readable label
f. SHAP TreeExplainer β top-5 feature importances
g. Severity mapping (CRITICAL/HIGH/MEDIUM/LOW/NONE)
h. Saves Alert to database
i. If not BENIGN β broadcast via WebSocket
β
7. Dashboard WebSocket client receives alert JSON
β Updates KPICards, AlertFeed, TrafficChart, PieChart in real-time
Built with React 18 + TypeScript + Vite + Tailwind CSS + shadcn/ui + Recharts.
| Component | Description |
|---|---|
StatusBar |
WS connection indicator, live clock, system status |
KPICards |
Total flows, attacks detected, detection rate, uptime |
TrafficChart |
Recharts LineChart β attacks per minute, last 30 points |
AttackPieChart |
Recharts PieChart β attack type distribution |
AlertFeed |
Real-time scrollable alert list with severity color coding |
IPLeaderboard |
Top 10 most aggressive source IPs |
AttackTimeline |
Temporal bar chart of attack events over time |
SHAPExplainer |
Per-alert SHAP feature importance bar chart |
Sidebar |
Navigation rail with system overview |
- Connects to
ws://localhost:8000/ws/live - On connect: receives batch of last 50 alerts for history seeding
- Auto-reconnects on disconnect (3s delay)
- Normalizes both new (
value) and legacy (impact) SHAP field names
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/predict |
Submit a 52-feature flow dict for classification |
GET |
/api/alerts |
Paginated alert history |
GET |
/api/stats |
Total flows, attacks by type/severity, uptime |
GET |
/api/ip-leaderboard |
Top N attacker IPs |
POST |
/api/sniffer/start |
Start live packet capture |
POST |
/api/sniffer/stop |
Stop live packet capture |
GET |
/api/sniffer/stats |
Sniffer stats (packets, flows, alerts) |
GET |
/health |
System health (DB, model, sniffer, WS clients) |
WS |
/ws/live |
Real-time alert stream (WebSocket) |
Interactive API docs: http://localhost:8000/docs
| Tool | Version | Purpose |
|---|---|---|
| Python | 3.10+ | Backend & ML |
| Node.js | 18+ | Frontend |
| Npcap | Latest | Packet capture on Windows |
| Git | Any | Clone repo |
Windows note: Install Npcap with "WinPcap API-compatible mode" enabled. Run the backend as Administrator for packet capture.
git clone https://github.com/yourusername/nids.git
cd nidscd nids-backend
# Create a virtual environment (recommended)
python -m venv venv
# Activate it
# Windows:
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate
# Install all dependencies
pip install -r requirements.txtOptional β PostgreSQL:
Create a .env file if you want to use PostgreSQL instead of the default SQLite:
DATABASE_URL=postgresql://nids:password@localhost:5432/nids_dbDownload the CICIDS2017 dataset CSVs and place them in nids-backend/data/raw/.
The dataset can be obtained from https://www.unb.ca/cic/datasets/ids-2017.html. Place CSV files directly in
data/raw/.
# From inside nids-backend/
python src/model/train.pyThis will:
- Load and clean the CSV data (up to 400,000 samples)
- Engineer 7 additional ratio features
- Handle class imbalance natively via
class_weight="balanced"(SMOTE disabled) - Train and compare 9 ML models (takes ~10β30 minutes depending on hardware)
- Save the best model, scaler, and label encoder:
nids-backend/model.pklnids-backend/scaler.pklnids-backend/label_encoder.pkl
Shortcut: If you have pre-trained artifacts, place them in
nids-backend/and skip this step.
# From inside nids-backend/
# Standard mode (no auto-start of packet capture):
uvicorn src.api.main:app --reload --port 8000
# With automatic packet capture on startup:
set NIDS_CAPTURE=1 # Windows
uvicorn src.api.main:app --port 8000Verify it's running:
curl http://localhost:8000/healthExpected response:
{
"status": "ok",
"db": "ok",
"model": "ok",
"sniffer": "stopped",
"uptime_seconds": 3.2,
"ws_clients": 0
}cd nids-frontend
npm install
npm run devOpen http://localhost:5173 in your browser.
Option A β Via the API:
curl -X POST http://localhost:8000/api/sniffer/startOption B β Environment variable (set NIDS_CAPTURE=1 before starting the server, see Step 4).
Option C β Standalone sniffer:
# From inside nids-backend/
python src/capture/sniffer.py --interface autoNo real adversary? Use the built-in simulators to generate attack traffic for demonstration.
cd nids-backend
# Simulate a DDoS UDP flood (300 packets, ~1000 pps)
python src/simulation/sim_ddos.py --target 127.0.0.1 --count 300
# Simulate a port scan
python src/simulation/sim_portscan.py --target 127.0.0.1
# Simulate a brute force attack
python src/simulation/sim_bruteforce.py --target 127.0.0.1
# Simulate a mixed attack scenario
python src/simulation/sim_mixed.pyThe sniffer will pick up the generated packets, extract features, and send them to the prediction API β alerts will appear on the dashboard in real time.
cd nids-backend
# Feature extraction sanity check
python test_pipeline.py
# API integration tests (requires server running)
python test_api.py
# ML prediction test
python check.py
# Run the full pytest suite
pytest tests/nids/
βββ nids-backend/
β βββ src/
β β βββ api/
β β β βββ main.py # FastAPI app, WebSocket, sniffer control
β β β βββ database.py # SQLAlchemy engine + session factory
β β β βββ models.py # Alert ORM model
β β β βββ schemas.py # Pydantic request/response schemas
β β β βββ routes/
β β β βββ predict.py # POST /api/predict β ML inference endpoint
β β β βββ alerts.py # GET /api/alerts β alert history
β β β βββ stats.py # GET /api/stats, /api/ip-leaderboard
β β βββ capture/
β β β βββ sniffer.py # Live packet capture, flow assembly
β β βββ features/
β β β βββ extractor.py # 52-feature CICIDS2017 extractor
β β βββ model/
β β β βββ train.py # Full ML training pipeline (9 models)
β β β βββ predict.py # Inference wrapper + SHAP
β β β βββ evaluate.py # Model evaluation utilities
β β βββ simulation/
β β βββ sim_ddos.py # UDP DDoS flood simulator
β β βββ sim_portscan.py # Port scan simulator
β β βββ sim_bruteforce.py# Brute force simulator
β β βββ sim_mixed.py # Mixed attack scenario
β βββ notebooks/
β β βββ 01_eda.ipynb # Exploratory Data Analysis
β β βββ 02_training.ipynb # Training walkthrough
β β βββ 02_training_with_nn.ipynb # Neural Network training
β βββ model.pkl # β Trained model (generated by train.py)
β βββ scaler.pkl # β StandardScaler (generated by train.py)
β βββ label_encoder.pkl # β LabelEncoder (generated by train.py)
β βββ requirements.txt
β
βββ nids-frontend/
βββ src/
βββ pages/
β βββ Index.tsx # Main dashboard page layout
βββ components/
β βββ StatusBar.tsx # Connection & system status bar
β βββ KPICards.tsx # Key performance indicator cards
β βββ TrafficChart.tsx # Live traffic line chart
β βββ AttackPieChart.tsx # Attack type pie chart
β βββ AlertFeed.tsx # Real-time alert list
β βββ IPLeaderboard.tsx# Top attacker IP table
β βββ AttackTimeline.tsx # Temporal attack timeline
β βββ SHAPExplainer.tsx # SHAP feature importance bars
β βββ Sidebar.tsx # Navigation sidebar
βββ hooks/
βββ useWebSocket.ts # WebSocket connection + alert normalization
| Technology | Version | Role |
|---|---|---|
| Python | 3.10+ | Core language |
| FastAPI | 0.110 | REST API + WebSocket server |
| Uvicorn | 0.29 | ASGI web server |
| SQLAlchemy | 2.0 | ORM + database layer |
| scikit-learn | 1.4 | ML models, scalers, class weighting |
| XGBoost | 2.0 | Gradient-boosted tree classifier |
| LightGBM | 4.x | Fast gradient boosting |
| SHAP | 0.45 | Model explainability |
| Scapy | 2.5 | Live packet capture |
| pandas / numpy | 2.2 / 1.26 | Data processing |
| matplotlib / seaborn | β | Evaluation plots & notebook visualizations |
| requests | β | API connection testing & sniffer scripts |
| SQLite / PostgreSQL | β | Alert persistence |
| Technology | Version | Role |
|---|---|---|
| React | 18.3 | UI framework |
| TypeScript | 5.8 | Type-safe JavaScript |
| Vite | 7.3.2 | Build tool + dev server |
| Tailwind CSS | 3.4 | Utility-first styling |
| shadcn/ui | β | Accessible component primitives |
| Recharts | 2.15 | Charts (line, pie, bar) |
| TanStack Query | 5 | Server state management |
| React Router | 6 | Client-side routing |
| Lucide React | β | Icon library |
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
sqlite:///./nids.db |
Database connection string |
NIDS_CAPTURE |
0 |
Set to 1 to auto-start sniffer on startup |
| Constant | Default | Description |
|---|---|---|
FLOW_TIMEOUT_SECONDS |
30 |
Close inactive flows after N seconds |
MAX_PACKETS_PER_FLOW |
500 |
Safety cap per flow before early processing |
ACTIVE_TIMEOUT |
5.0s |
IAT threshold for active/idle classification |
| Level | Attack Types |
|---|---|
| π΄ CRITICAL | DDoS, DoS Hulk, DoS GoldenEye, DoS Slowloris, DoS SlowHTTPTest, Heartbleed |
| π HIGH | Bot, FTP-Patator, SSH-Patator, Infiltration |
| π‘ MEDIUM | PortScan, Web Attack (Brute Force, XSS, SQL Injection) |
| π’ LOW | Brute Force (generic), unknown attack types |
| βͺ NONE | BENIGN (normal traffic) |
Exact metrics will vary depending on the CSV files and sample sizes used. Run
python src/model/train.pyto reproduce.
| Model | Accuracy | Macro F1 | Notes |
|---|---|---|---|
| XGBoost | ~99%+ | ~97%+ | Usually best single model |
| Random Forest | ~99% | ~96% | Very close to XGBoost |
| Voting Ensemble | ~99%+ | ~97%+ | RF + XGB + LGBM |
| Neural Network | ~98% | ~93% | 256β128β64 MLP |
| Decision Tree | ~98% | ~90% | After GridSearchCV |
| Logistic Regression | ~95% | ~75% | Baseline |
| SVM (RBF) | ~97% | ~85% | 15k subset only |
Open 3 separate terminals. Terminals 2 & 3 must be run as Administrator.
cd nids-frontend
npm install # first time only
npm run devDashboard live at β http://localhost:5173 (keep this terminal open)
cd nids-backend
venv\Scripts\activate # first time: python -m venv venv
pip install -r requirements.txt # first time only
uvicorn src.api.main:app --reload --port 8000Model is already trained β do not re-run
train.py. (keep this terminal open)
cd nids-backend
# Start the packet sniffer
curl -X POST http://localhost:8000/api/sniffer/start
# Expected: { "running": true }
# Send attack traffic
python send_attacks.pyLive alerts will appear on the dashboard in real time.
Edit the skiprows value inside send_attacks.py to sample different attack traffic from the dataset:
| Attack Type | skiprows value |
|---|---|
| Brute Force | 500_000 |
| DoS | 100_000 |
| Port Scan | 2_000_000 |
| DDoS / Web Attack | 1_500_000 |
- Place
send_attacks.pyinside thenids-backend/folder - Terminals 2 and 3 must be run in Administrator mode (packet capture requires elevated privileges)
- Do not close any terminal while the demo is running
- The sniffer requires administrator / root privileges to capture raw packets
- On Windows, Npcap must be installed (download from https://npcap.com)
- CORS is configured for localhost dev servers; update
allow_originsfor production - The simulation scripts send real packets on your network β use
127.0.0.1as target in demos
Class Imbalance Strategy
- SMOTE has been disabled (
USE_SMOTE = False) to eliminate synthetic sample noise and excessive memory overhead during training class_weight="balanced"introduced natively onLogisticRegression,DecisionTreeClassifier(both Standard and Robust ScalerGridSearchCVinstances),RandomForestClassifier(both Standard and Robust ScalerRandomizedSearchCVinstances), andSVC
SVM Optimization
- Replaced randomized sampling index with a deterministic first-15k slice (
X_tr_std[:15000], y_tr[:15000]) for fully reproducible SVM runs - Cross-validation and Grid Search removed from SVM to keep its computational footprint minimal given its O(nΒ²) complexity
Data Loading & Validation
- Base sample size increased to 400,000 (
load_data(RAW_DIR, n_samples=400_000)) for a richer training distribution cross_validate_top()now runs 5-Fold Stratified CV on the full standard-scaled dataset (X_tr_std) instead of a random 30k subsample, producing more robust evaluation metrics
Backend (requirements.txt)
- Added
matplotlibβ used in evaluation scripts and notebooks - Added
seabornβ used for notebook visualizations - Added
lightgbmβ promoted from optional fallback to standard required dependency - Added
requestsβ used in API connection testing and sniffer scripts - Removed
imbalanced-learnβ no longer needed following SMOTE removal
Frontend (package-lock.json)
axios:1.13.6β1.15.0proxy-from-env:1.1.0β2.1.0lodash:4.17.23β1.18.1vite:7.3.1β7.3.2
This project is licensed under the MIT License β see the LICENSE file for details.
Built with β€οΈ for network security research and hackathon competition.
The Sentinel β See every packet. Understand every threat.