Fraudrix is an intelligent message-risk classifier built for practical fraud defense across SMS, email, and URL-driven social engineering content. The system combines contextual text preprocessing, high-signal feature extraction, and a tuned classification pipeline to separate safe, spam, and scam traffic with high reliability.
Fraudrix follows a data-first architecture:
- Raw datasets are normalized into a unified format with two columns: message and label.
- URL-based phishing records are transformed into contextual text so the model learns intent, not only links.
- Curated scam phrases are injected to improve linguistic diversity for fraud patterns.
- Balanced sampling stabilizes decision boundaries across safe, spam, and scam classes.
- Training uses n-gram TF-IDF plus Logistic Regression for fast and robust inference.
- Runtime prediction adds lightweight explainability signals for trust and auditability.
flowchart LR
A[Incoming Message] --> B[Text Normalization]
B --> C[Reason Signal Extraction]
C --> D[TF-IDF Vectorization]
D --> E[Logistic Regression Classifier]
E --> F[Predicted Class]
C --> G[Reason Summary]
F --> H[Final Output]
G --> H
pie title Balanced Training Set Composition
"safe" : 7000
"spam" : 3000
"scam" : 3000
- Unified schema: message, label
- URL phishing enrichment for stronger contextual understanding
- Supplemental curated scam corpus for linguistic coverage
- Deduplication and null filtering before training
- Cleaner preserves lexical meaning while stripping noise
- TF-IDF with ngram_range=(1,2)
- Maximum features tuned for practical latency and stable quality
- Algorithm: Logistic Regression
- Multiclass target: safe, spam, scam
- Trained on balanced data for stronger minority-class recall
At inference time, Fraudrix reports why a message looks suspicious through compact reason tags such as:
- sensitive financial keywords
- promotional or spam keywords
- contains link
This keeps outputs auditable without adding heavyweight model overhead.
Latest benchmark from the current training pipeline:
| Metric | Value |
|---|---|
| Overall Accuracy | 99.19% |
| Safe Precision | 0.99 |
| Safe Recall | 0.99 |
| Spam Precision | 0.98 |
| Spam Recall | 0.99 |
| Scam Precision | 1.00 |
| Scam Recall | 0.99 |
Install dependencies:
pip install -r requirements.txtPrepare and train:
python data/raw/fixurl.py
python src/marge_all.py
python train.pyRun interactive classifier:
python main.pyRun API service:
uvicorn api.app:app --reloadDesigned and maintained by Notookk.
Fraudrix targets production-grade fraud intelligence where precision, recall, speed, and explainability must coexist in one practical pipeline.