Skip to content

Notookk/Fraudrix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fraudrix Animated Banner

Python Scikit Learn FastAPI Status Maintained

Stars Forks Issues Last Commit

Fraudrix

Fraudrix is an intelligent message-risk classifier built for practical fraud defense across SMS, email, and URL-driven social engineering content. The system combines contextual text preprocessing, high-signal feature extraction, and a tuned classification pipeline to separate safe, spam, and scam traffic with high reliability.

Core Design

Fraudrix follows a data-first architecture:

  1. Raw datasets are normalized into a unified format with two columns: message and label.
  2. URL-based phishing records are transformed into contextual text so the model learns intent, not only links.
  3. Curated scam phrases are injected to improve linguistic diversity for fraud patterns.
  4. Balanced sampling stabilizes decision boundaries across safe, spam, and scam classes.
  5. Training uses n-gram TF-IDF plus Logistic Regression for fast and robust inference.
  6. Runtime prediction adds lightweight explainability signals for trust and auditability.

Inference Graph

flowchart LR
      A[Incoming Message] --> B[Text Normalization]
      B --> C[Reason Signal Extraction]
      C --> D[TF-IDF Vectorization]
      D --> E[Logistic Regression Classifier]
      E --> F[Predicted Class]
      C --> G[Reason Summary]
      F --> H[Final Output]
      G --> H
Loading

Label Composition Graph

pie title Balanced Training Set Composition
      "safe" : 7000
      "spam" : 3000
      "scam" : 3000
Loading

Model Details

Data Engineering

  • Unified schema: message, label
  • URL phishing enrichment for stronger contextual understanding
  • Supplemental curated scam corpus for linguistic coverage
  • Deduplication and null filtering before training

Feature Logic

  • Cleaner preserves lexical meaning while stripping noise
  • TF-IDF with ngram_range=(1,2)
  • Maximum features tuned for practical latency and stable quality

Classifier

  • Algorithm: Logistic Regression
  • Multiclass target: safe, spam, scam
  • Trained on balanced data for stronger minority-class recall

Explainability Layer

At inference time, Fraudrix reports why a message looks suspicious through compact reason tags such as:

  • sensitive financial keywords
  • promotional or spam keywords
  • contains link

This keeps outputs auditable without adding heavyweight model overhead.

Performance Snapshot

Latest benchmark from the current training pipeline:

Metric Value
Overall Accuracy 99.19%
Safe Precision 0.99
Safe Recall 0.99
Spam Precision 0.98
Spam Recall 0.99
Scam Precision 1.00
Scam Recall 0.99

Training and Runtime

Install dependencies:

pip install -r requirements.txt

Prepare and train:

python data/raw/fixurl.py
python src/marge_all.py
python train.py

Run interactive classifier:

python main.py

Run API service:

uvicorn api.app:app --reload

Ownership and Credit

Designed and maintained by Notookk.

Vision

Fraudrix targets production-grade fraud intelligence where precision, recall, speed, and explainability must coexist in one practical pipeline.

About

An AI-powered scam and spam detection system that classifies messages as safe, spam, or scam using machine learning and intelligent pattern analysis. Built on a multi-source dataset, it provides accurate, real-time detection of phishing attempts, fraudulent links, and promotional spam.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors