Fraudrix

Fraudrix is an intelligent message-risk classifier built for practical fraud defense across SMS, email, and URL-driven social engineering content. The system combines contextual text preprocessing, high-signal feature extraction, and a tuned classification pipeline to separate safe, spam, and scam traffic with high reliability.

Core Design

Fraudrix follows a data-first architecture:

Raw datasets are normalized into a unified format with two columns: message and label.
URL-based phishing records are transformed into contextual text so the model learns intent, not only links.
Curated scam phrases are injected to improve linguistic diversity for fraud patterns.
Balanced sampling stabilizes decision boundaries across safe, spam, and scam classes.
Training uses n-gram TF-IDF plus Logistic Regression for fast and robust inference.
Runtime prediction adds lightweight explainability signals for trust and auditability.

Inference Graph

flowchart LR
      A[Incoming Message] --> B[Text Normalization]
      B --> C[Reason Signal Extraction]
      C --> D[TF-IDF Vectorization]
      D --> E[Logistic Regression Classifier]
      E --> F[Predicted Class]
      C --> G[Reason Summary]
      F --> H[Final Output]
      G --> H

Label Composition Graph

pie title Balanced Training Set Composition
      "safe" : 7000
      "spam" : 3000
      "scam" : 3000

Model Details

Data Engineering

Unified schema: message, label
URL phishing enrichment for stronger contextual understanding
Supplemental curated scam corpus for linguistic coverage
Deduplication and null filtering before training

Feature Logic

Cleaner preserves lexical meaning while stripping noise
TF-IDF with ngram_range=(1,2)
Maximum features tuned for practical latency and stable quality

Classifier

Algorithm: Logistic Regression
Multiclass target: safe, spam, scam
Trained on balanced data for stronger minority-class recall

Explainability Layer

At inference time, Fraudrix reports why a message looks suspicious through compact reason tags such as:

sensitive financial keywords
promotional or spam keywords
contains link

This keeps outputs auditable without adding heavyweight model overhead.

Performance Snapshot

Latest benchmark from the current training pipeline:

Metric	Value
Overall Accuracy	99.19%
Safe Precision	0.99
Safe Recall	0.99
Spam Precision	0.98
Spam Recall	0.99
Scam Precision	1.00
Scam Recall	0.99

Training and Runtime

Install dependencies:

pip install -r requirements.txt

Prepare and train:

python data/raw/fixurl.py
python src/marge_all.py
python train.py

Run interactive classifier:

python main.py

Run API service:

uvicorn api.app:app --reload

Ownership and Credit

Designed and maintained by Notookk.

Vision

Fraudrix targets production-grade fraud intelligence where precision, recall, speed, and explainability must coexist in one practical pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
api		api
assets		assets
data/processed		data/processed
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
train.py		train.py
web_app.py		web_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraudrix

Core Design

Inference Graph

Label Composition Graph

Model Details

Data Engineering

Feature Logic

Classifier

Explainability Layer

Performance Snapshot

Training and Runtime

Ownership and Credit

Vision

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fraudrix

Core Design

Inference Graph

Label Composition Graph

Model Details

Data Engineering

Feature Logic

Classifier

Explainability Layer

Performance Snapshot

Training and Runtime

Ownership and Credit

Vision

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages