Deadline: April 18, 2026 OS Note: Validated on native Ubuntu 22.04 (dual boot, not WSL)
Steganalysis classifiers that detect hidden messages concealed within ordinary text — using traditional ML, Bi-LSTM, and RoBERTa — without access to the encoding method or secret key.
This project implements three linguistic steganography encoding methods and three families of detectors to perform steganalysis.
| Method | Description |
|---|---|
| Synonym Substitution | Embeds bits by swapping words with WordNet synonyms ranked by Brown corpus frequency |
| LM-Guided Substitution | Uses GPT-2 top-k token predictions to select semantically consistent word swaps |
| Geometric (Qwen) | Encodes bits via hyperplane thresholding in Qwen 2.5-1.5B hidden states using LoRA fine-tuning |
| Detector | Description |
|---|---|
| Traditional ML | SGDClassifier and SVM (RBF) on TF-IDF + n-gram + statistical features (~3030 features total) |
| Bi-LSTM | Bidirectional LSTM sequence classifier with GloVe/Word2Vec embeddings, hosted on HuggingFace |
| RoBERTa / BERT | Fine-tuned transformer classifiers (RoBERTa-base, BERT-base, DistilBERT), hosted on HuggingFace |
| Model | Method | Accuracy | F1 |
|---|---|---|---|
| SGDClassifier (8 epochs) | Synonym | 72.5% | 0.776 |
| SGDClassifier (8 epochs) | LM | 85.0% | 0.864 |
| SVM (RBF) | Synonym | 77.5% | 0.816 |
| SVM (RBF) | LM | 80.0% | 0.818 |
| RoBERTa | Synonym | 99.15% | — |
| RoBERTa | LM | 99.49% | — |
| RoBERTa | Geometric | 95.90% | — |
EE6405-FinalProject/
├── app.py # Gradio web UI (encode / metrics / detect tabs)
├── pyproject.toml # Project metadata and dependencies
├── requirements.txt # pip-installable packages
│
├── dataset/
│ ├── dataset.py # StegoDataset + StegoDataModule (PyTorch Lightning)
│ ├── synonym_substitution/
│ │ ├── encoder.py # WordNet-based encoder / decoder
│ │ ├── generate_dataset.py # Generate 25K-row training CSV from WikiText-103
│ │ ├── create_test_samples.py # Generate 40-sample test set from AG News
│ │ └── data/ # dataset.csv, test_dataset.csv
│ └── lm_substitution/
│ ├── encoder.py # GPT-2-based encoder / decoder
│ ├── generate_dataset.py # Generate 25K-row training CSV
│ ├── create_test_samples.py # Generate 40-sample test set
│ └── data/ # dataset1.csv, test_dataset1.csv
│
├── models/
│ ├── mechanistic_probes.py # LinearProbe + NonLinearProbe for geometric method
│ ├── lstm/
│ │ └── lstm.py # StegoLSTM (Bi-LSTM, PyTorch Lightning)
│ ├── bert/
│ │ ├── run_all_bert.py # Batch RoBERTa fine-tuning on all 3 datasets
│ │ ├── train_roberta.py # RoBERTa fine-tuning (single dataset)
│ │ ├── train_distilbert.py # DistilBERT fine-tuning variant
│ │ ├── train_bert_base.py # BERT-base fine-tuning variant
│ │ ├── train_hybrid.py # Experimental GPT-2 + AutoModel hybrid
│ │ ├── cross_test.py # Cross-dataset RoBERTa evaluation
│ │ ├── attention_viz.py # BERTViz attention head visualisation
│ │ ├── shap_viz.py # SHAP token-level explainability
│ │ └── validate.py # Model validation script
│ └── traditional/
│ ├── main.py # CLI entry point: train SGD/SVM, evaluate, predict
│ ├── train.py # SGDClassifier (8 epochs) + SVM (RBF) training
│ ├── features.py # Feature engineering (TF-IDF, n-grams, lexical stats)
│ ├── evaluate.py # Metrics computation (Acc, F1, P, R, confusion matrix)
│ ├── predict.py # Single-sentence inference
│ ├── data_loader.py # CSV loading + stratified train-test split
│ ├── cross_dataset_eval.py # Cross-method generalisation evaluation
│ ├── save_all_models.py # Train + serialise all 3 model bundles as .pkl
│ ├── results_logger.py # CSV-based experiment logging
│ ├── wandb_logger.py # Optional Weights & Biases integration
│ └── saved_models/
│ ├── trad_syn.pkl # SGDClassifier + TF-IDF + scaler (Synonym)
│ ├── trad_lm.pkl # SGDClassifier + TF-IDF + scaler (LM)
│ └── trad_qwen.pkl # SGDClassifier + TF-IDF + scaler (Geometric)
│
├── trainer/
│ ├── train.py # Bi-LSTM trainer (PyTorch Lightning + WandB)
│ ├── train_all_bilstm.py # Train all 3 Bi-LSTM variants in sequence
│ ├── test_all_bilstm.py # Test all 3 Bi-LSTM models, report metrics
│ ├── cross_evaluate_all_bilstm.py # Cross-dataset Bi-LSTM evaluation
│ ├── inference.py # Bi-LSTM inference and evaluation
│ ├── train_stego_qwen.py # Qwen 2.5-1.5B LoRA fine-tuning (SFTTrainer)
│ ├── train_geometric_stego_detector.py # RoBERTa fine-tuning for geometric detection
│ └── extract_hidden_states.py # Train LinearProbe/NonLinearProbe on Qwen activations
│
└── latent_space_dataset/
└── generate_stego_dataset.py # Generate geometric stego text via Qwen hyperplane masking
- OS: Ubuntu 22.04 (native installation, not WSL)
- GPU: NVIDIA GPU (tested on RTX 4060)
- RAM: 8 GB recommended
- Storage: ~50 GB free disk space
- Python: 3.10
ubuntu-drivers devices
sudo apt install nvidia-driver-590-open
sudo reboot
nvidia-smiAlso install the CUDA Toolkit appropriate for your driver version.
curl -Ls https://astral.sh/uv/install.sh | sh
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrcgit clone --recursive https://<username>:<PAT>@github.com/ma7hu3/EE6405-FinalProject.git
cd EE6405-FinalProject/uv sync
uv pip install -e .export WANDB_API_KEY="your_key" # Weights & Biases experiment trackinguv run python app.pyOpens at http://localhost:7860 with three tabs:
- Encoder Playground — encode a 2-character ASCII secret into any paragraph using synonym substitution or LM-guided substitution
- Model Performance — pre-computed metrics table across all models and encoding methods
- Live Detection — classify any paragraph simultaneously with Traditional ML, Bi-LSTM, and RoBERTa
# Interactive: train SGD/SVM, evaluate, manual prediction CLI
uv run python models/traditional/main.py
# Batch: train and serialise all 3 .pkl bundles (syn, lm, geometric)
uv run python models/traditional/save_all_models.py# Single variant
uv run python -m trainer.train --embed glove
uv run python -m trainer.train --embed word2vec
# All 3 variants (synonym, lm, geometric) in sequence
uv run python trainer/train_all_bilstm.py
# Test all trained Bi-LSTM models
uv run python trainer/test_all_bilstm.py
# Cross-dataset evaluation
uv run python trainer/cross_evaluate_all_bilstm.py# Fine-tune RoBERTa on a single dataset
uv run python models/bert/train_roberta.py
# Fine-tune BERT-base
uv run python models/bert/train_bert_base.py
# Fine-tune DistilBERT
uv run python models/bert/train_distilbert.py
# Batch: fine-tune RoBERTa on all 3 datasets
uv run python models/bert/run_all_bert.py
# Cross-dataset RoBERTa evaluation
uv run python models/bert/cross_test.py
# Fine-tune RoBERTa specifically for geometric method
uv run python trainer/train_geometric_stego_detector.py# Fine-tune Qwen 2.5-1.5B with LoRA
uv run python trainer/train_stego_qwen.py
# Generate geometric stego dataset
uv run python latent_space_dataset/generate_stego_dataset.py
# Train LinearProbe / NonLinearProbe on Qwen hidden states
uv run python trainer/extract_hidden_states.py# Synonym substitution (25K rows from WikiText-103)
uv run python dataset/synonym_substitution/generate_dataset.py
uv run python dataset/synonym_substitution/create_test_samples.py
# LM-guided substitution (25K rows from WikiText-103)
uv run python dataset/lm_substitution/generate_dataset.py
uv run python dataset/lm_substitution/create_test_samples.py# BERTViz attention head visualisation
uv run python models/bert/attention_viz.py
# SHAP token-level attribution
uv run python models/bert/shap_viz.pyTraining data is sourced from WikiText-103 and AG News via HuggingFace datasets, downloaded automatically on first run. All CSV files share a common schema:
| Column | Type | Description |
|---|---|---|
paragraph |
str | Text sample (clean or stego) |
label |
int | 0 = clean, 1 = stego |
hidden_message |
str | Embedded secret (empty for clean samples) |
encoding_method |
str | synonym_substitution or lm_substitution |
synonym_map |
str (JSON) | Serialised encoding key mapping |
Pre-generated dataset files:
| File | Description |
|---|---|
dataset/synonym_substitution/data/dataset.csv |
Synonym training set (25K rows) |
dataset/lm_substitution/data/dataset1.csv |
LM-guided training set (25K rows) |
dataset/synonym_substitution/data/test_dataset.csv |
Synonym test set (40 rows) |
dataset/lm_substitution/data/test_dataset1.csv |
LM test set (40 rows) |
dataset/qwen_finetuned_test_dataset.csv |
Geometric method test set |
Bi-LSTM (loaded in app.py on demand):
jovima/bilstm-stego-synonymjovima/bilstm-stego-lmjovima/bilstm-stego-geometric
RoBERTa (loaded in app.py on demand):
narensnp/roberta-stego-synonymnarensnp/roberta-stego-lmnarensnp/roberta-stego-geometric
All models are loaded lazily and cached in memory after the first inference call.
| Category | Libraries |
|---|---|
| Deep Learning | PyTorch ≥2.5.1, PyTorch Lightning ≥2.4.0, TorchMetrics ≥1.0.0 |
| Transformers | HuggingFace Transformers ==4.43.0, PEFT ≥0.18.1 (LoRA), TRL ==0.9.6 (SFTTrainer), Accelerate ≥1.1.0, Evaluate ≥0.4.6 |
| Datasets | HuggingFace Datasets ≥4.8.4 (WikiText-103, AG News) |
| Traditional ML | scikit-learn ≥1.3.0 (SGD, SVM, TF-IDF, metrics), SciPy ≥1.11 |
| NLP | NLTK ≥3.7 (WordNet, Brown corpus, POS), spaCy ≥3.7, Gensim ≥4.3.2 (GloVe/Word2Vec), lemminflect ≥0.2 |
| Explainability | SHAP ≥0.43, BERTViz ≥1.4.0 |
| Visualisation | Matplotlib ≥3.5.0, Seaborn ≥0.11.0 |
| Web UI | Gradio ≥4.0.0 |
| Experiment Tracking | Weights & Biases ≥0.16.0 (optional) |
| Serialisation | joblib ≥1.1.0 |
| Utilities | tqdm ≥4.66.0, python-dotenv ≥1.0.0 |