Skip to content

PepiPetrov/MalwareNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MalwareNet

A high-performance, compact neural network for PE malware classification using the EMBER feature set. MalwareNet employs a Hierarchical Gated Architecture across five semantically distinct feature branches, achieving state-of-the-art adversarial robustness in a model small enough to run in near-zero latency on commodity CPU hardware.


Performance at a Glance

Metric Value
Parameters 273,452
AUC-ROC (test set) 0.9911
Expected Calibration Error (ECE) 0.0079
AUC under PGD attack (ε=0.1) ≥ 0.9910
AUC under FGSM attack (ε=0.1) ≥ 0.9910
Best LTH ticket (val AUC / sparsity) 0.9904 @ 48.18% sparse
Inference latency (CPU, single file) 0.041 ms
Throughput (CPU) ~24,400 files/sec

Architecture Summary

MalwareNet splits the 2,568-dimensional EMBER v3 feature vector into five semantically meaningful groups, each processed by an independent GatedFeatureBlock. Gated representations are fused via a FusionBlock that assigns learnable per-group importance weights before projecting to a binary logit. A Platt Scaling post-processor is baked into the exported ONNX graph so every inference call returns a calibrated probability.

flowchart TD

    A[PE File: .exe / .dll] --> B[EMBER2024 Feature Extraction]
    B --> C[2568-d EMBER Feature Vector]

    C --> G[global features<br/>156-d]
    C --> BY[byte histogram/features<br/>512-d]
    C --> S[strings features<br/>177-d]
    C --> SE[sections features<br/>224-d]
    C --> I[imports features<br/>1411-d]

    G --> G1[GatedFeatureBlock<br/>16-d]
    BY --> B1[GatedFeatureBlock<br/>32-d]
    S --> S1[GatedFeatureBlock<br/>32-d]
    SE --> SE1[GatedFeatureBlock<br/>32-d]
    I --> I1[GatedFeatureBlock<br/>64-d]

    G1 --> F[FusionBlock<br/>learnable cross-group weighting]
    B1 --> F
    S1 --> F
    SE1 --> F
    I1 --> F

    F --> L[Linear Projection<br/>96 → 1]
    L --> P[PlattScaler + Sigmoid]
    P --> O[Calibrated Malware Probability<br/>0.0 = benign / 1.0 = malware]

    O --> R[Rust Desktop App<br/>0.041ms CPU inference<br/>no Python runtime]

Loading

See ARCHITECTURE.md for a full technical description.


Quick Start — ONNX Inference

The exported ONNX model (model_artifacts/malware_model.onnx) already includes sigmoid activation and Platt scaling. It accepts a raw float32 feature vector of dimension 2,568 and returns a calibrated malware probability.

import numpy as np
import onnxruntime as ort

session = ort.InferenceSession(
    "model_artifacts/malware_model.onnx",
    providers=["CPUExecutionProvider"],
)

# features: np.ndarray of shape (1, 2568), dtype=float32
features = np.random.randn(1, 2568).astype(np.float32)

probability = session.run(
    ["probability"],
    {"features": features},
)[0]          # shape (1, 1)

print(f"Malware probability: {probability[0, 0]:.4f}")

Desktop App

A native egui desktop application lives in desktop_app/. It embeds the model binary at compile time, extracts EMBER features directly from a user-selected PE file, and displays a calibrated risk score with a visual progress bar. No Python or external runtime required at run time.

Prerequisites

  • Rust toolchain (stable): https://rustup.rs
  • The trained ONNX model must be present at model_artifacts/malware_model.onnx before building — it is compiled into the binary via include_bytes!.

Build

cd desktop_app
cargo build --release

The binary is written to desktop_app/target/release/desktop_app (Linux/macOS) or desktop_app\target\release\desktop_app.exe (Windows).

Run

# Linux / macOS
./desktop_app/target/release/desktop_app

# Windows
desktop_app\target\release\desktop_app.exe

The app opens a file picker. Select any PE (.exe, .dll) and the model scores it instantly.

Pre-built binaries

CI builds artifacts for all three platforms on every push to main. Download from the latest Actions run:

Platform Artifact
Linux x64 desktop_app-linux-x64
macOS x64 desktop_app-macos-x64
Windows x64 desktop_app-windows-x64

Training Pipeline

Run the scripts in this order:

Data: Download the EMBER 2024 dataset from FutureComputing4AI/EMBER2024. Only the PE files (Win32, Win64, .NET) are required. After installing thrember, run:

import thrember
thrember.download_dataset("./ember2024data/", file_type="PE")
thrember.create_vectorized_features("./ember2024data/")

Pass the data directory via --data-dir to any script that requires it.

Step 1 — Hyperparameter Search (optional)

python hyperparameter_tune.py
python hyperparameter_tune.py --data-dir /path/to/data --n-trials 50

Uses Optuna to search over learning rate, focal loss parameters, gate temperature, dropout, and weight decay. Results are persisted to tuning.db so searches can be resumed across sessions. Best parameters are printed at the end and can be passed directly to train_export.py.

Step 2 — Train, Calibrate, Export

python train_export.py
python train_export.py --data-dir ./ember2024data/ --output-dir ./model_artifacts/
python train_export.py --lr 4e-3 --max-epochs 10

Trains the model, fits Platt scaling on the validation set, and exports a calibrated ONNX model to model_artifacts/. Key defaults (from the Optuna search):

Hyperparameter Value
Learning rate 2.055e-3
Focal loss γ 3.761
Focal loss α 0.199
Gate temperature 2.079
Dropout 0.058
Weight decay 3.675e-5

The learning rate schedule uses a linear warmup followed by cosine annealing with η_min = 1e-5. Training checkpoints the best val_auc epoch.

Step 3 — Test Set Evaluation

python eval.py
python eval.py --state-dict ./model_artifacts/malware_net_calibrated_state_dict.pt
python eval.py --no-plots

Loads the calibrated state dict, runs inference on the held-out test set, and prints ECE, AUC-ROC, TPR at fixed FPR thresholds, and a full classification report. Saves evaluation plots to --output-dir (default: model_artifacts/) unless --no-plots is passed.

Step 4 — Adversarial Robustness Evaluation

python attack.py
python attack.py --state-dict ./model_artifacts/malware_net_calibrated_state_dict.pt \
                 --data-dir ./ember2024data/ --num-samples 10000

Evaluates against FGSM and PGD attacks at ε ∈ {0.001, 0.005, 0.01, 0.05, 0.1} using the Adversarial Robustness Toolbox. Results are written to model_artifacts/adversarial_robustness_report.txt. Under PGD (10 steps, ε=0.1), AUC degradation is < 0.02% relative to the clean baseline. See ARCHITECTURE.md for why the gating mechanism structurally limits adversarial transferability.

To compare the best dense model against the best Lottery Ticket sparse model with the same FGSM/PGD protocol:

python attack_best_models.py
python attack_best_models.py --regular-state-dict ./model_artifacts/malware_net_calibrated_state_dict.pt \
                             --lth-state-dict ./model_artifacts/lth_best_ticket_iter04_auc0.9904_sp0.48_calibrated_state_dict.pt

This script auto-discovers the best dense export and the highest-AUC LTH export if explicit paths are not provided. It writes:

  • model_artifacts/adversarial_robustness_best_vs_lth_report.txt
  • model_artifacts/adversarial_robustness_best_vs_lth_curves.png
  • model_artifacts/adversarial_robustness_best_vs_lth_deltas.png

Step 5 — Inference Latency Benchmark

python benchmark.py

No arguments. Runs against model_artifacts/malware_model.onnx with 2,000 warmup iterations followed by 20,000 timed single-file inferences, then reports mean, P95, and P99 latency and throughput.

Step 6 — Lottery Ticket Hypothesis Pruning

python lth_malwarenet.py

Runs iterative magnitude pruning starting from the best dense checkpoint and rewinds surviving weights between rounds. The 2026-04-13 run used 10 pruning iterations with 5 fine-tuning epochs per iteration, pruning 20% of MLP/fusion weights and 10% of gate/head weights per round.

Best result from the latest run:

Iteration Val AUC Sparsity
4 0.9904 48.18%

The best sparse ticket slightly exceeds the dense baseline (0.9899) while removing nearly half of all weights. Artifacts are written to lth_artifacts/, and the exported best checkpoint is model_artifacts/lth_best_ticket_iter04_auc0.9904_sp0.48_calibrated_state_dict.pt.

Step 7 — Problem-Space PE Mutation PoC

python pe_mutate_lief.py input.exe output.exe

This script is a small proof of concept for problem-space malware-ML experiments. It uses LIEF to mutate a PE directly, re-extracts EMBER features with thrember, scores each candidate with the exported ONNX model, and greedily keeps mutations that improve the requested objective. The current candidate families are intentionally simple:

  • add a section
  • rename a section
  • add or edit CodeView/PDB debug metadata

By default it tries to decrease the malware probability. This is not a full feasible-attack framework. A real problem-space attack still needs stronger semantics-preserving mutation policies and a functionality oracle.


Repository Layout

.
├── model/
│   ├── model.py          # MalwareNet + MalwareNetLightning
│   ├── model_utils.py    # GatedFeatureBlock, FusionBlock, FocalLoss
│   ├── dims.py           # EMBER v3 feature dimensions and group mapping
│   ├── dataset.py        # MemmapDataset, EmberMemmapDataModule
│   ├── calibration.py    # Platt scaling (LBFGS)
│   ├── export.py         # ONNX export + validation utilities
│   ├── attacks.py        # ART-based FGSM/PGD evaluation
│   ├── evaluation.py     # ROC, calibration, classification metrics
│   ├── train.py          # Callback and Trainer factory functions
│   └── seed.py           # RNG seeding utilities
├── model_artifacts/      # Saved checkpoints, ONNX model, eval outputs
├── desktop_app/          # Rust egui desktop application
├── train_export.py       # Train → calibrate → export pipeline
├── eval.py               # Test-set evaluation
├── attack.py             # Adversarial robustness evaluation
├── attack_best_models.py # Dense vs LTH adversarial robustness comparison
├── benchmark.py          # ONNX inference speed benchmark
├── lth_malwarenet.py     # Lottery Ticket Hypothesis pruning experiment
├── lth_artifacts/        # Per-iteration sparse checkpoints and logs
├── hyperparameter_tune.py# Optuna hyperparameter search
└── ARCHITECTURE.md       # Full technical architecture document

Dependencies

Core ML stack: torch, pytorch-lightning, torchmetrics, onnx, onnxruntime, optuna, adversarial-robustness-toolbox.

Feature extraction: thrember (EMBER v3 feature extractor — installed from FutureComputing4AI/EMBER2024).

Install Python dependencies:

pip install -r requirements.txt

About

PE malware classifier — hierarchical gated neural network on EMBER2024, 0.9911 AUC, adversarially robust, ONNX exported

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors