Skip to content

reuAC/reFlow

Repository files navigation

reFlow

A Metal Soul In My Hand — A feature-decoupled Transformer architecture with native interpretability.

reFlow factorizes the embedding matrix $E \in \mathbb{R}^{V \times d}$ into a Recipe Matrix $W_{recipe} \in \mathbb{R}^{V \times S}$ and a Signal Basis Matrix $W_{basis} \in \mathbb{R}^{S \times d}$, forcing the model to maintain a set of continuous, low-redundancy signal bases in latent space. The same factored product $W_{recipe} \times W_{basis}$ serves as both the input embedding and the output projection, forming an end-to-end signal-manifold computation loop without a separate LM head.

Key Results

Convergence. At matched depth and scale (36 layers, ~515M parameters), reFlow-1-Big achieves a validation loss within ~1% of GPT-2-New (514M). Three scale points — Small (46.47M), reFlow-1 (463.67M), Big (515.06M) — confirm strict scaling law compliance (val loss: 3.55 → 3.01 → 2.92).

Emergent Interpretable Structure (pure language modeling objective, no auxiliary loss):

  • Recipe-space semantic algebra: king + woman − man → queen (rank #1), 3/3 tests passed
  • Natural sparsity: each token activates ~11% of signals (mean 117/1024), Gini coefficient 0.085
  • Causal traceability: single-signal ablation collapses target probability from 8.31% to 0.03%
  • Information crystallization boundary: semantic interventions are effective at L0–L12 but inert beyond L18
  • Hard sparsity (Top-64) systematically destroys recipe-space semantic structure (algebra 3/3 → 0/3, silhouette +0.11 → −0.02)

Paper: English (PDF) | 中文 (PDF) — Theoretical derivation, 12 interpretability experiments, and scaling/ablation analysis.

Pretrained Weights: HuggingFace

Project Structure

reFlow/
├── train.py              # Training script (single GPU / DDP)
├── sample.py             # Text generation from trained models
├── experiment.py          # 12-experiment interpretability suite (Chinese)
├── experiment_en.py       # 12-experiment interpretability suite (English)
├── check.py              # Checkpoint parameter inspector
├── bench.py              # Performance benchmarking
├── models/
│   ├── gpt2.py           # Standard GPT-2 baseline
│   ├── gpt2-new.py       # Modernized GPT-2 (RoPE + SwiGLU + RMSNorm)
│   ├── reflow.py         # reFlow base architecture
│   ├── reflow-topk.py    # reFlow with ReLU + Top-K hard sparsity
│   └── reflow-lite.py    # reFlow with GQA + reduced MLP
├── config/               # Training / sampling / eval configurations
├── data/
│   ├── openwebtext/      # OpenWebText dataset preparation
│   └── sft-lima/         # LIMA SFT dataset preparation
└── out/                  # Checkpoints and experiment reports

Installation

Prerequisites

  • Python 3.10+
  • CUDA-compatible GPU (tested on Tesla T4 x4)

1. PyTorch (CUDA 12.8)

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

Adjust the CUDA version in the URL to match your driver. See PyTorch Get Started.

2. Core Dependencies

pip install datasets tiktoken wandb tqdm

3. Experiment Suite Dependencies

The interpretability experiments (experiment.py) require additional packages:

pip install numpy matplotlib seaborn scikit-learn scipy adjustText

Quick Install (All-in-One)

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install datasets tiktoken wandb tqdm numpy matplotlib seaborn scikit-learn scipy adjustText

Data Preparation

OpenWebText

python data/openwebtext/prepare.py

This downloads the OpenWebText corpus (~54 GB) and tokenizes it with the GPT-2 BPE tokenizer. Output: data/openwebtext/train.bin (~17 GB, ~9B tokens) and val.bin.

Training

All configurations are in config/. No CLI overrides — all hyperparameters must be set in the config file.

Single GPU

python train.py config/train_reflow_1.py

Multi-GPU (DDP)

torchrun --standalone --nproc_per_node=4 train.py config/train_reflow_1.py

Available Training Configs

Config Architecture Layers Params Notes
train_gpt2.py GPT-2 36 505.62M Standard baseline
train_gpt2_new.py GPT-2-New 36 514.01M + RoPE, SwiGLU, RMSNorm
train_reflow_1.py reFlow 32 463.67M Base reFlow, constant lr
train_reflow_1_big.py reFlow 36 515.06M lr decay, for interpretability
train_reflow_1_topk_big.py reFlow-TopK 36 515.06M + ReLU + Top-64 sparsity
train_reflow_1_lite.py reFlow-Lite 32 413.34M + GQA, reduced MLP
train_reflow_1_small.py reFlow 6 46.47M Small-scale validation

Resume Training

Append _resume to the config name (e.g., train_reflow_1_big_resume.py).

Text Generation

python sample.py config/sample_reflow_1.py

Edit the config file to change the prompt, temperature, top-k, etc.

Interpretability Experiments

The experiment suite runs 12 analyses on a trained reFlow model. Both Chinese and English versions are available:

python experiment_en.py config/train_reflow_1_big.py   # English
python experiment.py config/train_reflow_1_big.py      # Chinese

An interactive menu will appear:

# Experiment Group
1 Recipe Atlas — recipe-space nearest neighbors A. Signal Identity
2 Sparsity Profile — activation sparsity analysis A. Signal Identity
3 Basis Geometry — singular value & effective rank A. Signal Identity
4 Semantic Galaxy — PCA clustering visualization B. Semantic Properties
5 Semantic Algebra — vector arithmetic (king − man + woman = queen) B. Semantic Properties
6 Typo Resilience — robustness to spelling errors B. Semantic Properties
7 Layer Evolution — per-layer probability crystallization C. Mechanistic Analysis
8 Signal Flow — signal activation heatmaps across layers C. Mechanistic Analysis
9 Causal Ablation — progressive signal knockout curves C. Mechanistic Analysis
10 Emotion Surgery — sentiment steering via signal injection D. Control & Steering
11 Concept Inception — binary-search concept implantation D. Control & Steering
12 Genetic Hijack — global recipe matrix manipulation D. Control & Steering

Enter all to run all experiments, or specific numbers (e.g., 1 3 5). Reports are saved to out/<model>/audit_reports/.

Checkpoint Inspection

python check.py config/train_reflow_1.py out/reflow-1/ckpt.pt

License

MIT License. Based on nanoGPT by Andrej Karpathy.

About

A feature-decoupling Transformer architecture that factorizes word embeddings into Signal Basis × Recipe, achieving native interpretability with emergent semantic organization, 11% natural sparsity, and signal-level causal traceability — without sacrificing language modeling performance.

Topics

Resources

License

Stars

Watchers

Forks

Contributors