Self-Developed Algorithm Β· Iterative Refinement Architecture Β· Spiral Memory Mechanism
Apex is a fully self-developed iterative refinement reasoning framework. Its core innovation lies in formalizing the Proposal β Review β Refinement multi-round self-correction pipeline as a trainable neural network loop. By leveraging self-critique and verification feedback, the model continuously refines its output during inference, overcoming the single-pass forward limitation of standard Transformer architectures.
Despite their scale, major LLMs (GPT, Claude, LLaMA, etc.) share fundamental architectural weaknesses:
| Issue | Description |
|---|---|
| Single-Pass Inference | Each token is processed once; no built-in self-correction |
| No Feedback Loop | Output does not feed back into input for revision |
| Error Accumulation | Early-token errors compound along the autoregressive chain |
| Linear Reasoning | Chain-of-Thought strategies are unidirectional, lacking divergent review |
Rather than scaling up parameters and data, Apex innovates at the reasoning architecture level:
Standard LLM: Input β [Transformer Γ N] β Output (single pass)
Apex: Input β Prelude Encoding β [Refinement Loop Γ K]:
ββ Proposal Head
ββ Review Head
ββ Refinement Head
ββ Scoring Verifier
ββ Spiral Memory Update
β Decode Output (self-corrective reasoning)
A differentiable self-correction loop that generates three distinct representations per step β Proposal (candidate generation), Review (defect detection), and Refinement (fusion and improvement):
All three heads share the same Transformer backbone and project the same hidden state into different subspaces, forming a self-adversarial and convergent refinement loop. The number of steps
Unlike standard RNNs that update state from only the previous step, Spiral Memory compresses five-dimensional information into a unified memory state:
where
Verifier(refinement) β score β [0, 1]
A trainable scoring network that assesses the quality of each round's refinement output. The score is fed back into the Spiral Memory, driving the model to improve low-scoring outputs in subsequent steps β forming a closed-loop feedback system for reasoning quality.
| Mechanism | Function |
|---|---|
| GQA (Grouped Query Attention) | Reduces KV cache footprint, accelerates inference |
| Sliding Window Attention | Local attention with O(wn) complexity |
| Full Attention (every 4 layers) | Global attention every 4th layer, balancing local efficiency with global context |
| Memory Cross-Attention | Cross-attends to Spiral Memory at every layer, injecting historical reasoning context |
Replaces standard FFN with SwiGLU for enhanced non-linear expressiveness:
Self-implemented RoPE with zero external dependencies, supporting extrapolation to sequence lengths unseen during training.
ββββββββββββββββββββββββ
Input βββββββββββββΆ β SimpleTokenizer β
ββββββββββββ¬ββββββββββββ
β
ββββββββββββΌββββββββββββ
β Embedding Table β
ββββββββββββ¬ββββββββββββ
β
ββββββββββββββββββΌβββββββββββββββββ
β Pre-lude Transformer β
β (GQA + SWA + Cross-Attn) Γ N β
ββββββββββββββββββ¬βββββββββββββββββ
β
ββββββββββββββββββββββββΌβββββββββββββββββββββββ
β Refinement Loop Γ K β
β β
β βββββββββββββββββββββββββββββββββββββββ β
β β Shared Core Transformer Blocks β β
β β (GQA + SWA/Full + Mem Cross-Attn) β β
β βββββββββββββββββββ¬ββββββββββββββββββββ β
β β β
β ββββββββββββββββΌβββββββββββββββ β
β βΌ βΌ βΌ β
β Proposal Review Refinement β
β β β β β
β ββββββββββββββββΌβββββββββββββββ β
β βΌ β
β βββββββββββββββββ β
β β Verifier ββββΆ Score β
β βββββββββ¬ββββββββ β
β β β
β ββββββββββββββββΌβββββββββββββββ β
β β Spiral Memory Update β β
β β (PβCβRβScoreβInvariant} β β
β ββββββββββββββββ¬βββββββββββββββ β
β β β
β βΌ (next step) β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββΌββββββββββββ
β Linear Decoder β
ββββββββββββ¬ββββββββββββ
β
ββββββββββββΌββββββββββββ
β Output Text β
ββββββββββββββββββββββββ
| Feature | Standard Transformer | Chain-of-Thought | Tree-of-Thought | Apex (Self-Dev) |
|---|---|---|---|---|
| Inference passes | 1 | 1 + prompt | N tree searches | K loop steps |
| Self-correction | β | β (prompt-dependent) | Partial | β Built-in |
| Quality feedback | β | β | External | β Built-in scorer |
| Cross-step memory | β | β | β | β Spiral Memory |
| Compute overhead | Baseline | +prompt length | +tree nodes | +K Γ shared layers |
| End-to-end differentiable | β | β (no special design) | β | β Full pipeline |
Apex/
βββ README.md # This file (English)
βββ README-CN.md # Chinese documentation
βββ LICENSE # MIT License
βββ pyproject.toml # Python package config
βββ requirements.txt # Dependencies (torch>=2.0.0)
βββ configs/ # Hyperparameter configs
β
βββ apex/ # Core package
β βββ model/ # Model components
β β βββ rope.py # Rotary Position Embedding (self-dev)
β β βββ attention.py # GQA + Sliding Window + SwiGLU (self-dev)
β β βββ transformer.py # Shared Transformer Block (self-dev)
β β βββ memory.py # Spiral Memory compression (core innovation)
β β βββ heads.py # Three-way reasoning heads + decoder (core innovation)
β β βββ dialectic.py # Refinement step + ApexMVP model (core innovation)
β β βββ recurrent.py # Gated recurrent state cell
β β
β βββ runtime/ # Runtime control
β β βββ loop.py # Training / validation loop
β β βββ verifier.py # Scoring verifier interface (core innovation)
β β βββ scheduler.py # Loop-step / LR scheduler (self-dev)
β β βββ controller.py # Inference controller
β β
β βββ data/ # Data pipeline
β β βββ tokenizer.py # Character-level tokenizer (self-dev, zero deps)
β β βββ dataset.py # Dataset loaders
β β βββ preprocess.py # Preprocessing utilities
β β
β βββ train/ # Training system
β β βββ trainer.py # Trainer
β β βββ losses.py # Combined loss (verification + consistency)
β β βββ optim.py # Optimizer factory
β β
β βββ utils/ # Utility functions
β
βββ scripts/ # Run scripts
β βββ train.sh # Training entrypoint
β βββ eval.sh # Evaluation entrypoint
β βββ benchmark.sh # Performance benchmark
β
βββ examples/ # Usage examples
β βββ code_repair.py # Code repair example
β βββ math_reasoning.py # Math reasoning example
β βββ verifier_loop.py # Verifier loop analysis
β
βββ docs/ # Detailed docs
β βββ architecture.md # Architecture design doc
β βββ runtime.md # Runtime mechanics
β βββ training.md # Training guide
β
βββ benchmarks/ # Evaluation benchmarks
βββ experiments/ # Experiment configs
βββ checkpoints/ # Model checkpoints
βββ outputs/ # Output directory
βββ datasets/ # Dataset directory
- Python >= 3.10
- PyTorch >= 2.0.0
git clone <repo-url>
cd Apex
pip install -r requirements.txt# Code repair
python examples/code_repair.py
# Math reasoning
python examples/math_reasoning.py
# Detailed verifier loop analysis
python examples/verifier_loop.pybash scripts/train.shOr in Python:
from apex import ApexMVP
from apex.data import make_toy_dataset
from apex.train import Trainer
model = ApexMVP(
vocab_size=32000,
dim=512,
prelude_layers=2,
shared_layers=4,
num_heads=8,
num_kv_heads=2,
window_size=128,
loop_steps=3,
)
dataset = make_toy_dataset()
trainer = Trainer(model, device="cuda", lr=1e-4)
trainer.fit(dataset, epochs=50)from apex import ApexMVP
from apex.runtime import RuntimeController
model = ApexMVP(dim=512, loop_steps=3)
controller = RuntimeController(model, device="cuda")
result, scores, history = controller.run("Your question...")
print(f"Result: {result}")
print(f"Verification scores: {[round(s.item(), 4) for s in scores]}")Given input text
Step 1: Tokenization & Embedding
Step 2: Prelude Encoding
Step 3: Iterative Refinement Loop (repeated
where
Step 4: Final Decoding
The final hidden state is fused with the spiral memory via residual addition before the linear decoder generates the output.
Apex uses a three-component loss:
| Term | Formula | Purpose |
|---|---|---|
| Cross-Entropy(logits, targets) | Standard next-token prediction | |
| Encourages high verifier scores on refinements | ||
| MSE(Proposal, Refinement) + MSE(Review, Refinement) | Keeps representations semantically consistent |
| Parameter | Default | Description |
|---|---|---|
dim |
512 | Hidden dimension |
prelude_layers |
2 | Number of prelude Transformer layers |
shared_layers |
4 | Number of shared core layers |
num_heads |
8 | Number of query attention heads |
num_kv_heads |
2 | Number of KV heads (GQA groups) |
window_size |
128 | Sliding window size |
loop_steps |
3 | Number of refinement loop steps |
vocab_size |
32000 | Vocabulary size |
| Domain | Apex Advantage |
|---|---|
| Code Repair | Multi-round self-review detects and fixes defects |
| Math Reasoning | Verifier scores intermediate conclusions, selects correct paths |
| Logical Reasoning | Review head identifies inconsistencies in reasoning chains |
| Text Quality | Iteratively corrects grammar, logic, and style |
| Multi-Step Planning | Spiral Memory stores intermediate planning states |
@misc{apex2025,
title={Apex: Self-Refining Reasoning via Iterative Refinement Loops and Spiral Memory},
author={Apex Contributors},
year={2025},
note={Self-developed innovative algorithm},
}- MVP core architecture (iterative refinement loop + spiral memory + scoring verifier)
- GQA + sliding window attention + SwiGLU activation
- Complete train / eval / inference pipeline
- Real dataset training (CodeNet, GSM8K)
- Dynamic loop-step scheduling
- Multi-task fine-tuning support
- Distributed training (FSDP)
- ONNX / TensorRT inference acceleration
- Open-weight release
MIT License. See LICENSE.