A rigorous benchmarking study comparing classical simulated annealing against D-Wave quantum annealing on combinatorial optimization problems. The central question: does quantum annealing find better solutions, escape local minima that classical SA misses, or reach comparable solutions faster?
Classical SA uses thermal fluctuations to escape local minima. Quantum annealing uses quantum tunneling. They attack the same optimization landscape with fundamentally different physics. This project provides a fair, reproducible comparison using identical problem instances and a standardized solver interface.
Prior work: This project extends my classical simulated annealing optimizer (Thermal Budget Annealing) for constrained ML deployment — see Constrained-ML-Deployment. The TBA paper is available on arXiv.
| Phase | Status | Description |
|---|---|---|
| Phase 1 | ✅ Complete | Classical baselines (Custom SA, Neal SA, Tabu Search) across Max-Cut, planted partition, and spin glass problems up to n=500 |
| Phase 2 | ⏳ Pending | D-Wave QPU experiments via LaunchPad access. QPU solver is stubbed out — one line swap when access arrives |
graph TD
A[Problem Generation] --> B[QUBO Formulation]
B --> C{Solver}
C --> D[Custom SA<br/>Pure Python, TBA-inspired]
C --> E[Neal SA<br/>D-Wave C-optimized]
C --> F[Tabu Search<br/>D-Wave TabuSampler]
C --> G[Exact Solver<br/>Brute force, n≤20]
C --> H[D-Wave QPU<br/>Phase 2]
C --> I[D-Wave Hybrid<br/>Phase 2]
D --> J[Results Collection]
E --> J
F --> J
G --> J
H --> J
I --> J
J --> K[Metrics Computation<br/>Energy, Time, Approx Ratio]
K --> L[Plots & Analysis]
style H fill:#f9f,stroke:#333,stroke-dasharray: 5 5
style I fill:#f9f,stroke:#333,stroke-dasharray: 5 5
Given an NP-hard combinatorial optimization problem encoded as a QUBO (Quadratic Unconstrained Binary Optimization), how do different annealing-based solvers compare in terms of:
- Solution quality — does one solver consistently find lower-energy solutions?
- Time to solution — how much wall-clock time does each solver need?
- Scaling behavior — how does performance change from n=10 to n=500?
- Robustness — how consistent are solutions across independent runs?
We test three problem classes that span different difficulty regimes:
| Problem | Description | Hardness Source |
|---|---|---|
| Max-Cut (Erdos-Renyi) | Partition random graph nodes to maximize crossing edges | Random structure, easy for SA at moderate sizes |
| Planted Partition | Stochastic block model with known community structure | Designed to trap solvers in local minima (but see findings below) |
| Spin Glass | Random ±1 Ising couplings on degree-6 regular graphs | Frustrated interactions create rugged energy landscapes |
Every solver implements a standardized interface for fair comparison:
class Solver(ABC):
name: str
def solve(self, bqm, num_reads=100, time_limit_s=None, **kwargs) -> dict:
"""
Returns:
best_energy: float — lowest energy found
best_sample: dict — variable assignments for best solution
all_energies: list — energies from all reads
wall_time_s: float — total wall-clock time
solver_info: dict — solver-specific metadata
"""All solvers receive identical BQM instances, identical num_reads, and identical random seeds where applicable.
| Solver | Implementation | Description |
|---|---|---|
| Custom SA | Pure Python | TBA-inspired adaptive SA with geometric temperature schedule and multiple restarts |
| Neal SA | D-Wave dwave-neal (C-optimized) |
Production-grade SA from D-Wave's Ocean SDK |
| Tabu Search | D-Wave dwave-tabu |
Local search with tabu list to avoid revisiting states |
| Exact Solver | dimod.ExactSolver |
Brute-force enumeration for ground truth (n ≤ 20 only) |
| D-Wave QPU | DWaveSampler + EmbeddingComposite |
Phase 2 — pending LaunchPad access |
| D-Wave Hybrid | LeapHybridSampler |
Phase 2 — quantum-classical hybrid for large problems |
- Max-Cut: Erdos-Renyi graphs G(n, 0.5) converted to QUBO via standard edge-based formulation
- Planted Partition: Stochastic block model with 60/40 community split, dense intra-community edges (p_in=0.8), sparse inter-community edges (p_out=0.1)
- Spin Glass: Random ±1 Ising couplings on random regular graphs (degree 6)
All instances are seeded for reproducibility. Each experiment uses 5 random instances per size.
| Size | Custom SA | Neal SA | Tabu | Divergence | Custom SA Time | Neal SA Time | Speedup |
|---|---|---|---|---|---|---|---|
| 10 | -17.8 | -17.8 | -17.8 | 0.00% | 1.05s | 0.006s | 175x |
| 20 | -61.8 | -61.8 | -61.8 | 0.00% | 2.70s | 0.013s | 208x |
| 50 | -372.2 | -372.2 | -372.2 | 0.00% | 12.3s | 0.042s | 293x |
| 100 | -1421.2 | -1421.2 | -1421.2 | 0.00% | 42s | 0.10s | 420x |
| 200 | -5528.8 | -5528.8 | -5528.8 | 0.00% | 165s | 0.32s | 516x |
| 500 | -33330.8 | -33332.2 | -33332.2 | 0.004% | 607s | 1.5s | 405x |
At n=500, Neal SA and Tabu find solutions 1.4 energy units better on average out of 33,000. Effectively identical.
| Size | Custom SA | Neal SA | Tabu | Divergence |
|---|---|---|---|---|
| 20 | -34.40 | -34.40 | -34.40 | 0.00% |
| 50 | -84.40 | -84.40 | -84.40 | 0.00% |
| 100 | -175.20 | -175.20 | -175.20 | 0.00% |
| 200 | -355.20 | -355.20 | -355.20 | 0.00% |
| 500 | -901.60 | -901.60 | -901.60 | 0.00% |
Zero quality divergence across all sizes and all problem classes.
All solvers found solutions 8-10x better than the planted community cut. The planted partition is not the Max-Cut optimum — the true Max-Cut bisects within communities, not between them. This is a methodological finding: planted partition graphs do not create hard Max-Cut instances despite having deep community structure.
-
All classical solvers converge to identical solution quality up to n=500. Whether implemented in pure Python or C-optimized, simulated annealing and tabu search find the same best energies on every problem instance we tested. The algorithm matters more than the implementation.
-
The only classical differentiator is speed. Neal SA (C-optimized) is 400x faster than our pure Python SA implementation. Same solutions, dramatically less compute. At n=500, Custom SA takes 607 seconds while Neal SA takes 1.5 seconds.
-
This establishes a clean baseline for quantum annealing. If all classical solvers agree on the answer, the question becomes: can D-Wave's quantum annealer find these solutions faster, find better solutions on problems that are intractable for classical SA, or does it fall short due to embedding overhead and connectivity constraints?
-
Planted partition graphs are not hard for Max-Cut. Despite having designed community structure, the optimal Max-Cut bisects within communities rather than between them. Frustrated spin glasses (random ±1 couplings) are the correct hard problem class for annealing benchmarks.
-
Energy distribution spread differs between solvers. While best-found energies are identical, Tabu Search shows the lowest variance across reads, and Custom SA shows the widest spread. This hints at different landscape exploration strategies that may behave differently under quantum annealing.
When D-Wave LaunchPad access arrives, Phase 2 will:
- Run identical problem instances on the Advantage QPU (5000+ qubits)
- Study embedding overhead (logical to physical qubit mapping via
EmbeddingComposite) - Study chain strength effects on solution quality
- Sweep anneal time parameters
- Test the hybrid solver (
LeapHybridSampler) for large problems - Identify the crossover point (if any) where QPU outperforms classical SA
The QPU solver swap is one line:
# Phase 1 (current):
sampler = SimulatedAnnealingSampler()
# Phase 2 (when access arrives):
from dwave.system import DWaveSampler, EmbeddingComposite
sampler = EmbeddingComposite(DWaveSampler())git clone https://github.com/Chrislysen/dwave-benchmark.git
cd dwave-benchmark
pip install dwave-ocean-sdk networkx numpy scipy matplotlib pyyamlpython -c "import dimod; import dwave.samplers; import networkx; print('All deps OK')"python experiments/run_maxcut_benchmark.py --sizes 10 20 50 --n-instances 5 --num-reads 50 --num-sweeps 1000 --seed 42python experiments/run_large_maxcut.pypython experiments/run_spin_glass_fast.pypython tests/test_problems.py
python tests/test_solvers.py- All random instances use seeded generators (
seed=42by default) - Each experiment generates 5 random instances per problem size
- Solver reads: 50 per instance (configurable via
--num-reads) - SA sweeps: 1000 per read (configurable via
--num-sweeps) - Results saved as JSON in
results/with full configuration metadata - Figures saved as PNG in
figures/
dwave-benchmark/
├── pyproject.toml
├── README.md
├── src/
│ ├── problems/
│ │ ├── maxcut.py # Max-Cut: graph generation, QUBO formulation, exact solver
│ │ ├── number_partition.py # Number partitioning: instance generation, QUBO
│ │ ├── spin_glass.py # Frustrated spin glass: random ±1 couplings
│ │ └── utils.py # Shared utilities (seeded RNG)
│ ├── solvers/
│ │ ├── base.py # Abstract solver interface
│ │ ├── custom_sa.py # TBA-inspired SA (pure Python)
│ │ ├── neal_sa.py # D-Wave Neal SA wrapper (C-optimized)
│ │ ├── exact.py # Exact solver wrapper (n ≤ 20)
│ │ ├── tabu.py # Tabu search wrapper
│ │ ├── dwave_qpu.py # D-Wave QPU (Phase 2 stub)
│ │ └── hybrid.py # D-Wave Hybrid (Phase 2 stub)
│ ├── evaluation/
│ │ ├── metrics.py # Approximation ratio, success probability
│ │ ├── scaling.py # Power-law scaling fits
│ │ └── plotting.py # Publication-quality figures
│ └── configs/
│ └── experiment_config.yaml # Experiment parameters
├── experiments/
│ ├── run_maxcut_benchmark.py # Core Max-Cut benchmark (n=10-50)
│ ├── run_large_maxcut.py # Large-scale Max-Cut (n=100-500)
│ ├── run_hardness_study.py # Planted partition study
│ ├── run_spin_glass_fast.py # Spin glass hardness study
│ └── run_qpu_validation.py # Phase 2: QPU experiments
├── results/ # JSON results (generated)
├── figures/ # PNG plots (generated)
├── paper/
└── tests/
├── test_problems.py
└── test_solvers.py
Christian Lysenstøen — UC Berkeley EECS
- GitHub: Chrislysen
- Related projects: TBA Deployment Optimizer | Deploy Agent