D-Wave Quantum Annealing vs Classical Simulated Annealing

A rigorous benchmarking study comparing classical simulated annealing against D-Wave quantum annealing on combinatorial optimization problems. The central question: does quantum annealing find better solutions, escape local minima that classical SA misses, or reach comparable solutions faster?

Classical SA uses thermal fluctuations to escape local minima. Quantum annealing uses quantum tunneling. They attack the same optimization landscape with fundamentally different physics. This project provides a fair, reproducible comparison using identical problem instances and a standardized solver interface.

Prior work: This project extends my classical simulated annealing optimizer (Thermal Budget Annealing) for constrained ML deployment — see Constrained-ML-Deployment. The TBA paper is available on arXiv.

Status

Phase	Status	Description
Phase 1	✅ Complete	Classical baselines (Custom SA, Neal SA, Tabu Search) across Max-Cut, planted partition, and spin glass problems up to n=500
Phase 2	⏳ Pending	D-Wave QPU experiments via LaunchPad access. QPU solver is stubbed out — one line swap when access arrives

Benchmark Pipeline

graph TD
    A[Problem Generation] --> B[QUBO Formulation]
    B --> C{Solver}
    C --> D[Custom SA<br/>Pure Python, TBA-inspired]
    C --> E[Neal SA<br/>D-Wave C-optimized]
    C --> F[Tabu Search<br/>D-Wave TabuSampler]
    C --> G[Exact Solver<br/>Brute force, n≤20]
    C --> H[D-Wave QPU<br/>Phase 2]
    C --> I[D-Wave Hybrid<br/>Phase 2]
    D --> J[Results Collection]
    E --> J
    F --> J
    G --> J
    H --> J
    I --> J
    J --> K[Metrics Computation<br/>Energy, Time, Approx Ratio]
    K --> L[Plots & Analysis]

    style H fill:#f9f,stroke:#333,stroke-dasharray: 5 5
    style I fill:#f9f,stroke:#333,stroke-dasharray: 5 5

Problem Statement

Given an NP-hard combinatorial optimization problem encoded as a QUBO (Quadratic Unconstrained Binary Optimization), how do different annealing-based solvers compare in terms of:

Solution quality — does one solver consistently find lower-energy solutions?
Time to solution — how much wall-clock time does each solver need?
Scaling behavior — how does performance change from n=10 to n=500?
Robustness — how consistent are solutions across independent runs?

We test three problem classes that span different difficulty regimes:

Problem	Description	Hardness Source
Max-Cut (Erdos-Renyi)	Partition random graph nodes to maximize crossing edges	Random structure, easy for SA at moderate sizes
Planted Partition	Stochastic block model with known community structure	Designed to trap solvers in local minima (but see findings below)
Spin Glass	Random ±1 Ising couplings on degree-6 regular graphs	Frustrated interactions create rugged energy landscapes

Methodology

Solver Interface

Every solver implements a standardized interface for fair comparison:

class Solver(ABC):
    name: str

    def solve(self, bqm, num_reads=100, time_limit_s=None, **kwargs) -> dict:
        """
        Returns:
            best_energy: float    — lowest energy found
            best_sample: dict     — variable assignments for best solution
            all_energies: list    — energies from all reads
            wall_time_s: float    — total wall-clock time
            solver_info: dict     — solver-specific metadata
        """

All solvers receive identical BQM instances, identical num_reads, and identical random seeds where applicable.

Solvers

Solver	Implementation	Description
Custom SA	Pure Python	TBA-inspired adaptive SA with geometric temperature schedule and multiple restarts
Neal SA	D-Wave `dwave-neal` (C-optimized)	Production-grade SA from D-Wave's Ocean SDK
Tabu Search	D-Wave `dwave-tabu`	Local search with tabu list to avoid revisiting states
Exact Solver	`dimod.ExactSolver`	Brute-force enumeration for ground truth (n ≤ 20 only)
D-Wave QPU	`DWaveSampler` + `EmbeddingComposite`	Phase 2 — pending LaunchPad access
D-Wave Hybrid	`LeapHybridSampler`	Phase 2 — quantum-classical hybrid for large problems

Instance Generation

Max-Cut: Erdos-Renyi graphs G(n, 0.5) converted to QUBO via standard edge-based formulation
Planted Partition: Stochastic block model with 60/40 community split, dense intra-community edges (p_in=0.8), sparse inter-community edges (p_out=0.1)
Spin Glass: Random ±1 Ising couplings on random regular graphs (degree 6)

All instances are seeded for reproducibility. Each experiment uses 5 random instances per size.

Phase 1 Results: Classical Baselines

Max-Cut (Erdos-Renyi Graphs)

Size	Custom SA	Neal SA	Tabu	Divergence	Custom SA Time	Neal SA Time	Speedup
10	-17.8	-17.8	-17.8	0.00%	1.05s	0.006s	175x
20	-61.8	-61.8	-61.8	0.00%	2.70s	0.013s	208x
50	-372.2	-372.2	-372.2	0.00%	12.3s	0.042s	293x
100	-1421.2	-1421.2	-1421.2	0.00%	42s	0.10s	420x
200	-5528.8	-5528.8	-5528.8	0.00%	165s	0.32s	516x
500	-33330.8	-33332.2	-33332.2	0.004%	607s	1.5s	405x

At n=500, Neal SA and Tabu find solutions 1.4 energy units better on average out of 33,000. Effectively identical.

Spin Glass (Random ±1 Couplings, Degree-6 Regular Graphs)

Size	Custom SA	Neal SA	Tabu	Divergence
20	-34.40	-34.40	-34.40	0.00%
50	-84.40	-84.40	-84.40	0.00%
100	-175.20	-175.20	-175.20	0.00%
200	-355.20	-355.20	-355.20	0.00%
500	-901.60	-901.60	-901.60	0.00%

Zero quality divergence across all sizes and all problem classes.

Planted Partition

All solvers found solutions 8-10x better than the planted community cut. The planted partition is not the Max-Cut optimum — the true Max-Cut bisects within communities, not between them. This is a methodological finding: planted partition graphs do not create hard Max-Cut instances despite having deep community structure.

Key Findings

All classical solvers converge to identical solution quality up to n=500. Whether implemented in pure Python or C-optimized, simulated annealing and tabu search find the same best energies on every problem instance we tested. The algorithm matters more than the implementation.
The only classical differentiator is speed. Neal SA (C-optimized) is 400x faster than our pure Python SA implementation. Same solutions, dramatically less compute. At n=500, Custom SA takes 607 seconds while Neal SA takes 1.5 seconds.
This establishes a clean baseline for quantum annealing. If all classical solvers agree on the answer, the question becomes: can D-Wave's quantum annealer find these solutions faster, find better solutions on problems that are intractable for classical SA, or does it fall short due to embedding overhead and connectivity constraints?
Planted partition graphs are not hard for Max-Cut. Despite having designed community structure, the optimal Max-Cut bisects within communities rather than between them. Frustrated spin glasses (random ±1 couplings) are the correct hard problem class for annealing benchmarks.
Energy distribution spread differs between solvers. While best-found energies are identical, Tabu Search shows the lowest variance across reads, and Custom SA shows the widest spread. This hints at different landscape exploration strategies that may behave differently under quantum annealing.

Phase 2: QPU Experiments (Pending)

When D-Wave LaunchPad access arrives, Phase 2 will:

Run identical problem instances on the Advantage QPU (5000+ qubits)
Study embedding overhead (logical to physical qubit mapping via EmbeddingComposite)
Study chain strength effects on solution quality
Sweep anneal time parameters
Test the hybrid solver (LeapHybridSampler) for large problems
Identify the crossover point (if any) where QPU outperforms classical SA

The QPU solver swap is one line:

# Phase 1 (current):
sampler = SimulatedAnnealingSampler()

# Phase 2 (when access arrives):
from dwave.system import DWaveSampler, EmbeddingComposite
sampler = EmbeddingComposite(DWaveSampler())

Installation

git clone https://github.com/Chrislysen/dwave-benchmark.git
cd dwave-benchmark
pip install dwave-ocean-sdk networkx numpy scipy matplotlib pyyaml

Verify installation

python -c "import dimod; import dwave.samplers; import networkx; print('All deps OK')"

Reproducing Results

Run the core Max-Cut benchmark (sizes 10, 20, 50)

python experiments/run_maxcut_benchmark.py --sizes 10 20 50 --n-instances 5 --num-reads 50 --num-sweeps 1000 --seed 42

Run the large-scale Max-Cut benchmark (sizes 100, 200, 500)

python experiments/run_large_maxcut.py

Run the spin glass hardness study

python experiments/run_spin_glass_fast.py

Run all tests

python tests/test_problems.py
python tests/test_solvers.py

Reproducibility details

All random instances use seeded generators (seed=42 by default)
Each experiment generates 5 random instances per problem size
Solver reads: 50 per instance (configurable via --num-reads)
SA sweeps: 1000 per read (configurable via --num-sweeps)
Results saved as JSON in results/ with full configuration metadata
Figures saved as PNG in figures/

Project Structure

dwave-benchmark/
├── pyproject.toml
├── README.md
├── src/
│   ├── problems/
│   │   ├── maxcut.py              # Max-Cut: graph generation, QUBO formulation, exact solver
│   │   ├── number_partition.py    # Number partitioning: instance generation, QUBO
│   │   ├── spin_glass.py          # Frustrated spin glass: random ±1 couplings
│   │   └── utils.py               # Shared utilities (seeded RNG)
│   ├── solvers/
│   │   ├── base.py                # Abstract solver interface
│   │   ├── custom_sa.py           # TBA-inspired SA (pure Python)
│   │   ├── neal_sa.py             # D-Wave Neal SA wrapper (C-optimized)
│   │   ├── exact.py               # Exact solver wrapper (n ≤ 20)
│   │   ├── tabu.py                # Tabu search wrapper
│   │   ├── dwave_qpu.py           # D-Wave QPU (Phase 2 stub)
│   │   └── hybrid.py              # D-Wave Hybrid (Phase 2 stub)
│   ├── evaluation/
│   │   ├── metrics.py             # Approximation ratio, success probability
│   │   ├── scaling.py             # Power-law scaling fits
│   │   └── plotting.py            # Publication-quality figures
│   └── configs/
│       └── experiment_config.yaml # Experiment parameters
├── experiments/
│   ├── run_maxcut_benchmark.py    # Core Max-Cut benchmark (n=10-50)
│   ├── run_large_maxcut.py        # Large-scale Max-Cut (n=100-500)
│   ├── run_hardness_study.py      # Planted partition study
│   ├── run_spin_glass_fast.py     # Spin glass hardness study
│   └── run_qpu_validation.py     # Phase 2: QPU experiments
├── results/                       # JSON results (generated)
├── figures/                       # PNG plots (generated)
├── paper/
└── tests/
    ├── test_problems.py
    └── test_solvers.py

Author

Christian Lysenstøen — UC Berkeley EECS

GitHub: Chrislysen
Related projects: TBA Deployment Optimizer | Deploy Agent

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

D-Wave Quantum Annealing vs Classical Simulated Annealing

Status

Benchmark Pipeline

Problem Statement

Methodology

Solver Interface

Solvers

Instance Generation

Phase 1 Results: Classical Baselines

Max-Cut (Erdos-Renyi Graphs)

Spin Glass (Random ±1 Couplings, Degree-6 Regular Graphs)

Planted Partition

Key Findings

Phase 2: QPU Experiments (Pending)

Installation

Verify installation

Reproducing Results

Run the core Max-Cut benchmark (sizes 10, 20, 50)

Run the large-scale Max-Cut benchmark (sizes 100, 200, 500)

Run the spin glass hardness study

Run all tests

Reproducibility details

Project Structure

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
experiments		experiments
figures		figures
results		results
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

D-Wave Quantum Annealing vs Classical Simulated Annealing

Status

Benchmark Pipeline

Problem Statement

Methodology

Solver Interface

Solvers

Instance Generation

Phase 1 Results: Classical Baselines

Max-Cut (Erdos-Renyi Graphs)

Spin Glass (Random ±1 Couplings, Degree-6 Regular Graphs)

Planted Partition

Key Findings

Phase 2: QPU Experiments (Pending)

Installation

Verify installation

Reproducing Results

Run the core Max-Cut benchmark (sizes 10, 20, 50)

Run the large-scale Max-Cut benchmark (sizes 100, 200, 500)

Run the spin glass hardness study

Run all tests

Reproducibility details

Project Structure

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages