Automated discovery of boolean formulas with high average-case query complexity using LLM-guided evolution.
LongshotEvolve combines the Longshot library for boolean circuit analysis with the AlphaEvolve framework to automatically evolve DNF (Disjunctive Normal Form) formulas that approach theoretical bounds for average-case deterministic query complexity (avgQ).
Given a boolean function f: {0,1}^n → {0,1}, the average-case deterministic query complexity (avgQ) measures how many input bits a decision tree must examine on average to compute f(x) for a uniformly random input x.
The Challenge: Can we automatically discover DNF formulas with width constraint w that achieve avgQ close to the theoretical upper bound of n(1 - log(n/w)/Θ(w))?
Our Approach: Use LLM-guided evolutionary search to iteratively improve formula constructions, leveraging:
- The Longshot library's efficient avgQ computation (C++ backend)
- AlphaEvolve's multi-island evolution with cross-pollination
- Mathematical insights about function composition and query complexity
The average-case deterministic query complexity is defined as:
avgQ(f) = min_T E_{x ~ {0,1}^n} [cost(T, x)]
where T ranges over all deterministic decision trees that compute f with zero error, and cost(T, x) is the number of bits queried on input x.
| Function | avgQ | Notes |
|---|---|---|
| AND_n, OR_n | ~2 | Constant complexity |
| XOR_n | n | Maximum possible |
| MAJ_n | n - Θ(√n) | Majority function |
| TRIBES_{w,s} | Θ(s) or Θ(2^w) | Depends on parameters |
| CNF/DNF of width w | ≤ n(1 - log(n/w)/O(w)) | Theoretical upper bound |
Goal: Discover formulas that match or exceed known constructions.
- Python 3.8+
- C++ compiler with C++17 support
- OpenMP support (for parallel computation)
- Git
cd library
pip install -e .This compiles the C++ backend and installs the Python package in editable mode.
cd library
pytest test/test_boolean.pyTest a formula construction on various (n, w) test cases:
cd src
python evaluate.py --max_num_vars 16 --program_path initial.py --results_dir resultsThis evaluates the formula in initial.py and outputs:
results/metrics.json: Scores for each test caseresults/correct.json: Validation results
Start the evolutionary search:
cd src
python run_evo.pyThe evolution system will:
- Start with the initial formula in
initial.py - Use LLMs to generate mutations (diff patches, full rewrites, crossovers)
- Evaluate each candidate formula on test cases
- Select high-scoring formulas as parents for the next generation
- Store results in
evolution_db.sqlite
Progress is logged to console. Evolution runs for the configured number of generations (default: 30).
longshot-evolve/
├── library/ # Longshot boolean circuit library
│ ├── longshot/
│ │ ├── boolean/ # Python Circuit API
│ │ ├── core/ # C++ avgQ computation
│ │ └── _core.so # Compiled extension (after build)
│ └── test/ # Library tests
├── src/ # Evolution system
│ ├── initial.py # Starting formula (EVOLVED)
│ ├── evaluate.py # Formula evaluation
│ └── run_evo.py # Evolution configuration
├── CLAUDE.md # Documentation for AI assistants
└── README.md # This file
1. Longshot Library (library/)
- Circuit class: Represents boolean functions as truth tables
- avgQ computation: C++ implementation using dynamic programming
- Operations: AND, OR, XOR, NOT, composition
- Constraints: Maximum 26 variables (due to 31-bit depth field)
2. Evolution System (src/)
- initial.py: Contains the evolving
construct_formula(n, w)function - evaluate.py: Scores formulas using a three-component metric
- run_evo.py: Configures LLMs, evolution strategy, database settings
-
Initialization: Start with a seed formula in
initial.py -
Mutation: LLMs generate three types of patches:
- Diff patches (60%): Incremental modifications
- Full rewrites (30%): Complete reimplementations
- Crossovers (10%): Combine features from two parents
-
Evaluation: Each candidate is tested on multiple (n, w) pairs:
- n ranges from 3 to
max_num_vars - w ranges from
n - floor(log n)tonfor each n - Validation checks width constraints and variable usage
- avgQ is computed for valid formulas
- n ranges from 3 to
-
Selection: Parent selection strategies:
- Uniform: Random selection
- Hill climbing: Always select best
- Weighted: Probability proportional to score
- Power law: Emphasize top performers
- Beam search: Keep top-k candidates
-
Multi-Island Evolution: Multiple populations evolve independently with periodic migration of top formulas
Key parameters in run_evo.py:
evo_config = EvolutionConfig(
num_generations=30, # Evolution iterations
max_parallel_jobs=16, # Parallel evaluations
llm_models=["o4-mini", ...], # LLM models for mutations
patch_types=["diff", "full", "cross"],
patch_type_probs=[0.6, 0.3, 0.1],
)
db_config = DatabaseConfig(
num_islands=2, # Independent populations
archive_size=5, # Top formulas per island
elite_selection_ratio=0.6, # Elite parent selection
)from longshot import VAR_factory, AND, OR
def construct_formula(n, w):
"""Create a DNF with all w-sized AND terms."""
from itertools import combinations
VAR = VAR_factory(n)
VAR = [VAR(i) for i in range(n)]
# Generate all combinations of w variables
terms = [AND(list(combo)) for combo in combinations(VAR, w)]
return OR(terms)from longshot import VAR_factory, AND, OR, XOR, avgQ
# Create variables for n=5
VAR = VAR_factory(5)
x0, x1, x2, x3, x4 = VAR(0), VAR(1), VAR(2), VAR(3), VAR(4)
# Build a circuit
majority = (x0 & x1) | (x0 & x2) | (x1 & x2)
# Compute average query complexity
complexity = avgQ(majority)
print(f"avgQ = {complexity}")
# Check circuit properties
print(f"Width: {majority.width}") # Number of variables used
print(f"Variables: {majority.vars}") # Which variables usedfrom longshot import avgQ_with_tree
circuit = x0 & x1 # AND of two variables
complexity, tree = avgQ_with_tree(circuit, build_tree=True)
print(f"avgQ = {complexity}")
# Can analyze the decision tree structure