Skip to content

MendozaLab/math-morphism-atlas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mathematical Problem Morphism Atlas

The first cross-corpus structural classification of open mathematical problems at scale.

DOI License: CC BY 4.0 ORCID

Overview

A unified dataset of 4,957 open mathematical problems from 5 sources, each classified with structural invariants (morphism vectors) and embedded in a shared vector space. Includes:

  • An Erdős-like classifier that detects structural similarity to Erdős problems across the entire corpus
  • A morphism graph connecting problems that share proof techniques
  • Lean 4 formalizations of 21+ theorems across 4 Erdős problems
  • Tools for mathematical problem discovery — find tractable problems nobody else is working on

Quick Start

# Clone
git clone https://github.com/MendozaLab/math-morphism-atlas.git
cd math-morphism-atlas

# Install dependencies
pip install -r requirements.txt

# Run the classifier on the full corpus
python scripts/embed_and_classify.py --all

# Predict attack strategy for a specific Erdős problem
python scripts/erdos_morphism_engine.py --predict 500

# Find the top 50 "Erdős-like" non-Erdős problems
python scripts/embed_and_classify.py --top-erdos-like 50

Dataset

Sources

Source Problems Format Description
Erdős Problems 1,179 Structured DB Complete catalog with prizes, domains, formal statements
DeepMind formal-conjectures 2,026 Lean 4 Formalized conjectures with AMS classifications
MathOverflow 593 Q&A Community-curated open problems, voted by quality
OEIS 437 Sequence data Integer sequences with attached conjectures
Open Problem Garden 722 Wiki Cross-discipline open problems
Total 4,957

Unified Schema

Each problem in training_data/unified_corpus.jsonl:

{
  "id": "erdos-233",
  "source": "erdos",
  "label": 1,
  "text": "Problem statement...",
  "domain": "number theory",
  "tags": "product sets, prime factors"
}

Morphism Vector (12 dimensions)

For each problem, a structural fingerprint:

Dim Feature Values
0 Problem type extremal=0, density=1, growth=2, coloring=3, construction=4
1 Object type set=0, graph=1, hypergraph=2, sequence=3, coloring=4
2 Constraint type cardinality=0, sum=1, product=2, containment=3, density=4
3 Parameter count 1-5
4-5 Upper/lower exponents Float (growth rate)
6 Gap ratio upper/lower bound ratio
7 Log-factor count Integer
8-11 Proof methods Binary: probabilistic, algebraic, analytic, flag algebra

Erdős-Like Classifier

Trained on 1,179 Erdős problems (positive) vs. 3,778 non-Erdős problems (negative).

A problem is "Erdős-like" if it shares structural DNA with problems Erdős posed — extremal questions about combinatorial objects with sharp thresholds and asymptotic growth rates.

Tools

erdos_morphism_engine.py

Classify problems, embed into ChromaDB, discover morphism edges.

python scripts/erdos_morphism_engine.py --classify     # Classify all problems
python scripts/erdos_morphism_engine.py --embed        # Embed into ChromaDB
python scripts/erdos_morphism_engine.py --morphisms    # Find technique-sharing edges
python scripts/erdos_morphism_engine.py --predict 500  # Attack prediction for problem #500

embed_and_classify.py

Train and run the Erdős-like classifier.

python scripts/embed_and_classify.py --train           # Train XGBoost classifier
python scripts/embed_and_classify.py --top-erdos-like 50  # Rank non-Erdős problems
python scripts/embed_and_classify.py --evaluate        # Embedding quality tests

Lean 4 Proofs

Machine-verified proofs (0 sorry) in Lean/:

File Problem Theorems
Erdos1_SubsetSums_MDL_Solved.lean #1 Distinct subset sums 3
Sunflower_MDL_Solved.lean #20 Sunflower lemma 3
SidonCoding_v2_Solved.lean #30 Sidon sets 2
Erdos233.lean Product set bounds 2
DDC_Aristotle_Solved_v3.lean Divisor counting chain 2

Verified against Lean 4.24.0 / Mathlib f897ebcf via Aristotle (Harmonic AI).

HS(p) Framework

The classification uses the HS(p) Stability Sextet — a domain-independent fingerprint originally developed for physical systems (materials science, climate, medical diagnostics), here extended to combinatorial mathematics as the 10th domain spoke.

Sextet Element Combinatorial Meaning
I_det Growth rate exponent
θ_holo Extremal density / threshold
κ_curv Log-factor complexity
R_kin Upper/lower bound gap ratio
Q_takens Formalization depth (0=none → 3=Lean verified)
S_vn Structural entropy of extremal configuration

Citation

If you use this dataset or tools, please cite:

@misc{mendoza2026morphism,
  author = {Mendoza, Kenneth A.},
  title = {Mathematical Problem Morphism Atlas: Structural Classification of
           Open Mathematical Problems},
  year = {2026},
  publisher = {Zenodo},
  doi = {10.5281/zenodo.XXXXXXX},
  url = {https://github.com/MendozaLab/math-morphism-atlas},
  note = {ORCID: 0009-0000-9475-5938}
}

ORCID: 0009-0000-9475-5938

License

Individual source data retains original licensing:

AI Disclosure

This dataset was constructed with assistance from AI tools (Claude, Perplexity). All structural classifications have been reviewed by the human author. The Lean 4 proofs were generated with Aristotle (Harmonic AI) and verified by machine compilation. In accordance with the erdosproblems.com AI usage policy, AI assistance is disclosed and all mathematical content has been independently verified.

Acknowledgments

Author

Kenneth A. Mendoza Independent Researcher, Waldport, Oregon, USA ken@kenmendoza.com | ORCID 0009-0000-9475-5938

About

4,957 open math problems classified by structural morphism — Erdős-like classifier, 12-dim vectors, Lean 4 proofs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors