Sanity

A Lean 4 tactic that looks for counterexamples to your goal before you commit to proving it.

Design sketch. Nothing is implemented yet.

Idea

Pause at the current goal, generate concrete instances of its quantified variables, and try to refute it. The tactic never closes a goal. It only reports ok up to <bound>, counterexample: <term>, or inconclusive: <reason>.

Feel

example (G : SimpleGraph (Fin n)) (h : G.IsConnected) : G.chromaticNumber ≤ 4 := by
  sanity            -- searches small n, returns "no counterexample up to n = 7"
  sorry

example (n : ℕ) (h : n > 2) : ∀ a b c : ℕ, a^n + b^n ≠ c^n := by
  sanity (bound := 50)   -- fuzzes; finds nothing; that is information, not a proof
  sorry

example (p q : Prop) : (p → q) → (¬q → ¬p) := by
  sanity            -- dispatches to SMT, reports valid
  tauto

Backends

Lean tactics. A free pre-tier of in-process decision procedures: decide and native_decide for anything reducible to a Decidable instance, norm_num for ground numerical equalities and inequalities, linarith and polyrith for linear and polynomial arithmetic over ordered fields, positivity for sign goals. No external round-trip, no parsing, no IPC. Always try these before paying any cross-process cost.
Enumeration. For finite or finitely-bounded types, walk inhabitants up to a size bound and evaluate the conclusion using Lean's own reduction (Lean.Meta.reduce, Lean.Meta.whnf) or the compiled code via Lean.Elab.Term.evalTerm. Cheapest possible refutation strategy: no translation, no foreign solver, just decide-style brute force. Catches the embarrassing cases — off-by-one indices, swapped argument orders, identities that are false at n = 0 — in milliseconds.
Property-based testing. For non-finite or large types where exhaustive enumeration runs out fast, switch to randomised generators with shrinking: QuickCheck in the Haskell tradition, Hypothesis in Python. Strictly stronger than the bounded enumerator: the random sampler covers a much wider region of the input space than any size-k enumeration, and the shrinker minimises whatever counterexample it finds, so the user sees [1, 0] rather than a 200-element list.
SMT. For goals in a decidable theory — linear arithmetic over integers and rationals, bitvectors, uninterpreted functions, arrays, propositional logic — translate to SMT-LIB and shell out to z3 or cvc5. Parse the returned model back into Lean syntax: the user should see n = 3, xs = [0, 0, 1], not (define-fun n () Int 3) (define-fun xs () (Array Int Int) ...). The translator is the hard part; once it is honest, the solver does the work.
Nonlinear arithmetic. Polynomial inequalities and equalities over the reals fall outside the linear-arithmetic fragment but are still decidable via real-closed-field decision procedures. Z3 NLSAT is the most accessible option; MathSAT is a strong alternative; QEPCAD implements cylindrical algebraic decomposition for the harder cases; RedLog handles quantifier elimination over real-closed and discretely-valued fields.
Interval numerics. Analytic inequalities (claims about sin, exp, log, definite integrals) are out of reach of every symbolic procedure but trivially refutable by evaluating at sampled points using rigorous interval arithmetic. Arb and FLINT provide ball arithmetic with certified error bounds; mpmath is the Python equivalent. When the interval evaluation excludes the claimed bound, you have a sound refutation, not a heuristic one.
First-order ATPs. When the goal is logical but does not fit any decidable fragment — typed first-order logic with quantifier alternation, equality, axiom schemata — hand it to a saturation-based prover: Vampire, E, or iProver. They speak a common input format (TPTP), so one translator targets all three. Useful as a "Sledgehammer-lite" oracle for goals SMT cannot encode.
Computer algebra. Closed-form claims about sums, integrals, identities, special functions, and ODE solutions are the bread and butter of a CAS: SageMath, Mathematica, or Maple. SageMath is the natural unified front-end — under one Python interface it bundles GAP (finite groups, characters, permutation actions), Singular (commutative algebra, Gröbner bases), Macaulay2 (algebraic geometry), and PARI/GP (number theory, elliptic curves, modular forms). One adapter, most of the algebra-system territory.
Graphs. Brute enumeration of SimpleGraph (Fin n) is grossly wasteful — the number of graphs grows like 2^(n choose 2), but the number of isomorphism classes grows much more slowly. Nauty's geng produces one representative per isomorphism class up to n ≤ 10 exhaustively; genrang samples uniformly at larger n. Same machinery covers SimpleGraph V, Multigraph, and Digraph with appropriate flags.
Constraint solvers. For combinatorial existence statements that are not naturally graph-shaped — Latin squares, combinatorial designs, schedulings, set systems — encode the problem in a high-level constraint language and let a solver enumerate or refute models. MiniZinc is the standard front-end; Google's OR-Tools ships CP-SAT and is extremely fast on industrial problem sizes.
Database lookup. Some counterexamples already have names, and the cheapest possible "search" is a query against a curated list. π-Base catalogues topological spaces and their properties; House of Graphs catalogues interesting graphs; GAP's SmallGroups library enumerates every finite group up to order 2000; OEIS catalogues integer sequences. When this tier hits, it returns a named counterexample (the Sorgenfrey line, the Petersen graph, the Klein four-group) — vastly more useful to a human reader than a raw model.

Cost profiles differ wildly; don't treat backends as interchangeable. Fuzzing is shallow and free, SMT is a heavy hammer that succeeds or fails quickly, Nauty is the right answer for one shape of question and useless for everything else.

The IR

structure SanityProblem where
  binders    : Array (Name × Expr)        -- ∀-bound variables and their types
  hypotheses : Array Expr                 -- premises (also evaluated)
  goal       : Expr                       -- conclusion to refute
  bounds     : Bounds                     -- per-type size limits
  budget     : Budget                     -- time / memory caps

Bounds is where user-facing knobs live: bound := 50, graphSize := 8, smtTimeout := 2s. Defaults should be aggressive enough that by sanity returns in under a second on a typical goal.

Datatypes, in priority order

Bool, Fin n, Nat (bounded). Smoke test.
Prop / decidable propositions. Reuse Decidable.
Int, Rat, linear arithmetic. First SMT.
List α, Array α, Vector α n.
SimpleGraph V. First nauty integration.
BitVec n. SMT bitvector logic.
User-defined inductives. Derive SmallEnum α from constructors.
Functions α → β with finite α. Tabulate.

Skip for v1: real numbers, quotient types, anything classical.

Backend registration

class SanityBackend (β : Type) where
  name      : String
  applies   : SanityProblem → MetaM Bool
  run       : SanityProblem → MetaM SanityResult

Backends register through an attribute-driven table; see Lean.Meta.registerSimp* for the pattern. The frontend asks each registered backend applies?, runs the cheap ones first, and stops on the first definitive answer. This is the seam that lets a user drop in a backend for their own domain without forking the tactic.

References

[deMoura&Ullrich21]	L. de Moura and S. Ullrich. The Lean 4 Theorem Prover and Programming Language. In: A. Platzer, G. Sutcliffe (eds.), Automated Deduction – CADE 28, Lecture Notes in Computer Science, vol. 12699, pp. 625–635. Springer, Cham, 2021.
[Ayers+22]	E. Ayers, W. Nawrocki, J. Limperg, H. Böving, et al. (eds.). Metaprogramming in Lean 4. Online community textbook, Lean Prover Community, 2022–present. https://leanprover-community.github.io/lean4-metaprogramming-book. Accessed April 2026.
[Ullrich23]	S. Ullrich. An Extensible Theorem Proving Frontend: Designing the Lean Elaborator. Doctoral dissertation, Karlsruher Institut für Technologie (KIT), Department of Informatics, Karlsruhe, Germany, 2023.
[mathlib20]	The mathlib Community. The Lean Mathematical Library. In: J. Blanchette, C. Hriţcu (eds.), Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs (CPP 2020), New Orleans, LA, USA, January 20–21, 2020, pp. 367–381. Association for Computing Machinery, New York, NY, USA, 2020. ISBN 978-1-4503-7097-4.
[lean-smt]	UFMG SMITE Lab and contributors. lean-smt: SMT-LIB Integration for Lean 4. Open-source software repository, Universidade Federal de Minas Gerais, 2022–present. https://github.com/ufmg-smite/lean-smt. Accessed April 2026.
[deMoura&Bjørner08]	L. de Moura and N. Bjørner. Z3: An Efficient SMT Solver. In: C.R. Ramakrishnan, J. Rehof (eds.), Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2008), Lecture Notes in Computer Science, vol. 4963, pp. 337–340. Springer, Berlin, Heidelberg, 2008.
[Barbosa+22]	H. Barbosa, C. Barrett, M. Brain, G. Kremer, H. Lachnitt, M. Mann, A. Mohamed, M. Mohamed, A. Niemetz, A. Nötzli, A. Ozdemir, M. Preiner, A. Reynolds, Y. Sheng, C. Tinelli, Y. Zohar. cvc5: A Versatile and Industrial-Strength SMT Solver. In: D. Fisman, G. Rosu (eds.), Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2022), Lecture Notes in Computer Science, vol. 13243, pp. 415–442. Springer, Cham, 2022.
[Jovanović&deMoura12]	D. Jovanović and L. de Moura. Solving Non-linear Arithmetic. In: B. Gramlich, D. Miller, U. Sattler (eds.), Automated Reasoning (IJCAR 2012), Lecture Notes in Computer Science, vol. 7364, pp. 339–354. Springer, Berlin, Heidelberg, 2012.
[Cimatti+13]	A. Cimatti, A. Griggio, B.J. Schaafsma, R. Sebastiani. The MathSAT5 SMT Solver. In: N. Piterman, S.A. Smolka (eds.), Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2013), Lecture Notes in Computer Science, vol. 7795, pp. 93–107. Springer, Berlin, Heidelberg, 2013.
[Kovács&Voronkov13]	L. Kovács and A. Voronkov. First-Order Theorem Proving and Vampire. In: N. Sharygina, H. Veith (eds.), Computer Aided Verification (CAV 2013), Lecture Notes in Computer Science, vol. 8044, pp. 1–35. Springer, Berlin, Heidelberg, 2013.
[Schulz13]	S. Schulz. System Description: E 1.8. In: K. McMillan, A. Middeldorp, A. Voronkov (eds.), Logic for Programming, Artificial Intelligence, and Reasoning (LPAR-19), Lecture Notes in Computer Science, vol. 8312, pp. 735–743. Springer, Berlin, Heidelberg, 2013.
[Sutcliffe17]	G. Sutcliffe. The TPTP Problem Library and Associated Infrastructure: From CNF to TH0, TPTP v6.4.0. Journal of Automated Reasoning, 59(4):483–502, 2017. Springer Netherlands, Dordrecht. ISSN 0168-7433.
[Claessen&Hughes00]	K. Claessen and J. Hughes. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In: Proceedings of the 5th ACM SIGPLAN International Conference on Functional Programming (ICFP '00), Montreal, Canada, September 18–20, 2000, ACM SIGPLAN Notices, 35(9):268–279. Association for Computing Machinery, New York, NY, USA, 2000. ISBN 1-58113-202-6.
[MacIver+19]	D.R. MacIver, Z. Hatfield-Dodds, and many contributors. Hypothesis: A New Approach to Property-Based Testing. Journal of Open Source Software, 4(43):1891, 2019. The Open Journal. ISSN 2475-9066.
[McKay&Piperno14]	B.D. McKay and A. Piperno. Practical Graph Isomorphism, II. Journal of Symbolic Computation, 60:94–112, 2014. Elsevier (Academic Press), London. ISSN 0747-7171.
[Brinkmann+13]	G. Brinkmann, K. Coolsaet, J. Goedgebeur, H. Mélot. House of Graphs: A Database of Interesting Graphs. Discrete Applied Mathematics, 161(1–2):311–314, 2013. Elsevier (North-Holland), Amsterdam. ISSN 0166-218X.
[piBase]	J. Dabbs and contributors. π-Base: A Community Database of Topological Spaces. Online database, 2014–present. https://topology.pi-base.org. Accessed April 2026.
[OEIS]	N.J.A. Sloane (founder and editor-in-chief). The On-Line Encyclopedia of Integer Sequences® (OEIS®). Published electronically by the OEIS Foundation Inc., Highland Park, NJ, USA, 1964–present. https://oeis.org. Accessed April 2026.
[Sage]	The Sage Developers. SageMath, the Sage Mathematics Software System (Version 10.4). Manual and software distribution, SageMath project, 2024. https://www.sagemath.org. Distributed under the GNU General Public License v3.
[GAP]	The GAP Group. GAP — Groups, Algorithms, and Programming, Version 4.13.0. Manual and software distribution, 2024. https://www.gap-system.org. Distributed under the GNU General Public License v2.
[Singular]	W. Decker, G.-M. Greuel, G. Pfister, H. Schönemann. Singular 4-3-2 — A Computer Algebra System for Polynomial Computations. Manual and software distribution, University of Kaiserslautern (Rheinland-Pfälzische Technische Universität Kaiserslautern–Landau), Department of Mathematics, Kaiserslautern, Germany, 2023. https://www.singular.uni-kl.de.
[PARI/GP]	The PARI Group. PARI/GP, Version 2.15.5. Manual and software distribution, Université de Bordeaux, Institut de Mathématiques de Bordeaux, Bordeaux, France, 2024. https://pari.math.u-bordeaux.fr.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sanity

Idea

Feel

Backends

The IR

Datatypes, in priority order

Backend registration

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Sanity

Idea

Feel

Backends

The IR

Datatypes, in priority order

Backend registration

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages