From 7053f3ba55ddb7042e70f19b4a41206b8d54c882 Mon Sep 17 00:00:00 2001 From: Samuel Schlesinger Date: Tue, 14 Apr 2026 19:17:51 -0400 Subject: [PATCH 1/3] feat(MachineLearning/PACLearning): definitions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Define the PAC learning model [Valiant1984] generalized to an arbitrary label type `β` and parameterized by a distribution family `𝒟` on `α × β`. The unified definition `IsPACLearnerFor` captures the realizable, agnostic [Haussler1992], and noise-tolerant [AngluinLaird1988] settings by varying `𝒟`. Accuracy and confidence parameters `ε, δ` use the subtype `Set.Ioo (0 : ℝ≥0) 1` to ensure non-vacuity. Main contributions: - Core: `error`, `optimalError`, `IsPACLearnerFor`, `IsRPACLearnerFor` - Learnability: `IsPACLearnable`, `IsRPACLearnable` - Sample complexity: `sampleComplexity`, `rsampleComplexity` - API: `toIsRPACLearnerFor`, `mono`, `IsPACLearnable.toIsRPACLearnable` - Binary classification: `hypothesisError`, `hypothesisError_eq_add`, `error_map_eq_hypothesisError` --- Cslib.lean | 1 + Cslib/MachineLearning/PACLearning/Defs.lean | 300 ++++++++++++++++++++ references.bib | 60 ++++ 3 files changed, 361 insertions(+) create mode 100644 Cslib/MachineLearning/PACLearning/Defs.lean diff --git a/Cslib.lean b/Cslib.lean index 3e5977af2..43d44b938 100644 --- a/Cslib.lean +++ b/Cslib.lean @@ -120,3 +120,4 @@ public import Cslib.Logics.LinearLogic.CLL.CutElimination public import Cslib.Logics.LinearLogic.CLL.EtaExpansion public import Cslib.Logics.LinearLogic.CLL.PhaseSemantics.Basic public import Cslib.Logics.Propositional.Defs +public import Cslib.MachineLearning.PACLearning.Defs diff --git a/Cslib/MachineLearning/PACLearning/Defs.lean b/Cslib/MachineLearning/PACLearning/Defs.lean new file mode 100644 index 000000000..28a1eef87 --- /dev/null +++ b/Cslib/MachineLearning/PACLearning/Defs.lean @@ -0,0 +1,300 @@ +/- +Copyright (c) 2026 Samuel Schlesinger. All rights reserved. +Released under Apache 2.0 license as described in the file LICENSE. +Authors: Samuel Schlesinger +-/ + +module + +public import Cslib.Init +public import Mathlib.MeasureTheory.Measure.MeasureSpace +public import Mathlib.MeasureTheory.Constructions.Pi +public import Mathlib.Order.SymmDiff + +@[expose] public section + +/-! # PAC Learning + +This file defines the Probably Approximately Correct (PAC) learning model +introduced by Valiant [Valiant1984], generalized to an arbitrary label type `β` +and parameterized by a family of distributions `𝒟` on `α × β`. + +A concept class `C` over domain `α` with labels in `β` is a collection of +functions `α → β`. A learning algorithm receives a labeled sample drawn i.i.d. +from an unknown joint distribution `D` on `α × β` and must produce a hypothesis +whose 0-1 error is within `ε` of the best concept in `C`, with probability at +least `1 - δ`. + +The single definition `IsPACLearnerFor` captures the realizable, agnostic, and +noise-tolerant settings by varying the distribution family `𝒟`: + +- **Agnostic** [Haussler1992]: `𝒟 = Set.univ` — the learner must work for all distributions. +- **Realizable**: `𝒟` consists of pushforwards of arbitrary probability measures + `P` on `α` along the graph `x ↦ (x, c x)` of some concept `c ∈ C`, so that + `optimalError D C = 0`. +- **Noise-tolerant** [AngluinLaird1988]: `𝒟` consists of noisy versions of realizable + distributions, where each label is corrupted independently with some probability `η`. + +The accuracy and confidence parameters `ε` and `δ` are elements of the subtype +`Set.Ioo (0 : ℝ≥0) 1`, which bundles the value together with the proof that it +lies in the open interval `(0, 1)`, ensuring the learning condition is non-vacuous. + +## Main definitions + +- `ConceptClass`: a set of functions `α → β` (classifiers). +- `LabeledSample`: a finite sequence of `(point, label)` pairs. +- `Learner`: a function from labeled samples to hypotheses. +- `error`: the 0-1 error of a hypothesis under a joint distribution. +- `optimalError`: the infimum of `error` over a concept class. +- `IsPACLearnerFor`: deterministic `(ε, δ)`-PAC learner over a distribution family. +- `IsRPACLearnerFor`: randomized variant of `IsPACLearnerFor`. +- `IsPACLearnable`: a concept class is PAC learnable if `IsPACLearnerFor` holds for + all `ε, δ : Set.Ioo (0 : ℝ≥0) 1` with some sample size `m`. +- `IsRPACLearnable`: randomized variant of `IsPACLearnable`. +- `sampleComplexity`, `rsampleComplexity`: deterministic/randomized sample complexity. + +## Binary classification + +When `β = Bool`, concepts correspond to subsets of `α`. The section +*Binary Classification* provides: + +- `hypothesisError`: the symmetric-difference error `P(h ∆ c)`. +- `falsePositiveError`, `falseNegativeError`: its decomposition. +- `hypothesisError_eq_add`: the decomposition theorem. +- `error_map_eq_hypothesisError`: bridge between the general `error` and + the binary `hypothesisError` under a realizable distribution. + +## Main statements + +- `IsPACLearnerFor.toIsRPACLearnerFor`: every deterministic PAC learner is a + randomized one (via the trivial randomness space `PUnit`). +- `IsPACLearnerFor.mono`: monotonicity in the distribution family `𝒟` — a learner + that works for `𝒟'` also works for any `𝒟 ⊆ 𝒟'`. +- `IsPACLearnable.toIsRPACLearnable`: deterministic learnability implies randomized. +- `hypothesisError_eq_add`: total error = false positive + false negative. + +## References + +* [L. G. Valiant, *A Theory of the Learnable*][Valiant1984] +* [A. Ehrenfeucht, D. Haussler, M. Kearns, L. Valiant, + *A General Lower Bound on the Number of Examples Needed for Learning*][EHKV1989] +* [M. J. Kearns, U. V. Vazirani, + *An Introduction to Computational Learning Theory*][KearnsVazirani1994] +* [D. Haussler, *Decision Theoretic Generalizations of the PAC Model for Neural Net + and Other Learning Applications*][Haussler1992] +* [D. Angluin, P. Laird, *Learning from Noisy Examples*][AngluinLaird1988] +-/ + +open MeasureTheory Set +open scoped ENNReal NNReal + +namespace Cslib.MachineLearning + +/-! ### Core Definitions -/ + +/-- A *concept class* over domain `α` with label type `β` is a set of functions `α → β`. +For binary classification (`β = Bool`), this is equivalent to a collection of subsets of `α` +via the characteristic function. -/ +abbrev ConceptClass (α β : Type*) := Set (α → β) + +/-- A *labeled sample* of size `m` over domain `α` with label type `β` is a finite sequence +of `(point, label)` pairs. -/ +abbrev LabeledSample (α β : Type*) (m : ℕ) := Fin m → (α × β) + +/-- A *learner* using `m` samples is a function that takes a labeled sample and produces +a hypothesis (a function from the domain to the label type). -/ +abbrev Learner (α β : Type*) (m : ℕ) := LabeledSample α β m → (α → β) + +/-- The *prediction error* (0-1 loss) of a hypothesis `h` under a joint distribution `D` +on `α × β`, defined as the probability that the prediction disagrees with the label: +`D({(x, y) | h(x) ≠ y})`. -/ +noncomputable def error {α β : Type*} [MeasurableSpace α] [MeasurableSpace β] + (D : Measure (α × β)) (h : α → β) : ℝ≥0∞ := + D {p : α × β | h p.1 ≠ p.2} + +/-- The *optimal error* of a concept class `C` under a joint distribution `D`, defined as the +infimum of `error D c` over all concepts `c ∈ C`. When `C` is empty this is `⊤`, making the +PAC learning condition vacuously true. -/ +noncomputable def optimalError {α β : Type*} [MeasurableSpace α] [MeasurableSpace β] + (D : Measure (α × β)) (C : ConceptClass α β) : ℝ≥0∞ := + ⨅ c ∈ C, error D c + +variable {α : Type*} {β : Type*} [MeasurableSpace α] [MeasurableSpace β] + +/-! ### PAC Learners -/ + +/-- `IsPACLearnerFor m ε δ C 𝒟` asserts that there exists a learner using `m` samples +that is `(ε, δ)`-correct for the concept class `C` over the distribution family `𝒟`: for every +probability measure `D ∈ 𝒟` on `α × β`, the probability (over i.i.d. samples from `D`) that +the learner's hypothesis has error exceeding `opt_C(D) + ε` is at most `δ`. + +The parameters `ε` and `δ` are elements of `Set.Ioo (0 : ℝ≥0) 1`, bundling the value with +the proof that it lies in `(0, 1)`. This ensures the condition is non-vacuous: +`ε < 1` prevents the error threshold from exceeding the maximum possible error under a +probability measure, and `δ < 1` prevents the confidence bound from being trivially +satisfied. -/ +def IsPACLearnerFor (m : ℕ) (ε δ : Set.Ioo (0 : ℝ≥0) 1) + (C : ConceptClass α β) (𝒟 : Set (Measure (α × β))) : Prop := + ∃ A : Learner α β m, + ∀ (D : Measure (α × β)) [IsProbabilityMeasure D], D ∈ 𝒟 → + (Measure.pi (fun _ : Fin m => D)) + {S : LabeledSample α β m | + error D (A S) > optimalError D C + ↑ε.val} ≤ ↑δ.val + +/-- `IsRPACLearnerFor m ε δ C 𝒟` asserts that there exists a *randomized* learner using +`m` samples that is `(ε, δ)`-correct for the concept class `C` over the distribution family +`𝒟`. A randomized learner draws internal randomness `ω` from a probability space `(Ω, Q)` and +acts as the deterministic learner `A(ω)`. + +For every probability measure `D ∈ 𝒟`, the failure probability function +`ω ↦ D^m{S | error(A(ω)(S)) > opt_C(D) + ε}` must be `Q`-a.e. measurable, and its +expectation over `ω` must be at most `δ`. + +The parameters `ε` and `δ` are elements of `Set.Ioo (0 : ℝ≥0) 1`. + +A deterministic learner (`IsPACLearnerFor`) is the special case `Ω = PUnit`; +see `IsPACLearnerFor.toIsRPACLearnerFor`. -/ +def IsRPACLearnerFor (m : ℕ) (ε δ : Set.Ioo (0 : ℝ≥0) 1) + (C : ConceptClass α β) (𝒟 : Set (Measure (α × β))) : Prop := + ∃ (Ω : Type*) (_ : MeasurableSpace Ω) (Q : Measure Ω) (_ : IsProbabilityMeasure Q) + (A : Ω → Learner α β m), + ∀ (D : Measure (α × β)) [IsProbabilityMeasure D], D ∈ 𝒟 → + AEMeasurable (fun ω => (Measure.pi (fun _ : Fin m => D)) + {S : LabeledSample α β m | + error D ((A ω) S) > optimalError D C + ↑ε.val}) Q ∧ + ∫⁻ ω, (Measure.pi (fun _ : Fin m => D)) + {S : LabeledSample α β m | + error D ((A ω) S) > optimalError D C + ↑ε.val} ∂Q ≤ ↑δ.val + +/-- Every deterministic PAC learner is in particular a randomized PAC learner +(with the trivial one-point randomness space `PUnit`). -/ +theorem IsPACLearnerFor.toIsRPACLearnerFor {m : ℕ} {ε δ : Set.Ioo (0 : ℝ≥0) 1} + {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))} + (h : IsPACLearnerFor m ε δ C 𝒟) : + IsRPACLearnerFor m ε δ C 𝒟 := by + obtain ⟨A, hA⟩ := h + refine ⟨PUnit, inferInstance, Measure.dirac PUnit.unit, inferInstance, fun _ => A, ?_⟩ + intro D _ hD + exact ⟨measurable_const.aemeasurable, by + simp only [gt_iff_lt, lintegral_const, measure_univ, mul_one]; exact hA D hD⟩ + +/-- A PAC learner for a larger distribution family `𝒟'` is also a PAC learner for any +subfamily `𝒟 ⊆ 𝒟'`. -/ +theorem IsPACLearnerFor.mono {m : ℕ} {ε δ : Set.Ioo (0 : ℝ≥0) 1} + {C : ConceptClass α β} {𝒟 𝒟' : Set (Measure (α × β))} + (h𝒟 : 𝒟 ⊆ 𝒟') (h : IsPACLearnerFor m ε δ C 𝒟') : + IsPACLearnerFor m ε δ C 𝒟 := by + obtain ⟨A, hA⟩ := h + exact ⟨A, fun D inst hD => @hA D inst (h𝒟 hD)⟩ + +/-! ### PAC Learnability -/ + +/-- A concept class `C` is *PAC learnable* over the distribution family `𝒟` if for every +accuracy `ε ∈ (0, 1)` and confidence `δ ∈ (0, 1)`, there exists a sample size `m` admitting +a deterministic `(ε, δ)`-PAC learner for `C`. Here `ε` and `δ` are elements of the subtype +`Set.Ioo (0 : ℝ≥0) 1`. -/ +def IsPACLearnable (C : ConceptClass α β) (𝒟 : Set (Measure (α × β))) : Prop := + ∀ (ε δ : Set.Ioo (0 : ℝ≥0) 1), + ∃ m, IsPACLearnerFor m ε δ C 𝒟 + +/-- A concept class `C` is *randomized PAC learnable* over the distribution family `𝒟` if for +every accuracy `ε ∈ (0, 1)` and confidence `δ ∈ (0, 1)`, there exists a sample size `m` +admitting a randomized `(ε, δ)`-PAC learner for `C`. Here `ε` and `δ` are elements of the +subtype `Set.Ioo (0 : ℝ≥0) 1`. -/ +def IsRPACLearnable (C : ConceptClass α β) (𝒟 : Set (Measure (α × β))) : Prop := + ∀ (ε δ : Set.Ioo (0 : ℝ≥0) 1), + ∃ m, IsRPACLearnerFor.{_, _, 0} m ε δ C 𝒟 + +/-- Deterministic PAC learnability implies randomized PAC learnability. -/ +theorem IsPACLearnable.toIsRPACLearnable {C : ConceptClass α β} + {𝒟 : Set (Measure (α × β))} (h : IsPACLearnable C 𝒟) : + IsRPACLearnable C 𝒟 := by + intro ε δ + obtain ⟨m, hm⟩ := h ε δ + exact ⟨m, hm.toIsRPACLearnerFor⟩ + +/-! ### Sample Complexity -/ + +/-- The *deterministic sample complexity* of a concept class `C` at accuracy `ε ∈ (0, 1)` and +confidence `δ ∈ (0, 1)` over distribution family `𝒟` is the smallest sample size `m` admitting +a deterministic `(ε, δ)`-PAC learner for `C`. Here `ε` and `δ` are elements of the subtype +`Set.Ioo (0 : ℝ≥0) 1`. + +**Caveat**: because `sInf` on `ℕ` returns `0` for the empty set, this definition returns `0` +when no deterministic learner exists (e.g., when `C` has infinite VC dimension). It is only +meaningful when the defining set `{m | IsPACLearnerFor m ε δ C 𝒟}` is nonempty. -/ +noncomputable def sampleComplexity (C : ConceptClass α β) (ε δ : Set.Ioo (0 : ℝ≥0) 1) + (𝒟 : Set (Measure (α × β))) : ℕ := + sInf {m : ℕ | IsPACLearnerFor m ε δ C 𝒟} + +/-- The *randomized sample complexity* of a concept class `C` at accuracy `ε ∈ (0, 1)` and +confidence `δ ∈ (0, 1)` over distribution family `𝒟` is the smallest sample size `m` admitting +a randomized `(ε, δ)`-PAC learner for `C`. Here `ε` and `δ` are elements of the subtype +`Set.Ioo (0 : ℝ≥0) 1`. + +The universe of the randomness space `Ω` is pinned to `Type 0` (via `.{_, _, 0}`) so that the +`sInf` is taken over a definite set; without the pin the existential quantifier over `Ω : Type*` +would range over all universe levels, making the set ill-defined. + +**Caveat**: because `sInf` on `ℕ` returns `0` for the empty set, this definition returns `0` +when no randomized learner exists. It is only meaningful when the defining set is nonempty. -/ +noncomputable def rsampleComplexity (C : ConceptClass α β) (ε δ : Set.Ioo (0 : ℝ≥0) 1) + (𝒟 : Set (Measure (α × β))) : ℕ := + sInf {m : ℕ | IsRPACLearnerFor.{_, _, 0} m ε δ C 𝒟} + +/-! ### Binary Classification + +When `β = Bool`, concepts correspond to subsets of `α` via the characteristic function. +The symmetric-difference error `P(h ∆ c)` is the natural error metric, and it decomposes +into false positive and false negative components. + +The bridge lemma `error_map_eq_hypothesisError` connects the general `error` on `α × Bool` +to the binary `hypothesisError` on `α`, showing they coincide for realizable distributions. -/ + +/-- The *symmetric-difference error* of a hypothesis `h` with respect to a target concept `c` +(both viewed as subsets of `α`) under distribution `P`, defined as `P(h ∆ c)`. -/ +noncomputable def hypothesisError {α : Type*} [MeasurableSpace α] (P : Measure α) + (h c : Set α) : ℝ≥0∞ := + P (symmDiff h c) + +/-- The *false positive error* `P(h \ c)` — points classified positive but not in the +concept. -/ +noncomputable def falsePositiveError {α : Type*} [MeasurableSpace α] (P : Measure α) + (h c : Set α) : ℝ≥0∞ := + P (h \ c) + +/-- The *false negative error* `P(c \ h)` — points in the concept but classified negative. -/ +noncomputable def falseNegativeError {α : Type*} [MeasurableSpace α] (P : Measure α) + (h c : Set α) : ℝ≥0∞ := + P (c \ h) + +/-- The total hypothesis error decomposes as the sum of false positive and false negative +errors, since `h ∆ c = (h \ c) ∪ (c \ h)` is a disjoint union. -/ +theorem hypothesisError_eq_add {α : Type*} [MeasurableSpace α] {P : Measure α} + {h c : Set α} (hh : MeasurableSet h) (hc : MeasurableSet c) : + hypothesisError P h c = falsePositiveError P h c + falseNegativeError P h c := by + simp only [hypothesisError, falsePositiveError, falseNegativeError, symmDiff_def, sup_eq_union] + exact measure_union disjoint_sdiff_sdiff (hc.diff hh) + +open Classical in +/-- Under a realizable distribution `P.map (x ↦ (x, c(x)))`, the general 0-1 `error` +coincides with the binary `hypothesisError P h c`, where `h` and `c` are viewed as subsets +of `α` via the characteristic function `decide (· ∈ ·)`. -/ +theorem error_map_eq_hypothesisError {α : Type*} [MeasurableSpace α] (P : Measure α) + (h c : Set α) (hh : MeasurableSet h) (hc : MeasurableSet c) : + error (P.map (fun x => (x, decide (x ∈ c)))) (fun x => decide (x ∈ h)) = + hypothesisError P h c := by + simp only [error, hypothesisError] + have hf : Measurable (fun x => (x, decide (x ∈ c))) := + Measurable.prodMk measurable_id + (measurable_to_bool (by convert hc using 1; ext x; simp [decide_eq_true_eq])) + rw [Measure.map_apply_of_aemeasurable hf.aemeasurable] + · congr 1; ext x + simp only [Set.mem_preimage, Set.mem_setOf_eq, symmDiff_def, sup_eq_union, + Set.mem_union, Set.mem_diff] + by_cases hx : x ∈ h <;> by_cases hcx : x ∈ c <;> simp_all + · convert (hh.prod (measurableSet_singleton false)).union + (hh.compl.prod (measurableSet_singleton true)) using 1 + ext ⟨x, b⟩; cases b <;> simp + +end Cslib.MachineLearning diff --git a/references.bib b/references.bib index 2cccb928f..3e8c26190 100644 --- a/references.bib +++ b/references.bib @@ -19,6 +19,17 @@ @inproceedings{Aceto1999 bibsource = {dblp computer science bibliography, https://dblp.org} } +@article{AngluinLaird1988, + author = {Angluin, Dana and Laird, Philip}, + title = {Learning from Noisy Examples}, + journal = {Machine Learning}, + volume = {2}, + number = {4}, + pages = {343--370}, + year = {1988}, + doi = {10.1007/BF00116829} +} + @book{Baader1998, author = {Baader, Franz and Nipkow, Tobias}, title = {Term rewriting and all that}, @@ -107,6 +118,18 @@ @inbook{ Girard1995 collection={London Mathematical Society Lecture Note Series} } +@article{Haussler1992, + author = {Haussler, David}, + title = {Decision Theoretic Generalizations of the {PAC} Model for Neural Net and Other Learning Applications}, + journal = {Information and Computation}, + volume = {100}, + number = {1}, + pages = {78--150}, + year = {1992}, + issn = {0890-5401}, + doi = {10.1016/0890-5401(92)90010-D} +} + @article{ Hennessy1985, author = {Matthew Hennessy and Robin Milner}, @@ -265,6 +288,43 @@ @article{ ShepherdsonSturgis1963 address = {New York, NY, USA} } +@inproceedings{Valiant1984, + author = {Valiant, L. G.}, + title = {A Theory of the Learnable}, + year = {1984}, + isbn = {0-89791-133-4}, + publisher = {Association for Computing Machinery}, + address = {New York, NY, USA}, + url = {https://doi.org/10.1145/800057.808710}, + doi = {10.1145/800057.808710}, + booktitle = {Proceedings of the Sixteenth Annual ACM Symposium on Theory of Computing}, + pages = {436--445}, + series = {STOC '84} +} + +@article{EHKV1989, + author = {Ehrenfeucht, Andrzej and Haussler, David and Kearns, Michael and Valiant, Leslie}, + title = {A General Lower Bound on the Number of Examples Needed for Learning}, + journal = {Information and Computation}, + volume = {82}, + number = {3}, + pages = {247--261}, + year = {1989}, + issn = {0890-5401}, + url = {https://doi.org/10.1016/0890-5401(89)90002-3}, + doi = {10.1016/0890-5401(89)90002-3}, + publisher = {Academic Press} +} + +@book{KearnsVazirani1994, + author = {Kearns, Michael J. and Vazirani, Umesh V.}, + title = {An Introduction to Computational Learning Theory}, + year = {1994}, + isbn = {978-0-262-11193-5}, + publisher = {MIT Press}, + address = {Cambridge, MA, USA} +} + @incollection{WinskelNielsen1995, author = {Winskel, Glynn and Nielsen, Mogens}, isbn = {9780198537809}, From 0dfdbe53b7f0482aafaf305def04d034d6e979d1 Mon Sep 17 00:00:00 2001 From: Samuel Schlesinger Date: Fri, 17 Apr 2026 21:21:47 -0400 Subject: [PATCH 2/3] refactor(MachineLearning/PACLearning): mathlib-style polish MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Move all declarations under `Cslib.MachineLearning.PACLearning` to keep generic names like `error`, `optimalError` out of the parent namespace. - Rename predicate-level `.mono` lemmas that flip implication direction against `⊆` to `.antitone_family` / `.antitone_C` (`IsPACLearnerFor`, `IsRPACLearnerFor`, `IsPACLearnable`, `IsRPACLearnable`). Keep `.mono_δ` / `.mono_ε` on the predicates and `_mono_family` / `_mono_C` on sample complexity, where the direction actually is monotone. - Restore `Ω : Type*` on `IsRPACLearnerFor` so the randomness space is universe-polymorphic; downstream `IsRPACLearnerFor.*` theorems thread the universe via the `.{_, _, u}` pattern. `IsRPACLearnable` and `rsampleComplexity` pin `u := 0` (documented). - Wrap the `Binary` subsection in `section Binary` with a local `variable {α : Type*} [MeasurableSpace α]`. - Simplify `optimalError` monotonicity in `IsPACLearnerFor.antitone_C` via `iInf_le_iInf_of_subset`. - Add convenience variants on `IsPACLearnable` / `IsRPACLearnable` that discharge the `∃ m, IsPACLearnerFor m …` existence hypothesis from a learnability witness: `sampleComplexity_antitone_δ`, `_antitone_ε`, `_mono_family`, `_mono_C`, and the RPAC analogues. Axioms remain standard (`propext`, `Classical.choice`, `Quot.sound`). --- Cslib/MachineLearning/PACLearning/Defs.lean | 336 +++++++++++++++++--- 1 file changed, 285 insertions(+), 51 deletions(-) diff --git a/Cslib/MachineLearning/PACLearning/Defs.lean b/Cslib/MachineLearning/PACLearning/Defs.lean index 28a1eef87..81eeedf65 100644 --- a/Cslib/MachineLearning/PACLearning/Defs.lean +++ b/Cslib/MachineLearning/PACLearning/Defs.lean @@ -39,6 +39,9 @@ The accuracy and confidence parameters `ε` and `δ` are elements of the subtype `Set.Ioo (0 : ℝ≥0) 1`, which bundles the value together with the proof that it lies in the open interval `(0, 1)`, ensuring the learning condition is non-vacuous. +All declarations live under the `Cslib.MachineLearning.PACLearning` namespace so that +generic names like `error` and `optimalError` do not pollute the parent namespace. + ## Main definitions - `ConceptClass`: a set of functions `α → β` (classifiers). @@ -47,11 +50,16 @@ lies in the open interval `(0, 1)`, ensuring the learning condition is non-vacuo - `error`: the 0-1 error of a hypothesis under a joint distribution. - `optimalError`: the infimum of `error` over a concept class. - `IsPACLearnerFor`: deterministic `(ε, δ)`-PAC learner over a distribution family. -- `IsRPACLearnerFor`: randomized variant of `IsPACLearnerFor`. +- `IsRPACLearnerFor`: randomized variant of `IsPACLearnerFor`. Universe-polymorphic in the + randomness space `Ω : Type*`. - `IsPACLearnable`: a concept class is PAC learnable if `IsPACLearnerFor` holds for all `ε, δ : Set.Ioo (0 : ℝ≥0) 1` with some sample size `m`. -- `IsRPACLearnable`: randomized variant of `IsPACLearnable`. -- `sampleComplexity`, `rsampleComplexity`: deterministic/randomized sample complexity. +- `IsRPACLearnable`: randomized variant of `IsPACLearnable`. Pins the randomness space to + `Type 0`; `IsRPACLearnerFor` itself remains universe-polymorphic for users who need it. +- `LearnerModel`: the common predicate shape `ℕ → ε → δ → C → 𝒟 → Prop` abstracting both + the deterministic and randomized learners so sample-complexity lemmas can be shared. +- `sampleComplexity`: sample complexity of a generic learner model. +- `rsampleComplexity`: randomized sample complexity, i.e. `sampleComplexity IsRPACLearnerFor`. ## Binary classification @@ -68,9 +76,24 @@ When `β = Bool`, concepts correspond to subsets of `α`. The section - `IsPACLearnerFor.toIsRPACLearnerFor`: every deterministic PAC learner is a randomized one (via the trivial randomness space `PUnit`). -- `IsPACLearnerFor.mono`: monotonicity in the distribution family `𝒟` — a learner - that works for `𝒟'` also works for any `𝒟 ⊆ 𝒟'`. +- `IsPACLearnerFor.antitone_family`, `.antitone_C`: the deterministic PAC learner + predicate is antitone in the distribution family and concept class. +- `IsPACLearnerFor.mono_δ`, `.mono_ε`: the predicate is monotone in the confidence and + accuracy parameters (a weaker bound still holds). +- `IsRPACLearnerFor.antitone_family`, `.mono_δ`: analogues for the randomized predicate. + (`mono_ε` and `antitone_C` are not provided because they change the integrand and + would require an extra measurability assumption.) - `IsPACLearnable.toIsRPACLearnable`: deterministic learnability implies randomized. +- `IsPACLearnable.antitone_family`, `.antitone_C`, `IsRPACLearnable.antitone_family`: + PAC learnability is antitone in the distribution family and concept class. +- `sampleComplexity_antitone_δ`, `_antitone_ε`, `_mono_family`, `_mono_C`: variation of + deterministic sample complexity in confidence, accuracy, distribution family, and concept + class (antitone in the numeric parameters, monotone under `⊆` in the set parameters). The + randomized analogues `rsampleComplexity_antitone_δ` and `_mono_family` are provided. +- `IsPACLearnable.sampleComplexity_*`, `IsRPACLearnable.rsampleComplexity_*`: the same + monotonicity facts phrased with a learnability hypothesis in place of the ad-hoc + `∃ m, IsPACLearnerFor m …` existence witness, so callers who already know the class is + learnable need not thread it through. - `hypothesisError_eq_add`: total error = false positive + false negative. ## References @@ -88,7 +111,7 @@ When `β = Bool`, concepts correspond to subsets of `α`. The section open MeasureTheory Set open scoped ENNReal NNReal -namespace Cslib.MachineLearning +namespace Cslib.MachineLearning.PACLearning /-! ### Core Definitions -/ @@ -105,22 +128,21 @@ abbrev LabeledSample (α β : Type*) (m : ℕ) := Fin m → (α × β) a hypothesis (a function from the domain to the label type). -/ abbrev Learner (α β : Type*) (m : ℕ) := LabeledSample α β m → (α → β) +section +variable {α : Type*} {β : Type*} [MeasurableSpace α] [MeasurableSpace β] + /-- The *prediction error* (0-1 loss) of a hypothesis `h` under a joint distribution `D` on `α × β`, defined as the probability that the prediction disagrees with the label: `D({(x, y) | h(x) ≠ y})`. -/ -noncomputable def error {α β : Type*} [MeasurableSpace α] [MeasurableSpace β] - (D : Measure (α × β)) (h : α → β) : ℝ≥0∞ := +noncomputable def error (D : Measure (α × β)) (h : α → β) : ℝ≥0∞ := D {p : α × β | h p.1 ≠ p.2} /-- The *optimal error* of a concept class `C` under a joint distribution `D`, defined as the infimum of `error D c` over all concepts `c ∈ C`. When `C` is empty this is `⊤`, making the PAC learning condition vacuously true. -/ -noncomputable def optimalError {α β : Type*} [MeasurableSpace α] [MeasurableSpace β] - (D : Measure (α × β)) (C : ConceptClass α β) : ℝ≥0∞ := +noncomputable def optimalError (D : Measure (α × β)) (C : ConceptClass α β) : ℝ≥0∞ := ⨅ c ∈ C, error D c -variable {α : Type*} {β : Type*} [MeasurableSpace α] [MeasurableSpace β] - /-! ### PAC Learners -/ /-- `IsPACLearnerFor m ε δ C 𝒟` asserts that there exists a learner using `m` samples @@ -150,7 +172,9 @@ For every probability measure `D ∈ 𝒟`, the failure probability function `ω ↦ D^m{S | error(A(ω)(S)) > opt_C(D) + ε}` must be `Q`-a.e. measurable, and its expectation over `ω` must be at most `δ`. -The parameters `ε` and `δ` are elements of `Set.Ioo (0 : ℝ≥0) 1`. +The randomness space `Ω : Type*` is universe-polymorphic; the universe is an implicit +parameter of `IsRPACLearnerFor`, and downstream statements reference it via the pattern +`IsRPACLearnerFor.{_, _, u}`. Fix `u := 0` for the usual case of a standard randomness space. A deterministic learner (`IsPACLearnerFor`) is the special case `Ω = PUnit`; see `IsPACLearnerFor.toIsRPACLearnerFor`. -/ @@ -175,18 +199,83 @@ theorem IsPACLearnerFor.toIsRPACLearnerFor {m : ℕ} {ε δ : Set.Ioo (0 : ℝ obtain ⟨A, hA⟩ := h refine ⟨PUnit, inferInstance, Measure.dirac PUnit.unit, inferInstance, fun _ => A, ?_⟩ intro D _ hD - exact ⟨measurable_const.aemeasurable, by - simp only [gt_iff_lt, lintegral_const, measure_univ, mul_one]; exact hA D hD⟩ + refine ⟨measurable_const.aemeasurable, ?_⟩ + simp only [gt_iff_lt, lintegral_const, measure_univ, mul_one] + exact hA D hD -/-- A PAC learner for a larger distribution family `𝒟'` is also a PAC learner for any -subfamily `𝒟 ⊆ 𝒟'`. -/ -theorem IsPACLearnerFor.mono {m : ℕ} {ε δ : Set.Ioo (0 : ℝ≥0) 1} +/-- The deterministic PAC learner predicate is antitone in the distribution family: a +learner for a larger family `𝒟'` is also a learner for any subfamily `𝒟 ⊆ 𝒟'`. -/ +theorem IsPACLearnerFor.antitone_family {m : ℕ} {ε δ : Set.Ioo (0 : ℝ≥0) 1} {C : ConceptClass α β} {𝒟 𝒟' : Set (Measure (α × β))} (h𝒟 : 𝒟 ⊆ 𝒟') (h : IsPACLearnerFor m ε δ C 𝒟') : IsPACLearnerFor m ε δ C 𝒟 := by obtain ⟨A, hA⟩ := h exact ⟨A, fun D inst hD => @hA D inst (h𝒟 hD)⟩ +/-- A PAC learner with confidence `δ₁` is also a PAC learner with any weaker confidence +`δ₂ ≥ δ₁`: the failure-probability bound only gets looser. -/ +theorem IsPACLearnerFor.mono_δ {m : ℕ} {ε : Set.Ioo (0 : ℝ≥0) 1} + {δ₁ δ₂ : Set.Ioo (0 : ℝ≥0) 1} (hδ : δ₁.val ≤ δ₂.val) + {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))} + (h : IsPACLearnerFor m ε δ₁ C 𝒟) : + IsPACLearnerFor m ε δ₂ C 𝒟 := by + obtain ⟨A, hA⟩ := h + refine ⟨A, fun D inst hD => le_trans (@hA D inst hD) ?_⟩ + exact_mod_cast hδ + +/-- A PAC learner with accuracy `ε₁` is also a PAC learner with any weaker accuracy +`ε₂ ≥ ε₁`: the bad event `{error > opt + ε}` only shrinks. -/ +theorem IsPACLearnerFor.mono_ε {m : ℕ} {δ : Set.Ioo (0 : ℝ≥0) 1} + {ε₁ ε₂ : Set.Ioo (0 : ℝ≥0) 1} (hε : ε₁.val ≤ ε₂.val) + {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))} + (h : IsPACLearnerFor m ε₁ δ C 𝒟) : + IsPACLearnerFor m ε₂ δ C 𝒟 := by + obtain ⟨A, hA⟩ := h + refine ⟨A, fun D inst hD => le_trans (measure_mono ?_) (@hA D inst hD)⟩ + intro S hS + have hε' : (↑ε₁.val : ℝ≥0∞) ≤ ↑ε₂.val := by exact_mod_cast hε + calc optimalError D C + (↑ε₁.val : ℝ≥0∞) + ≤ optimalError D C + ↑ε₂.val := by gcongr + _ < error D (A S) := hS + +/-- The deterministic PAC learner predicate is antitone in the concept class: a learner +for a larger class `C'` is also a learner for any subclass `C ⊆ C'`, since the agnostic +benchmark `optimalError _ C ≥ optimalError _ C'` makes the error requirement easier. -/ +theorem IsPACLearnerFor.antitone_C {m : ℕ} {ε δ : Set.Ioo (0 : ℝ≥0) 1} + {C C' : ConceptClass α β} (hC : C ⊆ C') + {𝒟 : Set (Measure (α × β))} (h : IsPACLearnerFor m ε δ C' 𝒟) : + IsPACLearnerFor m ε δ C 𝒟 := by + obtain ⟨A, hA⟩ := h + refine ⟨A, fun D inst hD => le_trans (measure_mono ?_) (@hA D inst hD)⟩ + intro S hS + have h_opt : optimalError D C' ≤ optimalError D C := iInf_le_iInf_of_subset hC + calc optimalError D C' + (↑ε.val : ℝ≥0∞) + ≤ optimalError D C + ↑ε.val := by gcongr + _ < error D (A S) := hS + +/-- The randomized PAC learner predicate is antitone in the distribution family. The +universe of the randomness space `Ω` is pinned so the hypothesis and conclusion share it. -/ +theorem IsRPACLearnerFor.antitone_family.{u} {m : ℕ} {ε δ : Set.Ioo (0 : ℝ≥0) 1} + {C : ConceptClass α β} {𝒟 𝒟' : Set (Measure (α × β))} + (h𝒟 : 𝒟 ⊆ 𝒟') (h : IsRPACLearnerFor.{_, _, u} m ε δ C 𝒟') : + IsRPACLearnerFor.{_, _, u} m ε δ C 𝒟 := by + obtain ⟨Ω, mΩ, Q, hQ, A, hA⟩ := h + exact ⟨Ω, mΩ, Q, hQ, A, fun D inst hD => @hA D inst (h𝒟 hD)⟩ + +/-- A randomized PAC learner with confidence `δ₁` is also a randomized PAC learner with +any weaker confidence `δ₂ ≥ δ₁`. Unlike `mono_ε` or `antitone_C`, this does not touch the +integrand, so it carries the `AEMeasurable` part through unchanged. -/ +theorem IsRPACLearnerFor.mono_δ.{u} {m : ℕ} {ε : Set.Ioo (0 : ℝ≥0) 1} + {δ₁ δ₂ : Set.Ioo (0 : ℝ≥0) 1} (hδ : δ₁.val ≤ δ₂.val) + {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))} + (h : IsRPACLearnerFor.{_, _, u} m ε δ₁ C 𝒟) : + IsRPACLearnerFor.{_, _, u} m ε δ₂ C 𝒟 := by + obtain ⟨Ω, mΩ, Q, hQ, A, hA⟩ := h + refine ⟨Ω, mΩ, Q, hQ, A, fun D inst hD => ?_⟩ + obtain ⟨hmeas, hint⟩ := @hA D inst hD + refine ⟨hmeas, le_trans hint ?_⟩ + exact_mod_cast hδ + /-! ### PAC Learnability -/ /-- A concept class `C` is *PAC learnable* over the distribution family `𝒟` if for every @@ -199,8 +288,8 @@ def IsPACLearnable (C : ConceptClass α β) (𝒟 : Set (Measure (α × β))) : /-- A concept class `C` is *randomized PAC learnable* over the distribution family `𝒟` if for every accuracy `ε ∈ (0, 1)` and confidence `δ ∈ (0, 1)`, there exists a sample size `m` -admitting a randomized `(ε, δ)`-PAC learner for `C`. Here `ε` and `δ` are elements of the -subtype `Set.Ioo (0 : ℝ≥0) 1`. -/ +admitting a randomized `(ε, δ)`-PAC learner for `C`. The randomness space is pinned to +`Type 0` at the learnability level; `IsRPACLearnerFor` itself remains universe-polymorphic. -/ def IsRPACLearnable (C : ConceptClass α β) (𝒟 : Set (Measure (α × β))) : Prop := ∀ (ε δ : Set.Ioo (0 : ℝ≥0) 1), ∃ m, IsRPACLearnerFor.{_, _, 0} m ε δ C 𝒟 @@ -213,34 +302,177 @@ theorem IsPACLearnable.toIsRPACLearnable {C : ConceptClass α β} obtain ⟨m, hm⟩ := h ε δ exact ⟨m, hm.toIsRPACLearnerFor⟩ -/-! ### Sample Complexity -/ +/-- PAC learnability is antitone in the distribution family: a subfamily of a learnable +family is learnable. -/ +theorem IsPACLearnable.antitone_family {C : ConceptClass α β} {𝒟 𝒟' : Set (Measure (α × β))} + (h𝒟 : 𝒟 ⊆ 𝒟') (h : IsPACLearnable C 𝒟') : IsPACLearnable C 𝒟 := + fun ε δ => (h ε δ).imp fun _ hm => hm.antitone_family h𝒟 -/-- The *deterministic sample complexity* of a concept class `C` at accuracy `ε ∈ (0, 1)` and -confidence `δ ∈ (0, 1)` over distribution family `𝒟` is the smallest sample size `m` admitting -a deterministic `(ε, δ)`-PAC learner for `C`. Here `ε` and `δ` are elements of the subtype -`Set.Ioo (0 : ℝ≥0) 1`. +/-- PAC learnability is antitone in the concept class: a subclass of a learnable class is +learnable. -/ +theorem IsPACLearnable.antitone_C {C C' : ConceptClass α β} (hC : C ⊆ C') + {𝒟 : Set (Measure (α × β))} (h : IsPACLearnable C' 𝒟) : IsPACLearnable C 𝒟 := + fun ε δ => (h ε δ).imp fun _ hm => hm.antitone_C hC -**Caveat**: because `sInf` on `ℕ` returns `0` for the empty set, this definition returns `0` -when no deterministic learner exists (e.g., when `C` has infinite VC dimension). It is only -meaningful when the defining set `{m | IsPACLearnerFor m ε δ C 𝒟}` is nonempty. -/ -noncomputable def sampleComplexity (C : ConceptClass α β) (ε δ : Set.Ioo (0 : ℝ≥0) 1) - (𝒟 : Set (Measure (α × β))) : ℕ := - sInf {m : ℕ | IsPACLearnerFor m ε δ C 𝒟} +/-- Randomized PAC learnability is antitone in the distribution family. -/ +theorem IsRPACLearnable.antitone_family {C : ConceptClass α β} + {𝒟 𝒟' : Set (Measure (α × β))} (h𝒟 : 𝒟 ⊆ 𝒟') + (h : IsRPACLearnable C 𝒟') : IsRPACLearnable C 𝒟 := + fun ε δ => (h ε δ).imp fun _ hm => hm.antitone_family h𝒟 -/-- The *randomized sample complexity* of a concept class `C` at accuracy `ε ∈ (0, 1)` and -confidence `δ ∈ (0, 1)` over distribution family `𝒟` is the smallest sample size `m` admitting -a randomized `(ε, δ)`-PAC learner for `C`. Here `ε` and `δ` are elements of the subtype -`Set.Ioo (0 : ℝ≥0) 1`. +/-! ### Sample Complexity -/ + +/-- A *learner model* is a predicate on (sample size, accuracy, confidence, concept class, +distribution family) that classifies which sample sizes admit a learner of the given kind. +Instantiating with `IsPACLearnerFor` gives the deterministic model; with `IsRPACLearnerFor` +gives the randomized one. -/ +abbrev LearnerModel (α β : Type*) [MeasurableSpace α] [MeasurableSpace β] := + ℕ → Set.Ioo (0 : ℝ≥0) 1 → Set.Ioo (0 : ℝ≥0) 1 → + ConceptClass α β → Set (Measure (α × β)) → Prop -The universe of the randomness space `Ω` is pinned to `Type 0` (via `.{_, _, 0}`) so that the -`sInf` is taken over a definite set; without the pin the existential quantifier over `Ω : Type*` -would range over all universe levels, making the set ill-defined. +/-- The *sample complexity* of a concept class `C` under a learner model `L`, at accuracy +`ε ∈ (0, 1)` and confidence `δ ∈ (0, 1)` over distribution family `𝒟`, is the smallest sample +size `m` with `L m ε δ C 𝒟`. Specialize with `L := IsPACLearnerFor` for the deterministic model +and `L := IsRPACLearnerFor` for the randomized one. **Caveat**: because `sInf` on `ℕ` returns `0` for the empty set, this definition returns `0` -when no randomized learner exists. It is only meaningful when the defining set is nonempty. -/ +when no learner exists (e.g., a concept class of infinite VC dimension). It is only meaningful +when the defining set `{m | L m ε δ C 𝒟}` is nonempty. The `IsPACLearnable.sampleComplexity_*` +variants below discharge this nonemptiness from a learnability hypothesis. -/ +noncomputable def sampleComplexity (L : LearnerModel α β) (C : ConceptClass α β) + (ε δ : Set.Ioo (0 : ℝ≥0) 1) (𝒟 : Set (Measure (α × β))) : ℕ := + sInf {m : ℕ | L m ε δ C 𝒟} + +/-- The *randomized sample complexity* of `C`, i.e. `sampleComplexity` instantiated at the +randomized learner model `IsRPACLearnerFor`. The randomness space is pinned to `Type 0`. -/ noncomputable def rsampleComplexity (C : ConceptClass α β) (ε δ : Set.Ioo (0 : ℝ≥0) 1) (𝒟 : Set (Measure (α × β))) : ℕ := - sInf {m : ℕ | IsRPACLearnerFor.{_, _, 0} m ε δ C 𝒟} + sampleComplexity IsRPACLearnerFor.{_, _, 0} C ε δ 𝒟 + +/-! ### Monotonicity of Sample Complexity + +These lemmas are all special cases of the following observation: if `{m | L₁ m ε₁ δ₁ C₁ 𝒟₁} ⊆ +{m | L₂ m ε₂ δ₂ C₂ 𝒟₂}` and the first set is nonempty, then the sample complexity under +`(L₂, ε₂, δ₂, C₂, 𝒟₂)` is at most the sample complexity under `(L₁, ε₁, δ₁, C₁, 𝒟₁)`. The +nonemptiness hypothesis is essential: `sInf` on `ℕ` returns `0` for an empty set, so without +it the inequality can fail at the degenerate boundary. The `IsPACLearnable`-flavoured variants +at the end of this section discharge that witness from a learnability hypothesis. -/ + +/-- General pointwise monotonicity of `sampleComplexity`: if every witness sample size for +`(L₁, ε₁, δ₁, C₁, 𝒟₁)` is also a witness for `(L₂, ε₂, δ₂, C₂, 𝒟₂)`, then the latter's +sample complexity is at most the former's (provided the former is attained). -/ +theorem sampleComplexity_le_of_forall {L₁ L₂ : LearnerModel α β} + {ε₁ δ₁ ε₂ δ₂ : Set.Ioo (0 : ℝ≥0) 1} {C₁ C₂ : ConceptClass α β} + {𝒟₁ 𝒟₂ : Set (Measure (α × β))} + (hL : ∀ {m : ℕ}, L₁ m ε₁ δ₁ C₁ 𝒟₁ → L₂ m ε₂ δ₂ C₂ 𝒟₂) + (h : ∃ m, L₁ m ε₁ δ₁ C₁ 𝒟₁) : + sampleComplexity L₂ C₂ ε₂ δ₂ 𝒟₂ ≤ sampleComplexity L₁ C₁ ε₁ δ₁ 𝒟₁ := + Nat.sInf_le (hL (Nat.sInf_mem h)) + +/-- Deterministic sample complexity is antitone in the confidence parameter `δ`: weaker +confidence requires no more samples. -/ +theorem sampleComplexity_antitone_δ {ε δ₁ δ₂ : Set.Ioo (0 : ℝ≥0) 1} (hδ : δ₁.val ≤ δ₂.val) + {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))} + (h : ∃ m, IsPACLearnerFor m ε δ₁ C 𝒟) : + sampleComplexity IsPACLearnerFor C ε δ₂ 𝒟 ≤ sampleComplexity IsPACLearnerFor C ε δ₁ 𝒟 := + sampleComplexity_le_of_forall (fun h' => h'.mono_δ hδ) h + +/-- Deterministic sample complexity is antitone in the accuracy parameter `ε`: weaker +accuracy requires no more samples. -/ +theorem sampleComplexity_antitone_ε {ε₁ ε₂ δ : Set.Ioo (0 : ℝ≥0) 1} (hε : ε₁.val ≤ ε₂.val) + {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))} + (h : ∃ m, IsPACLearnerFor m ε₁ δ C 𝒟) : + sampleComplexity IsPACLearnerFor C ε₂ δ 𝒟 ≤ sampleComplexity IsPACLearnerFor C ε₁ δ 𝒟 := + sampleComplexity_le_of_forall (fun h' => h'.mono_ε hε) h + +/-- Deterministic sample complexity is monotone in the distribution family under `⊆`: a +smaller family (fewer distributions to cover) requires no more samples. -/ +theorem sampleComplexity_mono_family {ε δ : Set.Ioo (0 : ℝ≥0) 1} + {C : ConceptClass α β} {𝒟 𝒟' : Set (Measure (α × β))} (h𝒟 : 𝒟 ⊆ 𝒟') + (h : ∃ m, IsPACLearnerFor m ε δ C 𝒟') : + sampleComplexity IsPACLearnerFor C ε δ 𝒟 ≤ sampleComplexity IsPACLearnerFor C ε δ 𝒟' := + sampleComplexity_le_of_forall (fun h' => h'.antitone_family h𝒟) h + +/-- Deterministic sample complexity is monotone in the concept class under `⊆`: a smaller +class (weaker agnostic benchmark) requires no more samples. -/ +theorem sampleComplexity_mono_C {ε δ : Set.Ioo (0 : ℝ≥0) 1} + {C C' : ConceptClass α β} (hC : C ⊆ C') {𝒟 : Set (Measure (α × β))} + (h : ∃ m, IsPACLearnerFor m ε δ C' 𝒟) : + sampleComplexity IsPACLearnerFor C ε δ 𝒟 ≤ sampleComplexity IsPACLearnerFor C' ε δ 𝒟 := + sampleComplexity_le_of_forall (fun h' => h'.antitone_C hC) h + +/-- Randomized sample complexity is antitone in the confidence parameter `δ`. -/ +theorem rsampleComplexity_antitone_δ {ε δ₁ δ₂ : Set.Ioo (0 : ℝ≥0) 1} + (hδ : δ₁.val ≤ δ₂.val) {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))} + (h : ∃ m, IsRPACLearnerFor.{_, _, 0} m ε δ₁ C 𝒟) : + rsampleComplexity C ε δ₂ 𝒟 ≤ rsampleComplexity C ε δ₁ 𝒟 := + sampleComplexity_le_of_forall (fun h' => h'.mono_δ hδ) h + +/-- Randomized sample complexity is monotone in the distribution family under `⊆`. -/ +theorem rsampleComplexity_mono_family {ε δ : Set.Ioo (0 : ℝ≥0) 1} + {C : ConceptClass α β} {𝒟 𝒟' : Set (Measure (α × β))} (h𝒟 : 𝒟 ⊆ 𝒟') + (h : ∃ m, IsRPACLearnerFor.{_, _, 0} m ε δ C 𝒟') : + rsampleComplexity C ε δ 𝒟 ≤ rsampleComplexity C ε δ 𝒟' := + sampleComplexity_le_of_forall (fun h' => h'.antitone_family h𝒟) h + +end + +/-! Convenience variants conditional on learnability, which discharge the nonemptiness +hypothesis `(∃ m, IsPACLearnerFor m …)` from an `IsPACLearnable` / `IsRPACLearnable` witness. +These are declared outside the main `variable`-bound section so that cross-references to the +top-level `sampleComplexity_*` lemmas are not resolved as self-recursion. -/ + +/-- `sampleComplexity_antitone_δ` for a learnable class: the nonemptiness hypothesis comes +for free from `IsPACLearnable`. -/ +theorem IsPACLearnable.sampleComplexity_antitone_δ + {α β : Type*} [MeasurableSpace α] [MeasurableSpace β] + {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))} (hL : IsPACLearnable C 𝒟) + {ε δ₁ δ₂ : Set.Ioo (0 : ℝ≥0) 1} (hδ : δ₁.val ≤ δ₂.val) : + sampleComplexity IsPACLearnerFor C ε δ₂ 𝒟 ≤ sampleComplexity IsPACLearnerFor C ε δ₁ 𝒟 := + _root_.Cslib.MachineLearning.PACLearning.sampleComplexity_antitone_δ hδ (hL ε δ₁) + +/-- `sampleComplexity_antitone_ε` for a learnable class. -/ +theorem IsPACLearnable.sampleComplexity_antitone_ε + {α β : Type*} [MeasurableSpace α] [MeasurableSpace β] + {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))} (hL : IsPACLearnable C 𝒟) + {ε₁ ε₂ δ : Set.Ioo (0 : ℝ≥0) 1} (hε : ε₁.val ≤ ε₂.val) : + sampleComplexity IsPACLearnerFor C ε₂ δ 𝒟 ≤ sampleComplexity IsPACLearnerFor C ε₁ δ 𝒟 := + _root_.Cslib.MachineLearning.PACLearning.sampleComplexity_antitone_ε hε (hL ε₁ δ) + +/-- `sampleComplexity_mono_family` for a learnable class (learnability at the *larger* +family `𝒟'` is the hypothesis). -/ +theorem IsPACLearnable.sampleComplexity_mono_family + {α β : Type*} [MeasurableSpace α] [MeasurableSpace β] + {C : ConceptClass α β} {𝒟 𝒟' : Set (Measure (α × β))} + (hL : IsPACLearnable C 𝒟') (h𝒟 : 𝒟 ⊆ 𝒟') {ε δ : Set.Ioo (0 : ℝ≥0) 1} : + sampleComplexity IsPACLearnerFor C ε δ 𝒟 ≤ sampleComplexity IsPACLearnerFor C ε δ 𝒟' := + _root_.Cslib.MachineLearning.PACLearning.sampleComplexity_mono_family h𝒟 (hL ε δ) + +/-- `sampleComplexity_mono_C` for a learnable class (learnability at the *larger* class +`C'` is the hypothesis). -/ +theorem IsPACLearnable.sampleComplexity_mono_C + {α β : Type*} [MeasurableSpace α] [MeasurableSpace β] + {C C' : ConceptClass α β} {𝒟 : Set (Measure (α × β))} + (hL : IsPACLearnable C' 𝒟) (hC : C ⊆ C') {ε δ : Set.Ioo (0 : ℝ≥0) 1} : + sampleComplexity IsPACLearnerFor C ε δ 𝒟 ≤ sampleComplexity IsPACLearnerFor C' ε δ 𝒟 := + _root_.Cslib.MachineLearning.PACLearning.sampleComplexity_mono_C hC (hL ε δ) + +/-- `rsampleComplexity_antitone_δ` for a randomized-learnable class. -/ +theorem IsRPACLearnable.rsampleComplexity_antitone_δ + {α β : Type*} [MeasurableSpace α] [MeasurableSpace β] + {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))} + (hL : IsRPACLearnable C 𝒟) + {ε δ₁ δ₂ : Set.Ioo (0 : ℝ≥0) 1} (hδ : δ₁.val ≤ δ₂.val) : + rsampleComplexity C ε δ₂ 𝒟 ≤ rsampleComplexity C ε δ₁ 𝒟 := + _root_.Cslib.MachineLearning.PACLearning.rsampleComplexity_antitone_δ hδ (hL ε δ₁) + +/-- `rsampleComplexity_mono_family` for a randomized-learnable class. -/ +theorem IsRPACLearnable.rsampleComplexity_mono_family + {α β : Type*} [MeasurableSpace α] [MeasurableSpace β] + {C : ConceptClass α β} {𝒟 𝒟' : Set (Measure (α × β))} + (hL : IsRPACLearnable C 𝒟') (h𝒟 : 𝒟 ⊆ 𝒟') {ε δ : Set.Ioo (0 : ℝ≥0) 1} : + rsampleComplexity C ε δ 𝒟 ≤ rsampleComplexity C ε δ 𝒟' := + _root_.Cslib.MachineLearning.PACLearning.rsampleComplexity_mono_family h𝒟 (hL ε δ) /-! ### Binary Classification @@ -251,27 +483,27 @@ into false positive and false negative components. The bridge lemma `error_map_eq_hypothesisError` connects the general `error` on `α × Bool` to the binary `hypothesisError` on `α`, showing they coincide for realizable distributions. -/ +section Binary +variable {α : Type*} [MeasurableSpace α] + /-- The *symmetric-difference error* of a hypothesis `h` with respect to a target concept `c` (both viewed as subsets of `α`) under distribution `P`, defined as `P(h ∆ c)`. -/ -noncomputable def hypothesisError {α : Type*} [MeasurableSpace α] (P : Measure α) - (h c : Set α) : ℝ≥0∞ := +noncomputable def hypothesisError (P : Measure α) (h c : Set α) : ℝ≥0∞ := P (symmDiff h c) /-- The *false positive error* `P(h \ c)` — points classified positive but not in the concept. -/ -noncomputable def falsePositiveError {α : Type*} [MeasurableSpace α] (P : Measure α) - (h c : Set α) : ℝ≥0∞ := +noncomputable def falsePositiveError (P : Measure α) (h c : Set α) : ℝ≥0∞ := P (h \ c) /-- The *false negative error* `P(c \ h)` — points in the concept but classified negative. -/ -noncomputable def falseNegativeError {α : Type*} [MeasurableSpace α] (P : Measure α) - (h c : Set α) : ℝ≥0∞ := +noncomputable def falseNegativeError (P : Measure α) (h c : Set α) : ℝ≥0∞ := P (c \ h) /-- The total hypothesis error decomposes as the sum of false positive and false negative errors, since `h ∆ c = (h \ c) ∪ (c \ h)` is a disjoint union. -/ -theorem hypothesisError_eq_add {α : Type*} [MeasurableSpace α] {P : Measure α} - {h c : Set α} (hh : MeasurableSet h) (hc : MeasurableSet c) : +theorem hypothesisError_eq_add {P : Measure α} {h c : Set α} + (hh : MeasurableSet h) (hc : MeasurableSet c) : hypothesisError P h c = falsePositiveError P h c + falseNegativeError P h c := by simp only [hypothesisError, falsePositiveError, falseNegativeError, symmDiff_def, sup_eq_union] exact measure_union disjoint_sdiff_sdiff (hc.diff hh) @@ -280,8 +512,8 @@ open Classical in /-- Under a realizable distribution `P.map (x ↦ (x, c(x)))`, the general 0-1 `error` coincides with the binary `hypothesisError P h c`, where `h` and `c` are viewed as subsets of `α` via the characteristic function `decide (· ∈ ·)`. -/ -theorem error_map_eq_hypothesisError {α : Type*} [MeasurableSpace α] (P : Measure α) - (h c : Set α) (hh : MeasurableSet h) (hc : MeasurableSet c) : +theorem error_map_eq_hypothesisError (P : Measure α) (h c : Set α) + (hh : MeasurableSet h) (hc : MeasurableSet c) : error (P.map (fun x => (x, decide (x ∈ c)))) (fun x => decide (x ∈ h)) = hypothesisError P h c := by simp only [error, hypothesisError] @@ -297,4 +529,6 @@ theorem error_map_eq_hypothesisError {α : Type*} [MeasurableSpace α] (P : Meas (hh.compl.prod (measurableSet_singleton true)) using 1 ext ⟨x, b⟩; cases b <;> simp -end Cslib.MachineLearning +end Binary + +end Cslib.MachineLearning.PACLearning From 11ae458ccdb8ae4e4474786695afa8ec56c4a909 Mon Sep 17 00:00:00 2001 From: Samuel Schlesinger Date: Fri, 17 Apr 2026 21:31:21 -0400 Subject: [PATCH 3/3] refactor(MachineLearning/PACLearning): drop `_root_.` qualifiers in convenience lemmas MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The `IsPACLearnable.sampleComplexity_*` and `IsRPACLearnable.rsampleComplexity_*` convenience lemmas now compose with the general `sampleComplexity_le_of_forall` directly instead of calling the top-level `sampleComplexity_*` lemmas whose unqualified names collide with the surrounding `IsPACLearnable.` / `IsRPACLearnable.` namespace. This removes six `_root_.Cslib.MachineLearning.PACLearning.*` qualifiers, lets the convenience lemmas live inside the main `variable {α β}` section again, and drops their now-redundant type binders. --- Cslib/MachineLearning/PACLearning/Defs.lean | 30 +++++++++------------ 1 file changed, 12 insertions(+), 18 deletions(-) diff --git a/Cslib/MachineLearning/PACLearning/Defs.lean b/Cslib/MachineLearning/PACLearning/Defs.lean index 81eeedf65..da8536a3e 100644 --- a/Cslib/MachineLearning/PACLearning/Defs.lean +++ b/Cslib/MachineLearning/PACLearning/Defs.lean @@ -415,64 +415,58 @@ theorem rsampleComplexity_mono_family {ε δ : Set.Ioo (0 : ℝ≥0) 1} rsampleComplexity C ε δ 𝒟 ≤ rsampleComplexity C ε δ 𝒟' := sampleComplexity_le_of_forall (fun h' => h'.antitone_family h𝒟) h -end - /-! Convenience variants conditional on learnability, which discharge the nonemptiness hypothesis `(∃ m, IsPACLearnerFor m …)` from an `IsPACLearnable` / `IsRPACLearnable` witness. -These are declared outside the main `variable`-bound section so that cross-references to the -top-level `sampleComplexity_*` lemmas are not resolved as self-recursion. -/ +Bodies go through `sampleComplexity_le_of_forall` directly rather than the top-level +`sampleComplexity_*` lemmas, whose unqualified names would resolve as self-recursion inside +these theorems' `IsPACLearnable.*` / `IsRPACLearnable.*` namespaces. -/ /-- `sampleComplexity_antitone_δ` for a learnable class: the nonemptiness hypothesis comes for free from `IsPACLearnable`. -/ theorem IsPACLearnable.sampleComplexity_antitone_δ - {α β : Type*} [MeasurableSpace α] [MeasurableSpace β] {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))} (hL : IsPACLearnable C 𝒟) {ε δ₁ δ₂ : Set.Ioo (0 : ℝ≥0) 1} (hδ : δ₁.val ≤ δ₂.val) : sampleComplexity IsPACLearnerFor C ε δ₂ 𝒟 ≤ sampleComplexity IsPACLearnerFor C ε δ₁ 𝒟 := - _root_.Cslib.MachineLearning.PACLearning.sampleComplexity_antitone_δ hδ (hL ε δ₁) + sampleComplexity_le_of_forall (fun h' => h'.mono_δ hδ) (hL ε δ₁) /-- `sampleComplexity_antitone_ε` for a learnable class. -/ theorem IsPACLearnable.sampleComplexity_antitone_ε - {α β : Type*} [MeasurableSpace α] [MeasurableSpace β] {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))} (hL : IsPACLearnable C 𝒟) {ε₁ ε₂ δ : Set.Ioo (0 : ℝ≥0) 1} (hε : ε₁.val ≤ ε₂.val) : sampleComplexity IsPACLearnerFor C ε₂ δ 𝒟 ≤ sampleComplexity IsPACLearnerFor C ε₁ δ 𝒟 := - _root_.Cslib.MachineLearning.PACLearning.sampleComplexity_antitone_ε hε (hL ε₁ δ) + sampleComplexity_le_of_forall (fun h' => h'.mono_ε hε) (hL ε₁ δ) /-- `sampleComplexity_mono_family` for a learnable class (learnability at the *larger* family `𝒟'` is the hypothesis). -/ theorem IsPACLearnable.sampleComplexity_mono_family - {α β : Type*} [MeasurableSpace α] [MeasurableSpace β] {C : ConceptClass α β} {𝒟 𝒟' : Set (Measure (α × β))} (hL : IsPACLearnable C 𝒟') (h𝒟 : 𝒟 ⊆ 𝒟') {ε δ : Set.Ioo (0 : ℝ≥0) 1} : sampleComplexity IsPACLearnerFor C ε δ 𝒟 ≤ sampleComplexity IsPACLearnerFor C ε δ 𝒟' := - _root_.Cslib.MachineLearning.PACLearning.sampleComplexity_mono_family h𝒟 (hL ε δ) + sampleComplexity_le_of_forall (fun h' => h'.antitone_family h𝒟) (hL ε δ) /-- `sampleComplexity_mono_C` for a learnable class (learnability at the *larger* class `C'` is the hypothesis). -/ theorem IsPACLearnable.sampleComplexity_mono_C - {α β : Type*} [MeasurableSpace α] [MeasurableSpace β] {C C' : ConceptClass α β} {𝒟 : Set (Measure (α × β))} (hL : IsPACLearnable C' 𝒟) (hC : C ⊆ C') {ε δ : Set.Ioo (0 : ℝ≥0) 1} : sampleComplexity IsPACLearnerFor C ε δ 𝒟 ≤ sampleComplexity IsPACLearnerFor C' ε δ 𝒟 := - _root_.Cslib.MachineLearning.PACLearning.sampleComplexity_mono_C hC (hL ε δ) + sampleComplexity_le_of_forall (fun h' => h'.antitone_C hC) (hL ε δ) /-- `rsampleComplexity_antitone_δ` for a randomized-learnable class. -/ theorem IsRPACLearnable.rsampleComplexity_antitone_δ - {α β : Type*} [MeasurableSpace α] [MeasurableSpace β] - {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))} - (hL : IsRPACLearnable C 𝒟) + {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))} (hL : IsRPACLearnable C 𝒟) {ε δ₁ δ₂ : Set.Ioo (0 : ℝ≥0) 1} (hδ : δ₁.val ≤ δ₂.val) : rsampleComplexity C ε δ₂ 𝒟 ≤ rsampleComplexity C ε δ₁ 𝒟 := - _root_.Cslib.MachineLearning.PACLearning.rsampleComplexity_antitone_δ hδ (hL ε δ₁) + sampleComplexity_le_of_forall (fun h' => h'.mono_δ hδ) (hL ε δ₁) /-- `rsampleComplexity_mono_family` for a randomized-learnable class. -/ theorem IsRPACLearnable.rsampleComplexity_mono_family - {α β : Type*} [MeasurableSpace α] [MeasurableSpace β] {C : ConceptClass α β} {𝒟 𝒟' : Set (Measure (α × β))} (hL : IsRPACLearnable C 𝒟') (h𝒟 : 𝒟 ⊆ 𝒟') {ε δ : Set.Ioo (0 : ℝ≥0) 1} : rsampleComplexity C ε δ 𝒟 ≤ rsampleComplexity C ε δ 𝒟' := - _root_.Cslib.MachineLearning.PACLearning.rsampleComplexity_mono_family h𝒟 (hL ε δ) + sampleComplexity_le_of_forall (fun h' => h'.antitone_family h𝒟) (hL ε δ) + +end /-! ### Binary Classification