diff --git a/Cslib.lean b/Cslib.lean
index 3e5977af2..43d44b938 100644
--- a/Cslib.lean
+++ b/Cslib.lean
@@ -120,3 +120,4 @@ public import Cslib.Logics.LinearLogic.CLL.CutElimination
 public import Cslib.Logics.LinearLogic.CLL.EtaExpansion
 public import Cslib.Logics.LinearLogic.CLL.PhaseSemantics.Basic
 public import Cslib.Logics.Propositional.Defs
+public import Cslib.MachineLearning.PACLearning.Defs
diff --git a/Cslib/MachineLearning/PACLearning/Defs.lean b/Cslib/MachineLearning/PACLearning/Defs.lean
new file mode 100644
index 000000000..da8536a3e
--- /dev/null
+++ b/Cslib/MachineLearning/PACLearning/Defs.lean
@@ -0,0 +1,528 @@
+/-
+Copyright (c) 2026 Samuel Schlesinger. All rights reserved.
+Released under Apache 2.0 license as described in the file LICENSE.
+Authors: Samuel Schlesinger
+-/
+
+module
+
+public import Cslib.Init
+public import Mathlib.MeasureTheory.Measure.MeasureSpace
+public import Mathlib.MeasureTheory.Constructions.Pi
+public import Mathlib.Order.SymmDiff
+
+@[expose] public section
+
+/-! # PAC Learning
+
+This file defines the Probably Approximately Correct (PAC) learning model
+introduced by Valiant [Valiant1984], generalized to an arbitrary label type `β`
+and parameterized by a family of distributions `𝒟` on `α × β`.
+
+A concept class `C` over domain `α` with labels in `β` is a collection of
+functions `α → β`. A learning algorithm receives a labeled sample drawn i.i.d.
+from an unknown joint distribution `D` on `α × β` and must produce a hypothesis
+whose 0-1 error is within `ε` of the best concept in `C`, with probability at
+least `1 - δ`.
+
+The single definition `IsPACLearnerFor` captures the realizable, agnostic, and
+noise-tolerant settings by varying the distribution family `𝒟`:
+
+- **Agnostic** [Haussler1992]: `𝒟 = Set.univ` — the learner must work for all distributions.
+- **Realizable**: `𝒟` consists of pushforwards of arbitrary probability measures
+  `P` on `α` along the graph `x ↦ (x, c x)` of some concept `c ∈ C`, so that
+  `optimalError D C = 0`.
+- **Noise-tolerant** [AngluinLaird1988]: `𝒟` consists of noisy versions of realizable
+  distributions, where each label is corrupted independently with some probability `η`.
+
+The accuracy and confidence parameters `ε` and `δ` are elements of the subtype
+`Set.Ioo (0 : ℝ≥0) 1`, which bundles the value together with the proof that it
+lies in the open interval `(0, 1)`, ensuring the learning condition is non-vacuous.
+
+All declarations live under the `Cslib.MachineLearning.PACLearning` namespace so that
+generic names like `error` and `optimalError` do not pollute the parent namespace.
+
+## Main definitions
+
+- `ConceptClass`: a set of functions `α → β` (classifiers).
+- `LabeledSample`: a finite sequence of `(point, label)` pairs.
+- `Learner`: a function from labeled samples to hypotheses.
+- `error`: the 0-1 error of a hypothesis under a joint distribution.
+- `optimalError`: the infimum of `error` over a concept class.
+- `IsPACLearnerFor`: deterministic `(ε, δ)`-PAC learner over a distribution family.
+- `IsRPACLearnerFor`: randomized variant of `IsPACLearnerFor`. Universe-polymorphic in the
+  randomness space `Ω : Type*`.
+- `IsPACLearnable`: a concept class is PAC learnable if `IsPACLearnerFor` holds for
+  all `ε, δ : Set.Ioo (0 : ℝ≥0) 1` with some sample size `m`.
+- `IsRPACLearnable`: randomized variant of `IsPACLearnable`. Pins the randomness space to
+  `Type 0`; `IsRPACLearnerFor` itself remains universe-polymorphic for users who need it.
+- `LearnerModel`: the common predicate shape `ℕ → ε → δ → C → 𝒟 → Prop` abstracting both
+  the deterministic and randomized learners so sample-complexity lemmas can be shared.
+- `sampleComplexity`: sample complexity of a generic learner model.
+- `rsampleComplexity`: randomized sample complexity, i.e. `sampleComplexity IsRPACLearnerFor`.
+
+## Binary classification
+
+When `β = Bool`, concepts correspond to subsets of `α`. The section
+*Binary Classification* provides:
+
+- `hypothesisError`: the symmetric-difference error `P(h ∆ c)`.
+- `falsePositiveError`, `falseNegativeError`: its decomposition.
+- `hypothesisError_eq_add`: the decomposition theorem.
+- `error_map_eq_hypothesisError`: bridge between the general `error` and
+  the binary `hypothesisError` under a realizable distribution.
+
+## Main statements
+
+- `IsPACLearnerFor.toIsRPACLearnerFor`: every deterministic PAC learner is a
+  randomized one (via the trivial randomness space `PUnit`).
+- `IsPACLearnerFor.antitone_family`, `.antitone_C`: the deterministic PAC learner
+  predicate is antitone in the distribution family and concept class.
+- `IsPACLearnerFor.mono_δ`, `.mono_ε`: the predicate is monotone in the confidence and
+  accuracy parameters (a weaker bound still holds).
+- `IsRPACLearnerFor.antitone_family`, `.mono_δ`: analogues for the randomized predicate.
+  (`mono_ε` and `antitone_C` are not provided because they change the integrand and
+  would require an extra measurability assumption.)
+- `IsPACLearnable.toIsRPACLearnable`: deterministic learnability implies randomized.
+- `IsPACLearnable.antitone_family`, `.antitone_C`, `IsRPACLearnable.antitone_family`:
+  PAC learnability is antitone in the distribution family and concept class.
+- `sampleComplexity_antitone_δ`, `_antitone_ε`, `_mono_family`, `_mono_C`: variation of
+  deterministic sample complexity in confidence, accuracy, distribution family, and concept
+  class (antitone in the numeric parameters, monotone under `⊆` in the set parameters). The
+  randomized analogues `rsampleComplexity_antitone_δ` and `_mono_family` are provided.
+- `IsPACLearnable.sampleComplexity_*`, `IsRPACLearnable.rsampleComplexity_*`: the same
+  monotonicity facts phrased with a learnability hypothesis in place of the ad-hoc
+  `∃ m, IsPACLearnerFor m …` existence witness, so callers who already know the class is
+  learnable need not thread it through.
+- `hypothesisError_eq_add`: total error = false positive + false negative.
+
+## References
+
+* [L. G. Valiant, *A Theory of the Learnable*][Valiant1984]
+* [A. Ehrenfeucht, D. Haussler, M. Kearns, L. Valiant,
+  *A General Lower Bound on the Number of Examples Needed for Learning*][EHKV1989]
+* [M. J. Kearns, U. V. Vazirani,
+  *An Introduction to Computational Learning Theory*][KearnsVazirani1994]
+* [D. Haussler, *Decision Theoretic Generalizations of the PAC Model for Neural Net
+  and Other Learning Applications*][Haussler1992]
+* [D. Angluin, P. Laird, *Learning from Noisy Examples*][AngluinLaird1988]
+-/
+
+open MeasureTheory Set
+open scoped ENNReal NNReal
+
+namespace Cslib.MachineLearning.PACLearning
+
+/-! ### Core Definitions -/
+
+/-- A *concept class* over domain `α` with label type `β` is a set of functions `α → β`.
+For binary classification (`β = Bool`), this is equivalent to a collection of subsets of `α`
+via the characteristic function. -/
+abbrev ConceptClass (α β : Type*) := Set (α → β)
+
+/-- A *labeled sample* of size `m` over domain `α` with label type `β` is a finite sequence
+of `(point, label)` pairs. -/
+abbrev LabeledSample (α β : Type*) (m : ℕ) := Fin m → (α × β)
+
+/-- A *learner* using `m` samples is a function that takes a labeled sample and produces
+a hypothesis (a function from the domain to the label type). -/
+abbrev Learner (α β : Type*) (m : ℕ) := LabeledSample α β m → (α → β)
+
+section
+variable {α : Type*} {β : Type*} [MeasurableSpace α] [MeasurableSpace β]
+
+/-- The *prediction error* (0-1 loss) of a hypothesis `h` under a joint distribution `D`
+on `α × β`, defined as the probability that the prediction disagrees with the label:
+`D({(x, y) | h(x) ≠ y})`. -/
+noncomputable def error (D : Measure (α × β)) (h : α → β) : ℝ≥0∞ :=
+  D {p : α × β | h p.1 ≠ p.2}
+
+/-- The *optimal error* of a concept class `C` under a joint distribution `D`, defined as the
+infimum of `error D c` over all concepts `c ∈ C`. When `C` is empty this is `⊤`, making the
+PAC learning condition vacuously true. -/
+noncomputable def optimalError (D : Measure (α × β)) (C : ConceptClass α β) : ℝ≥0∞ :=
+  ⨅ c ∈ C, error D c
+
+/-! ### PAC Learners -/
+
+/-- `IsPACLearnerFor m ε δ C 𝒟` asserts that there exists a learner using `m` samples
+that is `(ε, δ)`-correct for the concept class `C` over the distribution family `𝒟`: for every
+probability measure `D ∈ 𝒟` on `α × β`, the probability (over i.i.d. samples from `D`) that
+the learner's hypothesis has error exceeding `opt_C(D) + ε` is at most `δ`.
+
+The parameters `ε` and `δ` are elements of `Set.Ioo (0 : ℝ≥0) 1`, bundling the value with
+the proof that it lies in `(0, 1)`. This ensures the condition is non-vacuous:
+`ε < 1` prevents the error threshold from exceeding the maximum possible error under a
+probability measure, and `δ < 1` prevents the confidence bound from being trivially
+satisfied. -/
+def IsPACLearnerFor (m : ℕ) (ε δ : Set.Ioo (0 : ℝ≥0) 1)
+    (C : ConceptClass α β) (𝒟 : Set (Measure (α × β))) : Prop :=
+  ∃ A : Learner α β m,
+    ∀ (D : Measure (α × β)) [IsProbabilityMeasure D], D ∈ 𝒟 →
+      (Measure.pi (fun _ : Fin m => D))
+        {S : LabeledSample α β m |
+          error D (A S) > optimalError D C + ↑ε.val} ≤ ↑δ.val
+
+/-- `IsRPACLearnerFor m ε δ C 𝒟` asserts that there exists a *randomized* learner using
+`m` samples that is `(ε, δ)`-correct for the concept class `C` over the distribution family
+`𝒟`. A randomized learner draws internal randomness `ω` from a probability space `(Ω, Q)` and
+acts as the deterministic learner `A(ω)`.
+
+For every probability measure `D ∈ 𝒟`, the failure probability function
+`ω ↦ D^m{S | error(A(ω)(S)) > opt_C(D) + ε}` must be `Q`-a.e. measurable, and its
+expectation over `ω` must be at most `δ`.
+
+The randomness space `Ω : Type*` is universe-polymorphic; the universe is an implicit
+parameter of `IsRPACLearnerFor`, and downstream statements reference it via the pattern
+`IsRPACLearnerFor.{_, _, u}`. Fix `u := 0` for the usual case of a standard randomness space.
+
+A deterministic learner (`IsPACLearnerFor`) is the special case `Ω = PUnit`;
+see `IsPACLearnerFor.toIsRPACLearnerFor`. -/
+def IsRPACLearnerFor (m : ℕ) (ε δ : Set.Ioo (0 : ℝ≥0) 1)
+    (C : ConceptClass α β) (𝒟 : Set (Measure (α × β))) : Prop :=
+  ∃ (Ω : Type*) (_ : MeasurableSpace Ω) (Q : Measure Ω) (_ : IsProbabilityMeasure Q)
+    (A : Ω → Learner α β m),
+    ∀ (D : Measure (α × β)) [IsProbabilityMeasure D], D ∈ 𝒟 →
+      AEMeasurable (fun ω => (Measure.pi (fun _ : Fin m => D))
+        {S : LabeledSample α β m |
+          error D ((A ω) S) > optimalError D C + ↑ε.val}) Q ∧
+      ∫⁻ ω, (Measure.pi (fun _ : Fin m => D))
+        {S : LabeledSample α β m |
+          error D ((A ω) S) > optimalError D C + ↑ε.val} ∂Q ≤ ↑δ.val
+
+/-- Every deterministic PAC learner is in particular a randomized PAC learner
+(with the trivial one-point randomness space `PUnit`). -/
+theorem IsPACLearnerFor.toIsRPACLearnerFor {m : ℕ} {ε δ : Set.Ioo (0 : ℝ≥0) 1}
+    {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))}
+    (h : IsPACLearnerFor m ε δ C 𝒟) :
+    IsRPACLearnerFor m ε δ C 𝒟 := by
+  obtain ⟨A, hA⟩ := h
+  refine ⟨PUnit, inferInstance, Measure.dirac PUnit.unit, inferInstance, fun _ => A, ?_⟩
+  intro D _ hD
+  refine ⟨measurable_const.aemeasurable, ?_⟩
+  simp only [gt_iff_lt, lintegral_const, measure_univ, mul_one]
+  exact hA D hD
+
+/-- The deterministic PAC learner predicate is antitone in the distribution family: a
+learner for a larger family `𝒟'` is also a learner for any subfamily `𝒟 ⊆ 𝒟'`. -/
+theorem IsPACLearnerFor.antitone_family {m : ℕ} {ε δ : Set.Ioo (0 : ℝ≥0) 1}
+    {C : ConceptClass α β} {𝒟 𝒟' : Set (Measure (α × β))}
+    (h𝒟 : 𝒟 ⊆ 𝒟') (h : IsPACLearnerFor m ε δ C 𝒟') :
+    IsPACLearnerFor m ε δ C 𝒟 := by
+  obtain ⟨A, hA⟩ := h
+  exact ⟨A, fun D inst hD => @hA D inst (h𝒟 hD)⟩
+
+/-- A PAC learner with confidence `δ₁` is also a PAC learner with any weaker confidence
+`δ₂ ≥ δ₁`: the failure-probability bound only gets looser. -/
+theorem IsPACLearnerFor.mono_δ {m : ℕ} {ε : Set.Ioo (0 : ℝ≥0) 1}
+    {δ₁ δ₂ : Set.Ioo (0 : ℝ≥0) 1} (hδ : δ₁.val ≤ δ₂.val)
+    {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))}
+    (h : IsPACLearnerFor m ε δ₁ C 𝒟) :
+    IsPACLearnerFor m ε δ₂ C 𝒟 := by
+  obtain ⟨A, hA⟩ := h
+  refine ⟨A, fun D inst hD => le_trans (@hA D inst hD) ?_⟩
+  exact_mod_cast hδ
+
+/-- A PAC learner with accuracy `ε₁` is also a PAC learner with any weaker accuracy
+`ε₂ ≥ ε₁`: the bad event `{error > opt + ε}` only shrinks. -/
+theorem IsPACLearnerFor.mono_ε {m : ℕ} {δ : Set.Ioo (0 : ℝ≥0) 1}
+    {ε₁ ε₂ : Set.Ioo (0 : ℝ≥0) 1} (hε : ε₁.val ≤ ε₂.val)
+    {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))}
+    (h : IsPACLearnerFor m ε₁ δ C 𝒟) :
+    IsPACLearnerFor m ε₂ δ C 𝒟 := by
+  obtain ⟨A, hA⟩ := h
+  refine ⟨A, fun D inst hD => le_trans (measure_mono ?_) (@hA D inst hD)⟩
+  intro S hS
+  have hε' : (↑ε₁.val : ℝ≥0∞) ≤ ↑ε₂.val := by exact_mod_cast hε
+  calc optimalError D C + (↑ε₁.val : ℝ≥0∞)
+      ≤ optimalError D C + ↑ε₂.val := by gcongr
+    _ < error D (A S) := hS
+
+/-- The deterministic PAC learner predicate is antitone in the concept class: a learner
+for a larger class `C'` is also a learner for any subclass `C ⊆ C'`, since the agnostic
+benchmark `optimalError _ C ≥ optimalError _ C'` makes the error requirement easier. -/
+theorem IsPACLearnerFor.antitone_C {m : ℕ} {ε δ : Set.Ioo (0 : ℝ≥0) 1}
+    {C C' : ConceptClass α β} (hC : C ⊆ C')
+    {𝒟 : Set (Measure (α × β))} (h : IsPACLearnerFor m ε δ C' 𝒟) :
+    IsPACLearnerFor m ε δ C 𝒟 := by
+  obtain ⟨A, hA⟩ := h
+  refine ⟨A, fun D inst hD => le_trans (measure_mono ?_) (@hA D inst hD)⟩
+  intro S hS
+  have h_opt : optimalError D C' ≤ optimalError D C := iInf_le_iInf_of_subset hC
+  calc optimalError D C' + (↑ε.val : ℝ≥0∞)
+      ≤ optimalError D C + ↑ε.val := by gcongr
+    _ < error D (A S) := hS
+
+/-- The randomized PAC learner predicate is antitone in the distribution family. The
+universe of the randomness space `Ω` is pinned so the hypothesis and conclusion share it. -/
+theorem IsRPACLearnerFor.antitone_family.{u} {m : ℕ} {ε δ : Set.Ioo (0 : ℝ≥0) 1}
+    {C : ConceptClass α β} {𝒟 𝒟' : Set (Measure (α × β))}
+    (h𝒟 : 𝒟 ⊆ 𝒟') (h : IsRPACLearnerFor.{_, _, u} m ε δ C 𝒟') :
+    IsRPACLearnerFor.{_, _, u} m ε δ C 𝒟 := by
+  obtain ⟨Ω, mΩ, Q, hQ, A, hA⟩ := h
+  exact ⟨Ω, mΩ, Q, hQ, A, fun D inst hD => @hA D inst (h𝒟 hD)⟩
+
+/-- A randomized PAC learner with confidence `δ₁` is also a randomized PAC learner with
+any weaker confidence `δ₂ ≥ δ₁`. Unlike `mono_ε` or `antitone_C`, this does not touch the
+integrand, so it carries the `AEMeasurable` part through unchanged. -/
+theorem IsRPACLearnerFor.mono_δ.{u} {m : ℕ} {ε : Set.Ioo (0 : ℝ≥0) 1}
+    {δ₁ δ₂ : Set.Ioo (0 : ℝ≥0) 1} (hδ : δ₁.val ≤ δ₂.val)
+    {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))}
+    (h : IsRPACLearnerFor.{_, _, u} m ε δ₁ C 𝒟) :
+    IsRPACLearnerFor.{_, _, u} m ε δ₂ C 𝒟 := by
+  obtain ⟨Ω, mΩ, Q, hQ, A, hA⟩ := h
+  refine ⟨Ω, mΩ, Q, hQ, A, fun D inst hD => ?_⟩
+  obtain ⟨hmeas, hint⟩ := @hA D inst hD
+  refine ⟨hmeas, le_trans hint ?_⟩
+  exact_mod_cast hδ
+
+/-! ### PAC Learnability -/
+
+/-- A concept class `C` is *PAC learnable* over the distribution family `𝒟` if for every
+accuracy `ε ∈ (0, 1)` and confidence `δ ∈ (0, 1)`, there exists a sample size `m` admitting
+a deterministic `(ε, δ)`-PAC learner for `C`. Here `ε` and `δ` are elements of the subtype
+`Set.Ioo (0 : ℝ≥0) 1`. -/
+def IsPACLearnable (C : ConceptClass α β) (𝒟 : Set (Measure (α × β))) : Prop :=
+  ∀ (ε δ : Set.Ioo (0 : ℝ≥0) 1),
+    ∃ m, IsPACLearnerFor m ε δ C 𝒟
+
+/-- A concept class `C` is *randomized PAC learnable* over the distribution family `𝒟` if for
+every accuracy `ε ∈ (0, 1)` and confidence `δ ∈ (0, 1)`, there exists a sample size `m`
+admitting a randomized `(ε, δ)`-PAC learner for `C`. The randomness space is pinned to
+`Type 0` at the learnability level; `IsRPACLearnerFor` itself remains universe-polymorphic. -/
+def IsRPACLearnable (C : ConceptClass α β) (𝒟 : Set (Measure (α × β))) : Prop :=
+  ∀ (ε δ : Set.Ioo (0 : ℝ≥0) 1),
+    ∃ m, IsRPACLearnerFor.{_, _, 0} m ε δ C 𝒟
+
+/-- Deterministic PAC learnability implies randomized PAC learnability. -/
+theorem IsPACLearnable.toIsRPACLearnable {C : ConceptClass α β}
+    {𝒟 : Set (Measure (α × β))} (h : IsPACLearnable C 𝒟) :
+    IsRPACLearnable C 𝒟 := by
+  intro ε δ
+  obtain ⟨m, hm⟩ := h ε δ
+  exact ⟨m, hm.toIsRPACLearnerFor⟩
+
+/-- PAC learnability is antitone in the distribution family: a subfamily of a learnable
+family is learnable. -/
+theorem IsPACLearnable.antitone_family {C : ConceptClass α β} {𝒟 𝒟' : Set (Measure (α × β))}
+    (h𝒟 : 𝒟 ⊆ 𝒟') (h : IsPACLearnable C 𝒟') : IsPACLearnable C 𝒟 :=
+  fun ε δ => (h ε δ).imp fun _ hm => hm.antitone_family h𝒟
+
+/-- PAC learnability is antitone in the concept class: a subclass of a learnable class is
+learnable. -/
+theorem IsPACLearnable.antitone_C {C C' : ConceptClass α β} (hC : C ⊆ C')
+    {𝒟 : Set (Measure (α × β))} (h : IsPACLearnable C' 𝒟) : IsPACLearnable C 𝒟 :=
+  fun ε δ => (h ε δ).imp fun _ hm => hm.antitone_C hC
+
+/-- Randomized PAC learnability is antitone in the distribution family. -/
+theorem IsRPACLearnable.antitone_family {C : ConceptClass α β}
+    {𝒟 𝒟' : Set (Measure (α × β))} (h𝒟 : 𝒟 ⊆ 𝒟')
+    (h : IsRPACLearnable C 𝒟') : IsRPACLearnable C 𝒟 :=
+  fun ε δ => (h ε δ).imp fun _ hm => hm.antitone_family h𝒟
+
+/-! ### Sample Complexity -/
+
+/-- A *learner model* is a predicate on (sample size, accuracy, confidence, concept class,
+distribution family) that classifies which sample sizes admit a learner of the given kind.
+Instantiating with `IsPACLearnerFor` gives the deterministic model; with `IsRPACLearnerFor`
+gives the randomized one. -/
+abbrev LearnerModel (α β : Type*) [MeasurableSpace α] [MeasurableSpace β] :=
+  ℕ → Set.Ioo (0 : ℝ≥0) 1 → Set.Ioo (0 : ℝ≥0) 1 →
+    ConceptClass α β → Set (Measure (α × β)) → Prop
+
+/-- The *sample complexity* of a concept class `C` under a learner model `L`, at accuracy
+`ε ∈ (0, 1)` and confidence `δ ∈ (0, 1)` over distribution family `𝒟`, is the smallest sample
+size `m` with `L m ε δ C 𝒟`. Specialize with `L := IsPACLearnerFor` for the deterministic model
+and `L := IsRPACLearnerFor` for the randomized one.
+
+**Caveat**: because `sInf` on `ℕ` returns `0` for the empty set, this definition returns `0`
+when no learner exists (e.g., a concept class of infinite VC dimension). It is only meaningful
+when the defining set `{m | L m ε δ C 𝒟}` is nonempty. The `IsPACLearnable.sampleComplexity_*`
+variants below discharge this nonemptiness from a learnability hypothesis. -/
+noncomputable def sampleComplexity (L : LearnerModel α β) (C : ConceptClass α β)
+    (ε δ : Set.Ioo (0 : ℝ≥0) 1) (𝒟 : Set (Measure (α × β))) : ℕ :=
+  sInf {m : ℕ | L m ε δ C 𝒟}
+
+/-- The *randomized sample complexity* of `C`, i.e. `sampleComplexity` instantiated at the
+randomized learner model `IsRPACLearnerFor`. The randomness space is pinned to `Type 0`. -/
+noncomputable def rsampleComplexity (C : ConceptClass α β) (ε δ : Set.Ioo (0 : ℝ≥0) 1)
+    (𝒟 : Set (Measure (α × β))) : ℕ :=
+  sampleComplexity IsRPACLearnerFor.{_, _, 0} C ε δ 𝒟
+
+/-! ### Monotonicity of Sample Complexity
+
+These lemmas are all special cases of the following observation: if `{m | L₁ m ε₁ δ₁ C₁ 𝒟₁} ⊆
+{m | L₂ m ε₂ δ₂ C₂ 𝒟₂}` and the first set is nonempty, then the sample complexity under
+`(L₂, ε₂, δ₂, C₂, 𝒟₂)` is at most the sample complexity under `(L₁, ε₁, δ₁, C₁, 𝒟₁)`. The
+nonemptiness hypothesis is essential: `sInf` on `ℕ` returns `0` for an empty set, so without
+it the inequality can fail at the degenerate boundary. The `IsPACLearnable`-flavoured variants
+at the end of this section discharge that witness from a learnability hypothesis. -/
+
+/-- General pointwise monotonicity of `sampleComplexity`: if every witness sample size for
+`(L₁, ε₁, δ₁, C₁, 𝒟₁)` is also a witness for `(L₂, ε₂, δ₂, C₂, 𝒟₂)`, then the latter's
+sample complexity is at most the former's (provided the former is attained). -/
+theorem sampleComplexity_le_of_forall {L₁ L₂ : LearnerModel α β}
+    {ε₁ δ₁ ε₂ δ₂ : Set.Ioo (0 : ℝ≥0) 1} {C₁ C₂ : ConceptClass α β}
+    {𝒟₁ 𝒟₂ : Set (Measure (α × β))}
+    (hL : ∀ {m : ℕ}, L₁ m ε₁ δ₁ C₁ 𝒟₁ → L₂ m ε₂ δ₂ C₂ 𝒟₂)
+    (h : ∃ m, L₁ m ε₁ δ₁ C₁ 𝒟₁) :
+    sampleComplexity L₂ C₂ ε₂ δ₂ 𝒟₂ ≤ sampleComplexity L₁ C₁ ε₁ δ₁ 𝒟₁ :=
+  Nat.sInf_le (hL (Nat.sInf_mem h))
+
+/-- Deterministic sample complexity is antitone in the confidence parameter `δ`: weaker
+confidence requires no more samples. -/
+theorem sampleComplexity_antitone_δ {ε δ₁ δ₂ : Set.Ioo (0 : ℝ≥0) 1} (hδ : δ₁.val ≤ δ₂.val)
+    {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))}
+    (h : ∃ m, IsPACLearnerFor m ε δ₁ C 𝒟) :
+    sampleComplexity IsPACLearnerFor C ε δ₂ 𝒟 ≤ sampleComplexity IsPACLearnerFor C ε δ₁ 𝒟 :=
+  sampleComplexity_le_of_forall (fun h' => h'.mono_δ hδ) h
+
+/-- Deterministic sample complexity is antitone in the accuracy parameter `ε`: weaker
+accuracy requires no more samples. -/
+theorem sampleComplexity_antitone_ε {ε₁ ε₂ δ : Set.Ioo (0 : ℝ≥0) 1} (hε : ε₁.val ≤ ε₂.val)
+    {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))}
+    (h : ∃ m, IsPACLearnerFor m ε₁ δ C 𝒟) :
+    sampleComplexity IsPACLearnerFor C ε₂ δ 𝒟 ≤ sampleComplexity IsPACLearnerFor C ε₁ δ 𝒟 :=
+  sampleComplexity_le_of_forall (fun h' => h'.mono_ε hε) h
+
+/-- Deterministic sample complexity is monotone in the distribution family under `⊆`: a
+smaller family (fewer distributions to cover) requires no more samples. -/
+theorem sampleComplexity_mono_family {ε δ : Set.Ioo (0 : ℝ≥0) 1}
+    {C : ConceptClass α β} {𝒟 𝒟' : Set (Measure (α × β))} (h𝒟 : 𝒟 ⊆ 𝒟')
+    (h : ∃ m, IsPACLearnerFor m ε δ C 𝒟') :
+    sampleComplexity IsPACLearnerFor C ε δ 𝒟 ≤ sampleComplexity IsPACLearnerFor C ε δ 𝒟' :=
+  sampleComplexity_le_of_forall (fun h' => h'.antitone_family h𝒟) h
+
+/-- Deterministic sample complexity is monotone in the concept class under `⊆`: a smaller
+class (weaker agnostic benchmark) requires no more samples. -/
+theorem sampleComplexity_mono_C {ε δ : Set.Ioo (0 : ℝ≥0) 1}
+    {C C' : ConceptClass α β} (hC : C ⊆ C') {𝒟 : Set (Measure (α × β))}
+    (h : ∃ m, IsPACLearnerFor m ε δ C' 𝒟) :
+    sampleComplexity IsPACLearnerFor C ε δ 𝒟 ≤ sampleComplexity IsPACLearnerFor C' ε δ 𝒟 :=
+  sampleComplexity_le_of_forall (fun h' => h'.antitone_C hC) h
+
+/-- Randomized sample complexity is antitone in the confidence parameter `δ`. -/
+theorem rsampleComplexity_antitone_δ {ε δ₁ δ₂ : Set.Ioo (0 : ℝ≥0) 1}
+    (hδ : δ₁.val ≤ δ₂.val) {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))}
+    (h : ∃ m, IsRPACLearnerFor.{_, _, 0} m ε δ₁ C 𝒟) :
+    rsampleComplexity C ε δ₂ 𝒟 ≤ rsampleComplexity C ε δ₁ 𝒟 :=
+  sampleComplexity_le_of_forall (fun h' => h'.mono_δ hδ) h
+
+/-- Randomized sample complexity is monotone in the distribution family under `⊆`. -/
+theorem rsampleComplexity_mono_family {ε δ : Set.Ioo (0 : ℝ≥0) 1}
+    {C : ConceptClass α β} {𝒟 𝒟' : Set (Measure (α × β))} (h𝒟 : 𝒟 ⊆ 𝒟')
+    (h : ∃ m, IsRPACLearnerFor.{_, _, 0} m ε δ C 𝒟') :
+    rsampleComplexity C ε δ 𝒟 ≤ rsampleComplexity C ε δ 𝒟' :=
+  sampleComplexity_le_of_forall (fun h' => h'.antitone_family h𝒟) h
+
+/-! Convenience variants conditional on learnability, which discharge the nonemptiness
+hypothesis `(∃ m, IsPACLearnerFor m …)` from an `IsPACLearnable` / `IsRPACLearnable` witness.
+Bodies go through `sampleComplexity_le_of_forall` directly rather than the top-level
+`sampleComplexity_*` lemmas, whose unqualified names would resolve as self-recursion inside
+these theorems' `IsPACLearnable.*` / `IsRPACLearnable.*` namespaces. -/
+
+/-- `sampleComplexity_antitone_δ` for a learnable class: the nonemptiness hypothesis comes
+for free from `IsPACLearnable`. -/
+theorem IsPACLearnable.sampleComplexity_antitone_δ
+    {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))} (hL : IsPACLearnable C 𝒟)
+    {ε δ₁ δ₂ : Set.Ioo (0 : ℝ≥0) 1} (hδ : δ₁.val ≤ δ₂.val) :
+    sampleComplexity IsPACLearnerFor C ε δ₂ 𝒟 ≤ sampleComplexity IsPACLearnerFor C ε δ₁ 𝒟 :=
+  sampleComplexity_le_of_forall (fun h' => h'.mono_δ hδ) (hL ε δ₁)
+
+/-- `sampleComplexity_antitone_ε` for a learnable class. -/
+theorem IsPACLearnable.sampleComplexity_antitone_ε
+    {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))} (hL : IsPACLearnable C 𝒟)
+    {ε₁ ε₂ δ : Set.Ioo (0 : ℝ≥0) 1} (hε : ε₁.val ≤ ε₂.val) :
+    sampleComplexity IsPACLearnerFor C ε₂ δ 𝒟 ≤ sampleComplexity IsPACLearnerFor C ε₁ δ 𝒟 :=
+  sampleComplexity_le_of_forall (fun h' => h'.mono_ε hε) (hL ε₁ δ)
+
+/-- `sampleComplexity_mono_family` for a learnable class (learnability at the *larger*
+family `𝒟'` is the hypothesis). -/
+theorem IsPACLearnable.sampleComplexity_mono_family
+    {C : ConceptClass α β} {𝒟 𝒟' : Set (Measure (α × β))}
+    (hL : IsPACLearnable C 𝒟') (h𝒟 : 𝒟 ⊆ 𝒟') {ε δ : Set.Ioo (0 : ℝ≥0) 1} :
+    sampleComplexity IsPACLearnerFor C ε δ 𝒟 ≤ sampleComplexity IsPACLearnerFor C ε δ 𝒟' :=
+  sampleComplexity_le_of_forall (fun h' => h'.antitone_family h𝒟) (hL ε δ)
+
+/-- `sampleComplexity_mono_C` for a learnable class (learnability at the *larger* class
+`C'` is the hypothesis). -/
+theorem IsPACLearnable.sampleComplexity_mono_C
+    {C C' : ConceptClass α β} {𝒟 : Set (Measure (α × β))}
+    (hL : IsPACLearnable C' 𝒟) (hC : C ⊆ C') {ε δ : Set.Ioo (0 : ℝ≥0) 1} :
+    sampleComplexity IsPACLearnerFor C ε δ 𝒟 ≤ sampleComplexity IsPACLearnerFor C' ε δ 𝒟 :=
+  sampleComplexity_le_of_forall (fun h' => h'.antitone_C hC) (hL ε δ)
+
+/-- `rsampleComplexity_antitone_δ` for a randomized-learnable class. -/
+theorem IsRPACLearnable.rsampleComplexity_antitone_δ
+    {C : ConceptClass α β} {𝒟 : Set (Measure (α × β))} (hL : IsRPACLearnable C 𝒟)
+    {ε δ₁ δ₂ : Set.Ioo (0 : ℝ≥0) 1} (hδ : δ₁.val ≤ δ₂.val) :
+    rsampleComplexity C ε δ₂ 𝒟 ≤ rsampleComplexity C ε δ₁ 𝒟 :=
+  sampleComplexity_le_of_forall (fun h' => h'.mono_δ hδ) (hL ε δ₁)
+
+/-- `rsampleComplexity_mono_family` for a randomized-learnable class. -/
+theorem IsRPACLearnable.rsampleComplexity_mono_family
+    {C : ConceptClass α β} {𝒟 𝒟' : Set (Measure (α × β))}
+    (hL : IsRPACLearnable C 𝒟') (h𝒟 : 𝒟 ⊆ 𝒟') {ε δ : Set.Ioo (0 : ℝ≥0) 1} :
+    rsampleComplexity C ε δ 𝒟 ≤ rsampleComplexity C ε δ 𝒟' :=
+  sampleComplexity_le_of_forall (fun h' => h'.antitone_family h𝒟) (hL ε δ)
+
+end
+
+/-! ### Binary Classification
+
+When `β = Bool`, concepts correspond to subsets of `α` via the characteristic function.
+The symmetric-difference error `P(h ∆ c)` is the natural error metric, and it decomposes
+into false positive and false negative components.
+
+The bridge lemma `error_map_eq_hypothesisError` connects the general `error` on `α × Bool`
+to the binary `hypothesisError` on `α`, showing they coincide for realizable distributions. -/
+
+section Binary
+variable {α : Type*} [MeasurableSpace α]
+
+/-- The *symmetric-difference error* of a hypothesis `h` with respect to a target concept `c`
+(both viewed as subsets of `α`) under distribution `P`, defined as `P(h ∆ c)`. -/
+noncomputable def hypothesisError (P : Measure α) (h c : Set α) : ℝ≥0∞ :=
+  P (symmDiff h c)
+
+/-- The *false positive error* `P(h \ c)` — points classified positive but not in the
+concept. -/
+noncomputable def falsePositiveError (P : Measure α) (h c : Set α) : ℝ≥0∞ :=
+  P (h \ c)
+
+/-- The *false negative error* `P(c \ h)` — points in the concept but classified negative. -/
+noncomputable def falseNegativeError (P : Measure α) (h c : Set α) : ℝ≥0∞ :=
+  P (c \ h)
+
+/-- The total hypothesis error decomposes as the sum of false positive and false negative
+errors, since `h ∆ c = (h \ c) ∪ (c \ h)` is a disjoint union. -/
+theorem hypothesisError_eq_add {P : Measure α} {h c : Set α}
+    (hh : MeasurableSet h) (hc : MeasurableSet c) :
+    hypothesisError P h c = falsePositiveError P h c + falseNegativeError P h c := by
+  simp only [hypothesisError, falsePositiveError, falseNegativeError, symmDiff_def, sup_eq_union]
+  exact measure_union disjoint_sdiff_sdiff (hc.diff hh)
+
+open Classical in
+/-- Under a realizable distribution `P.map (x ↦ (x, c(x)))`, the general 0-1 `error`
+coincides with the binary `hypothesisError P h c`, where `h` and `c` are viewed as subsets
+of `α` via the characteristic function `decide (· ∈ ·)`. -/
+theorem error_map_eq_hypothesisError (P : Measure α) (h c : Set α)
+    (hh : MeasurableSet h) (hc : MeasurableSet c) :
+    error (P.map (fun x => (x, decide (x ∈ c)))) (fun x => decide (x ∈ h)) =
+    hypothesisError P h c := by
+  simp only [error, hypothesisError]
+  have hf : Measurable (fun x => (x, decide (x ∈ c))) :=
+    Measurable.prodMk measurable_id
+      (measurable_to_bool (by convert hc using 1; ext x; simp [decide_eq_true_eq]))
+  rw [Measure.map_apply_of_aemeasurable hf.aemeasurable]
+  · congr 1; ext x
+    simp only [Set.mem_preimage, Set.mem_setOf_eq, symmDiff_def, sup_eq_union,
+      Set.mem_union, Set.mem_diff]
+    by_cases hx : x ∈ h <;> by_cases hcx : x ∈ c <;> simp_all
+  · convert (hh.prod (measurableSet_singleton false)).union
+      (hh.compl.prod (measurableSet_singleton true)) using 1
+    ext ⟨x, b⟩; cases b <;> simp
+
+end Binary
+
+end Cslib.MachineLearning.PACLearning
diff --git a/references.bib b/references.bib
index 2cccb928f..3e8c26190 100644
--- a/references.bib
+++ b/references.bib
@@ -19,6 +19,17 @@ @inproceedings{Aceto1999
   bibsource    = {dblp computer science bibliography, https://dblp.org}
 }
 
+@article{AngluinLaird1988,
+  author       = {Angluin, Dana and Laird, Philip},
+  title        = {Learning from Noisy Examples},
+  journal      = {Machine Learning},
+  volume       = {2},
+  number       = {4},
+  pages        = {343--370},
+  year         = {1988},
+  doi          = {10.1007/BF00116829}
+}
+
 @book{Baader1998,
 author = {Baader, Franz and Nipkow, Tobias},
 title = {Term rewriting and all that},
@@ -107,6 +118,18 @@ @inbook{          Girard1995
   collection={London Mathematical Society Lecture Note Series}
 }
 
+@article{Haussler1992,
+  author       = {Haussler, David},
+  title        = {Decision Theoretic Generalizations of the {PAC} Model for Neural Net and Other Learning Applications},
+  journal      = {Information and Computation},
+  volume       = {100},
+  number       = {1},
+  pages        = {78--150},
+  year         = {1992},
+  issn         = {0890-5401},
+  doi          = {10.1016/0890-5401(92)90010-D}
+}
+
 @article{         Hennessy1985,
   author       = {Matthew Hennessy and
                   Robin Milner},
@@ -265,6 +288,43 @@ @article{         ShepherdsonSturgis1963
   address      = {New York, NY, USA}
 }
 
+@inproceedings{Valiant1984,
+  author       = {Valiant, L. G.},
+  title        = {A Theory of the Learnable},
+  year         = {1984},
+  isbn         = {0-89791-133-4},
+  publisher    = {Association for Computing Machinery},
+  address      = {New York, NY, USA},
+  url          = {https://doi.org/10.1145/800057.808710},
+  doi          = {10.1145/800057.808710},
+  booktitle    = {Proceedings of the Sixteenth Annual ACM Symposium on Theory of Computing},
+  pages        = {436--445},
+  series       = {STOC '84}
+}
+
+@article{EHKV1989,
+  author       = {Ehrenfeucht, Andrzej and Haussler, David and Kearns, Michael and Valiant, Leslie},
+  title        = {A General Lower Bound on the Number of Examples Needed for Learning},
+  journal      = {Information and Computation},
+  volume       = {82},
+  number       = {3},
+  pages        = {247--261},
+  year         = {1989},
+  issn         = {0890-5401},
+  url          = {https://doi.org/10.1016/0890-5401(89)90002-3},
+  doi          = {10.1016/0890-5401(89)90002-3},
+  publisher    = {Academic Press}
+}
+
+@book{KearnsVazirani1994,
+  author       = {Kearns, Michael J. and Vazirani, Umesh V.},
+  title        = {An Introduction to Computational Learning Theory},
+  year         = {1994},
+  isbn         = {978-0-262-11193-5},
+  publisher    = {MIT Press},
+  address      = {Cambridge, MA, USA}
+}
+
 @incollection{WinskelNielsen1995,
   author       = {Winskel, Glynn and Nielsen, Mogens},
   isbn         = {9780198537809},