chore: audit(2026-05) phase 2a slice 1: extract data-pipeline concern from AiModelBuilder by ooples · Pull Request #1432 · ooples/AiDotNet

ooples · 2026-05-23T03:25:47Z

Summary

Foundation PR for the Phase 2a AiModelBuilder DI refactor (audit finding #12). Establishes the extraction pattern, ships the first of ~12 concern-component splits, and documents the full migration plan that subsequent slices will follow.

Scope is intentionally minimal so the pattern can be reviewed in isolation before the larger slices land. The other 11 slices each follow this same template.

Hard invariant

Every public AiModelBuilder API surface that exists on master remains identical (signatures, return types, observable behaviour) at the end of this PR. Existing tests stay green. Only the internal composition changes.

What this slice extracts

The data-pipeline concern: preprocessing, postprocessing, data loading, data preparation, and augmentation. 9 Configure-method bodies (~700 LoC of inline logic) migrated into a separately-testable component.

File	Purpose
`src/Configuration/IAiModelDataPipeline.cs`	Public interface — 11 mutating methods + 6 read-only properties
`src/Configuration/AiModelDataPipeline.cs`	Default implementation — mirrors pre-refactor inline logic verbatim
`src/AiModelBuilder.cs`	9 Configure methods reduced to 3-line delegations + legacy-field sync
`tests/AiDotNet.Tests/UnitTests/Configuration/AiModelDataPipelineTests.cs`	14 unit tests exercising the component in isolation
`docs/internal/audit-2026-05-phase2a-aimodelbuilder-refactor.md`	Full ~12-slice migration plan

The migration plan (full document in `docs/internal/`)

#	Component	Configure methods	LoC est.
1	DataPipeline (this PR)	Preprocessing × 3, Postprocessing × 3, DataLoader, DataPreparation, Augmentation × 2, SetPostprocessingFitMaxRows	~700
2	TrainingCore	Model, Optimizer, Regularization, FitnessCalculator, FitDetector, TrainingPipeline, TrainingMonitor, CheckpointManager, MemoryManagement	~900
3	CrossValidation	CrossValidation	~150
4	Compliance	BiasDetector, FairnessEvaluator, AdversarialRobustness, Safety, Interpretability	~600
5	Performance	MixedPrecision, InferenceOptimizations, JitCompilation, PlanCaching, GpuAcceleration, Quantization, Compression	~500
6	WorkflowOrchestration	FederatedLearning, DistributedTraining, PipelineParallelism, ReinforcementLearning, AutoML, HyperparameterOptimizer, CurriculumLearning, MetaLearning	~1,400
7	AdvancedLearning	KnowledgeDistillation, LoRA, FineTuning, SelfSupervisedLearning, ProgramSynthesis	~800
8	RagAndKnowledge	RetrievalAugmentedGeneration, KnowledgeGraph	~500
9	Storage	ExperimentTracker, ModelRegistry, DataVersionControl, Versioning, Caching, ABTesting	~400
10	Observability	Benchmarking, Profiling, Telemetry, GpuDiagnostics	~300
11	AgentAndExport	AgentAssistance, AskAgentAsync, Reasoning, Export, WeightStreaming	~600
12	LicenseAndCompat	LicenseKey resolution	~200

After all 12 slices land, AiModelBuilder.cs shrinks from ~9.5K LoC to ~1.5K (constructor + facade delegation + BuildAsync orchestration).

The double-write pattern (slice-safe migration)

To keep blast radius minimal on this 9,511-line file, each migrated Configure* method writes to BOTH the component AND the existing private field. BuildAsync and partial-class siblings continue to read from the legacy fields unchanged. Slice 2 is the one that replaces those callsites with property reads on _dataPipeline — once that happens for each concern's slice, the legacy fields can be deleted.

This means slice 1 carries near-zero risk of behaviour regression (the component does exactly what the inline code did, and the fields stay in sync) while still proving the pattern works for the remaining 11 slices.

Why no `[Obsolete]` annotations during the refactor

The previous audit cycle marked unfinished VLA stubs [Obsolete] and the maintainer rejected it as a lazy hand-wave. Same principle: the public Configure methods are not "obsolete" — they are the supported way to configure the builder and they will remain so indefinitely. The refactor changes implementation, not contract.

Test coverage

14 xUnit tests in AiModelDataPipelineTests exercise the component in isolation (no AiModelBuilder instance involved), covering:

Initial state — all slots null
ConfigurePreprocessing with null args (Action / transformer / pipeline overloads) → AutoML defaults applied
ConfigurePreprocessing with explicit pipeline → exact instance retained
ConfigurePostprocessing with null action → empty pipeline (no universal defaults)
SetPostprocessingFitMaxRows positive value stored
SetPostprocessingFitMaxRows zero / negative / null → clears to null
ConfigureDataLoader null → ArgumentNullException
ConfigureDataPreparation null → ArgumentNullException
ConfigureDataPreparation valid → pipeline built + stored + registry side-effect
ConfigureAugmentation null → modality auto-detected default (verified with Matrix → Tabular)
ConfigureAugmentation explicit config → exact instance retained
Interface-typed reference works correctly

All 14 pass in 51 ms.

Verification

dotnet build src/AiDotNet.csproj -c Release -f net10.0 — 0 errors
dotnet build src/AiDotNet.csproj -c Release -f net471 — 0 errors
dotnet test tests/AiDotNet.Tests/AiDotNetTests.csproj --filter Configuration.AiModelDataPipelineTests — 14/14 pass
(Follow-up gate) AiModelBuilder* integration regression suite (slice 2 will explicitly re-run these as part of its critical-path verification)

Honest scope notes

This is slice 1 of 12 — AiModelBuilder.cs is still 9,500+ lines. The remaining 11 concerns are tracked as follow-up PRs, each ~500-1,500 LoC, following this same template.
Slice ordering and dependencies are documented in docs/internal/audit-2026-05-phase2a-aimodelbuilder-refactor.md § "Sequencing dependencies between slices". Slices 1, 5, 8, 12 are independent; the rest depend on slice 2 (TrainingCore).
DataPreparationRegistry<T>.Current global side-effect is preserved verbatim. Subsequent slices may refactor this into an explicit dependency, but during slice 1 we keep the exact pre-refactor behaviour.

🤖 Generated with Claude Code

…iModelBuilder Foundation PR for the Phase 2a AiModelBuilder DI refactor (audit finding #12). Establishes the extraction pattern and ships the first of ~12 concern-component splits that will collectively reduce src/AiModelBuilder.cs from 9,511 LoC to ~1,500 LoC. Full plan documented in docs/internal/audit-2026-05-phase2a- aimodelbuilder-refactor.md. Hard invariant maintained: every public AiModelBuilder API surface remains identical (signatures, return types, observable behaviour) — only the internal composition changes. Existing tests stay green. Changes: * New interface IAiModelDataPipeline<T, TInput, TOutput> in src/Configuration/. Exposes ConfigurePreprocessing (3 overloads), ConfigurePostprocessing (3 overloads), SetPostprocessingFitMaxRows, ConfigureDataLoader, ConfigureDataPreparation, and ConfigureAugmentation (2 overloads) plus get-only readout properties for the configured state. * Default implementation AiModelDataPipeline<T, TInput, TOutput> in src/Configuration/. Mirrors the pre-refactor inline logic verbatim: AutoML defaults (SimpleImputer mean + StandardScaler) when no preprocessing args supplied, empty postprocessing pipeline when no args, DataPreparationRegistry side-effect preserved, modality-auto- detected augmentation defaults (image / tabular / audio / text / video). * AiModelBuilder refactor: 9 Configure-method bodies (preprocessing x3, postprocessing x3, SetPostprocessingFitMaxRows, ConfigureDataLoader, ConfigureDataPreparation, ConfigureAugmentation) reduced to 3-line delegations to _dataPipeline + legacy-field sync. Legacy private fields (_preprocessingPipeline, etc.) stay in place as synced caches so BuildAsync and partial-class siblings continue to read from them unchanged. Slice 2 will migrate those callsites to the component's properties directly. * 14 xUnit tests in tests/AiDotNet.Tests/UnitTests/Configuration/ AiModelDataPipelineTests.cs exercise the component in isolation (initial state, null-arg defaults for every overload, positive / zero / negative SetPostprocessingFitMaxRows, ConfigureDataLoader null guard, ConfigureDataPreparation null guard + builder invocation + registry side-effect, ConfigureAugmentation modality auto-detect + explicit-config pass-through, interface implementation). All 14 pass. * docs/internal/audit-2026-05-phase2a-aimodelbuilder-refactor.md documents the full ~12-slice plan: concern groupings, per-PR shape, sequencing dependencies, backward-compat contract, why composition not inheritance, why no [Obsolete] annotations during the refactor, testing strategy, open risks. This document is the source of truth that subsequent slices reference. Verified: * Core builds clean on net10.0 (0 errors) * Core builds clean on net471 (0 errors) * 14 new unit tests pass (51 ms)

coderabbitai · 2026-05-23T03:25:53Z

Warning

Review limit reached

@ooples, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 2 reviews/hour. Refill in 7 minutes and 33 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fa5f9a8a-733e-45a9-9bb2-1fc244ddffca

📥 Commits

Reviewing files that changed from the base of the PR and between 6218734 and 7e7b6a4.

📒 Files selected for processing (5)

docs/internal/audit-2026-05-phase2a-aimodelbuilder-refactor.md
src/AiModelBuilder.cs
src/Configuration/AiModelDataPipeline.cs
src/Configuration/IAiModelDataPipeline.cs
tests/AiDotNet.Tests/UnitTests/Configuration/AiModelDataPipelineTests.cs

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch audit/2026-05-phase2a-aimodelbuilder-di

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

vercel · 2026-05-23T03:25:54Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
aidotnet_website	Ready	Preview, Comment	May 23, 2026 3:25am
aidotnet-playground-api	Ready	Preview, Comment	May 23, 2026 3:25am

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

github-actions · 2026-05-23T04:25:00Z

🤖 PR Title Auto-Fixed

Your PR title was automatically updated to follow Conventional Commits format.

Original title:
audit(2026-05) phase 2a slice 1: extract data-pipeline concern from AiModelBuilder

New title:
chore: audit(2026-05) phase 2a slice 1: extract data-pipeline concern from AiModelBuilder

Detected type: chore: (default type)
Version impact: No release

Valid types and their effects:

feat: - New feature (MINOR bump: 0.1.0 → 0.2.0)
fix: - Bug fix (MINOR bump)
docs: - Documentation (MINOR bump)
refactor: - Code refactoring (MINOR bump)
perf: - Performance improvement (MINOR bump)
test: - Tests only (no release)
chore: - Build/tooling (no release)
ci: - CI/CD changes (no release)
style: - Code formatting (no release)
deps: - Dependency update (no release)

If the detected type is incorrect, you can manually edit the PR title.

Copilot AI review requested due to automatic review settings May 23, 2026 03:25

Copilot started reviewing on behalf of ooples May 23, 2026 03:25 View session

ooples mentioned this pull request May 23, 2026

chore: audit(2026-05) phase 2a slice 2: extract training-core concern from AiModelBuilder #1433

Open

3 tasks

Copilot AI reviewed May 23, 2026

View reviewed changes

github-actions Bot changed the title ~~audit(2026-05) phase 2a slice 1: extract data-pipeline concern from AiModelBuilder~~ chore: audit(2026-05) phase 2a slice 1: extract data-pipeline concern from AiModelBuilder May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: audit(2026-05) phase 2a slice 1: extract data-pipeline concern from AiModelBuilder#1432

chore: audit(2026-05) phase 2a slice 1: extract data-pipeline concern from AiModelBuilder#1432
ooples wants to merge 1 commit into
masterfrom
audit/2026-05-phase2a-aimodelbuilder-di

ooples commented May 23, 2026

Uh oh!

coderabbitai Bot commented May 23, 2026

Review limit reached

Uh oh!

vercel Bot commented May 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ooples commented May 23, 2026

Summary

Hard invariant

What this slice extracts

The migration plan (full document in docs/internal/)

The double-write pattern (slice-safe migration)

Why no [Obsolete] annotations during the refactor

Test coverage

Verification

Honest scope notes

Uh oh!

coderabbitai Bot commented May 23, 2026

Review limit reached

Uh oh!

vercel Bot commented May 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

The migration plan (full document in `docs/internal/`)

Why no `[Obsolete]` annotations during the refactor