chore: audit(2026-05) phase 2a slice 1: extract data-pipeline concern from AiModelBuilder#1432
chore: audit(2026-05) phase 2a slice 1: extract data-pipeline concern from AiModelBuilder#1432ooples wants to merge 1 commit into
Conversation
…iModelBuilder Foundation PR for the Phase 2a AiModelBuilder DI refactor (audit finding #12). Establishes the extraction pattern and ships the first of ~12 concern-component splits that will collectively reduce src/AiModelBuilder.cs from 9,511 LoC to ~1,500 LoC. Full plan documented in docs/internal/audit-2026-05-phase2a- aimodelbuilder-refactor.md. Hard invariant maintained: every public AiModelBuilder API surface remains identical (signatures, return types, observable behaviour) — only the internal composition changes. Existing tests stay green. Changes: * New interface IAiModelDataPipeline<T, TInput, TOutput> in src/Configuration/. Exposes ConfigurePreprocessing (3 overloads), ConfigurePostprocessing (3 overloads), SetPostprocessingFitMaxRows, ConfigureDataLoader, ConfigureDataPreparation, and ConfigureAugmentation (2 overloads) plus get-only readout properties for the configured state. * Default implementation AiModelDataPipeline<T, TInput, TOutput> in src/Configuration/. Mirrors the pre-refactor inline logic verbatim: AutoML defaults (SimpleImputer mean + StandardScaler) when no preprocessing args supplied, empty postprocessing pipeline when no args, DataPreparationRegistry side-effect preserved, modality-auto- detected augmentation defaults (image / tabular / audio / text / video). * AiModelBuilder refactor: 9 Configure-method bodies (preprocessing x3, postprocessing x3, SetPostprocessingFitMaxRows, ConfigureDataLoader, ConfigureDataPreparation, ConfigureAugmentation) reduced to 3-line delegations to _dataPipeline + legacy-field sync. Legacy private fields (_preprocessingPipeline, etc.) stay in place as synced caches so BuildAsync and partial-class siblings continue to read from them unchanged. Slice 2 will migrate those callsites to the component's properties directly. * 14 xUnit tests in tests/AiDotNet.Tests/UnitTests/Configuration/ AiModelDataPipelineTests.cs exercise the component in isolation (initial state, null-arg defaults for every overload, positive / zero / negative SetPostprocessingFitMaxRows, ConfigureDataLoader null guard, ConfigureDataPreparation null guard + builder invocation + registry side-effect, ConfigureAugmentation modality auto-detect + explicit-config pass-through, interface implementation). All 14 pass. * docs/internal/audit-2026-05-phase2a-aimodelbuilder-refactor.md documents the full ~12-slice plan: concern groupings, per-PR shape, sequencing dependencies, backward-compat contract, why composition not inheritance, why no [Obsolete] annotations during the refactor, testing strategy, open risks. This document is the source of truth that subsequent slices reference. Verified: * Core builds clean on net10.0 (0 errors) * Core builds clean on net471 (0 errors) * 14 new unit tests pass (51 ms)
|
Warning Review limit reached
Your plan currently allows 2 reviews/hour. Refill in 7 minutes and 33 seconds. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more review capacity refills, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
🤖 PR Title Auto-Fixed Your PR title was automatically updated to follow Conventional Commits format. Original title: New title: Detected type: Valid types and their effects:
If the detected type is incorrect, you can manually edit the PR title. |
Summary
Foundation PR for the Phase 2a
AiModelBuilderDI refactor (audit finding #12). Establishes the extraction pattern, ships the first of ~12 concern-component splits, and documents the full migration plan that subsequent slices will follow.Scope is intentionally minimal so the pattern can be reviewed in isolation before the larger slices land. The other 11 slices each follow this same template.
Hard invariant
Every public
AiModelBuilderAPI surface that exists on master remains identical (signatures, return types, observable behaviour) at the end of this PR. Existing tests stay green. Only the internal composition changes.What this slice extracts
The data-pipeline concern: preprocessing, postprocessing, data loading, data preparation, and augmentation. 9 Configure-method bodies (~700 LoC of inline logic) migrated into a separately-testable component.
src/Configuration/IAiModelDataPipeline.cssrc/Configuration/AiModelDataPipeline.cssrc/AiModelBuilder.cstests/AiDotNet.Tests/UnitTests/Configuration/AiModelDataPipelineTests.csdocs/internal/audit-2026-05-phase2a-aimodelbuilder-refactor.mdThe migration plan (full document in
docs/internal/)After all 12 slices land,
AiModelBuilder.csshrinks from ~9.5K LoC to ~1.5K (constructor + facade delegation +BuildAsyncorchestration).The double-write pattern (slice-safe migration)
To keep blast radius minimal on this 9,511-line file, each migrated
Configure*method writes to BOTH the component AND the existing private field.BuildAsyncand partial-class siblings continue to read from the legacy fields unchanged. Slice 2 is the one that replaces those callsites with property reads on_dataPipeline— once that happens for each concern's slice, the legacy fields can be deleted.This means slice 1 carries near-zero risk of behaviour regression (the component does exactly what the inline code did, and the fields stay in sync) while still proving the pattern works for the remaining 11 slices.
Why no
[Obsolete]annotations during the refactorThe previous audit cycle marked unfinished VLA stubs
[Obsolete]and the maintainer rejected it as a lazy hand-wave. Same principle: the public Configure methods are not "obsolete" — they are the supported way to configure the builder and they will remain so indefinitely. The refactor changes implementation, not contract.Test coverage
14 xUnit tests in
AiModelDataPipelineTestsexercise the component in isolation (noAiModelBuilderinstance involved), covering:ConfigurePreprocessingwith null args (Action / transformer / pipeline overloads) → AutoML defaults appliedConfigurePreprocessingwith explicit pipeline → exact instance retainedConfigurePostprocessingwith null action → empty pipeline (no universal defaults)SetPostprocessingFitMaxRowspositive value storedSetPostprocessingFitMaxRowszero / negative / null → clears to nullConfigureDataLoadernull →ArgumentNullExceptionConfigureDataPreparationnull →ArgumentNullExceptionConfigureDataPreparationvalid → pipeline built + stored + registry side-effectConfigureAugmentationnull → modality auto-detected default (verified with Matrix → Tabular)ConfigureAugmentationexplicit config → exact instance retainedAll 14 pass in 51 ms.
Verification
dotnet build src/AiDotNet.csproj -c Release -f net10.0— 0 errorsdotnet build src/AiDotNet.csproj -c Release -f net471— 0 errorsdotnet test tests/AiDotNet.Tests/AiDotNetTests.csproj --filter Configuration.AiModelDataPipelineTests— 14/14 passAiModelBuilder*integration regression suite (slice 2 will explicitly re-run these as part of its critical-path verification)Honest scope notes
AiModelBuilder.csis still 9,500+ lines. The remaining 11 concerns are tracked as follow-up PRs, each ~500-1,500 LoC, following this same template.docs/internal/audit-2026-05-phase2a-aimodelbuilder-refactor.md§ "Sequencing dependencies between slices". Slices 1, 5, 8, 12 are independent; the rest depend on slice 2 (TrainingCore).DataPreparationRegistry<T>.Currentglobal side-effect is preserved verbatim. Subsequent slices may refactor this into an explicit dependency, but during slice 1 we keep the exact pre-refactor behaviour.🤖 Generated with Claude Code