This repository is not a library.
It is a real production case study showing how system architecture determines whether AI can solve complex problems reliably.
OCR output from book Tables of Contents is highly inconsistent:
- Page numbers appear before/after titles
- Mixed casing styles
- Numbered vs non-numbered headings
- Multilingual text fragments
- Nested structures without clear delimiters
Raw OCR text is not usable directly.
Manual cleanup is slow and error-prone.
Naive AI prompts fail on edge cases.
Design a deterministic processing architecture where:
- AI is controlled, not trusted blindly
- Rules scale without conflict
- Edge cases do not break the pipeline
- Output format is stable across books
Most AI failures are not model failures.
They are architecture failures.
The solution was not "better prompting" —
but a layered system with strict contracts.
- Layered processing pipeline
- One function = one responsibility
- Strict input/output contracts
- Error-tolerant parsing
- Deterministic normalization before AI reasoning
- Manual override allowed for rare cases
After architecture redesign:
- Rule count increased safely without conflict
- Edge cases stopped breaking the pipeline
- AI outputs became predictable
- Processing time dropped dramatically
This is a demonstration that:
Correct architecture enables AI to solve hard problems reliably.
Not every system needs a bigger model.
Many need a better structure.
| File | Purpose |
|---|---|
| ARCHITECTURE.md | System layer design |
| RULES.md | Rule design philosophy |
| AI_GOVERNANCE.md | How AI is constrained |
| before_after.md | Real input/output examples |
CTOs, system architects, technical founders.
This repo is about thinking, not tooling.
Shared for learning and discussion.