Description
The CodebaseToText class has grown to 600+ lines and handles multiple responsibilities (file traversal, pattern matching, content processing, output generation). This violates the Single Responsibility Principle and makes the code hard to maintain and test.
Current Issues
- Single class with 25+ methods
- Mixed concerns: file I/O, pattern matching, formatting, CLI handling
- Difficult to unit test individual components
- Hard to extend with new output formats or processing modes
Proposed Architecture
CodebaseToText (orchestrator)
├── FileTreeGenerator (directory traversal and tree generation)
├── PatternMatcher (exclusion pattern logic)
├── ContentProcessor (file reading and processing)
├── OutputFormatter (text/docx generation)
└── GitHubHandler (repository cloning and cleanup)
Implementation Plan
- Extract
PatternMatcher class first (lowest risk)
- Extract
OutputFormatter for text/docx generation
- Extract
FileTreeGenerator for directory operations
- Extract
GitHubHandler for Git operations
- Refactor main class to use composition
Files Affected
codebase_to_text/codebase_to_text.py (entire file)
- New files to create:
pattern_matcher.py
output_formatter.py
file_tree_generator.py
github_handler.py
Acceptance Criteria
Benefits
- Easier to add new output formats (PDF, HTML, etc.)
- Simpler unit testing of individual components
- Better separation of concerns
- Easier to modify pattern matching logic
- Cleaner code organization
Description
The
CodebaseToTextclass has grown to 600+ lines and handles multiple responsibilities (file traversal, pattern matching, content processing, output generation). This violates the Single Responsibility Principle and makes the code hard to maintain and test.Current Issues
Proposed Architecture
Implementation Plan
PatternMatcherclass first (lowest risk)OutputFormatterfor text/docx generationFileTreeGeneratorfor directory operationsGitHubHandlerfor Git operationsFiles Affected
codebase_to_text/codebase_to_text.py(entire file)pattern_matcher.pyoutput_formatter.pyfile_tree_generator.pygithub_handler.pyAcceptance Criteria
Benefits