Skip to content

Improve architecture, code quality, and test coverage for article.md generation pipeline (aggregate-analysis β†’ article.md)Β #2277

@pethers

Description

@pethers

πŸ“‹ Issue Type

Architecture / Code Quality / Testing

🎯 Objective

Refactor the article.md generation pipeline into a well-architected bounded-context system with strict typing, clear interfaces, comprehensive tests, and proper separation of concerns. This covers everything from analysis artifacts to the canonical article.md output.

πŸ“Š Current State

The article.md generation pipeline involves:

  • scripts/aggregate-analysis.ts β€” main orchestrator (turns analysis folder into article.md)
  • scripts/render-lib/aggregator/ β€” 9 sub-modules (aggregate, cleaning/, frontmatter, order, per-document, reader-guide, reader-guide-i18n, seo/, sources-appendix)
  • scripts/analysis-reader.ts β€” reads analysis artifacts
  • scripts/analysis-references.ts β€” cross-reference utilities
  • scripts/validate-article.ts β€” post-generation validation
  • scripts/validate-methodology-reflection.ts β€” methodology validation
  • scripts/populate-analysis-data.ts β€” pre-population of analysis data
  • scripts/statistical-claims-detector.ts β€” detects statistical claims for verification

Issues:

  • Bounded context boundaries unclear between aggregator sub-modules
  • Some modules may have overlapping responsibilities (e.g., cleaning/ vs frontmatter.ts)
  • Type definitions spread across files rather than centralized interfaces
  • Test coverage for aggregator sub-modules (render-lib-architecture.test.ts, render-lib-leaf-modules.test.ts) may not cover all edge cases
  • Error handling patterns inconsistent across pipeline stages
  • No clear pipeline abstraction connecting stages (reader β†’ validator β†’ aggregator β†’ writer)

πŸš€ Desired State

Architecture (Bounded Contexts)

scripts/article-pipeline/
β”œβ”€β”€ interfaces.ts          # All shared types/interfaces for pipeline
β”œβ”€β”€ pipeline.ts            # Pipeline orchestrator (compose stages)
β”œβ”€β”€ stages/
β”‚   β”œβ”€β”€ read/              # Read analysis artifacts from filesystem
β”‚   β”‚   β”œβ”€β”€ artifact-reader.ts
β”‚   β”‚   β”œβ”€β”€ artifact-inventory.ts
β”‚   β”‚   └── index.ts
β”‚   β”œβ”€β”€ validate/          # Validate completeness and quality
β”‚   β”‚   β”œβ”€β”€ gate-checker.ts
β”‚   β”‚   β”œβ”€β”€ methodology-validator.ts
β”‚   β”‚   β”œβ”€β”€ statistical-claims.ts
β”‚   β”‚   └── index.ts
β”‚   β”œβ”€β”€ aggregate/         # Transform artifacts into article sections
β”‚   β”‚   β”œβ”€β”€ section-ordering.ts
β”‚   β”‚   β”œβ”€β”€ frontmatter-generator.ts
β”‚   β”‚   β”œβ”€β”€ content-cleaner.ts
β”‚   β”‚   β”œβ”€β”€ per-document-synthesis.ts
β”‚   β”‚   β”œβ”€β”€ reader-guide.ts
β”‚   β”‚   β”œβ”€β”€ sources-appendix.ts
β”‚   β”‚   └── index.ts
β”‚   β”œβ”€β”€ enrich/            # Add cross-references, SEO, metadata
β”‚   β”‚   β”œβ”€β”€ cross-references.ts
β”‚   β”‚   β”œβ”€β”€ seo-metadata.ts
β”‚   β”‚   └── index.ts
β”‚   └── write/             # Write final article.md
β”‚       β”œβ”€β”€ markdown-writer.ts
β”‚       └── index.ts
└── index.ts               # Public API

Code Quality

  • Each stage has a clear input/output interface (typed)
  • Pipeline is composable β€” stages can be tested independently
  • Error propagation uses Result types or typed exceptions
  • No any types β€” full strict TypeScript
  • Maximum 200 lines per module (single responsibility)

Test Quality

  • Each stage has dedicated test file with:
    • Happy path tests with real fixture data
    • Edge cases (empty analysis folders, missing artifacts, malformed files)
    • Error handling tests (filesystem errors, validation failures)
    • Integration tests for full pipeline
  • Test fixtures in tests/fixtures/ for reproducible article generation
  • Snapshot tests for generated article.md output stability

πŸ”§ Implementation Approach

  1. Map current architecture: Document current data flow through aggregate-analysis.ts β†’ render-lib/aggregator/
  2. Define interfaces: Create shared type definitions for pipeline stages
  3. Extract and refactor: Move aggregator logic into bounded-context stages
  4. Add pipeline orchestrator: Composable pipeline with typed stage inputs/outputs
  5. Write tests: Comprehensive unit tests for each stage + integration tests
  6. Validate: Ensure generated article.md output is identical before/after refactoring
  7. Update docs: Update Article-Generation.md Β§"How article.md Is Generated"

πŸ“š Key Files

File Purpose
scripts/aggregate-analysis.ts Main aggregation orchestrator
scripts/render-lib/aggregator/ (9 files) Aggregator sub-modules
scripts/analysis-reader.ts Analysis artifact reader
scripts/analysis-references.ts Cross-reference utilities
scripts/validate-article.ts Post-generation validation
scripts/validate-methodology-reflection.ts Methodology validation
scripts/populate-analysis-data.ts Analysis data pre-population
scripts/statistical-claims-detector.ts Statistical claims detection
tests/render-lib-architecture.test.ts Existing architecture tests
tests/render-lib-leaf-modules.test.ts Existing leaf module tests
tests/validate-article.test.ts Existing validation tests
tests/validate-methodology-reflection.test.ts Existing methodology tests
Article-Generation.md System documentation
.github/prompts/04-analysis-pipeline.md Analysis pipeline contract
.github/prompts/05-analysis-gate.md Gate validation rules

βœ… Acceptance Criteria

  • Pipeline stages have explicit TypeScript interfaces for inputs/outputs
  • Each bounded context (read/validate/aggregate/enrich/write) is independently testable
  • No module exceeds 200 lines (split large modules)
  • Zero any types in pipeline code
  • Each stage has >90% line coverage in unit tests
  • Integration test validates full pipeline produces correct article.md
  • Snapshot tests ensure output stability across refactoring
  • Article-Generation.md updated to reflect new architecture
  • All existing tests pass (npm test)
  • Generated article.md output is byte-identical before/after refactoring (regression-free)

πŸ€– Recommended Agent

code-quality-engineer β€” Architecture refactoring, bounded contexts, TypeScript interfaces, test coverage

🏷️ Labels

enhancement, refactor, testing, component:content-generation, news-generation, priority-high, size-xl

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions