Skip to content

Improve architecture, code quality, and test coverage for HTML/XML/RSS/sitemap generation from article.md (render-articles pipeline)Β #2278

@pethers

Description

@pethers

πŸ“‹ Issue Type

Architecture / Code Quality / Testing

🎯 Objective

Refactor the HTML rendering pipeline (article.md β†’ news/*.html) and all related static output generation (sitemap XML, sitemap HTML, RSS feed, news indexes) into well-architected bounded contexts with strict typing, comprehensive tests, and clear separation of concerns.

πŸ“Š Current State

The rendering/publishing pipeline involves multiple scripts with overlapping responsibilities:

HTML Rendering:

  • scripts/render-articles.ts β€” Main renderer (article.md β†’ news/{date}-{type}-{lang}.html)
  • scripts/render-lib/article.ts β€” Article document model
  • scripts/render-lib/chrome.ts β€” HTML chrome/template wrapping
  • scripts/render-lib/chrome-i18n.ts β€” Internationalized chrome strings
  • scripts/render-lib/constants.ts β€” Shared constants
  • scripts/render-lib/jsonld.ts β€” JSON-LD structured data
  • scripts/render-lib/url-helpers.ts β€” URL construction
  • scripts/render-lib/article-types.ts β€” Article type registry
  • scripts/render-lib/markdown/ (7 files) β€” Markdown β†’ HTML pipeline (rehype/remark plugins, Mermaid preprocessing, sanitization)

Supporting Generators:

  • scripts/generate-news-indexes/ β€” News index pages
  • scripts/generate-sitemap.ts + scripts/sitemap-xml/ β€” XML sitemap
  • scripts/generate-sitemap-html.ts + scripts/sitemap-html/ β€” Human-readable sitemap
  • scripts/generate-rss.ts + scripts/rss/ β€” RSS feed
  • scripts/generate-political-intelligence.ts β€” Political intelligence page
  • scripts/normalize-static-html-chrome.ts β€” Chrome normalization
  • scripts/backfill-translated-chrome.ts β€” Translation backfill
  • scripts/strip-legacy-chrome-script-tags.ts β€” Legacy cleanup
  • scripts/extract-news-metadata.ts β€” News metadata extraction
  • scripts/html-utils.ts β€” Shared HTML utilities

Issues:

  • Render pipeline mixes concerns: Markdown parsing, HTML templating, i18n, SEO, and sanitization in render-lib/
  • Multiple independent scripts for sitemap/RSS/indexes with duplicated HTML generation patterns
  • chrome.ts is likely large and handles too many responsibilities (header, footer, nav, SEO meta, language switcher)
  • Markdown pipeline in render-lib/markdown/ tightly coupled to rendering rather than being a standalone transformation
  • Inconsistent error handling between generators
  • Limited test coverage for HTML output correctness (structure, accessibility, valid HTML5)
  • No shared "static output writer" abstraction β€” each generator handles file I/O independently

πŸš€ Desired State

Architecture (Bounded Contexts)

scripts/rendering/
β”œβ”€β”€ interfaces.ts              # Shared types for all rendering
β”œβ”€β”€ markdown-to-html/          # Pure Markdown β†’ HTML transformation
β”‚   β”œβ”€β”€ pipeline.ts            # rehype/remark pipeline (compose plugins)
β”‚   β”œβ”€β”€ plugins/
β”‚   β”‚   β”œβ”€β”€ mermaid-preprocess.ts
β”‚   β”‚   β”œβ”€β”€ slug-prefixed.ts
β”‚   β”‚   β”œβ”€β”€ wrap-tables.ts
β”‚   β”‚   └── sanitize.ts
β”‚   └── index.ts
β”œβ”€β”€ article-chrome/            # HTML page chrome (header/footer/nav)
β”‚   β”œβ”€β”€ template.ts           # Base HTML5 template
β”‚   β”œβ”€β”€ header.ts             # Site header with nav
β”‚   β”œβ”€β”€ footer.ts             # Site footer with sources
β”‚   β”œβ”€β”€ language-switcher.ts  # hreflang + UI switcher
β”‚   β”œβ”€β”€ seo.ts                # Meta tags, OG, Twitter Cards
β”‚   β”œβ”€β”€ jsonld.ts             # JSON-LD structured data
β”‚   └── index.ts
β”œβ”€β”€ i18n/                      # Internationalization for chrome
β”‚   β”œβ”€β”€ strings.ts            # UI string registry (14 langs)
β”‚   β”œβ”€β”€ rtl.ts                # RTL layout handling
β”‚   └── index.ts
β”œβ”€β”€ article-renderer/          # Compose markdown + chrome β†’ full page
β”‚   β”œβ”€β”€ renderer.ts           # Main article rendering
β”‚   └── index.ts
└── index.ts                   # Public API

scripts/static-outputs/
β”œβ”€β”€ interfaces.ts              # Shared types for generators
β”œβ”€β”€ sitemap-xml/               # XML sitemap generation
β”œβ”€β”€ sitemap-html/              # Human-readable sitemap
β”œβ”€β”€ rss/                       # RSS feed generation
β”œβ”€β”€ news-indexes/              # News index pages
β”œβ”€β”€ political-intelligence/    # Political intelligence page
β”œβ”€β”€ writer.ts                  # Shared file writer with validation
└── index.ts

Code Quality

  • Clear separation: Markdown transformation is independent of HTML chrome
  • Each chrome component (header, footer, nav, SEO) is independently testable
  • i18n strings centralized β€” no scattered translations
  • Shared file writer handles I/O, encoding, and validation consistently
  • All HTML output validated against HTML5 spec in tests
  • No any types β€” typed interfaces for article metadata, page context, i18n strings

Test Quality

  • Each component has unit tests verifying:
    • Correct HTML5 structure output
    • WCAG 2.1 AA compliance (ARIA, contrast, keyboard nav)
    • All 14 language outputs (including RTL for AR/HE)
    • JSON-LD correctness (Schema.org NewsArticle)
    • hreflang alternate links
    • RSS/sitemap XML validity
  • Snapshot tests for rendered HTML stability
  • Integration tests for full render pipeline
  • Link integrity tests for generated cross-references

πŸ”§ Implementation Approach

  1. Map current architecture: Document data flow through render-articles.ts β†’ render-lib/ β†’ file output
  2. Define interfaces: Type definitions for article context, page metadata, chrome options, i18n strings
  3. Separate Markdown pipeline: Extract render-lib/markdown/ into standalone bounded context
  4. Decompose chrome: Split chrome.ts into focused components (header, footer, nav, SEO, jsonld)
  5. Centralize i18n: Single source of truth for all UI strings across 14 languages
  6. Unify static generators: Shared writer + validation for sitemap/RSS/indexes
  7. Write comprehensive tests: HTML structure, accessibility, i18n, SEO validation
  8. Validate: Ensure all generated HTML/XML/RSS output is identical before/after refactoring

πŸ“š Key Files

File Purpose
scripts/render-articles.ts Main article renderer
scripts/render-lib/ (11+ files) Rendering library
scripts/render-lib/markdown/ (7 files) Markdown pipeline
scripts/render-lib/aggregator/ (9 files) Content aggregation
scripts/generate-news-indexes/ News index generation
scripts/generate-sitemap.ts + scripts/sitemap-xml/ XML sitemap
scripts/generate-sitemap-html.ts + scripts/sitemap-html/ HTML sitemap
scripts/generate-rss.ts + scripts/rss/ RSS feed
scripts/generate-political-intelligence.ts Intelligence page
scripts/html-utils.ts Shared HTML utilities
scripts/normalize-static-html-chrome.ts Chrome normalization
scripts/backfill-translated-chrome.ts Translation backfill
tests/render-lib.test.ts Existing render tests
tests/render-lib-architecture.test.ts Architecture tests
tests/generate-rss.test.ts RSS generation tests
tests/generate-sitemap.test.ts Sitemap tests
tests/generate-sitemap-html.test.ts HTML sitemap tests
tests/generate-news-indexes.test.ts News index tests
Article-Generation.md System documentation

βœ… Acceptance Criteria

  • Markdown-to-HTML pipeline is independently testable (no chrome dependency)
  • Chrome decomposed into focused components (<200 lines each)
  • i18n strings centralized with type-safe access for all 14 languages
  • Shared file writer validates output before writing
  • All HTML output passes HTML5 validation in tests
  • JSON-LD output validates against Schema.org NewsArticle
  • RSS output validates against RSS 2.0 spec
  • Sitemap XML validates against sitemap.org schema
  • Each component has >90% line coverage
  • RTL languages (AR, HE) correctly handled in all chrome components
  • No any types in rendering code
  • Generated output byte-identical before/after refactoring
  • All existing tests pass (npm test)
  • Article-Generation.md updated to reflect new rendering architecture

πŸ€– Recommended Agent

code-quality-engineer β€” Architecture refactoring, HTML/CSS quality, bounded contexts, test coverage

🏷️ Labels

enhancement, refactor, testing, component:content-generation, news-generation, html-css, i18n, priority-high, size-xl

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions