Skip to content

Design: generalize Content_Parser to support multiple standard.site content formats #45

@kraftbj

Description

@kraftbj

Summary

The current Content_Parser interface and atmosphere_content_parser filter assume a single parser produces a single content record per post. site.standard.document is a union field that accepts multiple lexicon types — long-term we likely want to support more than just at.markpub.markdown (e.g. HTML, plain text, or whatever standard.site adds next), and possibly emit more than one representation per post.

This issue tracks design-and-discuss before a second parser implementation lands and forces a breaking interface change.

Current Shape

  • includes/content-parser/interface-content-parser.phpparse() returns a single ?array shaped for one lexicon type. get_type() returns one NSID.
  • includes/class-atmosphere.php:50 — registers a single Markpub parser via atmosphere_content_parser filter.
  • includes/transformer/class-document.php:172 — fetches one parser, calls parse(), sets content to the single result.

Constraints this imposes:

  1. Only one parser can win the filter (last-wins). Two third-party parsers can't coexist.
  2. Document can't pick a parser based on the post (e.g. classic content → HTML parser; block content → Markpub).
  3. No path to emit multiple representations even though the union allows it.

Discussion Points

Sketching a few directions — not picking one yet:

A. Registry of parsers by NSID
Replace the single-parser filter with a registry. Parsers register against their get_type(). Document picks the preferred one per post (config or filter hook), or iterates and emits all that produce non-null output.

B. Per-NSID filter pattern
Keep the filter shape but namespace it: atmosphere_content_parser_at_markpub_markdown, atmosphere_content_parser_at_html (or similar). Document iterates across known NSIDs.

C. Parser produces canonical IR; Document projects
Parser returns a structured intermediate (e.g. block tree) and a list of supported output types. Document negotiates and projects. Most flexible, biggest refactor.

Open Questions

  • Does standard.site already define more than one content format today, or is this purely future-proofing?
  • Should the post author be able to override which format publishes (per-post meta), or is this a site-wide config?
  • Does the union field accept an array (multiple representations) or strictly one of N?
  • Where does language/locale or accessibility metadata live — per-parser or shared?

Out of Scope

  • Implementing a second parser. This issue is interface-design only; the next parser PR consumes whatever shape lands here.
  • Changing the Markpub parser's behavior. Markpub keeps producing at.markpub.markdown regardless of the chosen registry shape.

Context

Raised by @pfefferle in #9 (comment) ("we should maybe find a more generic way, to also transform into the other standard.site content formats").

Markpub (#9) is shipping with the current single-parser interface; this issue captures the follow-up so the design is settled before a second parser implementation forces our hand.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions