Design: generalize Content_Parser to support multiple standard.site content formats

## Summary

The current `Content_Parser` interface and `atmosphere_content_parser` filter assume a single parser produces a single content record per post. `site.standard.document` is a union field that accepts multiple lexicon types — long-term we likely want to support more than just `at.markpub.markdown` (e.g. HTML, plain text, or whatever standard.site adds next), and possibly emit more than one representation per post.

This issue tracks design-and-discuss before a second parser implementation lands and forces a breaking interface change.

## Current Shape

- `includes/content-parser/interface-content-parser.php` — `parse()` returns a single `?array` shaped for one lexicon type. `get_type()` returns one NSID.
- `includes/class-atmosphere.php:50` — registers a single `Markpub` parser via `atmosphere_content_parser` filter.
- `includes/transformer/class-document.php:172` — fetches one parser, calls `parse()`, sets `content` to the single result.

Constraints this imposes:

1. Only one parser can win the filter (last-wins). Two third-party parsers can't coexist.
2. Document can't pick a parser based on the post (e.g. classic content → HTML parser; block content → Markpub).
3. No path to emit multiple representations even though the union allows it.

## Discussion Points

Sketching a few directions — not picking one yet:

**A. Registry of parsers by NSID**
Replace the single-parser filter with a registry. Parsers register against their `get_type()`. Document picks the preferred one per post (config or filter hook), or iterates and emits all that produce non-null output.

**B. Per-NSID filter pattern**
Keep the filter shape but namespace it: `atmosphere_content_parser_at_markpub_markdown`, `atmosphere_content_parser_at_html` (or similar). Document iterates across known NSIDs.

**C. Parser produces canonical IR; Document projects**
Parser returns a structured intermediate (e.g. block tree) and a list of supported output types. Document negotiates and projects. Most flexible, biggest refactor.

## Open Questions

- Does standard.site already define more than one content format today, or is this purely future-proofing?
- Should the post author be able to override which format publishes (per-post meta), or is this a site-wide config?
- Does the union field accept an array (multiple representations) or strictly one of N?
- Where does language/locale or accessibility metadata live — per-parser or shared?

## Out of Scope

- Implementing a second parser. This issue is interface-design only; the next parser PR consumes whatever shape lands here.
- Changing the Markpub parser's behavior. Markpub keeps producing `at.markpub.markdown` regardless of the chosen registry shape.

## Context

Raised by @pfefferle in https://github.com/Automattic/wordpress-atmosphere/pull/9#issuecomment-4302689260 ("we should maybe find a more generic way, to also transform into the other standard.site content formats").

Markpub (#9) is shipping with the current single-parser interface; this issue captures the follow-up so the design is settled before a second parser implementation forces our hand.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design: generalize Content_Parser to support multiple standard.site content formats #45

Summary

Current Shape

Discussion Points

Open Questions

Out of Scope

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Design: generalize Content_Parser to support multiple standard.site content formats #45

Description

Summary

Current Shape

Discussion Points

Open Questions

Out of Scope

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions