Add Markpub content parser by pfefferle · Pull Request #9 · Automattic/wordpress-atmosphere

pfefferle · 2026-03-23T17:04:00Z

Proposed changes:

Add Markpub parser (at.markpub.markdown) that walks Gutenberg blocks via parse_blocks() and converts each to GFM markdown.
Register Markpub as the default content parser via the atmosphere_content_parser filter.
Add tests for all supported block types and filter integration.

Stacked on #8. Merge that first.

Other information:

Have you written new tests for your changes, if applicable?

Testing instructions:

Verify ?atproto preview on a published post returns valid JSON with a content field containing at.markpub.markdown data.
Verify block content converts to markdown correctly (headings, links, images, lists, code blocks).
Verify classic editor content falls back to inline HTML-to-markdown conversion.
Verify atmosphere_content_parser filter with __return_null disables the content field.
Run npm run env-test — all tests pass.

Changelog entry

Automatically create a changelog entry from the details below.

Changelog Entry Details

Significance

Patch
Minor
Major

Type

Added - for new features
Changed - for changes in existing functionality
Deprecated - for soon-to-be removed features
Removed - for now removed features
Fixed - for any bug fixes
Security - in case of vulnerabilities

Message

Add rich content support for standard.site documents using the Markpub format.

…rule

Ship a Markpub parser (at.markpub.markdown) as the default content parser, registered via the atmosphere_content_parser filter. - Add Markpub class that walks Gutenberg blocks via parse_blocks() and converts each to GFM markdown. - Register Markpub as the default parser in Atmosphere::init(). - Add tests for all supported block types and filter integration.

- Use a separate counter for ordered lists so numbering stays sequential when empty items are skipped. - Percent-encode parentheses in link URLs to prevent breaking markdown link syntax.

kraftbj

Thanks for this — the parser structure is clean and the namespaced-prefix conventions look good throughout. Left a few inline notes on some edge cases I spotted while reading; none of them block the overall approach.

- Widen Content_Parser::parse return type to ?array; Markpub returns null when the trimmed markdown is empty so Document omits the content field instead of shipping an empty record. Document skips the atmosphere_document_content filter on null to keep the filter contract strictly array-typed. - Drop 'table' from advertised extensions; core/table currently falls through to fallback and is stripped by wp_strip_all_tags, so the advertised capability didn't match emitted output. - Reorder the figcaption extraction so the closing-tag strip runs before wp_strip_all_tags. Otherwise sibling content after </figcaption> (e.g. a trailing <p> inside the same <figure>) would bleed into the caption text. - Wrap preg_replace / preg_replace_callback calls in safe_replace helpers that fall back to the original input on PCRE failure and emit a wp_trigger_error. Without this, a pathological input could erase the post body silently when preg_replace returns null and that null cascades through subsequent string ops. - Fix the phpcs:ignore rule on unused $post parameters to reference the sniff that actually fires (Generic.CodeAnalysis.UnusedFunction Parameter alongside VariableAnalysis), and document why $post is declared but unused.

pfefferle · 2026-04-22T06:25:31Z

Thanks for the review! All five points addressed in 8991043:

L36 phpcs:ignore rule + $post comment — both Generic.CodeAnalysis.UnusedFunctionParameter and VariableAnalysis.CodeAnalysis.VariableAnalysis.UnusedVariable actually fire on this parameter, so the ignore now lists both. Added a docblock note explaining that $post is required by the Content_Parser contract even though Markpub doesn't use it.

L65 advertised table extension — dropped from the extensions array. No core/table handler yet; the advertised capability should match what's actually emitted. Can re-add when a table transform lands.

L66 empty-markdown null sentinel — widened Content_Parser::parse return type to ?array. Markpub returns null when the trimmed markdown is empty; Document::get_content bails early on null and skips the atmosphere_document_content filter (keeping callbacks' array $content signature honest). New tests cover a post of only core/spacer blocks, literal empty post content, and the Document-side null-handling path (asserting the filter is not invoked).

L164 figcaption sibling-bleed — took your suggested order swap: preg_replace the surrounding figcaption markers first, then wp_strip_all_tags. Added a regression test with a trailing <p> inside the same <figure>; without the fix it fails with "Real captionShould not appear in caption" concatenated.

L367 preg_replace PCRE guard — added safe_replace / safe_replace_callback helpers that fall back to the unchanged subject and wp_trigger_error when preg_replace returns null. Applied to all inline conversions and the figcaption chain. Didn't TDD this one directly — reliably tripping PCRE limits in a unit test is awkward — but the new call sites are uniform and the helper is small enough to read at a glance.

paragraph(), fallback(), and the classic-content branch checked empty( $html ) on raw innerHTML. For content like <p> </p> the wrapper is non-empty, so these handlers fell through to inline_html_to_markdown() which trims to "". The empty string survived into the parts array and produced a spurious "\n\n" separator before the next block in mixed-content posts. Match the pattern heading() already uses: convert first, return null when the result is empty.

pfefferle · 2026-04-23T08:04:21Z

@kraftbj we should maybe find a more generic way, to also transform into the other standard.site content formats.

kraftbj · 2026-04-29T16:29:23Z

Agreed — the single-parser filter and one-NSID-per-implementation shape will get awkward as soon as a second standard.site format lands. Filed #45 to nail down the interface before that happens (registry vs per-NSID filter vs canonical-IR projection — open for discussion).

For this PR, let's move forward with the current shape and iterate. Markpub is the only parser today, so the registry refactor doesn't block anything user-facing, and #45 keeps the design conversation in its own thread.

pfefferle added the enhancement New feature label Mar 23, 2026

pfefferle self-assigned this Mar 23, 2026

pfefferle mentioned this pull request Mar 23, 2026

Add extensible content parser with Markpub support #3

Closed

6 tasks

github-actions Bot added [Feature] Content Parser Content parser for AT Protocol [Tests] Includes Tests PR includes test changes labels Mar 23, 2026

pfefferle force-pushed the add/markpub-parser branch 3 times, most recently from f3d5710 to 71a30f5 Compare March 23, 2026 17:15

Base automatically changed from add/content-parser-interface to trunk March 23, 2026 17:34

github-actions Bot added the [Feature] Transformer AT Protocol record transformers label Mar 23, 2026

pfefferle added 7 commits March 23, 2026 18:46

Add Document transformer tests for content parser and site fallback

f028c97

Move stub parser to its own file to satisfy PHPCS one-class-per-file …

c5576d1

…rule

Declare accepted args on content parser filter registration

6c6065d

Fix ordered list counter and markdown link parentheses

693f374

- Use a separate counter for ordered lists so numbering stays sequential when empty items are skipped. - Percent-encode parentheses in link URLs to prevent breaking markdown link syntax.

Fix PHPCS warnings for unused closure parameters

9f74de8

Fix Document test to account for default Markpub parser registration

920dbea

pfefferle force-pushed the add/markpub-parser branch from 6a2aa7d to 920dbea Compare March 23, 2026 17:46

pfefferle added 2 commits March 23, 2026 20:56

Merge branch 'trunk' into add/markpub-parser

c8be06e

Merge branch 'trunk' into add/markpub-parser

de454d6

kraftbj reviewed Apr 21, 2026

View reviewed changes

pfefferle requested a review from kraftbj April 22, 2026 06:35

kraftbj previously approved these changes Apr 22, 2026

View reviewed changes

kraftbj mentioned this pull request Apr 22, 2026

Expand Markpub parser test coverage for untested block handlers #30

Open

22 tasks

kraftbj dismissed their stale review via d90575d April 22, 2026 14:53

kraftbj mentioned this pull request Apr 22, 2026

Expand Markpub parser test coverage #31

Merged

2 tasks

kraftbj previously approved these changes Apr 22, 2026

View reviewed changes

pfefferle and others added 2 commits April 22, 2026 17:02

Merge branch 'trunk' into add/markpub-parser

1ea88b3

Expand Markpub parser test coverage (#31)

5610f98

pfefferle dismissed kraftbj’s stale review via 5610f98 April 23, 2026 08:03

Merge branch 'trunk' into add/markpub-parser

acfa7a5

kraftbj mentioned this pull request Apr 29, 2026

Design: generalize Content_Parser to support multiple standard.site content formats #45

Open

Merge branch 'trunk' into add/markpub-parser

05a623e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Markpub content parser#9

Add Markpub content parser#9
pfefferle wants to merge 15 commits intotrunkfrom
add/markpub-parser

pfefferle commented Mar 23, 2026

Uh oh!

kraftbj left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pfefferle commented Apr 22, 2026

Uh oh!

pfefferle commented Apr 23, 2026

Uh oh!

kraftbj commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pfefferle commented Mar 23, 2026

Proposed changes:

Other information:

Testing instructions:

Changelog entry

Significance

Type

Message

Uh oh!

kraftbj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pfefferle commented Apr 22, 2026

Uh oh!

pfefferle commented Apr 23, 2026

Uh oh!

kraftbj commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants