Conversation
f3d5710 to
71a30f5
Compare
Ship a Markpub parser (at.markpub.markdown) as the default content parser, registered via the atmosphere_content_parser filter. - Add Markpub class that walks Gutenberg blocks via parse_blocks() and converts each to GFM markdown. - Register Markpub as the default parser in Atmosphere::init(). - Add tests for all supported block types and filter integration.
- Use a separate counter for ordered lists so numbering stays sequential when empty items are skipped. - Percent-encode parentheses in link URLs to prevent breaking markdown link syntax.
6a2aa7d to
920dbea
Compare
kraftbj
left a comment
There was a problem hiding this comment.
Thanks for this — the parser structure is clean and the namespaced-prefix conventions look good throughout. Left a few inline notes on some edge cases I spotted while reading; none of them block the overall approach.
- Widen Content_Parser::parse return type to ?array; Markpub returns null when the trimmed markdown is empty so Document omits the content field instead of shipping an empty record. Document skips the atmosphere_document_content filter on null to keep the filter contract strictly array-typed. - Drop 'table' from advertised extensions; core/table currently falls through to fallback and is stripped by wp_strip_all_tags, so the advertised capability didn't match emitted output. - Reorder the figcaption extraction so the closing-tag strip runs before wp_strip_all_tags. Otherwise sibling content after </figcaption> (e.g. a trailing <p> inside the same <figure>) would bleed into the caption text. - Wrap preg_replace / preg_replace_callback calls in safe_replace helpers that fall back to the original input on PCRE failure and emit a wp_trigger_error. Without this, a pathological input could erase the post body silently when preg_replace returns null and that null cascades through subsequent string ops. - Fix the phpcs:ignore rule on unused $post parameters to reference the sniff that actually fires (Generic.CodeAnalysis.UnusedFunction Parameter alongside VariableAnalysis), and document why $post is declared but unused.
|
Thanks for the review! All five points addressed in 8991043: L36 L65 advertised L66 empty-markdown null sentinel — widened L164 figcaption sibling-bleed — took your suggested order swap: L367 preg_replace PCRE guard — added |
paragraph(), fallback(), and the classic-content branch checked empty( $html ) on raw innerHTML. For content like <p> </p> the wrapper is non-empty, so these handlers fell through to inline_html_to_markdown() which trims to "". The empty string survived into the parts array and produced a spurious "\n\n" separator before the next block in mixed-content posts. Match the pattern heading() already uses: convert first, return null when the result is empty.
|
@kraftbj we should maybe find a more generic way, to also transform into the other standard.site content formats. |
|
Agreed — the single-parser filter and one-NSID-per-implementation shape will get awkward as soon as a second standard.site format lands. Filed #45 to nail down the interface before that happens (registry vs per-NSID filter vs canonical-IR projection — open for discussion). For this PR, let's move forward with the current shape and iterate. Markpub is the only parser today, so the registry refactor doesn't block anything user-facing, and #45 keeps the design conversation in its own thread. |
Proposed changes:
at.markpub.markdown) that walks Gutenberg blocks viaparse_blocks()and converts each to GFM markdown.atmosphere_content_parserfilter.Stacked on #8. Merge that first.
Other information:
Testing instructions:
?atprotopreview on a published post returns valid JSON with acontentfield containingat.markpub.markdowndata.atmosphere_content_parserfilter with__return_nulldisables the content field.npm run env-test— all tests pass.Changelog entry
Changelog Entry Details
Significance
Type
Message
Add rich content support for standard.site documents using the Markpub format.