Skip to content

Long-form teaser-thread strategy + atmosphere_long_form_composition filter#34

Merged
pfefferle merged 26 commits intotrunkfrom
add/long-form-teaser-thread
Apr 30, 2026
Merged

Long-form teaser-thread strategy + atmosphere_long_form_composition filter#34
pfefferle merged 26 commits intotrunkfrom
add/long-form-teaser-thread

Conversation

@kraftbj
Copy link
Copy Markdown
Contributor

@kraftbj kraftbj commented Apr 24, 2026

Fixes DOTCOM-16886
See DOTCOM-16810 (parent epic)

Proposed changes:

Introduces long-form composition strategies for Bluesky posts, with the default (link-card) preserved so existing sites see no change.

  • New atmosphere_long_form_composition filter selects the long-form composition strategy. Accepts 'link-card' (default, unchanged behavior), 'truncate-link' (single post, body-as-text + permalink), or 'teaser-thread' (2-post thread: hook + CTA-with-link).
  • New atmosphere_teaser_thread_posts filter lets downstream override the default 2-post composition (e.g. expand to 3 posts, customize CTA copy).
  • New _atmosphere_bsky_thread_records post meta: ordered array of { uri, cid, tid } triples tracking every bsky post in a thread. Existing _atmosphere_bsky_uri / _atmosphere_bsky_tid / _atmosphere_bsky_cid continue to mirror the root post for backwards compatibility.
  • New _atmosphere_bsky_orphan_records post meta: populated only when a thread publish fails and the compensating-delete rollback also fails, so operators have a durable manifest of records alive on the PDS but no longer tracked locally.
  • New atmosphere_pre_apply_writes filter: short-circuits API::apply_writes when a non-null value is returned. Used by tests and external harnesses (FOSSE e2e) to mock/inspect the write batch.
  • Publisher::publish/update/delete reworked to handle multi-record threads via sequential writes with partial-meta writes and compensating-delete rollback on failure.
  • Atmosphere::on_before_delete now reads META_THREAD_RECORDS and schedules a delete covering every bsky tid, so force-deleting a thread-strategy post cleans up the whole thread instead of leaking the non-root replies.

Why sequential writes for thread publishes instead of one atomic applyWrites? Reply records need the parent's CID. CIDs are deterministic hashes of canonical DAG-CBOR — clients can compute them ahead of time (the TypeScript SDK does this to batch threads atomically) — but there is no DAG-CBOR encoder in our PHP dependencies yet, so we rely on the PDS to return the CID in its create response. We mitigate the partial-failure window with partial-meta writes after each successful create (so crash recovery can surface any orphans).

See FOSSE SDD for full design context, alternatives considered, and open questions.

Default behavior is unchanged. atmosphere_long_form_composition defaults to 'link-card'. Existing sites behave exactly as today unless they opt into a different strategy (or install a downstream like FOSSE that projects the value).

Publisher::update() semantics for thread-strategy posts:

  • Same record count → in-place update. When the new composition has the same number of records as the stored thread, every record (root + replies + doc) is rewritten via applyWrites#update in one atomic batch. TIDs and URIs are preserved, so external replies remain valid and Bluesky URLs stay stable. Each record gets a new CID; reply refs are built from the pre-update CIDs (self-consistent at write time, and clients treat any post-update CID mismatch as "parent was edited" rather than broken). `createdAt` is pinned to `post_date_gmt`.
  • Different record count or strategy change → delete + republish. When the shape changes (link-card ↔ teaser-thread, or thread length changes), every existing record is deleted and the post is republished fresh. In this path, external replies to the original thread are orphaned (their `reply.root` points at a deleted AT URI), and the replaced posts surface to followers with a fresh `createdAt`.

Other information:

  • Have you written new tests for your changes, if applicable?

Testing instructions:

  • `composer test` — unit tests cover composition branches, sequential-write happy path, rollback on mid-thread failure, rollback-of-rollback with orphan-manifest persistence, in-place thread update, strategy-change rewrite, delete covering the whole thread, and legacy single-record fallback.
  • Manual: install with `fosse-long-form-strategy` downstream (or call `add_filter( 'atmosphere_long_form_composition', fn() => 'teaser-thread' )`), publish a titled post of 500+ words, verify two `app.bsky.feed.post` records land with `reply` refs. Edit the post content and verify the two records are updated in place (same TIDs/URIs, new content).

Changelog entry

  • Automatically create a changelog entry from the details below.
Changelog Entry Details

Significance

  • Minor

Type

  • Added - for new features

Message

Long-form posts can now be published as a Bluesky thread. A new filter lets sites choose between a single link-card post (default, unchanged from today), a single post that combines body text with the permalink, or a 2-post teaser thread that leads with a hook and follows with a "continue reading" link. Post edits update every record in place when the shape of the post hasn't changed, so Bluesky URLs stay stable and people's replies don't get orphaned. If the shape of the post changes (for example, switching between a single post and a thread), the old records are replaced with fresh ones.

This branch introduces Atmosphere-side support for multi-post thread
composition for long-form posts. Tracks the FOSSE SDD at
https://github.com/Automattic/fosse/tree/trunk/sdd/long-form-bluesky-strategy.

Opening as draft to start the async review window. Implementation
lands in follow-up commits on this branch.

Linear: https://linear.app/a8c/issue/DOTCOM-16810
kraftbj added 5 commits April 23, 2026 20:33
…osition methods

New on Atmosphere\Transformer\Post:

- `META_THREAD_RECORDS` constant for the per-WP-post ordered array
  of `{ uri, cid, tid }` triples that Publisher will populate once it
  knows about threads (next commit).
- `build_long_form_records()` as the long-form counterpart to
  `transform()`. Branches on `atmosphere_long_form_composition`
  (default `'link-card'`, i.e. today's behavior unchanged) to produce
  one record for `'link-card'` / `'truncate-link'`, or 2+ records for
  `'teaser-thread'`. `transform()` is unchanged — legacy callers keep
  today's contract.
- `build_teaser_thread()` (private) composes the default hook + CTA
  pair, filterable via `atmosphere_teaser_thread_posts`. Hook uses the
  post's excerpt when set, else the first ~280 chars of the body cut
  at a sentence boundary so the hook never ends mid-sentence right
  before the CTA.
- `build_truncate_link_text()` (private) for the single-post
  body-plus-permalink strategy.
- `truncate_to_budget()` (private) — budget-aware truncation that
  prefers a sentence break, falls back to word boundary, then hard-caps
  with an ellipsis. Uses mb_strlen, matching the existing
  `truncate_text()` helper.
- `is_short_form_post()` (public) wraps the existing
  `atmosphere_is_short_form_post` filter + private `is_short_form()`
  discriminator so Publisher can branch on short vs. long without
  duplicating the filter call inside its own code.
- Empty-body guard: `'teaser-thread'` and `'truncate-link'` silently
  degrade to `'link-card'` when the post has neither a 10+-char
  excerpt nor a 10+-char body. Logs a notice so ops can tell the
  fallback from an intentional link-card configuration.

New on Atmosphere\API:

- `atmosphere_pre_apply_writes` filter. Returning a non-null value
  short-circuits the PDS call, so PHPUnit and the FOSSE e2e harness
  can observe the write batch without a real DPoP round-trip
  (`pre_http_request` fires inside `wp_remote_request`, too late to
  mock past the DPoP-proof step in test environments without a real
  JWK).

Default behavior is unchanged: `atmosphere_long_form_composition`
defaults to `'link-card'`. Existing sites see exactly today's output
unless a downstream projects a different value.

Tests cover these additions in the next commit.

Linear: https://linear.app/a8c/issue/DOTCOM-16810
Adds 16 unit tests under Atmosphere\Tests\Transformer\Test_Post:

- 6 tests for `truncate_to_budget()` via reflection — sentence-preferred
  cut, trailing close-punctuation, word-boundary fallback, word-only
  when `prefer_sentence` is false, and hard-cap for a single long
  token.
- 10 tests for `build_long_form_records()` — default (link-card)
  matches `transform()` output, `atmosphere_transform_bsky_post` fires
  per entry for threads, `truncate-link` branch, `teaser-thread`
  default 2-entry shape with sentence-cut hook and link-faceted CTA,
  word-boundary fallback when no sentence break exists, excerpt
  precedence, empty-body downgrade to link-card, 3-entry extension
  via `atmosphere_teaser_thread_posts`, `langs` consistency,
  per-record facet extraction, and unknown-strategy fallback.

Also adds a small observability hook on the downgrade path:
`atmosphere_long_form_strategy_downgraded` fires with the post,
requested strategy, and effective strategy, so ops and the
downgrade test can distinguish the fallback from an intentional
`'link-card'` configuration.

Linear: https://linear.app/a8c/issue/DOTCOM-16810
…s and rollback

Reshapes Publisher around the idea that one WordPress post may map to
N Bluesky feed posts (a thread) plus exactly one site.standard.document.

**publish()**
- Branches on `Post::is_short_form_post()`. Short-form stays on today's
  single-record path via `transform()`.
- Long-form calls the new `build_long_form_records()`. 1-entry output
  (link-card / truncate-link) goes through `publish_single()`; N-entry
  output goes through `publish_thread()`.
- `publish_single()` preserves today's atomic applyWrites (bsky + doc)
  and additionally populates `META_THREAD_RECORDS` as a 1-entry array
  so the new key is always present after a successful publish.
- `publish_thread()` writes root + doc atomically, persists partial
  meta immediately, then writes each reply on its own `applyWrites`.
  Reply refs (`reply.root` / `reply.parent`) are filled from the
  already-persisted thread records since CIDs only arrive via the PDS
  response to the prior create — we can't assemble one atomic batch
  for the whole thread.

**Rollback**
On a mid-thread failure, `rollback_thread()` issues compensating
deletes tail-first + a doc delete, clears all meta, and returns the
original error. If rollback itself fails, the returned WP_Error wraps
both errors and carries `partial_records` so an operator can clean up
by hand.

**update()**
- Reads `META_THREAD_RECORDS` with legacy fallback to single-record meta.
- Single stored + single new composition → in-place applyWrites#update
  on bsky + doc, matching today's behavior.
- Any other shape (single ↔ thread, thread → thread with different
  lengths) → delete all existing records atomically, then republish.
  Documented side effect: followers see the thread updates as a fresh
  publish (new `createdAt`); external replies to the original thread
  are orphaned.

**delete()**
- Reads `META_THREAD_RECORDS` with legacy fallback; batches one
  applyWrites with N bsky deletes + 1 doc delete. Clears every meta
  key on success; leaves meta intact on failure so a retry can
  complete.

**Meta shape**
- New `META_THREAD_RECORDS` is the canonical list of ordered
  `{ uri, cid, tid }` triples.
- Legacy `META_URI` / `META_TID` / `META_CID` continue to mirror the
  root record for backwards compatibility and because
  `Document::transform()` still reads them to compose the bskyPostRef.
- `clear_all_record_meta()` helper keeps the six keys in sync on
  delete / rollback paths.

**Not touched**
- `delete_by_tids()` signature preserved for queued-cron backwards
  compatibility. `on_before_delete` in class-atmosphere.php still
  captures a single TID; force-deleting a thread-published post
  leaves thread tails on the PDS. Follow-up task will update the
  on_before_delete / cron hook to capture the full thread.
- Short-form behavior is unchanged — `transform()`'s `createdAt`
  (from `post_date_gmt`) still flows through `publish_single()`.

Linear: https://linear.app/a8c/issue/DOTCOM-16810
Adds 10 unit tests covering the new long-form / thread flows, using the
`atmosphere_pre_apply_writes` filter as the test seam (DPoP-dependent
`pre_http_request` mocking can't reach the wp_remote_request layer in
the test environment's auth setup).

Publish path:
- link-card writes one atomic applyWrites with 2 creates and mirrors
  the root into `META_THREAD_RECORDS` as a 1-entry array.
- teaser-thread writes root + doc atomically, then the CTA reply as a
  second applyWrites with `reply.root` / `reply.parent` pointing at the
  root's {uri, cid}.
- partial meta is persisted between the root write and the reply
  write — asserted via a meta snapshot taken inside the capture
  filter when the reply call fires.
- happy-path final meta is an ordered 2-entry `META_THREAD_RECORDS`
  array, with legacy single-value keys mirroring the root.

Rollback path:
- second-write failure triggers compensating deletes (root bsky +
  doc), clears every meta key, and returns the original WP_Error.
- when the rollback applyWrites itself also fails, the returned
  `WP_Error` has code `atmosphere_thread_rollback_failed` and data
  containing `original_error`, `rollback_error`, and
  `partial_records`.

Update / delete paths:
- single stored + single new composition uses in-place
  `applyWrites#update` on both bsky and doc (one applyWrites call),
  matching today's link-card update behavior.
- strategy change (stored single → composed thread) issues a delete
  of the old record + doc, then publishes fresh as a thread.
- thread delete batches every bsky delete + the doc delete into one
  atomic applyWrites and clears all meta.
- legacy single-record meta (no `META_THREAD_RECORDS`) still deletes
  correctly via the fallback in `stored_thread_records()`.

Linear: https://linear.app/a8c/issue/DOTCOM-16810
@pfefferle
Copy link
Copy Markdown
Member

Early review — read through the branch, a few things worth flagging before this gets much further. Ordered by impact.

1. Default behavior is changed: createdAt shifts from post date to publish-run time

The description says "Default behavior is unchanged — atmosphere_long_form_composition defaults to 'link-card', existing sites behave exactly as today." That isn't quite right.

  • Today: Post::transform() sets createdAt from $post->post_date_gmt (line 99).
  • This PR: for long-form (including the default link-card), the record goes through build_long_form_records()record_for_link_card(), which omits createdAt. publish_single then stamps \wp_date( 'c' ) — the current time.

Consequences:

  • A normal long-form publish shows up on bsky.app with a timestamp different from the WP publish date.
  • Backfill is much worse: every re-synced post gets stamped with the backfill-run timestamp, torpedoing the chronological ordering of the account's timeline.

Fix: propagate $post->post_date_gmt through build_long_form_records / record_for_link_card so the default strategy produces byte-identical output to transform(). Thread replies that are genuinely fresh posts can still use wp_date('c').

2. Thread-aware force-delete is missing on the WP side

The new comment in delete_by_tids says "force-deletion of thread-strategy posts is handled by the caller by reading META_THREAD_RECORDS pre-deletion and scheduling a separate cron per thread record." But the only caller — Atmosphere::on_before_delete — hasn't been updated:

$bsky_tid = \get_post_meta( $post_id, Transformer\Post::META_TID, true );
$doc_tid  = \get_post_meta( $post_id, Transformer\Document::META_TID, true );
\wp_schedule_single_event( \time(), 'atmosphere_delete_records', array( $bsky_tid, $doc_tid ) );

It only captures the root + doc. Force-deleting a thread-strategy post (empty-trash, wp post delete --force, before_delete_post from another plugin) leaks every non-root thread reply to Bluesky. This is a real data-consistency bug, not just a doc caveat.

3. Rollback's partial_records payload is effectively swallowed

rollback_thread returns a WP_Error with partial_records in the error data when rollback itself fails. But the cron closures that call Publisher::publish only read get_error_code() + get_error_message(). Nobody inspects $error->get_error_data()['partial_records'], so the orphan-record manifest is lost the moment the cron event returns. Either log the partial records alongside the error message, or persist them in post meta (e.g. _atmosphere_bsky_orphan_records) so an operator or recovery worker can find them later.

4. Thread update is delete-and-republish; the alternative isn't addressed

The PR calls out the follower-feed churn and orphaned-external-replies consequences and routes affected callers to "pin to a single-post strategy". But applyWrites#update on each existing thread record would do an in-place rewrite — same TIDs, same URIs, replies remain valid, createdAt preserved. The only cases where delete-and-republish is strictly required are record-count changes (thread length shrinks / grows) or strategy changes (thread ↔ single). For equal-length thread → thread, an N-record applyWrites#update batch would work.

Not a blocker, but worth a line in the description: "We considered in-place thread updates and rejected them because X", or the code could prefer them when the shape matches.

5. Minor

  • $bsky_transformer->get_rkey(); called as a side-effect-only statement at the top of publish() then again inside publish_single/publish_thread. Pick one.
  • store_results() and mirror_thread_records_meta() both write Post::META_URI / META_TID / META_CID for the root. Two code paths → easy to drift. Mirror-at-the-end only, or drop the mirroring from store_results.
  • record_for_thread_entry() fires atmosphere_transform_bsky_post once per thread entry. The existing filter contract was one-call-per-post. Listeners that accumulate state (rate-limit counters, lint checks) will double-count. Worth a heads-up note in the filter docblock, or a new filter name for thread entries.
  • The description says reply CIDs can only come from the PDS response, "a protocol constraint". It's actually a PHP-library constraint — CIDs are deterministic hashes of canonical DAG-CBOR and can be computed client-side (the TypeScript SDK does this to batch threads atomically). Just no DAG-CBOR encoder in the PHP codebase yet. Worth a one-line correction so the sequential-write choice reads as a pragmatic trade-off rather than a protocol requirement.

Things that look good

  • Keeping transform() byte-compatible for the short-form path and routing everything new through build_long_form_records() is the right shape.
  • The partial-meta-per-step pattern in publish_thread (so crash recovery has a pointer to the root before step 2 starts) is the right discipline.
  • The empty-body guard + atmosphere_long_form_strategy_downgraded observability hook is a nice touch.

kraftbj and others added 8 commits April 24, 2026 11:30
…-place thread update

Five changes informed by the early review feedback on #34:

1. Preserve the post's publish date as `createdAt` on every record
   representing the post itself (short-form transform(), link-card,
   truncate-link, and thread root). Previously the long-form paths
   fell through to `wp_date('c')`, which stamped the backfill-run
   time on re-synced posts and collapsed timeline ordering.

2. Thread-aware force-delete. `Atmosphere::on_before_delete` now reads
   `Post::META_THREAD_RECORDS` and schedules a delete covering every
   bsky tid plus the doc; `Publisher::delete_by_tids` accepts an array
   so one applyWrites covers the full thread. Legacy single-TID
   callers still work unchanged.

3. Persist rollback orphans to post meta. When a thread publish fails
   and the compensating rollback itself fails, the partial record list
   now lands in `Post::META_ORPHAN_RECORDS` (and error_log) instead of
   disappearing with the cron closure's WP_Error.

4. In-place applyWrites#update for thread posts when the record count
   is unchanged. Preserves TIDs/URIs and external replies; only CIDs
   change. Falls back to delete+republish when the thread shape
   changes (single ↔ thread, 2-post ↔ 3-post).

5. Cleanups: drop redundant get_rkey() side-effect calls from
   publish(); split legacy store_results() into a document-only
   helper so the root's single-record meta is mirrored from one
   place; document that `atmosphere_transform_bsky_post` fires per
   record (not per post) when the thread strategy is active.
The assertion `assertDoesNotMatch('~\S$~', $hook)` required the hook to
end with whitespace, which contradicts
`test_truncate_to_budget_falls_back_to_word_boundary_when_no_sentence`
(expects 'The quick brown fox' with no trailing space). The test name
and comment both say "not mid-word" — that's the actual intent. Since
the corpus is built of 8-char words, a partial trailing word is any
`\s\S{1,7}$` tail.
Consolidated findings from a 3-agent parallel review pass:

- Unify reply `createdAt` across publish_thread and update_thread_in_place.
  Both now stamp every thread record with `post_date_gmt`, so backfill,
  fresh publish, and in-place update all agree on the post's publish
  date. Previously the publish path stamped replies with `wp_date('c')`,
  which would stamp replies with the backfill-run time.

- Validate `atmosphere_pre_apply_writes` return. A filter that returns
  a scalar or object would fatal on the `array|\WP_Error` return type.
  Now normalized to a `WP_Error` with a clear code.

- Harden `atmosphere_teaser_thread_posts`. Non-iterable or non-string
  entries fall back to the default pair; the list is capped at 5 to
  bound PDS rate-limit blast radius if a downstream returns something
  unreasonable.

- Carry post tags on the thread root (index 0), matching the
  link-card and short-form record shape. Replies remain conversational
  and omit tags.

- Index reply URIs for reaction sync. New `META_URI_INDEX` multi-row
  meta key mirrors every thread record URI; `find_post_by_bsky_uri`
  in Reaction_Sync now falls back to the index so a like or repost on
  a reply post resolves back to the originating WP post.

- Cap `META_ORPHAN_RECORDS` at 10 entries so a crash-looping rollback
  can't grow the meta row past MySQL limits.

- Align `has_composable_body` with `build_teaser_thread`: both now
  require an excerpt of ≥ 10 chars before preferring it over the body.

- Memoize `render_post_content_plain()` per instance so thread
  composition doesn't re-run `the_content` multiple times per pass.

- Tests: `delete_by_tids` with array/string/empty; `on_before_delete`
  thread-aware vs legacy vs no-meta scheduling; `META_URI_INDEX` is
  populated; `atmosphere_pre_apply_writes` malformed return surfaces
  as WP_Error; stale docblock references updated.
…sponse

Previously publish_thread returned a bare WP_Error on this path,
leaving the root + doc records live on the PDS while the local TID
was still stored from get_rkey(). A retry reused the same TID and
the PDS rejected the create, stuck state. Route through
rollback_thread so retry starts from a clean slate; on rollback
failure the orphan manifest surfaces as for any other thread
rollback.
…overy

Two real findings from an adversarial pass that Pass 1/Pass 2 missed:

1. rewrite_thread deleted remote records before proving the replacement
   publish would succeed. If the republish failed (transient PDS error,
   rate limit, malformed payload), the old thread was gone from Bluesky,
   the new one was absent, and local meta was already cleared. The
   deleted records themselves are unrecoverable, but we now persist a
   rewrite-phase manifest to META_ORPHAN_RECORDS so operators have an
   audit trail, and retry of update() self-heals via publish() (stored
   records are cleared, so new TIDs are generated).

2. update()'s "missing document URI" repair branch called publish()
   directly, which reuses the existing bsky TID via get_rkey() — the
   PDS then rejected the create as "already exists." Self-heal was
   broken. Now routes through rewrite_thread with an empty doc_tid,
   which deletes the orphan bsky records and republishes fresh.

rewrite_thread now also guards against empty tids and empty doc_tid
so it can handle the partial-state recovery shape.

Tests cover both paths.
…ordpress-atmosphere into add/long-form-teaser-thread
A TID without a URI means Transformer::get_rkey() reserved the rkey
locally but the create never landed on the PDS — most commonly after
a failed publish or a rewrite_thread republish that clears meta
mid-step. stored_thread_records() previously returned such a bare TID
as a recoverable record, which caused the next update() to attempt a
delete of a non-existent PDS record and the retry to loop.

Now the legacy fallback requires a non-empty URI, matching how
publish() actually persists state. The reserved ghost TID is harmless:
the next publish() reuses it (get_rkey returns the stored value) and
the PDS accepts the create normally.
@kraftbj
Copy link
Copy Markdown
Contributor Author

kraftbj commented Apr 24, 2026

Thanks for the careful read — all 5 addressed in 72ac474 (with extra review passes landing in follow-up commits). Walking through in the same order:

1. createdAt drift. Real regression. Fixed by centralising the fallback in Publisher::publish_single / update_single / publish_thread's root to stamp to_iso8601( $post->post_date_gmt ) instead of wp_date( 'c' ). Short-form transform() already set it; the long-form paths now match, so backfill and normal publishes both land on the WP publish date. Later passes also unified thread replies on the same timestamp (they were stamping wp_date('c') in the publish path while the update path used post_date_gmt — in-place edits would've silently rewound reply createdAt after any edit).

2. Thread-aware force-delete. Real bug. Atmosphere::on_before_delete now reads Post::META_THREAD_RECORDS pre-deletion and schedules a single atmosphere_delete_records cron with the full tid array. Publisher::delete_by_tids accepts string|array, so one applyWrites covers the whole thread (root + replies + doc). Legacy single-tid callers keep working; the doc-comment on delete_by_tids now reflects the new contract rather than apologizing for not matching it.

3. Orphan manifest disappearing. New post meta Post::META_ORPHAN_RECORDSrollback_thread writes the partial state there when the compensating delete fails, in addition to error_log and the WP_Error data. Value is an append-only list (capped at 10 entries to bound meta_value growth) of { stamp, bsky_records, doc_rkey, original_error, rollback_error } entries so repeated failures on the same post all survive for manual cleanup. Test coverage added.

4. Delete-and-republish for thread updates. Implemented in-place applyWrites#update — when count($stored) === count($new_records) and both > 1, update_thread_in_place rewrites every record + doc in one atomic batch. TIDs and URIs are preserved, external replies stay valid, createdAt is pinned to post_date_gmt. Reply refs are built from the pre-update CIDs (structurally self-consistent at write time; clients treat any post-update CID mismatch as "parent was edited"). After the call, META_THREAD_RECORDS is refreshed with the new CIDs from the response so subsequent updates chain from current state. rewrite_thread (delete + republish) is now only taken on a true shape change — strategy swap or different record count — and on failure now persists a rewrite-phase manifest to META_ORPHAN_RECORDS so operators have an audit trail of what was deleted.

5. Minors.

  • Dropped the side-effect-only get_rkey() calls at the top of publish()publish_single/publish_thread generate lazily when building writes, so the pre-call was just noise.
  • store_results() renamed to store_document_meta() and reduced to owning doc meta only. mirror_thread_records_meta() is the single writer for Post::META_URI / META_TID / META_CID.
  • The atmosphere_transform_bsky_post docblock now calls out that the filter fires per-record under teaser-thread, with a pointer for accumulating listeners on how to branch per-entry.
  • You're right about the CID constraint — it's a PHP-ecosystem limitation (no DAG-CBOR encoder in our deps yet), not a protocol one. Updated the PR description.

Also fixed 4 failing tests on the way: one real truncate_to_budget bug (the preg_replace word-boundary fallback fired even when no whitespace existed, bypassing the hard-cap ellipsis), and three tests that didn't set 'post_excerpt' => '' and got bitten by the WP test factory's auto-generated excerpt.

Follow-up review passes surfaced a few more things that landed as bonus fixes: reaction-sync now resolves reply URIs via a new META_URI_INDEX (previously a like/repost on a reply post wouldn't resolve back to the WP post); atmosphere_pre_apply_writes guards against malformed filter returns that would otherwise fatal on the typed return; atmosphere_teaser_thread_posts is sanitised and capped; and the legacy stored_thread_records fallback now requires a URI so a reserved-but-unpublished TID doesn't loop retry through non-existent PDS records.

CI is green across the full phpunit matrix. Flipping to Ready for Review.

@kraftbj kraftbj marked this pull request as ready for review April 24, 2026 20:50
@kraftbj kraftbj requested review from Copilot and pfefferle April 24, 2026 21:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds opt-in long-form composition strategies for publishing WordPress posts to Bluesky, including a new “teaser thread” mode that publishes multi-post threads while preserving existing default behavior (link-card) for current installs.

Changes:

  • Added long-form composition strategy selection via filters and implemented thread-capable record composition in the post transformer.
  • Reworked publishing/update/delete flows to support multi-record threads with sequential writes, partial-meta persistence, and compensating-delete rollback on failure.
  • Extended reaction sync and before-delete cleanup to resolve and delete non-root thread replies via a per-URI index and thread record manifest.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
includes/transformer/class-post.php Adds long-form composition strategies, thread record builders, and new thread/orphan meta key constants.
includes/transformer/class-base.php Memoizes plain-text rendering to avoid repeated the_content filter work during composition.
includes/class-publisher.php Implements thread-aware publish/update/delete, thread meta mirroring/indexing, rollback, and rewrite flows.
includes/class-api.php Adds atmosphere_pre_apply_writes short-circuit filter to mock/inspect applyWrites in tests/harnesses.
includes/class-atmosphere.php Updates before-delete scheduling to delete all thread TIDs (root + replies) in one cron event.
includes/class-reaction-sync.php Resolves WP posts by reply AT-URI using the new URI index meta.
tests/phpunit/tests/transformer/class-test-post.php Adds unit coverage for truncation logic and long-form composition strategy branches.
tests/phpunit/tests/class-test-status-change.php Adds tests ensuring before-delete schedules deletion for all thread TIDs (and legacy single-TID behavior).
tests/phpunit/tests/class-test-publisher.php Adds extensive tests for thread publish/update/delete, rollback, orphan manifests, and pre-apply short-circuiting.
.github/changelog/long-form-teaser-thread Changelog entry describing the new long-form thread feature and strategy options.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread includes/class-api.php
Comment thread includes/transformer/class-post.php Outdated
@pfefferle pfefferle requested a review from Copilot April 28, 2026 16:25
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/phpunit/tests/class-test-publisher.php Outdated
Comment thread includes/class-publisher.php Outdated
Comment thread includes/class-publisher.php
Copy link
Copy Markdown
Member

@pfefferle pfefferle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed correctness of the rollback paths, update() shape detection, on_before_delete thread cleanup, CID handling for replies, the empty-body / long-permalink downgrades, and back-compat for legacy single-record posts.

Critical

Publisher::publish_thread — mid-thread update_document_bsky_ref failure leaves threads half-published with no rollbackincludes/class-publisher.php:222-225

After the root+doc atomic write succeeds and meta is persisted, a WP_Error from update_document_bsky_ref returns early. The root bsky record and doc record are live on the PDS, but reply writes are silently abandoned and no rollback or orphan-manifest entry is written. Inconsistent with how every other partial-failure path is handled.

Either treat this call as best-effort (continue to replies, log the failure) or call rollback_thread on failure for symmetry.

Major

atmosphere_teaser_thread_posts — no minimum-entry guardincludes/transformer/class-post.php:628-649
Filter docblock promises "at least 2 entries", but a filter returning a single string yields a 1-element array. Publisher::publish() then routes to publish_single (count===1), so the CTA reply is silently dropped without notice. Add a minimum-2 guard with _doing_it_wrong() before the filter becomes public API.

Orphan manifest is write-onlyincludes/class-publisher.php:380-419, 748-785
_atmosphere_bsky_orphan_records is never read, surfaced in admin, or processed by cron. Operators would have to query post meta directly to discover orphans. Either ship a Site Health check / admin notice, or track as a follow-up issue before merge — currently neither exists.

update() shape-change detection uses count comparison onlyincludes/class-publisher.php:485
A truncate-link (count=1) → teaser-thread that empty-body-downgrades to link-card (count=1) takes the in-place path. Result is structurally correct, but the silent semantic strategy change is worth a comment or a guard.

Minor

  • build_long_form_records long-permalink downgrade is silentincludes/transformer/class-post.php:434-438. Empty-body guard fires atmosphere_long_form_strategy_downgraded; the long-permalink guard does not. Inconsistent observability.
  • rollback_thread uses single batch applyWritesincludes/class-publisher.php:320-335. A transient PDS error means the whole 6-write batch fails and triggers the orphan manifest, even when 4 of 6 deletes would individually have succeeded. Trade-off, not a bug.
  • test_update_sends_update_writes has a conditional assertion that no-ops when auth blockstests/phpunit/tests/class-test-publisher.php:313-323. Now that atmosphere_pre_apply_writes exists, the test should use register_capture() and assert unconditionally.
  • Inconsistent transformer reuseincludes/class-publisher.php:986 instantiates a new Document($post) instead of reusing the local $doc_transformer. Harmless but noisy.

Positive

  • atmosphere_pre_apply_writes short-circuit + validate_apply_writes_response is a clean test-mocking surface. Correctly distinguishes create/update (require uri+cid) from delete.
  • stored_thread_records correctly handles bare-TID-without-URI fallback (failed prior publish), with a clear inline comment.
  • 5-entry cap on build_teaser_thread is a sensible blast-radius limiter.
  • delete_by_tids accepts string|array first arg with fallbacks — forward/backward compatible with cron events queued before the signature change.
  • Reply refs use pre-update CIDs in update_thread_in_place — architecturally sound, since post-update CID drift is editorial state clients handle gracefully.
  • Test coverage is comprehensive for documented scenarios: rollback happy path, rollback-of-rollback with orphan persistence, in-place updates (single + multi), shape-change rewrite, full-thread delete, legacy single-record path, delete_by_tids back-compat.

The half-publish bug is the only true blocker. Teaser-thread minimum-entry guard should also land before the filter is public. Orphan manifest gap is acceptable as a tracked follow-up if you want to ship soon.

kraftbj added 4 commits April 28, 2026 12:35
Pulls in auto-port wp-env startup (#40) so tests can run alongside other wp-env instances on the default ports.
- publish_thread: doc-ref putRecord failure between root+doc and
  replies is now best-effort (log + continue). Returning early left
  META_THREAD_RECORDS at length=1, and a follow-up edit treated that
  as a shape change and rewrote the already-published root URI/TID,
  invalidating likes/reposts/external replies pointing at it.
- build_long_form_records: fire atmosphere_long_form_strategy_downgraded
  when the long-permalink path falls back to link-card, matching the
  empty-body downgrade so observability stays consistent.
- publish_single docblock: createdAt is set by transform() and the
  long-form record builders; Publisher only backfills when missing.
- Test coverage: publish_thread continues to write replies (and meta
  reflects the full thread) when the doc-ref putRecord fails.
The filter docblock promises a multi-entry array, but a filter
returning a single string yielded a 1-element array that quietly
routed to publish_single() and dropped the CTA. Enforce the contract
with _doing_it_wrong() and fall back to the default hook + CTA pair
when fewer than 2 valid string entries come back, so the contract is
visible to filter authors before this becomes public API.
…ests

test_publish_surfaces_document_ref_update_failure added a
pre_http_request filter and never removed it; set_up()'s own filter
already exists in the suite, so the leak could affect later tests in
the class. Wrap in try/finally with a referenced callback so the
filter is removed even if assertions throw.

test_update_sends_update_writes was using pre_http_request to capture
the applyWrites payload, with conditional assertions that no-op'd when
the auth/DPoP layer blocked the request before reaching the filter.
Switch to register_capture() (atmosphere_pre_apply_writes) so the test
runs deterministically and assertions are unconditional.
- update(): add a comment that count-only shape detection means a
  truncate-link → teaser-thread (which empty-body-downgrades to
  link-card) takes the in-place path; output is structurally correct
  because both end up as a single record.
- update_document_bsky_ref(): drop the redundant `new Document( $post )`
  and reuse the existing transformer; transform() reads URI/CID fresh
  from post meta on every call.
- persist_orphan_records(): TODO comment pointing at issue 44 (surface
  the manifest in admin / Site Health / WP-CLI).
@kraftbj
Copy link
Copy Markdown
Contributor Author

kraftbj commented Apr 28, 2026

@pfefferle thanks for the thorough re-read. Walking through everything in the review body:

Critical

Doc-ref half-publish (publish_thread). Fixed in b92fd1c. The doc-ref putRecord is now best-effort — failure logs and continues to the reply writes, so META_THREAD_RECORDS ends up at the full thread length and a subsequent edit takes the in-place update path instead of rewriting the already-published root. Test coverage in test_publish_thread_continues_when_doc_ref_update_fails.

Major

  1. atmosphere_teaser_thread_posts minimum-entry guard. Added in c464fda. A filter return with fewer than 2 valid string entries triggers _doing_it_wrong( 'atmosphere_teaser_thread_posts', ..., 'unreleased' ) and falls back to the default hook + CTA pair. Test in test_build_long_form_records_teaser_thread_filter_under_two_falls_back.

  2. Orphan manifest write-only. Tracked as issue 44 — covers Site Health surfacing, admin notice, WP-CLI, and a retry-with-backoff cron. Added a TODO referencing it from persist_orphan_records() in 14f21d2.

  3. update() shape-change count-only detection. Added a clarifying comment in 14f21d2 — the output is structurally correct because a truncate-link (count=1) → teaser-thread that empty-body-downgrades to link-card (count=1) both produce a single-record post; URIs/TIDs reuse on the bsky side. No behavior change.

Minor

  • Long-permalink downgrade observability. Fixed in b92fd1catmosphere_long_form_strategy_downgraded now fires from both branches in build_long_form_records().
  • rollback_thread single batch. Acknowledged as a trade-off; no change.
  • test_update_sends_update_writes conditional assertion. Fixed in c6a86e8 — now uses register_capture() (atmosphere_pre_apply_writes) and asserts unconditionally.
  • Inconsistent transformer reuse. Fixed in 14f21d2update_document_bsky_ref() reuses the existing $doc_transformer (its transform() reads URI/CID fresh from post meta on every call).

kraftbj added 4 commits April 28, 2026 13:22
Adds coverage for the new atmosphere_long_form_strategy_downgraded
action fire from the long-permalink branch in build_long_form_records.
Mirrors the existing empty-body downgrade test.
The previous guard checked only the bare permalink (`mb_strlen >= 300`),
but the actual CTA text is `Continue reading: <permalink>`. A permalink
of e.g. 290 chars passed the guard yet produced a composed CTA of
~309 chars; truncate_to_budget then word-cut the over-budget URL fragment
off and shipped a thread whose final post said `Continue reading:`
with no link.

Add `requires_link_card_for_teaser_thread()` that builds the localized
CTA via a new `teaser_thread_cta_text()` helper and tests its mb_strlen.
build_teaser_thread reuses the same helper so the guard and the actual
composition operate on identical strings (locale changes can't desync
them). truncate-link keeps `requires_link_card_for_long_permalink()` —
its overflow equation is different.
The doc-ref `putRecord` is best-effort within `publish_thread`, but
swallowing the WP_Error and only writing to `error_log` left the gap
invisible to backfill, the admin UI, and any future Site Health
surface. Backfill callers would mark the post synced and never retry
the missing cross-reference.

Add Post::META_DOC_REF_PENDING (`{ stamp, code, message }`) and a
small `record_doc_ref_pending()` helper that publish_thread invokes
when the follow-up `putRecord` fails. The next successful
`update_document_bsky_ref` clears the marker — typical recovery path
is the user re-saving the post, which routes through update_thread_in_place
or update_single, both of which already end with `update_document_bsky_ref`.

`clear_all_record_meta` clears the marker too so rollback / delete
paths leave no stale state.

TODO comment points at issue 44 to broaden its surfacing remit beyond
META_ORPHAN_RECORDS.
PDS create errors are ambiguous: the server may have committed the
write even when WP got back a transport-level WP_Error (response
timeout, connection drop after server-side commit). Previously the
rollback only iterated already-stored thread_records, so the
just-attempted reply could be a live record on the PDS that
META_THREAD_RECORDS never saw and rollback never targeted.

The reply rkey is generated locally before the create call, so it is
known either way. Include a synthetic triple (uri/cid/tid with empty
cid) in the rollback list when the create returns WP_Error. Two
outcomes:

- If the PDS never committed: the compensating delete is a no-op, or
  surfaces in the orphan manifest if the PDS treats it as an error.
- If the PDS committed: rollback cleans the live record up.

Either way we no longer leave a Bluesky reply that the WP side can't
discover or delete.

Existing rollback tests updated for the new 3-write rollback shape
(reply + root + doc) and the partial_records count of 2.
@kraftbj
Copy link
Copy Markdown
Contributor Author

kraftbj commented Apr 28, 2026

Codex adversarial pass surfaced three more findings; all addressed:

Findings + fixes

  1. [HIGH] Ambiguous reply-write failures could leave untracked live replies. publish_thread's reply-create error path only rolled back already-stored thread records, so a transport-level WP_Error after a server-side commit (response timeout, connection drop) left the just-attempted reply alive on Bluesky with no WP-side handle on it. Fixed in 880af50: the locally-generated $reply_rkey is now packed into a synthetic triple and appended to the rollback batch. If the PDS never committed, the compensating delete is a no-op (or surfaces in the orphan manifest); if it did, rollback cleans it up. Existing rollback tests updated for the new shape.

  2. [MEDIUM] Best-effort doc-ref failures were silent permanent partial publishes. The b92fd1c best-effort change made error_log the only signal — backfill callers would mark the post synced and never retry the missing bskyPostRef. Fixed in 694f3c5: persist Post::META_DOC_REF_PENDING ({ stamp, code, message }) when the doc-ref putRecord fails, clear it on the next successful update_document_bsky_ref. Re-saving the post is the natural recovery path; clear_all_record_meta clears the marker on rollback / delete. TODO comment broadens issue 44's surfacing scope to include this.

  3. [MEDIUM] Long-permalink guard missed the CTA prefix overhead. requires_link_card_for_long_permalink() only checked the bare permalink (≥ 300). The actual CTA is Continue reading: <permalink>, so a permalink ~290 chars passed the guard and then word-cut the over-budget URL fragment off, shipping a thread whose final post said Continue reading: with no link. Fixed in 8374793: new requires_link_card_for_teaser_thread() builds the localized CTA via a teaser_thread_cta_text() helper and compares its mb_strlen against 300; build_teaser_thread() reuses the same helper so the guard and the actual composition can't desync across locales. truncate-link still uses the bare-permalink check (its overflow equation differs).

CI green across PHP 8.2/8.3/8.5 × WP 6.2/latest/trunk.

@kraftbj kraftbj requested a review from pfefferle April 28, 2026 19:02
@github-actions github-actions Bot added the [Feature] WP Admin Admin UI and settings label Apr 29, 2026
@pfefferle
Copy link
Copy Markdown
Member

I hope I will find some time to review this biggie tomorrow!

Copy link
Copy Markdown
Member

@pfefferle pfefferle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works like a charm! Do you plan to have a "real" thread with the full content some day?

@pfefferle pfefferle merged commit ee1df08 into trunk Apr 30, 2026
8 checks passed
@pfefferle pfefferle deleted the add/long-form-teaser-thread branch April 30, 2026 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

[Feature] API PDS API client [Feature] Publisher Publishing to AT Protocol [Feature] Transformer AT Protocol record transformers [Feature] WP Admin Admin UI and settings [Tests] Includes Tests PR includes test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants