Remove redacted_because from internal unsigned.#19581
Conversation
c2d73b6 to
7d90b4c
Compare
7d90b4c to
b99fb4e
Compare
There was a problem hiding this comment.
Pull request overview
This PR simplifies redaction handling by removing redacted_because from events’ internal unsigned data (to keep unsigned “simple” for future Rust porting), while adding a DB-backed mechanism (redactions.recheck) to track whether a redaction needs sender-domain revalidation at read time.
Changes:
- Add
recheckcolumn toredactions, plus a background update to backfill it fromevent_json.internal_metadata. - Split redactions into confirmed vs unconfirmed at read time to avoid fetching confirmed redaction events from the DB.
- Move
redacted_becauseconstruction intoEventClientSerializer(with batch redaction-event fetching inserialize_events) and update callers/tests accordingly.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/storage/test_redaction.py | Updates storage-layer assertions to use internal_metadata.redacted_by instead of unsigned["redacted_because"]. |
| tests/storage/test_events_bg_updates.py | Adds coverage for the new redactions_recheck background update backfill behavior. |
| tests/rest/client/test_rooms.py | Adjusts tests to check internal_metadata.redacted_by for stored events. |
| tests/replication/storage/test_events.py | Updates replication tests to set internal_metadata.redacted_by rather than unsigned redaction fields. |
| tests/events/test_utils.py | Switches tests to use EventClientSerializer (async) rather than the standalone serializer. |
| synapse/types/storage/init.py | Adds background update name constant for redactions_recheck. |
| synapse/synapse_rust/events.pyi | Exposes EventInternalMetadata.redacted_by in the Python typing stub. |
| synapse/storage/schema/main/delta/94/01_redactions_recheck.sql | Adds recheck column to redactions table. |
| synapse/storage/schema/main/delta/94/02_redactions_recheck_bg_update.sql | Schedules the redactions_recheck background update. |
| synapse/storage/schema/init.py | Bumps schema version to 94 and documents the change. |
| synapse/storage/databases/main/events_worker.py | Loads redactions.recheck to classify redactions into confirmed/unconfirmed and updates redaction application logic. |
| synapse/storage/databases/main/events_bg_updates.py | Implements the redactions_recheck backfill background update. |
| synapse/storage/databases/main/events.py | Persists redactions.recheck from need_to_check_redaction() when storing redactions. |
| synapse/rest/client/room.py | Ensures client-facing serialization uses EventClientSerializer. |
| synapse/rest/admin/events.py | Ensures admin event serialization uses EventClientSerializer. |
| synapse/events/utils.py | Makes the old serializer private, and populates redacted_because via EventClientSerializer (with batch-fetch support). |
| synapse/appservice/api.py | Switches appservice serialization to serialize_events. |
| synapse/_scripts/synapse_port_db.py | Adds recheck to the list of boolean columns for DB porting. |
| rust/src/events/internal_metadata.rs | Adds redacted_by as a non-persisted internal metadata field in the Rust implementation. |
| changelog.d/19581.misc | Changelog entry for removing redacted_because from internal unsigned. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
anoadragon453
left a comment
There was a problem hiding this comment.
A few small things; nothing too major. Nice that we could get a logic speedup from this work!
| Changes in SCHEMA_VERSION = 94 | ||
| - Add `recheck` column (boolean, default true) to the `redactions` table. |
There was a problem hiding this comment.
Note that this will conflict with #19556, which adds the same new schema version with 01_*.sql files.
Though this PR will likely merge first.
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| -- <https://www.gnu.org/licenses/agpl-3.0.html>. | ||
|
|
||
|
|
||
| ALTER TABLE redactions ADD COLUMN recheck boolean NOT NULL DEFAULT true; |
There was a problem hiding this comment.
Add
recheckcolumn toredactionstableA new boolean
recheckcolumn (default true) is added to theredactionstable. This captures whether a redaction needs its sender domain checked at read time — required for room v3+ where redactions are accepted speculatively and later validated. When persisting a new redaction,recheckis set directly fromevent.internal_metadata.need_to_check_redaction().It's fine if initially we recheck all redactions, as it only results in a little more CPU overhead (as we always pull out the redaction event regardless).
-- PR description
Feels like this context isn't explained anywhere in the code
This is a simplification so that
unsignedonly includes "simple" values, to make it easier to port to Rust.Reviewable commit-by-commit
Summary:
Add
recheckcolumn toredactionstableA new boolean
recheckcolumn (default true) is added to theredactionstable. This captures whether a redaction needs its sender domain checked at read time — required for room v3+ where redactions are accepted speculatively and later validated. When persisting a new redaction,recheckis set directly fromevent.internal_metadata.need_to_check_redaction().It's fine if initially we recheck all redactions, as it only results in a little more CPU overhead (as we always pull out the redaction event regardless).
Backfill
recheckvia background updateA background update (
redactions_recheck) backfills the new column for existing rows by readingrecheck_redactionfrom each event'sinternal_metadataJSON. This avoids loading full event objects by readingevent_jsondirectly via a SQL JOIN.Don't fetch confirmed redaction events from the DB
Previously, when loading events, Synapse recursively fetched all redaction events regardless of whether they needed domain rechecking. Now
_fetch_event_rowsreads therecheckcolumn and splits redactions into two lists:unconfirmed_redactions— need fetching and domain validationconfirmed_redactions— already validated, applied directly without fetching the eventThis avoids unnecessary DB reads for the common case of already-confirmed redactions.
Move
redacted_becausepopulation toEventClientSerializerPreviously,
redacted_because(the full redaction event object) was stored inevent.unsignedat DB fetch time, coupling storage-layer code to client serialization concerns. This is removed from_maybe_redact_event_rowand moved intoEventClientSerializer.serialize_event, which fetches the redaction event on demand. The storage layer now only setsunsigned["redacted_by"](the redaction event ID).Always use
EventClientSerializerThe standalone
serialize_eventfunction was made private (_serialize_event). All external callers —rest/client/room.py,rest/admin/events.py, appservice/api.py, andtests— were updated to useEventClientSerializer.serialize_event/serialize_events, ensuringredacted_becauseis always populated correctly via the serializer.Batch-fetch redaction events in
serialize_eventsserialize_eventsnow collects allredacted_byIDs from the event batch upfront and fetches them in a singleget_eventscall, passing the result as aredaction_mapto eachserialize_eventcall. This reduces N individual DB round-trips to one when serializing a batch of events that includes redacted events.