Skip to content

Remove redacted_because from internal unsigned.#19581

Merged
erikjohnston merged 21 commits intodevelopfrom
erikj/simplify_redaction
Mar 26, 2026
Merged

Remove redacted_because from internal unsigned.#19581
erikjohnston merged 21 commits intodevelopfrom
erikj/simplify_redaction

Conversation

@erikjohnston
Copy link
Copy Markdown
Member

@erikjohnston erikjohnston commented Mar 17, 2026

This is a simplification so that unsigned only includes "simple" values, to make it easier to port to Rust.

Reviewable commit-by-commit

Summary:

  1. Add recheck column to redactions table

    A new boolean recheck column (default true) is added to the redactions table. This captures whether a redaction needs its sender domain checked at read time — required for room v3+ where redactions are accepted speculatively and later validated. When persisting a new redaction, recheck is set directly from event.internal_metadata.need_to_check_redaction().

    It's fine if initially we recheck all redactions, as it only results in a little more CPU overhead (as we always pull out the redaction event regardless).

  2. Backfill recheck via background update

    A background update (redactions_recheck) backfills the new column for existing rows by reading recheck_redaction from each event's internal_metadata JSON. This avoids loading full event objects by reading event_json directly via a SQL JOIN.

  3. Don't fetch confirmed redaction events from the DB

    Previously, when loading events, Synapse recursively fetched all redaction events regardless of whether they needed domain rechecking. Now _fetch_event_rows reads the recheck column and splits redactions into two lists:

    • unconfirmed_redactions — need fetching and domain validation
    • confirmed_redactions — already validated, applied directly without fetching the event

    This avoids unnecessary DB reads for the common case of already-confirmed redactions.

  4. Move redacted_because population to EventClientSerializer

    Previously, redacted_because (the full redaction event object) was stored in event.unsigned at DB fetch time, coupling storage-layer code to client serialization concerns. This is removed from _maybe_redact_event_row and moved into EventClientSerializer.serialize_event, which fetches the redaction event on demand. The storage layer now only sets unsigned["redacted_by"] (the redaction event ID).

  5. Always use EventClientSerializer

    The standalone serialize_event function was made private (_serialize_event). All external callers — rest/client/room.py, rest/admin/events.py, appservice/api.py, and tests — were updated to use EventClientSerializer.serialize_event / serialize_events, ensuring
    redacted_because is always populated correctly via the serializer.

  6. Batch-fetch redaction events in serialize_events

    serialize_events now collects all redacted_by IDs from the event batch upfront and fetches them in a single get_events call, passing the result as a redaction_map to each serialize_event call. This reduces N individual DB round-trips to one when serializing a batch of events that includes redacted events.

@erikjohnston erikjohnston force-pushed the erikj/simplify_redaction branch from c2d73b6 to 7d90b4c Compare March 17, 2026 12:29
Comment thread synapse/appservice/api.py
@erikjohnston erikjohnston force-pushed the erikj/simplify_redaction branch from 7d90b4c to b99fb4e Compare March 17, 2026 12:38
@erikjohnston erikjohnston marked this pull request as ready for review March 17, 2026 14:56
@erikjohnston erikjohnston requested a review from a team as a code owner March 17, 2026 14:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR simplifies redaction handling by removing redacted_because from events’ internal unsigned data (to keep unsigned “simple” for future Rust porting), while adding a DB-backed mechanism (redactions.recheck) to track whether a redaction needs sender-domain revalidation at read time.

Changes:

  • Add recheck column to redactions, plus a background update to backfill it from event_json.internal_metadata.
  • Split redactions into confirmed vs unconfirmed at read time to avoid fetching confirmed redaction events from the DB.
  • Move redacted_because construction into EventClientSerializer (with batch redaction-event fetching in serialize_events) and update callers/tests accordingly.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/storage/test_redaction.py Updates storage-layer assertions to use internal_metadata.redacted_by instead of unsigned["redacted_because"].
tests/storage/test_events_bg_updates.py Adds coverage for the new redactions_recheck background update backfill behavior.
tests/rest/client/test_rooms.py Adjusts tests to check internal_metadata.redacted_by for stored events.
tests/replication/storage/test_events.py Updates replication tests to set internal_metadata.redacted_by rather than unsigned redaction fields.
tests/events/test_utils.py Switches tests to use EventClientSerializer (async) rather than the standalone serializer.
synapse/types/storage/init.py Adds background update name constant for redactions_recheck.
synapse/synapse_rust/events.pyi Exposes EventInternalMetadata.redacted_by in the Python typing stub.
synapse/storage/schema/main/delta/94/01_redactions_recheck.sql Adds recheck column to redactions table.
synapse/storage/schema/main/delta/94/02_redactions_recheck_bg_update.sql Schedules the redactions_recheck background update.
synapse/storage/schema/init.py Bumps schema version to 94 and documents the change.
synapse/storage/databases/main/events_worker.py Loads redactions.recheck to classify redactions into confirmed/unconfirmed and updates redaction application logic.
synapse/storage/databases/main/events_bg_updates.py Implements the redactions_recheck backfill background update.
synapse/storage/databases/main/events.py Persists redactions.recheck from need_to_check_redaction() when storing redactions.
synapse/rest/client/room.py Ensures client-facing serialization uses EventClientSerializer.
synapse/rest/admin/events.py Ensures admin event serialization uses EventClientSerializer.
synapse/events/utils.py Makes the old serializer private, and populates redacted_because via EventClientSerializer (with batch-fetch support).
synapse/appservice/api.py Switches appservice serialization to serialize_events.
synapse/_scripts/synapse_port_db.py Adds recheck to the list of boolean columns for DB porting.
rust/src/events/internal_metadata.rs Adds redacted_by as a non-persisted internal metadata field in the Rust implementation.
changelog.d/19581.misc Changelog entry for removing redacted_because from internal unsigned.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread synapse/appservice/api.py
Comment thread synapse/events/utils.py
Comment thread synapse/storage/databases/main/events_worker.py Outdated
Copy link
Copy Markdown
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small things; nothing too major. Nice that we could get a logic speedup from this work!

Comment thread synapse/storage/schema/main/delta/94/01_redactions_recheck.sql
Comment on lines +175 to +176
Changes in SCHEMA_VERSION = 94
- Add `recheck` column (boolean, default true) to the `redactions` table.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this will conflict with #19556, which adds the same new schema version with 01_*.sql files.

Though this PR will likely merge first.

Comment thread synapse/storage/databases/main/events_bg_updates.py Outdated
Comment thread synapse/storage/schema/main/delta/94/02_redactions_recheck_bg_update.sql Outdated
Comment thread tests/storage/test_events_bg_updates.py
Comment thread synapse/appservice/api.py
Comment thread synapse/appservice/api.py
Comment thread synapse/storage/databases/main/events_worker.py
Comment thread synapse/events/utils.py
Comment thread synapse/events/utils.py
Copy link
Copy Markdown
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment thread tests/events/test_utils.py Outdated
Comment thread synapse/events/utils.py Outdated
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
@erikjohnston erikjohnston merged commit 539f708 into develop Mar 26, 2026
44 of 46 checks passed
@erikjohnston erikjohnston deleted the erikj/simplify_redaction branch March 26, 2026 09:18
-- <https://www.gnu.org/licenses/agpl-3.0.html>.


ALTER TABLE redactions ADD COLUMN recheck boolean NOT NULL DEFAULT true;
Copy link
Copy Markdown
Contributor

@MadLittleMods MadLittleMods Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Add recheck column to redactions table

    A new boolean recheck column (default true) is added to the redactions table. This captures whether a redaction needs its sender domain checked at read time — required for room v3+ where redactions are accepted speculatively and later validated. When persisting a new redaction, recheck is set directly from event.internal_metadata.need_to_check_redaction().

    It's fine if initially we recheck all redactions, as it only results in a little more CPU overhead (as we always pull out the redaction event regardless).

-- PR description

Feels like this context isn't explained anywhere in the code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants