Introduce `spam_checker_spammy` internal event metadata. by reivilibre · Pull Request #19453 · element-hq/synapse

reivilibre · 2026-02-11T13:29:06Z

Follows: #19365

Part of: MSC4354 Sticky Events (experimental feature #19409)

This PR introduces a spam_checker_spammy flag, analogous to policy_server_spammy, as an explicit flag
that an event was decided to be spammy by a spam-checker module.

The original Sticky Events PR (#18968) just reused policy_server_spammy, but it didn't sit right with me
because we (at least appear to be experimenting with features that) allow users to opt-in to seeing
policy_server_spammy events (presumably for moderation purposes).

Keeping these flags separate felt best, therefore.

As for why we need this flag: soon soft-failed status won't be permanent, at least for sticky events.
The spam checker modules currently work by making events soft-failed.
We want to prevent spammy events from getting reconsidered/un-soft-failed, so it seems like we need
a flag to track spam-checker spamminess separately from soft-failed.

Should be commit-by-commit friendly, but is also small.

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>

reivilibre · 2026-02-12T15:49:43Z

~~Opening this for feedback, still aware I need to add a test but I doubt I will have time today or tomorrow~~ Test added

MadLittleMods · 2026-02-23T15:04:07Z

    PolicyServerSpammy(bool),
+    SpamCheckerSpammy(bool),


Should this be an enum which would allow future adaptions like this? It makes sense that these could be both true so that may not apply in any case. I'm guessing we have some early-returns if either things marks the event as spammy though?

An enum isn't a bad idea actually, but I can't think of a pleasant way of doing that migration, which leads me to think we might be better off keeping the 2 bools but also addressing the 'early-returns' point you make.

Early-return removed at be8c05b so we evaluate both attributes

MadLittleMods · 2026-02-23T16:17:59Z

    PolicyServerSpammy(bool),
+    SpamCheckerSpammy(bool),


Is "spam checker" only for the module API?

Yes, or really there is also the pre-module-API 'spam checker API' but it got superseded by the module API, I think we have a compatibility shim.

MadLittleMods · 2026-02-23T16:33:01Z

+            # Mark this as spam so we don't re-evaluate soft-failure status.
+            redacted_event.internal_metadata.spam_checker_spammy = True


Based on how this was used in #18968, perhaps insert_sticky_events_txn should be updated to instead skip any soft-failed events 🤔

Especially as the reasoning there is "Skipping the insertion of these types of 'invalid' events is useful for performance reasons because they would fill up the table yet we wouldn't show them to clients anyway." and soft_failed covers the behavior of showing events to clients.

Ahh, this comment below clarifies why we distinguish this:

Note: Soft-failed sticky events ARE inserted, as their soft-failed status could be re-evaluated later.

MadLittleMods · 2026-02-23T16:44:26Z

+            {self.remote_user_join_event.event_id},
+        )
+
+    def test_federated_events_with_spam_checker_metadata(self) -> None:


It would be nice to add a in-repo Complement test for this sort of thing. Only hiccup would be setting up/configuring up the spam checker module but we could always configure one and have some specific constant that triggers this behavior SPAM_CHECKER_SPAM

I'm not sure how much it fits; I feel the trial test is adequate and is nicer to develop on.
The spam_checker_spammy field is also internal so wouldn't be visible externally.
Am I missing something really beneficial that would be covered?

To make it more clear, I think the trial test is sufficient 👍

It was more an ideal. One benefit would be avoiding the room version details (like this problem). We could also avoid the internal details of manually puppeting federation here.

In terms of checking, we would probably check via /sync (send a sentinel event after the spammy one) and/or admin API's if the spam_checker_spammy value was actually important.

…ta` overrides Extra type annotations needed because `**kwargs` now has a wider type

MadLittleMods · 2026-04-10T18:45:00Z

+        token = self.login(user_id, "pass")
+        room_id = self.helper.create_room_as(user_id, tok=token)
+
+        start_id = self.store.get_max_sticky_events_stream_id()


Not exactly a problem of this PR specifically but the name get_max_sticky_events_stream_id() feels a bit off.

get_current_sticky_events_stream_id seems more appropriate. I see how it got the name "max" because that's what the underlying get_current_token() says but then it gets confusing when we also have get_max_allocated_token()

Recently dealt with this in #19558 / #19677 and ended up with

synapse/synapse/storage/databases/main/room.py

Lines 1305 to 1319 in 62523d8

async def get_current_quarantined_media_stream_id(self) -> int:

"""Gets the position of the quarantined media changes stream.

Returns:

int - the current stream ID

"""

return self._quarantined_media_changes_id_gen.get_current_token()

async def get_max_allocated_quarantined_media_stream_id(self) -> int:

"""Gets the maximum allocated position of the quarantined media changes stream.

Returns:

int - the maximum stream ID

"""

return await self._quarantined_media_changes_id_gen.get_max_allocated_token()

The reason this name got introduced is because this is what account_data does, which I was using as a reference for thread subscriptions (iirc).

And of course since documenting the 'how to add a stream' cheatsheet, everyone has been cribbing off that, copying the pattern further.

You are very correct though, in a multi-writer stream, max_stream_id (or thereabouts) sounds like it would be the maximum position any writer had persisted.

Whereas the current one actually sounds like it would make more sense being called 'min', based on this docstring of the underlying function:

def get_persisted_upto_position(self) -> int: """Get the max position where all previous positions have been persisted. Note: In the worst case scenario this will be equal to the minimum # <-------------------- position across writers. This means that the returned position here can lag if one writer doesn't write very often. """

I do prefer the 'current' idea and I think I will update all streams to match that (in a subsequent PR of course)

We have even more variants, have asked internally to give people a chance to provide an opinion on whether any one of them has a better reason to be standard than the others.

stream_token e.g. get_to_device_stream_token

current_key e.g. self.sources.account_data.get_current_key() (TIL about this variant)

max_stream_id e.g. get_max_push_rules_stream_id

current_stream_id e.g. get_current_quarantined_media_stream_id

It might still make sense to exclude the events stream from the discussion/any potential change, since we have both a min and max there and therefore it's hard to drop the 'max' name.... (although we call those the room_min and room_max even though it's about the events stream :S)

After people have a chance to have some input I will go through and try to unify these

reivilibre added 4 commits February 11, 2026 13:21

Introduce internal_metadata for spam_checker_spammy

e1bf75a

Set spam_checker_spammy when a spam checker thinks an event is spammy

8c54c12

Sticky Events: don't insert them if they are spam_checker_spammy

1facd0d

Newsfile

be62d4c

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>

reivilibre marked this pull request as ready for review February 12, 2026 15:49

reivilibre requested a review from a team as a code owner February 12, 2026 15:49

MadLittleMods added X-spam and removed X-spam labels Feb 16, 2026

reivilibre added the A-Modules label Feb 20, 2026

reivilibre added 2 commits February 23, 2026 14:38

Add a test for the spam_checker_spammy flag

96059cb

drive-by typo fix

8835bc8

MadLittleMods reviewed Feb 23, 2026

View reviewed changes

reivilibre added 3 commits March 5, 2026 16:43

Evaluate both policy server and spam checker

be8c05b

Use get_event helper

e274e23

Pin the room version

630ff19

reivilibre requested a review from MadLittleMods April 10, 2026 11:43

reivilibre added 2 commits April 10, 2026 13:32

Add sticky event storage tests for soft-failed and spammy events

e225d6d

event_injection: Refactor to not use bare kwargs for `internal_metada…

71137f6

…ta` overrides Extra type annotations needed because `**kwargs` now has a wider type

reivilibre force-pushed the rei/sticky_events_spam_checker_spammy branch from 07b866c to 71137f6 Compare April 10, 2026 12:33

MadLittleMods approved these changes Apr 10, 2026

View reviewed changes

MadLittleMods added rust Z-Rust labels Apr 10, 2026

fixup! Add a test for the spam_checker_spammy flag

709631d

reivilibre mentioned this pull request Apr 13, 2026

Standardise on current_stream_pos nomenclature for current stream positions #19685

Open

reivilibre merged commit 52c05c5 into develop Apr 15, 2026
109 of 115 checks passed

reivilibre deleted the rei/sticky_events_spam_checker_spammy branch April 15, 2026 15:53

		# Mark this as spam so we don't re-evaluate soft-failure status.
		redacted_event.internal_metadata.spam_checker_spammy = True

	async def get_current_quarantined_media_stream_id(self) -> int:
	"""Gets the position of the quarantined media changes stream.

	Returns:
	int - the current stream ID
	"""
	return self._quarantined_media_changes_id_gen.get_current_token()

	async def get_max_allocated_quarantined_media_stream_id(self) -> int:
	"""Gets the maximum allocated position of the quarantined media changes stream.

	Returns:
	int - the maximum stream ID
	"""
	return await self._quarantined_media_changes_id_gen.get_max_allocated_token()

Conversation

reivilibre commented Feb 11, 2026

Uh oh!

reivilibre commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reivilibre Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

reivilibre commented Feb 12, 2026 •

edited

Loading

MadLittleMods Feb 23, 2026 •

edited

Loading

reivilibre Apr 13, 2026 •

edited

Loading