Support search redirection with policyserv#19658
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Warning
This PR includes discussion of topics including child exploitation.
Background context
Hat: T&S team member
When a user searches for illegal material on matrix.org, we need to ensure they don't get results. If their search is specifically related to CSEA1 or CSAM2, we'd like to redirect the user to services that can help them.
We'd like to use a number of different keyword lists and evaluate their performance over time, so we need an ability to change out which keyword APIs we're calling with relative ease. Implementing each one into Synapse would be a challenge (especially considering at least one of them explicitly cannot have its API details publicly disclosed), so we've abstracted the keyword matching away to our policy server implementation, called policyserv.
This abstraction does create a network dependency for servers (like matrix.org) which enable search redirection. The impact of policyserv being down for this request would take the following shape:
POST /publicRoomsrequests stacking up on Synapse workers due to long-by-default client timeoutsSeparate to this PR, Synapse should probably look at reducing the number of search queries which can stack up (if that hasn't already been mitigated - I haven't looked).
For more information, refer to the link tree below (some may only be available to employees):
M_SAFETYerror code (this very PR causes Synapse to exposeM_SAFETYsemantics onPOST /publicRoomsbecause the policyserv endpoint we call implements MSC4387)PR description
Noted in the background context, this defers all keyword matching to policyserv (or more realistically: something that implements its API). This is done primarily to reduce code duplication and maintain the ability for certain keyword lists to be used. We felt that making this a Synapse module instead of native-to-Synapse code could lead to increased code duplication between what policyserv is already able to do and whatever modules get written (possibly in private). We also felt that a module will increase maintenance burden over time compared to the maintenance we've already committed to on policyserv. We further expect to expand the new "safety" config section we've introduced to cover more search cases (messages, users, etc) as well as native-to-Synapse invite filtering, media scanning, and more.
This is not behind an experimental feature flag because it's not implementing an MSC directly. By passing whatever error returned by policyserv through though, Synapse may unexpectedly be implementing a given MSC (we promise not to violate the unstable features policy of the MSC process to mitigate this). We also expect to use this on matrix.org pretty much right away.
TODO:
Pull Request Checklist
EventStoretoEventWorkerStore.".code blocks.Footnotes (content warning: acronyms spelled out)
1CSEA - Child Sexual Exploitation & Abuse
2CSAM - Child Sexual Abuse Material