Skip to content

Support search redirection with policyserv#19658

Draft
turt2live wants to merge 2 commits intodevelopfrom
travis/ps-search-redirect
Draft

Support search redirection with policyserv#19658
turt2live wants to merge 2 commits intodevelopfrom
travis/ps-search-redirect

Conversation

@turt2live
Copy link
Copy Markdown
Member

Warning

This PR includes discussion of topics including child exploitation.

  • Uses acronyms related to the topic where possible
  • Uses abstract concepts rather than specific examples
  • Does not include imagery

Background context

Hat: T&S team member

When a user searches for illegal material on matrix.org, we need to ensure they don't get results. If their search is specifically related to CSEA1 or CSAM2, we'd like to redirect the user to services that can help them.

We'd like to use a number of different keyword lists and evaluate their performance over time, so we need an ability to change out which keyword APIs we're calling with relative ease. Implementing each one into Synapse would be a challenge (especially considering at least one of them explicitly cannot have its API details publicly disclosed), so we've abstracted the keyword matching away to our policy server implementation, called policyserv.

This abstraction does create a network dependency for servers (like matrix.org) which enable search redirection. The impact of policyserv being down for this request would take the following shape:

  • POST /publicRooms requests stacking up on Synapse workers due to long-by-default client timeouts
    • Amplified by clients like Element Web sending a new request for each character typed, nearly immediately.
  • Users seeing "Failed to query public rooms" or similar errors

Separate to this PR, Synapse should probably look at reducing the number of search queries which can stack up (if that hasn't already been mitigated - I haven't looked).

For more information, refer to the link tree below (some may only be available to employees):

PR description

Noted in the background context, this defers all keyword matching to policyserv (or more realistically: something that implements its API). This is done primarily to reduce code duplication and maintain the ability for certain keyword lists to be used. We felt that making this a Synapse module instead of native-to-Synapse code could lead to increased code duplication between what policyserv is already able to do and whatever modules get written (possibly in private). We also felt that a module will increase maintenance burden over time compared to the maintenance we've already committed to on policyserv. We further expect to expand the new "safety" config section we've introduced to cover more search cases (messages, users, etc) as well as native-to-Synapse invite filtering, media scanning, and more.

This is not behind an experimental feature flag because it's not implementing an MSC directly. By passing whatever error returned by policyserv through though, Synapse may unexpectedly be implementing a given MSC (we promise not to violate the unstable features policy of the MSC process to mitigate this). We also expect to use this on matrix.org pretty much right away.

TODO:

  • Tests to ensure this actually works (manual testing says it's fine)
  • Introduce a very short timeout on the policyserv request to limit the number of stacked requests

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

Footnotes (content warning: acronyms spelled out)

1CSEA - Child Sexual Exploitation & Abuse
2CSAM - Child Sexual Abuse Material

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant