Skip to content

Add an API to list changes to quarantine state of media#19558

Merged
turt2live merged 76 commits intodevelopfrom
travis/list-quarantined-media-mk2
Apr 9, 2026
Merged

Add an API to list changes to quarantine state of media#19558
turt2live merged 76 commits intodevelopfrom
travis/list-quarantined-media-mk2

Conversation

@turt2live
Copy link
Copy Markdown
Member

@turt2live turt2live commented Mar 14, 2026

Fixes #19352

(See issue for history of this feature and previous PRs)

First, a naive implementation of the endpoint was introduced, but it quickly ran into performance issues on query and long startup times, leading to its removal. It also didn't actually work, and would fail to expose media when it was "unquarantined", so a partial fix was attempted, where the suggested direction is to use a stream instead of a timestamp column.

This PR re-introduces the API building on the previous feedback:

  • Adds a stream which tracks when media becomes (un)quarantined.
  • Runs a background update to capture already-quarantined media.
  • Adds a new admin API to return rows from the stream table.

We track both quarantine and unquarantine actions in the stream to allow downstream consumers to process the records appropriately. Namely, to allow our Synapse exchange in HMA to remove hashes for unquarantined media (use case further explained in the issue).

Note: This knowingly does not capture all cases of media being quarantined. Other call sites are lower priority for T&S, and can be addressed in a future PR. An issue will be created after this PR is merged to track those sites. #19672

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

@turt2live turt2live force-pushed the travis/list-quarantined-media-mk2 branch from e76dc83 to 6328e78 Compare March 16, 2026 19:27
@turt2live turt2live force-pushed the travis/list-quarantined-media-mk2 branch from 258768e to f1a35fa Compare March 16, 2026 19:32
@turt2live turt2live marked this pull request as ready for review March 16, 2026 19:34
@turt2live turt2live requested a review from a team as a code owner March 16, 2026 19:34
turt2live added a commit that referenced this pull request Mar 18, 2026
Just something I noticed while working on
#19558

We start the function by setting `total_media_quarantined` to zero, then
we do work on the `media_ids`, add the number affected, zero it out
(**bug**), do work on `hashes`, add the number of affected rows, then
return `total_media_quarantined`.

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
Comment thread docs/admin_api/media_admin_api.md
Comment thread docs/admin_api/media_admin_api.md Outdated
Comment thread synapse/storage/databases/main/room.py
Comment thread synapse/rest/admin/media.py
Comment thread synapse/storage/databases/main/room.py Outdated
Comment thread synapse/storage/databases/main/room.py
Comment thread synapse/storage/databases/main/room.py
Comment thread synapse/storage/databases/main/room.py
Comment thread synapse/storage/databases/main/room.py
Comment thread synapse/storage/databases/main/room.py
Comment thread synapse/storage/databases/main/room.py
Comment thread tests/storage/test_room.py Outdated
Comment thread tests/storage/test_room.py
Comment thread tests/rest/admin/test_media.py Outdated
Comment thread synapse/storage/databases/main/room.py
Comment thread tests/storage/test_room.py Outdated
Comment thread tests/storage/test_room.py
Comment thread docs/admin_api/media_admin_api.md Outdated
Comment thread docs/admin_api/media_admin_api.md
turt2live and others added 2 commits April 8, 2026 17:16
Co-authored-by: Eric Eastwood <madlittlemods@gmail.com>
@turt2live turt2live requested a review from MadLittleMods April 8, 2026 23:35
Comment thread synapse/rest/admin/media.py Outdated
HTTPStatus.INTERNAL_SERVER_ERROR,
"Timed out while waiting for stream position",
errcode=Codes.UNKNOWN,
)
Copy link
Copy Markdown
Contributor

@MadLittleMods MadLittleMods Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we'd also validate from_id before waiting and throw M_INVALID_PARAM. This way people see a more sane error instead of 500 {'errcode': 'M_UNKNOWN', 'error': 'Internal server error'} from the assertion in wait_for_quarantined_media_stream_id(...).

This kind of thing is mentioned in #19644 "tokens should be validated before it reaches this point."

There isn't a helper for this so it would be a similar if from_id < max_persisted_position: error check.

Since this is an admin endpoint we could forgo this but it would be nice to have a good example in the codebase.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be covered in 51f9f0e - let me know if changes are needed

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(will follow up in a subsequent PR if required)

Copy link
Copy Markdown
Contributor

@MadLittleMods MadLittleMods Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if to_id < from_id: error check that was added in 51f9f0e is a bit flawed since to_id is the current token of the current worker (which could be behind the other workers). (and it's okay for to_id == from_id as that would mean no activity has happened)

I think we want to compare to get_max_allocated_token() which actually looks at the database source of truth across all of the workers.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd then need to wait for that token too, I believe?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't - if we're caught up on from then we might return fewer results, but that's fine.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread tests/storage/test_room.py
turt2live and others added 2 commits April 9, 2026 08:46
Co-authored-by: Eric Eastwood <erice@element.io>
Co-authored-by: Eric Eastwood <erice@element.io>
MadLittleMods added a commit that referenced this pull request Apr 9, 2026
…portdb` (#19675)

Part of #19671

Spawning from [discussion in
`#synapse-dev:matrix.org`](https://matrix.to/#/!i5D5LLct_DYG-4hQprLzrxdbZ580U9UB6AEgFnk6rZQ/$Z3nqbH0Qy21FWC3qJOim6LSRCRpJ3pxV5DLXm98IA6I?via=element.io&via=matrix.org&via=beeper.com)
with roots in
#19558 (comment).
As trialed/discovered by @turt2live alongside @reivilibre and @clokep
❤️


### Why is this necessary?

If you forget to add `_setup_sequence(...)`, you can run into the
following error if there is 1 row in SQLite and then you use the
`portdb` script to try to migrate to Postgres (as
[explained](https://matrix.to/#/!i5D5LLct_DYG-4hQprLzrxdbZ580U9UB6AEgFnk6rZQ/$mHU6dcTNL7NMfKBCJUekCh7vDj1lr1GDjriZQl7oeeU?via=element.io&via=matrix.org&via=beeper.com)
by @reivilibre)

```
Postgres sequence 'quarantined_media_id_seq' is inconsistent with associated stream position
of 'quarantined_media' in the 'stream_positions' table.
```
@turt2live
Copy link
Copy Markdown
Member Author

Thanks for working with me on this :)

@turt2live turt2live merged commit fe74265 into develop Apr 9, 2026
48 checks passed
@turt2live turt2live deleted the travis/list-quarantined-media-mk2 branch April 9, 2026 17:41
@MadLittleMods
Copy link
Copy Markdown
Contributor

Thanks for pushing through all of the hard stuff @turt2live 🦏

turt2live added a commit that referenced this pull request Apr 9, 2026
…#19677)

Following up on
#19558 (comment)

Changelog for this PR is intended to overlap with the above PR.

`get_current_quarantined_media_stream_id` wasn't being used anywhere
else, so we can replace it like we do in this PR.

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))

---------

Co-authored-by: Eric Eastwood <erice@element.io>
Co-authored-by: Eric Eastwood <madlittlemods@gmail.com>
reivilibre pushed a commit that referenced this pull request Apr 16, 2026
Fixes #19692

Introduced by #19558

---------

Co-authored-by: Eric Eastwood <madlittlemods@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Admin API to list quarantined media

2 participants