Skip to content

Continue Add scan file body endpoint #83

Open
TrevisGordan wants to merge 7 commits intoelement-hq:mainfrom
TrevisGordan:continue-hs-scan-file-body
Open

Continue Add scan file body endpoint #83
TrevisGordan wants to merge 7 commits intoelement-hq:mainfrom
TrevisGordan:continue-hs-scan-file-body

Conversation

@TrevisGordan
Copy link
Copy Markdown
Contributor

Continues #79 by @Half-Shot, which implements #78 — the ability to scan a file before uploading it to Matrix.

PR #79 was closed as Half-Shot shifted away from the content scanner (comment). This PR picks up the work, addresses the review feedback from @reivilibre, and restructures the implementation.

What changed from #79

Instead of adding the new scan_file functionality on top of the existing code, I refactored the scanner so that all scan paths follow the same pipeline:

decrypt (if metadata) → write to disk → _check_mimetype() → _run_scan() → cleanup

Key change: no more streaming to disk for multipart uploads. In #79, write_multipart_to_disk streamed the upload to disk, but since ~99% of files will be encrypted, the content would immediately be read back into memory for decryption and then written to disk again. The streaming benefit was entirely negated. The multipart body is now read into memory once in the servlet layer and passed as raw bytes to the scanner.

Scanner changes

  • scan_file_on_disk renamed to scan_content — now accepts raw bytes instead of a file path.
  • _do_scan — extracted shared scan logic (mimetype check, run scan, cleanup) that was previously duplicated between _scan_media and scan_file_on_disk.
  • Removed write_multipart_to_disk — no longer needed.
  • Added async file writing via aiofiles in _write_file_to_disk.

Bug fix from #79

write_multipart_to_disk was called from the servlet with a BodyPartReader, but scan_file_on_disk received the resulting file path as its file_path parameter — then passed it to _decrypt_file which expected bytes, not a path. This is fixed by the restructuring above.

Documentation

Improved docs/api.md for the scan_file endpoint:

  • Fixed request/response sections being conflated.
  • Added curl and Python usage examples.
  • Clarified required vs. optional fields.

Suggestions for follow-up or now

  • Rename scan_file endpoint to scan_upload or scan_content to avoid confusion with the internal scan_file / scan_content scanner methods.
  • Use UUID for temp files in scan_media — the media path is used as filename but the file is deleted after scanning anyway.
  • file parameter naming — confusing, but inherited from the Matrix client-server spec EncryptedFile structure.

@TrevisGordan TrevisGordan requested a review from a team as a code owner March 9, 2026 15:05
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 9, 2026

CLA assistant check
All committers have signed the CLA.

@TrevisGordan
Copy link
Copy Markdown
Contributor Author

@reivilibre Please check out the changes and share your feedback.
Regarding the suggestion to rename the scan_file endpoint to scan_upload or scan_content to avoid confusion with the internal scan_file / scan_content scanner methods, I’d like to make that change before we merge - but I’ll wait for your input first.

@dklimpel
Copy link
Copy Markdown
Contributor

@TrevisGordan This branch has conflicts :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants