Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive file validation system for document uploads, including checks for file size, extension, and magic MIME type detection using the python-magic library. The validation logic is encapsulated in a new FileValidator class and integrated into the API views and forms. Review feedback suggests improving the robustness of extension extraction from URLs using urlparse, correcting a typo in a validation error message, and refining the FileValidator to avoid database writes during validation and overly broad exception handling.
| max_size = self._get_max_size() | ||
| if file_size > max_size: | ||
| raise FileUploadLimitException( | ||
| _(f"File size size exceeds {filesizeformat(max_size)}. Please try again with a smaller file.") |
| expected_mimes = DOCUMENT_MAGIC_MIMETYPE_MAP.get(self.extension, set()) | ||
|
|
||
| if not expected_mimes: | ||
| raise ValidationError(_(f"File type .{self.extension} cannot be verified by MIME detection.")) |
There was a problem hiding this comment.
This check is very strict and will block any file extension that is allowed in settings.ALLOWED_DOCUMENT_TYPES but not explicitly mapped in DOCUMENT_MAGIC_MIMETYPE_MAP. Consider allowing the upload to proceed (perhaps with a warning) if the extension is recognized by the system but lacks a MIME mapping, to avoid breaking support for less common document types.
| def _detect_mime(self, sample): | ||
| try: | ||
| return magic.from_buffer(sample, mime=True) | ||
| except Exception: |
| def _get_file_size(self): | ||
| if isinstance(self.file, str): | ||
| try: | ||
| return Path(self.file).stat().st_size |
| def _read_sample(self): | ||
| if isinstance(self.file, str): | ||
| try: | ||
| with open(self.file, "rb") as file_pointer: |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #14154 +/- ##
==========================================
- Coverage 74.62% 74.57% -0.06%
==========================================
Files 958 959 +1
Lines 57891 58168 +277
Branches 7889 7936 +47
==========================================
+ Hits 43202 43379 +177
- Misses 12927 13016 +89
- Partials 1762 1773 +11 🚀 New features to boost your workflow:
|
|
As agreed, validation will be moved to the level of Djang FileUploadHandlers. This way we can detect malicious files early and avoid storing them even temporarily on the disk even |
Fixes #14153
Checklist
For all pull requests:
The following are required only for core and extension modules (they are welcomed, but not required, for contrib modules):
Submitting the PR does not require you to check all items, but by the time it gets merged, they should be either satisfied or inapplicable.