Skip to content

[Fixes #14153] Validation of Uploads#14154

Closed
sijandh35 wants to merge 2 commits intomasterfrom
ISSUE_14153
Closed

[Fixes #14153] Validation of Uploads#14154
sijandh35 wants to merge 2 commits intomasterfrom
ISSUE_14153

Conversation

@sijandh35
Copy link
Copy Markdown
Contributor

@sijandh35 sijandh35 commented Apr 21, 2026

Fixes #14153

Checklist

Reviewing is a process done by project maintainers, mostly on a volunteer basis. We try to keep the overhead as small as possible and appreciate if you help us to do so by completing the following items. Feel free to ask in a comment if you have troubles with any of them.

For all pull requests:

  • Confirm you have read the contribution guidelines
  • You have sent a Contribution Licence Agreement (CLA) as necessary (not required for small changes, e.g., fixing typos in the documentation)
  • Make sure the first PR targets the master branch, eventual backports will be managed later. This can be ignored if the PR is fixing an issue that only happens in a specific branch, but not in newer ones.

The following are required only for core and extension modules (they are welcomed, but not required, for contrib modules):

  • There is a ticket in https://github.com/GeoNode/geonode/issues describing the issue/improvement/feature (a notable exemption is, changes not visible to end-users)
  • The issue connected to the PR must have Labels and Milestone assigned
  • PR for bug fixes and small new features are presented as a single commit
  • PR title must be in the form "[Fixes #<issue_number>] Title of the PR"
  • New unit tests have been added covering the changes, unless there is an explanation on why the tests are not necessary/implemented

Submitting the PR does not require you to check all items, but by the time it gets merged, they should be either satisfied or inapplicable.

@cla-bot cla-bot Bot added the cla-signed CLA Bot: community license agreement signed label Apr 21, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive file validation system for document uploads, including checks for file size, extension, and magic MIME type detection using the python-magic library. The validation logic is encapsulated in a new FileValidator class and integrated into the API views and forms. Review feedback suggests improving the robustness of extension extraction from URLs using urlparse, correcting a typo in a validation error message, and refining the FileValidator to avoid database writes during validation and overly broad exception handling.

Comment thread geonode/documents/api/views.py Outdated
Comment thread geonode/documents/api/views.py
max_size = self._get_max_size()
if file_size > max_size:
raise FileUploadLimitException(
_(f"File size size exceeds {filesizeformat(max_size)}. Please try again with a smaller file.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a minor typo in the error message: "size size".

Suggested change
_(f"File size size exceeds {filesizeformat(max_size)}. Please try again with a smaller file.")
_(f"File size exceeds {filesizeformat(max_size)}. Please try again with a smaller file.")

expected_mimes = DOCUMENT_MAGIC_MIMETYPE_MAP.get(self.extension, set())

if not expected_mimes:
raise ValidationError(_(f"File type .{self.extension} cannot be verified by MIME detection."))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This check is very strict and will block any file extension that is allowed in settings.ALLOWED_DOCUMENT_TYPES but not explicitly mapped in DOCUMENT_MAGIC_MIMETYPE_MAP. Consider allowing the upload to proceed (perhaps with a warning) if the extension is recognized by the system but lacks a MIME mapping, to avoid breaking support for less common document types.

Comment thread geonode/upload/validators.py Outdated
def _detect_mime(self, sample):
try:
return magic.from_buffer(sample, mime=True)
except Exception:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Avoid catching the broad Exception class. It is better to catch specific exceptions that magic.from_buffer might raise (such as magic.MagicError or TypeError) to prevent masking unrelated bugs or system-level issues.

def _get_file_size(self):
if isinstance(self.file, str):
try:
return Path(self.file).stat().st_size
def _read_sample(self):
if isinstance(self.file, str):
try:
with open(self.file, "rb") as file_pointer:
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 81.88406% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.57%. Comparing base (94ca636) to head (32c417a).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #14154      +/-   ##
==========================================
- Coverage   74.62%   74.57%   -0.06%     
==========================================
  Files         958      959       +1     
  Lines       57891    58168     +277     
  Branches     7889     7936      +47     
==========================================
+ Hits        43202    43379     +177     
- Misses      12927    13016      +89     
- Partials     1762     1773      +11     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sijandh35 sijandh35 marked this pull request as draft April 21, 2026 07:29
@giohappy
Copy link
Copy Markdown
Contributor

As agreed, validation will be moved to the level of Djang FileUploadHandlers. This way we can detect malicious files early and avoid storing them even temporarily on the disk even

@giohappy giohappy closed this Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed CLA Bot: community license agreement signed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validation of file uploads

4 participants