Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,3 +89,7 @@
## 2026-05-18 - Jaccard Similarity Optimization via Set Arithmetic
**Learning:** In retrieval loops calculating Jaccard similarity (e.g. RAG), explicitly building a union set `A.union(B)` is expensive due to memory allocation and population.
**Action:** Use the inclusion-exclusion principle $|A \cup B| = |A| + |B| - |A \cap B|$ to calculate union size in O(1) arithmetic time after calculating the intersection. Pre-calculate $|B|$ (token count) to further reduce overhead. Use `isdisjoint()` for fast early-exit.

## 2026-05-18 - Async Event Loop Blocking in Image Uploads
**Learning:** Performing synchronous image processing (PIL resize) and file I/O (write) directly in FastAPI async handlers blocks the event loop, causing severe latency spikes under load. Unified processing pipelines (resizing/EXIF stripping) should be offloaded to thread pools to maintain responsiveness.
**Action:** Use `run_in_threadpool` for all image processing and file write operations in async endpoints. Ensure specific domain limits (like 10MB vs 20MB) are checked before calling generic utilities.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
48 changes: 25 additions & 23 deletions backend/routers/field_officer.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
"""

from fastapi import APIRouter, Depends, HTTPException, UploadFile, File, Form, Response
from fastapi.concurrency import run_in_threadpool
from sqlalchemy.orm import Session
from sqlalchemy import func, case
from typing import List, Optional
Expand Down Expand Up @@ -34,6 +35,7 @@
)
from backend.cache import visit_last_hash_cache, visit_stats_cache
from backend.schemas import BlockchainVerificationResponse
from backend.utils import process_uploaded_image, save_processed_image

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -278,7 +280,8 @@ async def upload_visit_images(
- **visit_id**: ID of the visit
- **images**: List of image files

Maximum 10 images per visit
Maximum 10 images per visit.
Optimized: Uses single-pass image processing (resize/strip EXIF) and non-blocking I/O.
"""
try:
visit = db.query(FieldOfficerVisit).filter(FieldOfficerVisit.id == visit_id).first()
Expand All @@ -303,42 +306,41 @@ async def upload_visit_images(
image_paths = []

for idx, image in enumerate(images):
# Validate content_type is present
if not image.content_type:
raise HTTPException(status_code=400, detail="File must have a content type")
# Performance optimization: Use unified image processing pipeline
# This handles validation, resizing (1024px), and EXIF stripping in one pass.

# Validate file type
if not image.content_type.startswith('image/'):
raise HTTPException(status_code=400, detail=f"File must be an image, got {image.content_type}")

# Validate filename is present
# 1. Fast-fail: Validate filename and extension
if not image.filename:
raise HTTPException(status_code=400, detail="File must have a filename")

# Validate extension
extension = image.filename.split('.')[-1].lower() if '.' in image.filename else ''

extension = image.filename.split('.')[-1].lower() if '.' in image.filename else 'jpg'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Defaulting files without an extension to .jpg can store non-JPEG bytes under a JPEG filename, causing incorrect content-type handling for uploaded visit images.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/field_officer.py, line 316:

<comment>Defaulting files without an extension to `.jpg` can store non-JPEG bytes under a JPEG filename, causing incorrect content-type handling for uploaded visit images.</comment>

<file context>
@@ -303,42 +306,41 @@ async def upload_visit_images(
-            # Validate extension
-            extension = image.filename.split('.')[-1].lower() if '.' in image.filename else ''
+
+            extension = image.filename.split('.')[-1].lower() if '.' in image.filename else 'jpg'
             if extension not in ALLOWED_IMAGE_EXTENSIONS:
                 raise HTTPException(
</file context>
Suggested change
extension = image.filename.split('.')[-1].lower() if '.' in image.filename else 'jpg'
extension = image.filename.split('.')[-1].lower() if '.' in image.filename else ''

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Extensionless files silently pass the extension check and may produce a format/extension mismatch.

When image.filename contains no ., extension defaults to 'jpg', which is in ALLOWED_IMAGE_EXTENSIONS, so the check is bypassed entirely. Downstream, safe_filename gets a .jpg suffix, but process_uploaded_image_sync preserves the PIL-detected original_format (e.g., PNG for an RGBA image), so the saved bytes can be a PNG inside a .jpg filename — a content/extension mismatch.

🐛 Proposed fix
-            extension = image.filename.split('.')[-1].lower() if '.' in image.filename else 'jpg'
-            if extension not in ALLOWED_IMAGE_EXTENSIONS:
-                raise HTTPException(
-                    status_code=400,
-                    detail=f"File extension '{extension}' not allowed."
-                )
+            if '.' not in image.filename:
+                raise HTTPException(
+                    status_code=400,
+                    detail="File must have a valid image extension."
+                )
+            extension = image.filename.rsplit('.', 1)[-1].lower()
+            if extension not in ALLOWED_IMAGE_EXTENSIONS:
+                raise HTTPException(
+                    status_code=400,
+                    detail=f"File extension '{extension}' not allowed."
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routers/field_officer.py` at line 316, The current logic treats files
with no dot in image.filename as having extension 'jpg', allowing extensionless
uploads to bypass ALLOWED_IMAGE_EXTENSIONS and causing content/extension
mismatches; modify the validation in the handler that computes extension (the
line using image.filename.split) to reject filenames without an explicit
extension (or treat them as invalid) instead of defaulting to 'jpg', and/or
consult process_uploaded_image_sync's detected original_format to derive a
canonical extension before constructing safe_filename so saved filename
extension matches the actual image format; update the check that references
ALLOWED_IMAGE_EXTENSIONS and the code path that builds safe_filename to use the
detected PIL format (via process_uploaded_image_sync or its return) or to fail
early for extensionless filenames.

if extension not in ALLOWED_IMAGE_EXTENSIONS:
raise HTTPException(
status_code=400,
detail=f"File extension '{extension}' not allowed. Allowed: {', '.join(ALLOWED_IMAGE_EXTENSIONS)}"
detail=f"File extension '{extension}' not allowed."
Comment on lines +316 to +320
)
Comment on lines 317 to 321

# Read and validate file size
content = await image.read()
if len(content) > MAX_UPLOAD_SIZE:
raise HTTPException(

# 2. Fast-fail: Validate file size (10MB limit for field officer visits)
# Must check explicitly because process_uploaded_image uses a 20MB default.
image.file.seek(0, 2)
size = image.file.tell()
image.file.seek(0)
Comment on lines +325 to +327
Comment on lines +325 to +327
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find and read the relevant file section
fd "field_officer.py" backend --exec wc -l {} \;

Repository: RohanExploit/VishwaGuru

Length of output: 104


🏁 Script executed:

# Read the actual code around lines 325-327
sed -n '310,340p' backend/routers/field_officer.py

Repository: RohanExploit/VishwaGuru

Length of output: 1575


🏁 Script executed:

# Check for the bolt.md documentation
fd "bolt.md" -o find . -name "bolt.md" -type f

Repository: RohanExploit/VishwaGuru

Length of output: 195


🏁 Script executed:

# Search for process_uploaded_image_sync
rg "process_uploaded_image_sync" -A 10 -B 2

Repository: RohanExploit/VishwaGuru

Length of output: 1404


🏁 Script executed:

# Verify UploadFile API and understand the implementation
rg "class.*UploadFile" --type python -A 15

Repository: RohanExploit/VishwaGuru

Length of output: 98


🏁 Script executed:

# Check the entire context of the endpoint to understand the flow
sed -n '280,350p' backend/routers/field_officer.py

Repository: RohanExploit/VishwaGuru

Length of output: 3193


🏁 Script executed:

# Search for bolt.md file
find . -type f -name "*bolt*"

Repository: RohanExploit/VishwaGuru

Length of output: 84


🏁 Script executed:

# Check for constants like MAX_UPLOAD_SIZE and related imports
rg "MAX_UPLOAD_SIZE|MAX_FILE_SIZE" -B 2 -A 2

Repository: RohanExploit/VishwaGuru

Length of output: 3362


🏁 Script executed:

# Look at imports in field_officer.py to understand the context
head -50 backend/routers/field_officer.py

Repository: RohanExploit/VishwaGuru

Length of output: 1714


🏁 Script executed:

# Verify the HTTP status code for payload too large (413)
rg "413|payload.*large" -i

Repository: RohanExploit/VishwaGuru

Length of output: 33936


🏁 Script executed:

# Read bolt.md to verify the documented pattern
cat ./.jules/bolt.md

Repository: RohanExploit/VishwaGuru

Length of output: 12072


🏁 Script executed:

# Check the full context of process_uploaded_image
sed -n '1,100p' backend/utils.py | head -50

Repository: RohanExploit/VishwaGuru

Length of output: 1291


🏁 Script executed:

# Look at how UploadFile is used and check if it has a .size attribute
rg "UploadFile" -B 2 -A 10 | head -80

Repository: RohanExploit/VishwaGuru

Length of output: 4427


🏁 Script executed:

# Verify if UploadFile has a .size attribute by checking FastAPI versions and documentation
rg "\.size" backend --type python -B 2 -A 2 | grep -i upload

Repository: RohanExploit/VishwaGuru

Length of output: 98


🏁 Script executed:

# Check if there are any uses of UploadFile.size in the codebase
rg "\.size\s*=|image\.size|file\.size" backend -A 2 -B 2 | head -50

Repository: RohanExploit/VishwaGuru

Length of output: 1077


🏁 Script executed:

# Verify the endpoint is async and confirm the seeking happens directly without threadpool
sed -n '280,340p' backend/routers/field_officer.py | grep -E "async|def|await|run_in_threadpool|seek|tell"

Repository: RohanExploit/VishwaGuru

Length of output: 322


🏁 Script executed:

# Check FastAPI documentation references and how UploadFile.size is used
rg "UploadFile.size|file.size" . --type-list -A 1 -B 1 | head -40

Repository: RohanExploit/VishwaGuru

Length of output: 457


🏁 Script executed:

# Verify if there are any imports of UploadFile in the field_officer.py file
head -30 backend/routers/field_officer.py | grep -i upload

Repository: RohanExploit/VishwaGuru

Length of output: 184


🏁 Script executed:

# Check the exact function signature of process_uploaded_image
rg "async def process_uploaded_image" -A 5

Repository: RohanExploit/VishwaGuru

Length of output: 426


Synchronous image.file.seek() calls block the event loop — directly contradicts this project's documented pattern.

image.file.seek(0, 2) / image.file.tell() / image.file.seek(0) are called directly on the underlying SpooledTemporaryFile, bypassing the async wrapper. FastAPI's UploadFile.seek(offset) is the awaitable version that runs in threadpool; calling .file.seek() directly is synchronous. The SpooledTemporaryFile spills to disk once it exceeds its max_size, meaning for virtually all real image uploads the seek is a blocking disk syscall. This directly contradicts the learning documented in .jules/bolt.md (2025-02-27): "UploadFile validation using python-magic and file seeking is synchronous and CPU/IO bound… Wrap file validation logic in run_in_threadpool."

Note also that process_uploaded_image_sync (via run_in_threadpool) already performs the same seek/tell for its own size check, making this a redundant blocking call.

The cleanest fixes (in order of preference):

  1. Use UploadFile.size if it is populated (FastAPI sets it from the content-length hint); fall back to the seek in threadpool if None.
  2. Pass a max_size parameter to process_uploaded_image_sync so the stricter 10 MB limit is enforced inside the threadpool where the seek already happens.
  3. Wrap in run_in_threadpool with a small sync helper.

Also use HTTP status 413 (Payload Too Large) instead of 400, consistent with RFC 7231 and this project's own utils.py.

♻️ Option 1 — use UploadFile.size with threadpool fallback
-            # 2. Fast-fail: Validate file size (10MB limit for field officer visits)
-            # Must check explicitly because process_uploaded_image uses a 20MB default.
-            image.file.seek(0, 2)
-            size = image.file.tell()
-            image.file.seek(0)
-            if size > MAX_UPLOAD_SIZE:
-                 raise HTTPException(
-                    status_code=400,
-                    detail=f"File exceeds maximum size of {MAX_UPLOAD_SIZE / 1024 / 1024:.1f} MB"
-                )
+            # 2. Fast-fail: Validate file size (10MB limit for field officer visits)
+            # Must check explicitly because process_uploaded_image uses a 20MB default.
+            # Use UploadFile.size when available; otherwise measure in threadpool to avoid
+            # blocking the event loop with a synchronous seek on a disk-spooled temp file.
+            size = image.size
+            if size is None:
+                def _measure_size(f):
+                    f.seek(0, 2); s = f.tell(); f.seek(0); return s
+                size = await run_in_threadpool(_measure_size, image.file)
+            if size > MAX_UPLOAD_SIZE:
+                raise HTTPException(
+                    status_code=413,
+                    detail=f"File exceeds maximum size of {MAX_UPLOAD_SIZE / 1024 / 1024:.1f} MB"
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routers/field_officer.py` around lines 325 - 327, The code
synchronously calls image.file.seek/tell which blocks the event loop and is
redundant because process_uploaded_image_sync already does this inside
run_in_threadpool; change the pre-check to use UploadFile.size when available
and only fall back to seeking inside run_in_threadpool (or pass max_size into
process_uploaded_image_sync so the size check happens in that threadpool task),
remove direct uses of image.file.seek/tell in this async handler, and return
HTTP 413 (Payload Too Large) instead of 400 when the size exceeds the limit;
reference image.file.seek/tell, UploadFile.size, process_uploaded_image_sync,
and run_in_threadpool in your changes.

if size > MAX_UPLOAD_SIZE:
raise HTTPException(
status_code=400,
detail=f"File {image.filename} exceeds maximum size of {MAX_UPLOAD_SIZE / 1024 / 1024:.1f} MB"
detail=f"File exceeds maximum size of {MAX_UPLOAD_SIZE / 1024 / 1024:.1f} MB"
)
Comment on lines +329 to 332
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use 413 (Content Too Large) not 400 for the size-limit rejection.

400 Bad Request is semantically incorrect for an oversized payload. The correct status is 413, which is what process_uploaded_image_sync already uses for its equivalent check — using 400 here creates an inconsistency that can confuse client-side retry logic and reverse-proxy middleware.

🐛 Proposed fix
-                 raise HTTPException(
-                    status_code=400,
-                    detail=f"File exceeds maximum size of {MAX_UPLOAD_SIZE / 1024 / 1024:.1f} MB"
+                raise HTTPException(
+                    status_code=413,
+                    detail=f"File exceeds maximum size of {MAX_UPLOAD_SIZE / 1024 / 1024:.1f} MB"
                 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routers/field_officer.py` around lines 329 - 332, The HTTP error
raised for oversized uploads in the field_officer upload path currently uses
status_code=400; change it to status_code=413 (Content Too Large) to match the
semantics and the existing check in process_uploaded_image_sync. Locate the
raise HTTPException call that references MAX_UPLOAD_SIZE and replace the status
code with 413 while keeping the existing detail message intact so client and
proxy logic remains consistent.



# 3. Process image (decode, resize, strip, encode)
_, image_bytes = await process_uploaded_image(image)

# Generate secure filename
timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')
safe_filename = f"visit_{visit_id}_{timestamp}_{idx}.{extension}"
file_path = os.path.join(VISIT_IMAGES_DIR, safe_filename)

# Save file
with open(file_path, 'wb') as f:
f.write(content)
# Save file using threadpool to avoid blocking the main event loop
await run_in_threadpool(save_processed_image, image_bytes, file_path)

# Store relative path
relative_path = os.path.join("data", "visit_images", safe_filename)
Expand Down
Loading