Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,3 +81,7 @@
## 2026-05-15 - Serialization Caching Bypass
**Learning:** Caching raw Python objects (like SQLAlchemy models or Pydantic instances) in a high-traffic API still incurs significant overhead because FastAPI/Pydantic must re-serialize the data on every request.
**Action:** Serialize data to a JSON string using `json.dumps()` BEFORE caching. On cache hits, return a raw `fastapi.Response(content=..., media_type="application/json")`. This bypasses the validation and serialization layer, resulting in significant performance gains (up to 50x in benchmarks).

## 2026-06-12 - Threadpool Offloading for Tail Latency
**Learning:** Mixed I/O operations (Database and File System) in FastAPI `async def` endpoints block the event loop, causing severe tail latency spikes under concurrency. Explicitly offloading these to `run_in_threadpool` is essential for maintaining responsiveness.
**Action:** Wrap all synchronous DB and File I/O operations in `run_in_threadpool`. For purely blocking background tasks, use standard `def` instead of `async def` to leverage FastAPI's automatic threadpool execution.
Comment on lines +85 to +87
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟑 Minor

Use the actual PR date for this learning entry.

Line 85 is dated 2026-06-12, which is in the future relative to this PR/current date: April 22, 2026.

πŸ“ Proposed fix
-## 2026-06-12 - Threadpool Offloading for Tail Latency
+## 2026-04-22 - Threadpool Offloading for Tail Latency
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## 2026-06-12 - Threadpool Offloading for Tail Latency
**Learning:** Mixed I/O operations (Database and File System) in FastAPI `async def` endpoints block the event loop, causing severe tail latency spikes under concurrency. Explicitly offloading these to `run_in_threadpool` is essential for maintaining responsiveness.
**Action:** Wrap all synchronous DB and File I/O operations in `run_in_threadpool`. For purely blocking background tasks, use standard `def` instead of `async def` to leverage FastAPI's automatic threadpool execution.
## 2026-04-22 - Threadpool Offloading for Tail Latency
**Learning:** Mixed I/O operations (Database and File System) in FastAPI `async def` endpoints block the event loop, causing severe tail latency spikes under concurrency. Explicitly offloading these to `run_in_threadpool` is essential for maintaining responsiveness.
**Action:** Wrap all synchronous DB and File I/O operations in `run_in_threadpool`. For purely blocking background tasks, use standard `def` instead of `async def` to leverage FastAPI's automatic threadpool execution.
πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.jules/bolt.md around lines 85 - 87, The learning entry titled "Threadpool
Offloading for Tail Latency" currently uses the future date "2026-06-12"; update
that header date to the actual PR date ("2026-04-22") so the entry reflects the
correct timeline, leaving the rest of the content (references to FastAPI,
run_in_threadpool, and the guidance about async def vs def) unchanged.

17 changes: 13 additions & 4 deletions backend/routers/field_officer.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
"""

from fastapi import APIRouter, Depends, HTTPException, UploadFile, File, Form, Response
from fastapi.concurrency import run_in_threadpool
from sqlalchemy.orm import Session
from sqlalchemy import func, case
from typing import List, Optional
Expand Down Expand Up @@ -281,7 +282,10 @@ async def upload_visit_images(
Maximum 10 images per visit
"""
try:
visit = db.query(FieldOfficerVisit).filter(FieldOfficerVisit.id == visit_id).first()
# Performance Optimization: Wrap blocking DB query in threadpool
visit = await run_in_threadpool(
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: SQLAlchemy Session/ORM object visit is used across thread boundaries: queried in a threadpool worker here, mutated on the event loop (lines 357–358), then committed in yet another threadpool dispatch. SQLAlchemy Sessions are not thread-safe and ORM instances are bound to the session that loaded them. This can cause DetachedInstanceError, state corruption, or silent data loss.

Extract a single synchronous helper that performs the query, mutation, and commit atomically within one thread, and call it via run_in_threadpool.

Prompt for AI agents
Check if this issue is valid β€” if so, understand the root cause and fix it. At backend/routers/field_officer.py, line 286:

<comment>SQLAlchemy Session/ORM object `visit` is used across thread boundaries: queried in a threadpool worker here, mutated on the event loop (lines 357–358), then committed in yet another threadpool dispatch. SQLAlchemy Sessions are not thread-safe and ORM instances are bound to the session that loaded them. This can cause `DetachedInstanceError`, state corruption, or silent data loss.

Extract a single synchronous helper that performs the query, mutation, and commit atomically within one thread, and call it via `run_in_threadpool`.</comment>

<file context>
@@ -281,7 +282,10 @@ async def upload_visit_images(
     try:
-        visit = db.query(FieldOfficerVisit).filter(FieldOfficerVisit.id == visit_id).first()
+        # Performance Optimization: Wrap blocking DB query in threadpool
+        visit = await run_in_threadpool(
+            lambda: db.query(FieldOfficerVisit).filter(FieldOfficerVisit.id == visit_id).first()
+        )
</file context>
Fix with Cubic

lambda: db.query(FieldOfficerVisit).filter(FieldOfficerVisit.id == visit_id).first()
)
Comment on lines +285 to +288
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loads a full FieldOfficerVisit ORM object in a worker thread and then uses/mutates it in the async context. SQLAlchemy ORM instances (and their associated Session) are not thread-safe across thread boundaries. Prefer moving all DB interaction for this request into a single run_in_threadpool call (or a sync helper) that queries/updates/commits using its own Session, and have the async handler only do file reads/writes and pass primitives (visit_id, image_paths) into that helper.

Copilot uses AI. Check for mistakes.
Comment on lines +285 to +288
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Inspect database setup and field officer DB threadpool boundaries.

echo "## backend/database.py"
fd -a '^database\.py$' backend | xargs -r sed -n '1,220p'

echo
echo "## field_officer.py DB offloading around upload_visit_images"
rg -n -C5 'upload_visit_images|run_in_threadpool\(|db\.query|db\.commit|visit\.visit_images' backend/routers/field_officer.py

Repository: RohanExploit/VishwaGuru

Length of output: 10096


Refactor visit image update into a single synchronous database operation.

The upload_visit_images endpoint splits the database session across three different execution contexts:

  1. Line 286–288: Visit is queried in a threadpool worker
  2. Line 357–358: The ORM object is mutated on the event loop (visit.visit_images assignment)
  3. Line 361: Changes are committed in another threadpool worker

SQLAlchemy sessions are thread-unsafe and not designed to handle object state modifications across thread boundaries. The visit object becomes detached from its session between the query and mutation, risking state inconsistency.

Create a synchronous helper function that bundles the query, object mutation, and commit into a single database operation within one thread:

def _update_visit_with_images(visit_id: int, image_paths: list, db: Session):
    visit = db.query(FieldOfficerVisit).filter(FieldOfficerVisit.id == visit_id).first()
    if visit:
        existing = visit.visit_images or []
        existing.extend(image_paths)
        visit.visit_images = existing
        visit.updated_at = datetime.now(timezone.utc)
        db.commit()
    return visit

# In upload_visit_images, after file I/O completes:
await run_in_threadpool(_update_visit_with_images, visit_id, image_paths, db)

This keeps all session-bound operations atomic within a single thread.

Also applies to lines 355–361.

πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routers/field_officer.py` around lines 285 - 288, The current
upload_visit_images flow splits DB work across threads causing session
detachment; create a synchronous helper (e.g.
_update_visit_with_images(visit_id: int, image_paths: list, db: Session)) that
does the query, mutates visit.visit_images and visit.updated_at, and calls
db.commit() all inside the same thread, then invoke it via await
run_in_threadpool(_update_visit_with_images, visit_id, image_paths, db) after
file I/O completes; update references in upload_visit_images (replace the
separate threadpool query, on-event-loop mutation of visit.visit_images, and
separate commit) so the session-bound operations for FieldOfficerVisit are
performed atomically in that helper.


if not visit:
raise HTTPException(status_code=404, detail=f"Visit {visit_id} not found")
Expand Down Expand Up @@ -337,8 +341,12 @@ async def upload_visit_images(
file_path = os.path.join(VISIT_IMAGES_DIR, safe_filename)

# Save file
with open(file_path, 'wb') as f:
f.write(content)
# Performance Optimization: Wrap blocking File I/O in threadpool
def _save_image(p, c):
with open(p, 'wb') as f:
f.write(c)

await run_in_threadpool(_save_image, file_path, content)

# Store relative path
relative_path = os.path.join("data", "visit_images", safe_filename)
Expand All @@ -349,7 +357,8 @@ async def upload_visit_images(
visit.visit_images = existing_images
visit.updated_at = datetime.now(timezone.utc)

db.commit()
# Performance Optimization: Wrap blocking DB commit in threadpool
await run_in_threadpool(db.commit)
Comment on lines 357 to +361
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db.commit is offloaded to the threadpool, but the visit ORM instance being committed was created/modified outside the threadpool (and may have been loaded in a different worker thread). This mixes Session/ORM usage across threads, which SQLAlchemy does not support. Do the load+mutation+commit in the same thread (single threadpool helper) or use an UPDATE ... query in the threadpool with a fresh Session, rather than committing an ORM instance created in another thread.

Copilot uses AI. Check for mistakes.

logger.info(f"Uploaded {len(images)} images for visit {visit_id}")

Expand Down
12 changes: 6 additions & 6 deletions backend/routers/voice.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,8 +265,10 @@ def _save_audio_file():
prev_hash = blockchain_last_hash_cache.get("last_hash")
if prev_hash is None:
# Cache miss: Fetch only the last hash from DB
# Use await run_in_threadpool for DB query if needed, or just do it in-thread
prev_issue = db.query(Issue.integrity_hash).order_by(Issue.id.desc()).first()
# Performance Optimization: Wrap blocking DB query in threadpool
prev_issue = await run_in_threadpool(
lambda: db.query(Issue.integrity_hash).order_by(Issue.id.desc()).first()
)
prev_hash = prev_issue[0] if prev_issue and prev_issue[0] else ""
blockchain_last_hash_cache.set(data=prev_hash, key="last_hash")

Expand Down Expand Up @@ -300,10 +302,8 @@ def _save_audio_file():
audio_file_path=relative_audio_path # Store relative path
)

# Standard synchronous DB operations for simplicity and thread-safety
db.add(new_issue)
db.commit()
db.refresh(new_issue)
# Performance Optimization: Wrap blocking DB operations in threadpool to keep event loop responsive
await run_in_threadpool(save_issue_db, db, new_issue)
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Do not pass the same SQLAlchemy Session instance into run_in_threadpool; use a session created inside the worker thread (or an async session) to avoid cross-thread session usage.

Prompt for AI agents
Check if this issue is valid β€” if so, understand the root cause and fix it. At backend/routers/voice.py, line 306:

<comment>Do not pass the same SQLAlchemy Session instance into `run_in_threadpool`; use a session created inside the worker thread (or an async session) to avoid cross-thread session usage.</comment>

<file context>
@@ -300,10 +302,8 @@ def _save_audio_file():
-        db.commit()
-        db.refresh(new_issue)
+        # Performance Optimization: Wrap blocking DB operations in threadpool to keep event loop responsive
+        await run_in_threadpool(save_issue_db, db, new_issue)
 
         # Update cache for next report AFTER successful DB commit
</file context>
Fix with Cubic


# Update cache for next report AFTER successful DB commit
blockchain_last_hash_cache.set(data=integrity_hash, key="last_hash")
Comment on lines +305 to 309
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

save_issue_db is run in a worker thread using the request-scoped db Session and the new_issue ORM instance, but the async context reads new_issue.id afterwards. Passing SQLAlchemy Sessions/ORM instances across threads is not supported and can break under different DB drivers or when lazy attributes are accessed. Prefer a threadpool helper that creates its own Session and returns the new issue ID (and any other primitives) to the async handler.

Copilot uses AI. Check for mistakes.
Expand Down
17 changes: 13 additions & 4 deletions backend/tasks.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import logging
import json
import os
from fastapi.concurrency import run_in_threadpool
from pywebpush import webpush, WebPushException
from backend.database import SessionLocal
from backend.models import Issue, PushSubscription
Expand All @@ -18,11 +19,14 @@ async def process_action_plan_background(issue_id: int, description: str, catego
action_plan = await generate_action_plan(description, category, language, image_path)

# Update issue in DB
issue = db.query(Issue).filter(Issue.id == issue_id).first()
# Performance Optimization: Wrap blocking DB operations in threadpool
issue = await run_in_threadpool(
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The SessionLocal() is created on the event loop, the query is dispatched to a threadpool, the ORM issue object is mutated back on the event loop, and then db.commit is dispatched to yet another threadpool call. This splits a single SQLAlchemy Session across multiple threads, violating its thread-affinity contract.

Extract all DB work (query + mutation + commit + close) into a single synchronous helper and invoke it once via run_in_threadpool, returning only primitive data to the async context.

Prompt for AI agents
Check if this issue is valid β€” if so, understand the root cause and fix it. At backend/tasks.py, line 23:

<comment>The `SessionLocal()` is created on the event loop, the query is dispatched to a threadpool, the ORM `issue` object is mutated back on the event loop, and then `db.commit` is dispatched to yet another threadpool call. This splits a single SQLAlchemy Session across multiple threads, violating its thread-affinity contract.

Extract all DB work (query + mutation + commit + close) into a single synchronous helper and invoke it once via `run_in_threadpool`, returning only primitive data to the async context.</comment>

<file context>
@@ -18,11 +19,14 @@ async def process_action_plan_background(issue_id: int, description: str, catego
         # Update issue in DB
-        issue = db.query(Issue).filter(Issue.id == issue_id).first()
+        # Performance Optimization: Wrap blocking DB operations in threadpool
+        issue = await run_in_threadpool(
+            lambda: db.query(Issue).filter(Issue.id == issue_id).first()
+        )
</file context>
Fix with Cubic

lambda: db.query(Issue).filter(Issue.id == issue_id).first()
)
if issue:
current_plan = issue.action_plan or {}
issue.action_plan = {**current_plan, **action_plan}
db.commit()
await run_in_threadpool(db.commit)
Comment on lines +22 to +29
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Find async functions that create/use SessionLocal and offload only individual DB operations.

rg -n -C4 'async def|SessionLocal\(|run_in_threadpool\(|db\.commit|db\.close' backend/tasks.py

Repository: RohanExploit/VishwaGuru

Length of output: 2617


Move the entire SQLAlchemy update into one threadpool helper.

The current code creates SessionLocal() on the event loop, queries in a threadpool, mutates the ORM object back on the event loop, commits in a threadpool, and closes synchronously. This violates SQLAlchemy's thread-affinity requirements. Keep the full session lifecycle on a single thread by extracting the database operation into a synchronous helper.

♻️ Proposed refactor
+def _merge_action_plan_into_issue(issue_id: int, action_plan: dict) -> bool:
+    db = SessionLocal()
+    try:
+        issue = db.query(Issue).filter(Issue.id == issue_id).first()
+        if not issue:
+            return False
+
+        current_plan = issue.action_plan or {}
+        issue.action_plan = {**current_plan, **action_plan}
+        db.commit()
+        return True
+    finally:
+        db.close()
+
 async def process_action_plan_background(issue_id: int, description: str, category: str, language: str, image_path: str):
-    db = SessionLocal()
     try:
         # Generate Action Plan (AI)
         action_plan = await generate_action_plan(description, category, language, image_path)
 
         # Update issue in DB
-        # Performance Optimization: Wrap blocking DB operations in threadpool
-        issue = await run_in_threadpool(
-            lambda: db.query(Issue).filter(Issue.id == issue_id).first()
-        )
-        if issue:
-            current_plan = issue.action_plan or {}
-            issue.action_plan = {**current_plan, **action_plan}
-            await run_in_threadpool(db.commit)
+        updated = await run_in_threadpool(_merge_action_plan_into_issue, issue_id, action_plan)
+        if updated:
 
             # Invalidate cache to ensure users get the updated action plan
             recent_issues_cache.clear()
     except Exception as e:
         logger.error(f"Background action plan generation failed for issue {issue_id}: {e}", exc_info=True)
-    finally:
-        db.close()
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Performance Optimization: Wrap blocking DB operations in threadpool
issue = await run_in_threadpool(
lambda: db.query(Issue).filter(Issue.id == issue_id).first()
)
if issue:
current_plan = issue.action_plan or {}
issue.action_plan = {**current_plan, **action_plan}
db.commit()
await run_in_threadpool(db.commit)
def _merge_action_plan_into_issue(issue_id: int, action_plan: dict) -> bool:
db = SessionLocal()
try:
issue = db.query(Issue).filter(Issue.id == issue_id).first()
if not issue:
return False
current_plan = issue.action_plan or {}
issue.action_plan = {**current_plan, **action_plan}
db.commit()
return True
finally:
db.close()
async def process_action_plan_background(issue_id: int, description: str, category: str, language: str, image_path: str):
try:
# Generate Action Plan (AI)
action_plan = await generate_action_plan(description, category, language, image_path)
# Update issue in DB
updated = await run_in_threadpool(_merge_action_plan_into_issue, issue_id, action_plan)
if updated:
# Invalidate cache to ensure users get the updated action plan
recent_issues_cache.clear()
except Exception as e:
logger.error(f"Background action plan generation failed for issue {issue_id}: {e}", exc_info=True)
πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tasks.py` around lines 22 - 29, The current code mixes async and sync
SQLAlchemy operations; wrap the entire DB lifecycle and update in one
synchronous helper so the Session never crosses threads: create a function
(e.g., _update_issue_action_plan(issue_id, action_plan)) that instantiates the
session, queries Issue by id, merges/sets issue.action_plan =
{**(issue.action_plan or {}), **action_plan}, commits, and always closes the
session in a finally block, then call it via await
run_in_threadpool(_update_issue_action_plan, issue_id, action_plan) instead of
calling query/commit/close separately on the event loop.


Comment on lines +22 to 30
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db (a SQLAlchemy Session) is created on the event-loop thread but then used inside run_in_threadpool, and the returned Issue ORM instance is subsequently mutated outside the threadpool. SQLAlchemy Sessions/ORM instances are not thread-safe, and crossing thread boundaries like this can lead to subtle corruption or lazy-load failures. Keep the entire DB unit-of-work (load + update + commit) inside a single threadpool function that creates/uses/closes its own SessionLocal, and only return primitive data back to the async context.

Copilot uses AI. Check for mistakes.
# Invalidate cache to ensure users get the updated action plan
recent_issues_cache.clear()
Expand All @@ -31,8 +35,13 @@ async def process_action_plan_background(issue_id: int, description: str, catego
finally:
db.close()

async def create_grievance_from_issue_background(issue_id: int):
"""Background task to create a grievance from an issue for escalation management"""
def create_grievance_from_issue_background(issue_id: int):
"""
Background task to create a grievance from an issue for escalation management.
Performance Optimization: Changed to synchronous function since it only performs
blocking DB operations, allowing FastAPI to run it in a threadpool automatically
when added via BackgroundTasks.
"""
db = SessionLocal()
try:
# Get the issue
Expand Down
Loading