Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,3 +85,7 @@
## 2026-05-16 - Pre-processing for RAG Retrieval
**Learning:** In RAG (Retrieval-Augmented Generation) systems with static or semi-static policy datasets, performing tokenization, regex substitution, and string formatting inside the retrieval loop is a significant bottleneck that scales with the number of policies.
**Action:** Move all deterministic operations (tokenization, formatting, regex matching prep) to a one-time initialization step to ensure the retrieval hot-path only performs necessary set intersections and similarity calculations.

## 2026-05-17 - Redundant Configuration I/O in Service Initialization
**Learning:** Instantiating services that read static configuration files from disk on every `__init__` call is a significant performance bottleneck, especially in high-traffic or frequent background task scenarios. This adds unnecessary latency and disk I/O on every request or task execution.
**Action:** Implement class-level or module-level caching for static configuration files to ensure they are only read and parsed once per process lifetime, significantly reducing initialization overhead.
11 changes: 9 additions & 2 deletions backend/grievance_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,22 @@ class GrievanceService:
Main service for managing grievances, routing, and escalations.
"""

# Class-level cache to avoid redundant disk I/O when instantiating the service
_rules_cache = {}

def __init__(self, rules_config_path: str = "backend/grievance_rules.json"):
"""
Initialize the grievance service.

Args:
rules_config_path: Path to the rules configuration file
"""
with open(rules_config_path, 'r') as f:
self.rules_config = json.load(f)
# Optimized: Use class-level cache to avoid reading and parsing the JSON file repeatedly
if rules_config_path not in GrievanceService._rules_cache:
with open(rules_config_path, 'r') as f:
GrievanceService._rules_cache[rules_config_path] = json.load(f)
Comment on lines +25 to +38
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_rules_cache is a process-wide mutable dict that is populated via a check-then-set sequence without any synchronization. If this service can be instantiated concurrently (threads/background jobs), consider guarding cache population with a lock or using the existing ThreadSafeCache utility to avoid concurrent loads and future race-prone mutations.

Copilot uses AI. Check for mistakes.

self.rules_config = GrievanceService._rules_cache[rules_config_path]
Comment on lines +25 to +40
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Cached rules_config is shared by reference across all instances — risk of cross-instance mutation, plus address Ruff RUF012.

self.rules_config on every GrievanceService instance now points at the same dict object stored in _rules_cache. The references handed off to RoutingService (backend/routing_service.py:18-26) and EscalationEngine (backend/escalation_engine.py:24-36) are stored without defensive copies, so any in-place mutation (now or in the future — e.g., a downstream setdefault, .update(), or appending to a nested list) silently leaks into the cache and into every subsequent instance, including across tests. This is a subtle footgun introduced by the optimization that didn't exist when each instance owned its own parsed copy.

Additionally, Ruff RUF012 flags _rules_cache = {} as a mutable class default; a type annotation both silences the lint and documents intent.

Two reasonable fixes (pick one):

♻️ Option A — deep-copy on read (preserves isolation, parsed once)
-import json
+import copy
+import json
@@
-    # Class-level cache to avoid redundant disk I/O when instantiating the service
-    _rules_cache = {}
+    # Class-level cache of parsed rules JSON, keyed by config path, to avoid
+    # redundant disk I/O across GrievanceService instantiations.
+    _rules_cache: Dict[str, Dict[str, Any]] = {}
@@
-        # Optimized: Use class-level cache to avoid reading and parsing the JSON file repeatedly
-        if rules_config_path not in GrievanceService._rules_cache:
-            with open(rules_config_path, 'r') as f:
-                GrievanceService._rules_cache[rules_config_path] = json.load(f)
-
-        self.rules_config = GrievanceService._rules_cache[rules_config_path]
+        # Cache the parsed JSON once per path; deep-copy on read so per-instance
+        # mutations cannot leak into the shared cache or other instances.
+        if rules_config_path not in GrievanceService._rules_cache:
+            with open(rules_config_path, 'r', encoding='utf-8') as f:
+                GrievanceService._rules_cache[rules_config_path] = json.load(f)
+
+        self.rules_config = copy.deepcopy(GrievanceService._rules_cache[rules_config_path])
♻️ Option B — cache the raw JSON string and re-parse (simpler, still ~order-of-magnitude faster than disk I/O)
-    # Class-level cache to avoid redundant disk I/O when instantiating the service
-    _rules_cache = {}
+    # Class-level cache of raw JSON text, keyed by path, to avoid redundant disk I/O.
+    _rules_cache: Dict[str, str] = {}
@@
-        if rules_config_path not in GrievanceService._rules_cache:
-            with open(rules_config_path, 'r') as f:
-                GrievanceService._rules_cache[rules_config_path] = json.load(f)
-
-        self.rules_config = GrievanceService._rules_cache[rules_config_path]
+        if rules_config_path not in GrievanceService._rules_cache:
+            with open(rules_config_path, 'r', encoding='utf-8') as f:
+                GrievanceService._rules_cache[rules_config_path] = f.read()
+
+        self.rules_config = json.loads(GrievanceService._rules_cache[rules_config_path])

If you can guarantee that no downstream consumer ever mutates rules_config, the current code is safe — but that's an unwritten invariant that will be easy to break later. At minimum, please add the type annotation to address RUF012.

🧰 Tools
🪛 Ruff (0.15.11)

[warning] 26-26: Mutable default value for class attribute

(RUF012)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/grievance_service.py` around lines 25 - 40, GrievanceService
currently stores a mutable dict in the class-level _rules_cache and assigns it
directly to self.rules_config in __init__, which risks cross-instance mutations
(and triggers Ruff RUF012 for a mutable class default); fix by either (A)
storing a deep copy into self.rules_config when handing out the cached value
(use copy.deepcopy on GrievanceService._rules_cache[rules_config_path] before
assignment) so the cached parsed object is never mutated, or (B) cache the raw
JSON string in _rules_cache and json.loads it per-instance (i.e., cache the file
content string, then parse into a fresh dict for self.rules_config), and add a
type annotation for _rules_cache to silence RUF012; ensure callers like
RoutingService and EscalationEngine receive the per-instance dict, not the
shared reference.

Comment on lines +35 to +40
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class-level cache changes rules_config from per-instance to a shared mutable dict across all GrievanceService instances. To avoid cross-instance coupling (and accidental mutation impacting other requests), consider storing an immutable view in the cache (e.g., types.MappingProxyType) or returning a defensive copy per instance (e.g., copy.deepcopy(...)).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The cached rules_config dict is now shared by reference across all GrievanceService instances. Any in-place mutation by a downstream consumer (e.g., RoutingService or EscalationEngine calling .update(), .setdefault(), or modifying a nested list) will silently leak into the cache and every subsequent instance. The original code gave each instance its own parsed copy. Use copy.deepcopy() on read or cache the raw JSON string and re-parse per instance to preserve isolation.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/grievance_service.py, line 40:

<comment>The cached `rules_config` dict is now shared by reference across all `GrievanceService` instances. Any in-place mutation by a downstream consumer (e.g., `RoutingService` or `EscalationEngine` calling `.update()`, `.setdefault()`, or modifying a nested list) will silently leak into the cache and every subsequent instance. The original code gave each instance its own parsed copy. Use `copy.deepcopy()` on read or cache the raw JSON string and re-parse per instance to preserve isolation.</comment>

<file context>
@@ -22,15 +22,22 @@ class GrievanceService:
+            with open(rules_config_path, 'r') as f:
+                GrievanceService._rules_cache[rules_config_path] = json.load(f)
+
+        self.rules_config = GrievanceService._rules_cache[rules_config_path]
 
         self.routing_service = RoutingService(self.rules_config)
</file context>
Fix with Cubic


self.routing_service = RoutingService(self.rules_config)
self.sla_service = SLAConfigService(
Expand Down
Loading