Add Semantic Caching and CrewAI Config#4548
Add Semantic Caching and CrewAI Config#4548rudransh-shrivastava wants to merge 1 commit intoOWASP:feature/nestbot-ai-assistantfrom
Conversation
Summary by CodeRabbit
WalkthroughThis pull request introduces semantic caching functionality for the Nestbot AI assistant. A new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
backend/apps/ai/common/crewai_config.py (1)
6-12: Consider supporting environment-based configuration.The configuration values are currently hardcoded. For production flexibility, consider reading from Django settings or environment variables to allow tuning without code changes.
Example approach:
from django.conf import settings `@dataclass` class CrewAIConfig: semantic_cache_enabled: bool = getattr(settings, "SEMANTIC_CACHE_ENABLED", True) semantic_cache_similarity_threshold: float = getattr( settings, "SEMANTIC_CACHE_SIMILARITY_THRESHOLD", 0.95 ) semantic_cache_ttl_seconds: int = getattr( settings, "SEMANTIC_CACHE_TTL_SECONDS", 86400 )This would allow operators to disable caching or adjust thresholds per environment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/apps/ai/common/crewai_config.py` around lines 6 - 12, Update CrewAIConfig to read defaults from configuration instead of hardcoding: replace the fixed defaults on CrewAIConfig (semantic_cache_enabled, semantic_cache_similarity_threshold, semantic_cache_ttl_seconds) with values obtained from Django settings or environment variables (e.g., using getattr(settings, "...", <default>) or os.environ with parsing), so instances of CrewAIConfig use settings-driven values; ensure you import django.conf.settings (or os) and keep the same attribute names and types to avoid breaking callers.backend/apps/ai/models/semantic_cache.py (2)
20-28: Consider adding a vector index for similarity search performance.As the cache grows, vector similarity searches will slow down without an index. pgvector supports HNSW or IVFFlat indexes for approximate nearest neighbor search.
This can be added in a future migration when the cache reaches meaningful size:
class Meta: indexes = [ HnswIndex( name="semantic_cache_embedding_idx", fields=["query_embedding"], m=16, ef_construction=64, opclasses=["vector_cosine_ops"], ), ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/apps/ai/models/semantic_cache.py` around lines 20 - 28, Add a vector index to improve similarity search performance for the Semantic Cache model: update the model Meta to include an indexes list that adds a pgvector-compatible approximate nearest neighbor index (e.g., HnswIndex) on the query_embedding field, referencing the model Meta class, the query_embedding VectorField, and naming the index like "semantic_cache_embedding_idx"; implement this as a new migration (not inline in the existing model file) so you can tune parameters (m, ef_construction) and opclasses (e.g., "vector_cosine_ops") when the cache size warrants it.
59-67: Embedding computed on every lookup, regardless of cache state.The
get_embedder().embed_query(query)call happens during queryset construction, meaning an API call to OpenAI occurs even if the cache table is empty or all entries are expired. For high-traffic scenarios, consider short-circuiting with an existence check first.However, this is a minor optimization that can be deferred.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/apps/ai/models/semantic_cache.py` around lines 59 - 67, The code currently calls get_embedder().embed_query(query) while building the queryset which triggers an embedding API call on every lookup; update the logic in SemanticCache lookup to first check for any candidate rows (e.g., use SemanticCache.objects.filter(nest_created_at__gte=ttl_cutoff).exists() or count) and only call get_embedder().embed_query(query) and annotate with CosineDistance when candidates exist, then apply .filter(distance__lte=max_distance).order_by("distance").first(); reference SemanticCache, ttl_cutoff, max_distance, CosineDistance, and get_embedder().embed_query(query) when making the change.backend/apps/ai/flows/assistant.py (1)
282-282: Extract duplicate literal to a constant.The string
"Failed to store semantic cache entry"is duplicated 3 times (lines 282, 311, 425). Extract it to a module-level constant for maintainability.♻️ Proposed fix
Add near the top of the file:
CACHE_STORE_ERROR_MSG = "Failed to store semantic cache entry"Then replace each occurrence:
- logger.exception("Failed to store semantic cache entry") + logger.exception(CACHE_STORE_ERROR_MSG)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/apps/ai/flows/assistant.py` at line 282, Introduce a module-level constant CACHE_STORE_ERROR_MSG = "Failed to store semantic cache entry" near the top of the file and replace the three duplicated logger.exception calls that use the literal (the occurrences invoking logger.exception("Failed to store semantic cache entry")) with logger.exception(CACHE_STORE_ERROR_MSG) so all references use the new constant; ensure the constant name is imported/visible in the same module scope where logger.exception is called.backend/apps/ai/migrations/0011_semanticcache.py (1)
22-41: Add an index for TTL-based cache filtering.Given semantic cache lookups typically filter by recency, add an index on
nest_created_at(and optionallyintentif frequently filtered) to avoid full scans as cache volume grows.Suggested index in model options
options={ "verbose_name": "Semantic Cache", "db_table": "ai_semantic_cache", + "indexes": [ + models.Index(fields=["nest_created_at"], name="ai_semcache_created_idx"), + ], },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/apps/ai/migrations/0011_semanticcache.py` around lines 22 - 41, Add a DB index for TTL-based cache queries by updating the CreateModel options in this migration to include an index on nest_created_at (and optionally a composite index on ["intent", "nest_created_at"] if intent is often filtered). Specifically, inside the CreateModel for the Semantic Cache (db_table "ai_semantic_cache"), add an indexes list with models.Index(fields=["nest_created_at"], name="ai_semantic_cache_nest_created_at_idx") and optionally models.Index(fields=["intent", "nest_created_at"], name="ai_semantic_cache_intent_nest_created_at_idx") so queries filtering by recency (and intent) use the index instead of full scans.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/apps/ai/flows/assistant.py`:
- Around line 303-313: The current flow calls handle_collaborative_query(...)
and only returns response inside the try/except else, so a failure in
store_cached_response prevents returning the collaborative response; update the
block around response and store_cached_response (the variables/funcs: response,
handle_collaborative_query, store_cached_response) so that you attempt to store
the cache inside a try/except but always return the response afterwards
regardless of cache storage success (i.e., move the return response out of the
try/except or add an unconditional return response after the exception
handling).
In `@backend/apps/ai/semantic_cache.py`:
- Around line 30-53: store_cached_response currently always writes to the DB
even when caching is disabled; update the function (store_cached_response) to
check _config.semantic_cache_enabled at the start and if false skip storing and
return None (change return type to Optional[SemanticCache]), otherwise call
SemanticCache.store_response as before; ensure callers can accept None (note
assistant.py callers are in try/except and ignore the return value) and update
any type hints/imports affected.
---
Nitpick comments:
In `@backend/apps/ai/common/crewai_config.py`:
- Around line 6-12: Update CrewAIConfig to read defaults from configuration
instead of hardcoding: replace the fixed defaults on CrewAIConfig
(semantic_cache_enabled, semantic_cache_similarity_threshold,
semantic_cache_ttl_seconds) with values obtained from Django settings or
environment variables (e.g., using getattr(settings, "...", <default>) or
os.environ with parsing), so instances of CrewAIConfig use settings-driven
values; ensure you import django.conf.settings (or os) and keep the same
attribute names and types to avoid breaking callers.
In `@backend/apps/ai/flows/assistant.py`:
- Line 282: Introduce a module-level constant CACHE_STORE_ERROR_MSG = "Failed to
store semantic cache entry" near the top of the file and replace the three
duplicated logger.exception calls that use the literal (the occurrences invoking
logger.exception("Failed to store semantic cache entry")) with
logger.exception(CACHE_STORE_ERROR_MSG) so all references use the new constant;
ensure the constant name is imported/visible in the same module scope where
logger.exception is called.
In `@backend/apps/ai/migrations/0011_semanticcache.py`:
- Around line 22-41: Add a DB index for TTL-based cache queries by updating the
CreateModel options in this migration to include an index on nest_created_at
(and optionally a composite index on ["intent", "nest_created_at"] if intent is
often filtered). Specifically, inside the CreateModel for the Semantic Cache
(db_table "ai_semantic_cache"), add an indexes list with
models.Index(fields=["nest_created_at"],
name="ai_semantic_cache_nest_created_at_idx") and optionally
models.Index(fields=["intent", "nest_created_at"],
name="ai_semantic_cache_intent_nest_created_at_idx") so queries filtering by
recency (and intent) use the index instead of full scans.
In `@backend/apps/ai/models/semantic_cache.py`:
- Around line 20-28: Add a vector index to improve similarity search performance
for the Semantic Cache model: update the model Meta to include an indexes list
that adds a pgvector-compatible approximate nearest neighbor index (e.g.,
HnswIndex) on the query_embedding field, referencing the model Meta class, the
query_embedding VectorField, and naming the index like
"semantic_cache_embedding_idx"; implement this as a new migration (not inline in
the existing model file) so you can tune parameters (m, ef_construction) and
opclasses (e.g., "vector_cosine_ops") when the cache size warrants it.
- Around line 59-67: The code currently calls get_embedder().embed_query(query)
while building the queryset which triggers an embedding API call on every
lookup; update the logic in SemanticCache lookup to first check for any
candidate rows (e.g., use
SemanticCache.objects.filter(nest_created_at__gte=ttl_cutoff).exists() or count)
and only call get_embedder().embed_query(query) and annotate with CosineDistance
when candidates exist, then apply
.filter(distance__lte=max_distance).order_by("distance").first(); reference
SemanticCache, ttl_cutoff, max_distance, CosineDistance, and
get_embedder().embed_query(query) when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 42db570c-b5aa-430e-82cd-9e660f4ad7ad
📒 Files selected for processing (9)
backend/apps/ai/admin.pybackend/apps/ai/common/crewai_config.pybackend/apps/ai/flows/assistant.pybackend/apps/ai/migrations/0011_semanticcache.pybackend/apps/ai/models/__init__.pybackend/apps/ai/models/chunk.pybackend/apps/ai/models/semantic_cache.pybackend/apps/ai/semantic_cache.pydocker-compose/local/compose.yaml
| response = handle_collaborative_query(query, query_analysis["sub_queries"]) | ||
| if response: | ||
| try: | ||
| store_cached_response( | ||
| query=query, | ||
| response=response, | ||
| ) | ||
| except Exception: | ||
| logger.exception("Failed to store semantic cache entry") | ||
| else: | ||
| return response |
There was a problem hiding this comment.
Bug: Collaborative flow response not returned on cache storage failure.
The else clause at line 312-313 is attached to the inner try/except block, not the if response: block. This means:
- If
store_cached_responsesucceeds → returnsresponse✓ - If
store_cached_responsefails → logs exception, falls through to Step 4 (routing), potentially returning a different response ✗
The response should be returned regardless of cache storage outcome.
🐛 Proposed fix
if not query_analysis["is_simple"] and len(query_analysis["sub_queries"]) > 1:
try:
response = handle_collaborative_query(query, query_analysis["sub_queries"])
if response:
try:
store_cached_response(
query=query,
response=response,
)
except Exception:
logger.exception("Failed to store semantic cache entry")
- else:
- return response
+ return response
except Exception:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| response = handle_collaborative_query(query, query_analysis["sub_queries"]) | |
| if response: | |
| try: | |
| store_cached_response( | |
| query=query, | |
| response=response, | |
| ) | |
| except Exception: | |
| logger.exception("Failed to store semantic cache entry") | |
| else: | |
| return response | |
| response = handle_collaborative_query(query, query_analysis["sub_queries"]) | |
| if response: | |
| try: | |
| store_cached_response( | |
| query=query, | |
| response=response, | |
| ) | |
| except Exception: | |
| logger.exception("Failed to store semantic cache entry") | |
| return response |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/apps/ai/flows/assistant.py` around lines 303 - 313, The current flow
calls handle_collaborative_query(...) and only returns response inside the
try/except else, so a failure in store_cached_response prevents returning the
collaborative response; update the block around response and
store_cached_response (the variables/funcs: response,
handle_collaborative_query, store_cached_response) so that you attempt to store
the cache inside a try/except but always return the response afterwards
regardless of cache storage success (i.e., move the return response out of the
try/except or add an unconditional return response after the exception
handling).
| def store_cached_response( | ||
| query: str, | ||
| response: str, | ||
| intent: str = "", | ||
| confidence: float = 0.0, | ||
| ) -> SemanticCache: | ||
| """Store query-response pair in semantic cache. | ||
|
|
||
| Args: | ||
| query: Original query text. | ||
| response: Generated response text. | ||
| intent: Classified intent for the query. | ||
| confidence: Router confidence score. | ||
|
|
||
| Returns: | ||
| Created SemanticCache instance. | ||
|
|
||
| """ | ||
| return SemanticCache.store_response( | ||
| query=query, | ||
| response=response, | ||
| intent=intent, | ||
| confidence=confidence, | ||
| ) |
There was a problem hiding this comment.
store_cached_response ignores the semantic_cache_enabled flag.
Unlike get_cached_response which checks _config.semantic_cache_enabled, the store function always writes to the database. This means the cache table will grow unboundedly even when caching is disabled.
🐛 Proposed fix
def store_cached_response(
query: str,
response: str,
intent: str = "",
confidence: float = 0.0,
) -> SemanticCache:
+) -> SemanticCache | None:
"""Store query-response pair in semantic cache.
Args:
query: Original query text.
response: Generated response text.
intent: Classified intent for the query.
confidence: Router confidence score.
Returns:
- Created SemanticCache instance.
+ Created SemanticCache instance, or None if caching is disabled.
"""
+ if not _config.semantic_cache_enabled:
+ return None
+
return SemanticCache.store_response(
query=query,
response=response,
intent=intent,
confidence=confidence,
)Note: This changes the return type, so callers should handle None. Currently, all callers in assistant.py are inside try/except blocks and don't use the return value, so this is safe.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/apps/ai/semantic_cache.py` around lines 30 - 53,
store_cached_response currently always writes to the DB even when caching is
disabled; update the function (store_cached_response) to check
_config.semantic_cache_enabled at the start and if false skip storing and return
None (change return type to Optional[SemanticCache]), otherwise call
SemanticCache.store_response as before; ensure callers can accept None (note
assistant.py callers are in try/except and ignore the return value) and update
any type hints/imports affected.
There was a problem hiding this comment.
2 issues found across 9 files
Confidence score: 3/5
- There is concrete user-impact risk in
backend/apps/ai/flows/assistant.py: if cache storage throws, a valid collaborative response can be dropped and the query gets re-routed instead of returning the successful result. backend/apps/ai/semantic_cache.pyappears to ignore_config.semantic_cache_enabledinstore_cached_response, so writes may still occur when semantic caching is disabled, creating behavior inconsistency and unexpected DB churn.- Given two medium-severity, high-confidence issues (both 6/10 with strong confidence), this carries some merge risk until error-handling and config-gating behavior are aligned.
- Pay close attention to
backend/apps/ai/flows/assistant.py,backend/apps/ai/semantic_cache.py- ensure cache failures are non-blocking for responses and disabled cache mode does not perform writes.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="backend/apps/ai/flows/assistant.py">
<violation number="1" location="backend/apps/ai/flows/assistant.py:275">
P2: Return the collaborative response even when cache storage fails; otherwise a cache exception causes a valid response to be dropped and the query is re-routed.</violation>
</file>
<file name="backend/apps/ai/semantic_cache.py">
<violation number="1" location="backend/apps/ai/semantic_cache.py:48">
P2: `store_cached_response` does not check `_config.semantic_cache_enabled`, unlike `get_cached_response` which guards on this flag. When semantic caching is disabled, cache entries will still be written to the database, causing unbounded table growth. Add an early return when caching is disabled.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| if response: | ||
| try: | ||
| store_cached_response( | ||
| query=query, | ||
| response=response, | ||
| ) | ||
| except Exception: | ||
| logger.exception("Failed to store semantic cache entry") | ||
| return response |
There was a problem hiding this comment.
P2: Return the collaborative response even when cache storage fails; otherwise a cache exception causes a valid response to be dropped and the query is re-routed.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/apps/ai/flows/assistant.py, line 275:
<comment>Return the collaborative response even when cache storage fails; otherwise a cache exception causes a valid response to be dropped and the query is re-routed.</comment>
<file context>
@@ -258,12 +266,21 @@ def process_query( # noqa: PLR0911
channel_id=channel_id,
is_channel_suggestion=True,
)
+ if response:
+ try:
+ store_cached_response(
</file context>
| if response: | |
| try: | |
| store_cached_response( | |
| query=query, | |
| response=response, | |
| ) | |
| except Exception: | |
| logger.exception("Failed to store semantic cache entry") | |
| return response | |
| if response: | |
| try: | |
| store_cached_response( | |
| query=query, | |
| response=response, | |
| ) | |
| except Exception: | |
| logger.exception("Failed to store semantic cache entry") | |
| return response |
| return SemanticCache.store_response( | ||
| query=query, | ||
| response=response, | ||
| intent=intent, | ||
| confidence=confidence, |
There was a problem hiding this comment.
P2: store_cached_response does not check _config.semantic_cache_enabled, unlike get_cached_response which guards on this flag. When semantic caching is disabled, cache entries will still be written to the database, causing unbounded table growth. Add an early return when caching is disabled.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/apps/ai/semantic_cache.py, line 48:
<comment>`store_cached_response` does not check `_config.semantic_cache_enabled`, unlike `get_cached_response` which guards on this flag. When semantic caching is disabled, cache entries will still be written to the database, causing unbounded table growth. Add an early return when caching is disabled.</comment>
<file context>
@@ -0,0 +1,53 @@
+ Created SemanticCache instance.
+
+ """
+ return SemanticCache.store_response(
+ query=query,
+ response=response,
</file context>
| return SemanticCache.store_response( | |
| query=query, | |
| response=response, | |
| intent=intent, | |
| confidence=confidence, | |
| if not _config.semantic_cache_enabled: | |
| return None | |
| return SemanticCache.store_response( | |
| query=query, | |
| response=response, | |
| intent=intent, | |
| confidence=confidence, | |
| ) |



Proposed change
Resolves #4532
Add:
Checklist
make check-testlocally: all warnings addressed, tests passed