Skip to content

Add Lookup System for Reference Data Enrichment in Prompts#1780

Open
gaya3-zipstack wants to merge 28 commits intomainfrom
feature/look-ups
Open

Add Lookup System for Reference Data Enrichment in Prompts#1780
gaya3-zipstack wants to merge 28 commits intomainfrom
feature/look-ups

Conversation

@gaya3-zipstack
Copy link
Contributor

@gaya3-zipstack gaya3-zipstack commented Feb 5, 2026

What

  • Adds a comprehensive Lookup System for document-based reference data lookups and variable replacement in prompts
  • New lookup Django app with complete backend models, services, and APIs
  • Full frontend UI for managing lookup projects, profiles, templates, and reference data
  • Integration with Prompt Studio for prompt-level lookup variable replacement

Why

  • Enable users to upload reference documents (CSVs, JSON, PDFs) and automatically look up values during prompt execution
  • Provides RAG-based retrieval to match extracted document content against reference data
  • Allows standardization/normalization of extracted values (e.g., mapping company name variations to canonical names)
  • Supports data enrichment workflows where extracted values need to be augmented with reference data

How

Backend (backend/lookup/)

Models:

  • LookupProject - Container for lookup configurations
  • LookupDataSource - Reference data file storage and metadata
  • LookupProfileManager - Adapter configurations (X2Text, Embedding, VectorDB, LLM)
  • LookupIndexManager - Vector DB index tracking
  • LookupPromptTemplate - Template configurations for lookup execution
  • PromptStudioLookupLink - Links Prompt Studio projects to Lookup projects
  • LookupExecutionAudit - Execution history and debugging

Services:

  • IndexingService - Document indexing with chunking and vector embeddings
  • LookupExecutor - Executes lookups using RAG retrieval
  • LookupOrchestrator - Coordinates lookup workflow
  • VectorDBCleanupService - Manages vector DB lifecycle
  • AuditLogger - Execution logging and tracking

Integrations:

  • LLMWhisperer client for document extraction
  • Redis caching for performance
  • MinIO/file storage for reference data
  • Vector DB integration for semantic search

Frontend (frontend/src/components/lookups/)

  • LookUpProjectList - Project listing and management
  • LookUpProjectDetail - Project detail view with tabs
  • ProfileManagementTab - Adapter profile configuration
  • ReferenceDataTab - Reference data file management
  • TemplateTab - Lookup template configuration
  • LinkedProjectsTab - Prompt Studio project linking
  • ExecutionHistoryTab - Execution audit viewing
  • DebugTab - Testing and debugging interface

Prompt Studio Integration

  • Added lookup_project field to ToolStudioPrompt model
  • New UI indicator for prompts with lookup replacements configured
  • Variable replacement during prompt execution using lookup results

Can this PR break any existing features. If yes, please list possible items. If no, please explain why.

Low risk of breaking existing features:

  • The Lookup system is a new, isolated Django app with its own models and endpoints
  • Prompt Studio integration is additive (new optional field lookup_project)
  • Frontend components are new routes/pages, not modifications to existing UI
  • Database changes are new tables via migrations, no modifications to existing schema
  • Backend URL routes are added to urls_v2.py without modifying existing endpoints

Database Migrations

  • 0001_initial.py - Creates all lookup tables (LookupProject, LookupDataSource, LookupProfileManager, LookupIndexManager, LookupPromptTemplate, PromptStudioLookupLink, LookupExecutionAudit)
  • 0002_remove_reference_data_type.py - Schema refinement
  • 0003_add_file_execution_id.py - Adds file execution tracking
  • 0004_fix_file_paths_for_minio.py - MinIO storage path fixes
  • 0005_add_reindex_required_field.py - Adds reindex tracking
  • prompt_studio_v2/0014_toolstudioprompt_lookup_project.py - Adds lookup_project FK to prompts

Env Config

No new required environment variables. Uses existing adapter configurations (X2Text, Embedding, VectorDB, LLM).

Relevant Docs

  • docs/lookup_architecture_final.md - Complete architecture design

Related Issues or PRs

  • Feature request for reference data lookups and variable enrichment

Dependencies Versions

No new external dependencies added.

Notes on Testing

  • Unit tests added in backend/lookup/tests/
    • API tests for all endpoints
    • Service tests for indexing, execution, caching
    • Integration tests for adapters
  • Manual testing recommended:
    1. Create a Lookup Project
    2. Configure a Profile with adapters
    3. Upload reference data (CSV/JSON)
    4. Create and link to Prompt Studio project
    5. Execute prompt with lookup variable
    6. Verify lookup replacement in output

Screenshots

Lookups-Screenshots.zip

Checklist

I have read and understood the Contribution Guidelines.

🤖 Generated with Claude Code

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 5, 2026

Summary by CodeRabbit

  • New Features

    • Look-Up enrichment: automatic data augmentation using reference data and LLMs (toggleable, timeout-configurable).
    • Reference data management: upload, extraction, versioning, and semantic indexing with audit trails.
    • Profile-based adapters: per-project LLM/embedding/vector-store profiles for flexible enrichment.
    • Prompt Studio integration: per-prompt lookup selection, available-lookups list, and seamless enrichment in workflows.
    • Frontend UX: Look-Up project creation, Debug tab, replacement indicator, logs/stage styling, and settings UI entry.
  • Documentation

    • Added architecture overview for the Look-Up system.

Walkthrough

Adds a full Look-Up system: Django models, migrations, REST APIs, services (indexing, retrieval, orchestration), LLM and storage integrations, Prompt Studio linkage, frontend UI for managing/debugging lookups, extensive tests, and configuration flags for automatic enrichment and timeouts.

Changes

Cohort / File(s) Summary
Backend settings & routing
backend/backend/settings/base.py, backend/backend/urls.py, backend/backend/urls_v2.py
Added LOOKUP_AUTO_ENRICH_ENABLED and LOOKUP_ENRICHMENT_TIMEOUT; registered lookup app in shared apps; mounted lookup.urls in main URL configs.
App bootstrap & constants
backend/lookup/apps.py, backend/lookup/constants.py, backend/lookup/__init__.py
New app config and centralized constants for profile keys/errors; package initializer added.
Exceptions
backend/lookup/exceptions.py
Introduced LookupError hierarchy (ExtractionNotCompleteError, ContextWindowExceededError, etc.) with informative messages and payloads.
Integrations (LLM, extraction, storage, cache)
backend/lookup/integrations/...
.../llm_provider.py, .../unstract_llm_client.py, .../llmwhisperer_client.py, .../file_storage_client.py, .../storage_client.py, .../redis_cache.py
Multiple integration clients: Unstract/OpenAI/Anthropic wrappers, Unstract SDK client, LLMWhisperer extraction client (polling/status/result), File/Remote storage clients, and Redis-backed LLM cache with memory fallback. Review error handling, timeouts, and external-call isolation.
Models & migrations (core)
backend/lookup/models/..., backend/lookup/migrations/*
Added LookupProject, LookupPromptTemplate, LookupDataSource (versioning signals), LookupProfileManager, LookupIndexManager, LookupExecutionAudit, PromptStudioLookupLink, plus initial and follow-up migrations (field removals, data fixes, reindex flag, file path migration, index field). Verify constraints, signals, and index creation.
Models API surface
backend/lookup/models/__init__.py
Aggregated model exports via all.
Services — core orchestration
backend/lookup/services/lookup_executor.py, .../lookup_orchestrator.py, .../variable_resolver.py, .../execution_context.py
New LookUpExecutor, LookUpOrchestrator, variable resolver, and execution context dataclass enabling RAG/full-context execution, parallel orchestration, and variable substitution logic.
Services — indexing & retrieval
backend/lookup/services/indexing_service.py, .../lookup_retrieval_service.py, .../reference_data_loader.py, .../document_indexing_service.py
Extraction + indexing flow (x2text -> embeddings -> vector DB), retrieval service for RAG, reference data loader and document indexing status/caching. Pay attention to external SDK calls and idempotency.
Services — support & management
backend/lookup/services/audit_logger.py, .../enrichment_merger.py, .../log_emitter.py, .../llm_cache.py, .../lookup_index_helper.py, .../vector_db_cleanup_service.py, .../document_indexing_service.py, .../mock_clients.py
Audit logging, enrichment merging (conflict resolution), dual-target logging (WebSocket + Redis), in-memory LLM cache, index manager helpers, vector DB cleanup workflows, and test mock clients. Check transactional updates and cleanup safety.
Integration entrypoint
backend/lookup/services/lookup_integration_service.py, .../workflow_integration.py
LookupIntegrationService.enrich_if_linked orchestration (timeout handling, parallel enrichment) and workflow-level enrichment integration; includes LLMClientWrapper. Review timeout/threading and error propagation boundaries.
REST API: serializers & views
backend/lookup/serializers.py, backend/lookup/views.py, backend/lookup/urls.py
Comprehensive serializers and viewsets for projects, templates, profiles, data sources, audits, debug endpoints; upload/indexing/execution endpoints; router registration under /lookup/. Pay attention to validation, default-profile logic, and side-effecting endpoints (indexing, cleanup).
Prompt Studio integration
backend/prompt_studio/...
prompt_studio_v2/models.py, .../migrations/0014_*, .../views.py, .../serializers.py, .../output_manager_v2/*, prompt_studio_core_v2/views.py
Added optional lookup_project FK on prompts, available_lookups endpoint, serializers/validation, automatic enrichment wiring in output manager, and LookupEnrichmentError handling in prompt response flow. Confirm graceful degradation when lookup app absent.
Workflow manager integration
backend/workflow_manager/workflow_v2/file_execution_tasks.py
Attempt to enrich final workflow outputs via LookupWorkflowIntegration; integration guarded for missing lookup app and errors.
Frontend — Look-Up UI & integrations
frontend/src/components/lookups/*, frontend/src/components/custom-tools/prompt-card/*, frontend/src/components/custom-tools/display-logs/*, frontend/src/components/logging/*, frontend/package.json
New Look-Up UI: create project modal, debug tab, execution history, prompt-card lookup controls and replacement indicator, log stage styling (LOOKUP), settings-modal hook for LookupManager, and CodeMirror JSON deps. Review API usage, lazy-loading behavior, and new UI CSS.
Frontend — combined output / misc
frontend/src/components/custom-tools/combined-output/*, JsonView.jsx
Minor DOM/CSS tweaks to combined output and enrichment info styling.
Tests
backend/lookup/tests/... (api, integrations, services, migrations)
Extensive test coverage added: API tests, integration tests for LLM/whisperer/redis/storage, service unit tests (executor, orchestrator, merger, cache, reference loader), migration tests. Expect many mocked external interactions; review for completeness and flakiness.
Documentation
docs/LOOKUP_ARCHITECTURE.md
New architecture document describing models, services, workflows, and integration points.

Sequence Diagram(s)

sequenceDiagram
  participant PS as Prompt Studio
  participant Integration as LookupIntegrationService
  participant Orchestrator as LookUpOrchestrator
  participant Executor as LookUpExecutor
  participant Loader as ReferenceDataLoader
  participant Cache as LLM/Redis Cache
  participant LLM as LLM Provider
  participant Log as LookupLogEmitter

  PS->>Integration: request enrich_if_linked(extracted_data, prompt_lookup_map)
  Integration->>Orchestrator: build projects, execution_id
  Orchestrator->>Executor: execute(project, input_data, execution_id)
  Executor->>Loader: load_latest_for_project(project_id)
  Loader-->>Executor: concatenated reference_data
  Executor->>Cache: check cache(prompt + reference)
  alt cache hit
    Cache-->>Executor: cached_response
  else cache miss
    Executor->>LLM: generate(prompt, config)
    LLM-->>Executor: llm_response
    Executor->>Cache: store response
  end
  Executor->>Log: emit_enrichment_* (start/success/failure)
  Executor-->>Orchestrator: enrichment_result
  Orchestrator-->>Integration: aggregated_results
  Integration-->>PS: lookup_enrichment + _lookup_metadata
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding a Lookup System for reference data enrichment in prompts. It is specific, concise, and directly reflects the core objective.
Description check ✅ Passed The description is comprehensive and follows the template structure well. It includes What, Why, How sections with detailed backend/frontend breakdown, risk assessment, database migrations, testing notes, and all required sections.
Docstring Coverage ✅ Passed Docstring coverage is 98.31% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/look-ups

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
frontend/src/components/logging/log-modal/LogModal.jsx (1)

223-229: ⚠️ Potential issue | 🟡 Minor

Incorrect PropTypes definitions.

logDescModalOpen is used as a boolean (lines 175, 195) and setLogDescModalOpen is used as a function (line 97), but both are defined as PropTypes.array.

🔧 Proposed fix
 LogModal.propTypes = {
   executionId: PropTypes.string,
   fileId: PropTypes.string,
-  logDescModalOpen: PropTypes.array,
-  setLogDescModalOpen: PropTypes.array,
+  logDescModalOpen: PropTypes.bool,
+  setLogDescModalOpen: PropTypes.func,
   filterParams: PropTypes.object,
 };
backend/prompt_studio/prompt_studio_core_v2/views.py (1)

435-462: ⚠️ Potential issue | 🟠 Major

Handle LookupEnrichmentError in single_pass_extraction as well.

The same lookup failure that now returns 400 in fetch_response will still bubble as 500 here. Mirror the new error handling for consistency.

🛠️ Proposed fix
     def single_pass_extraction(self, request: HttpRequest, pk: uuid) -> Response:
         """API Entry point method to fetch response to prompt.
@@
         """
+        from prompt_studio.prompt_studio_output_manager_v2.output_manager_helper import (
+            LookupEnrichmentError,
+        )
         # TODO: Handle fetch_response and single_pass_
         # extraction using common function
         custom_tool = self.get_object()
@@
-        response: dict[str, Any] = PromptStudioHelper.prompt_responder(
-            tool_id=tool_id,
-            org_id=UserSessionUtils.get_organization_id(request),
-            user_id=custom_tool.created_by.user_id,
-            document_id=document_id,
-            run_id=run_id,
-        )
-        return Response(response, status=status.HTTP_200_OK)
+        try:
+            response: dict[str, Any] = PromptStudioHelper.prompt_responder(
+                tool_id=tool_id,
+                org_id=UserSessionUtils.get_organization_id(request),
+                user_id=custom_tool.created_by.user_id,
+                document_id=document_id,
+                run_id=run_id,
+            )
+            return Response(response, status=status.HTTP_200_OK)
+        except LookupEnrichmentError as e:
+            error_response = {
+                "error": str(e),
+                "error_type": e.error_type,
+                **e.details,
+            }
+            return Response(error_response, status=status.HTTP_400_BAD_REQUEST)
backend/workflow_manager/workflow_v2/file_execution_tasks.py (1)

1025-1043: ⚠️ Potential issue | 🟠 Major

Lookup enrichment is skipped entirely for filesystem and database destinations—design mismatch with intended support.

Lookup integration is designed to support ETL, Workflow, and API deployments (per workflow_integration.py documentation), but the current implementation only applies enrichment to API destinations. The condition on line 1037 (if output_result and not processing_error:) prevents enrichment from running for filesystem and database destinations because handle_output() returns None for those types—it only returns the tool execution result for API deployments.

This creates a silent data loss: enrichment is skipped for FILESYSTEM/DATABASE workflows even though the lookup system is designed to support them. Additionally, for FILESYSTEM/DATABASE destinations, persistence occurs inside handle_output() before any enrichment opportunity, so even if enrichment ran, it wouldn't reach persisted outputs.

Filesystem/database workflows should either receive enriched output via a redesigned flow, or enrichment scope should be explicitly limited to API deployments only (with documentation clarifying why).

🤖 Fix all issues with AI agents
In `@backend/lookup/integrations/llm_provider.py`:
- Around line 207-222: The current JSON extraction uses the regex json_pattern =
r"\{[^{}]*\}" which fails on nested objects; replace it with an iterative
balanced-brace extractor that scans the response string (the variable response)
and uses a stack/counter to find matching '{' ... '}' spans, attempt json.loads
on each balanced substring and return the first valid parse; update the logic
where matches are built/iterated (the block referencing json_pattern, matches,
and the json.loads try/except) to use this brace-matching approach so nested
JSON like {"data":{"value":1}} is correctly extracted and parsed.

In `@backend/lookup/integrations/redis_cache.py`:
- Around line 374-381: The warmup loop uses hashlib.md5 to build keys, which
diverges from the main generate_cache_key() scheme (SHA256) and triggers
security linting; update the key generation inside the preload_data loop to use
hashlib.sha256(prompt.encode()).hexdigest() (matching the generate_cache_key()
approach), keeping the same self.key_prefix and project:{project_id}: namespace
so keys are consistent; ensure the change is made where the code builds key (the
preload loop that calls self.set(key, response)) so warmup entries match the
primary cache lookup.

In `@backend/lookup/models/lookup_data_source.py`:
- Around line 168-196: The pre_save signal handler
auto_increment_version_and_update_latest on LookupDataSource is vulnerable to
race conditions because it reads Max("version_number") and then writes, so
concurrent creations can get duplicate version_number; update this by acquiring
a row-level lock and doing the read/update inside a transaction (e.g., wrap in
transaction.atomic() and use select_for_update() on LookupDataSource objects
filtered by project to lock the relevant rows before computing max and updating
is_latest) or switch to a DB-backed sequence/serial for version_number and set
is_latest in the same transaction to ensure uniqueness; ensure all logic
referencing version_number and is_latest in
auto_increment_version_and_update_latest runs inside that transaction context.

In `@backend/lookup/models/lookup_execution_audit.py`:
- Around line 134-140: The validation currently requires enriched_output only
when self.status == "success" but was_successful() treats "partial" as
successful too; update the validation in the LookupExecutionAudit model (the
block that checks self.status and self.enriched_output) to require
enriched_output when self.status is either "success" or "partial" (e.g., change
the condition to check status in ["success", "partial"]) and keep the
ValidationError message consistent.

In `@backend/lookup/models/lookup_project.py`:
- Around line 142-163: The clean() method wrongly enforces LLM_PROVIDER_CHOICES
even when llm_provider is nullable; update the validation in clean() to only
check membership if self.llm_provider is not None (and optionally not an empty
string) before comparing against valid_providers (referencing clean(),
LLM_PROVIDER_CHOICES and llm_provider), so nullable values pass and only
non-null providers are validated.

In `@backend/lookup/serializers.py`:
- Around line 331-361: The AdapterProcessor.get_adapter_instance_by_id function
currently returns adapter.adapter_name (string) but should return the full
AdapterInstance object; update get_adapter_instance_by_id in
backend/adapter_processor_v2/adapter_processor.py to return the adapter object
(not adapter.adapter_name), correct the return type annotation to the actual
AdapterInstance type (or Adapter) to match callers, and ensure error handling
returns None or raises appropriately before returning so callers like
LookupProfileSerializer.to_representation (which calls
AdapterProcessor.get_adapter_instance_by_id) receive the full instance or a
clear failure signal.

In `@backend/lookup/services/enrichment_merger.py`:
- Around line 103-112: The conflict counter misses cases where the existing
entry wins (winner == existing) because the current elif is unreachable; update
the resolution block in enrichment_merger.py so that whenever there is a
conflict between existing and field_entry (i.e., they are different lookup_ids)
you always increment conflicts_resolved after selecting a winner: keep the
existing branch that updates field_sources/merged_data/fields_added when winner
differs from existing (winner["lookup_id"] != existing["lookup_id"]), and
replace the elif with an else (or an explicit check for existing winning) that
simply increments conflicts_resolved when winner == existing and
existing["lookup_id"] != field_entry["lookup_id"]; this ensures
conflicts_resolved increments for both cases (new winner or existing winner).

In `@backend/lookup/services/llm_cache.py`:
- Around line 112-134: The generate_cache_key method currently concatenates
prompt and reference_data directly, causing possible collisions; update
generate_cache_key in LLMResponseCache to combine inputs with an unambiguous
scheme (e.g., length-prefixed fields like
"<len>:<prompt>|<len>:<reference_data>" or a reserved delimiter with escaping)
before hashing, and make the same change in the corresponding key-building
function in redis_cache (the method that assembles prompt+reference_data at
backend/lookup/integrations/redis_cache.py line ~107) so both implementations
use the identical separator/length-prefix format to ensure deterministic,
collision-free keys.

In `@backend/lookup/services/vector_db_cleanup_service.py`:
- Around line 286-301: The loop that calls cleanup_index_ids currently always
clears index_manager.index_ids_history, raw_index_id and marks
status/reindex_required even when result indicates failures, which can orphan
vector DB rows; change the logic in the loop that iterates index_managers so
that you only clear index_manager.index_ids_history, set
index_manager.raw_index_id = None, mark status {"indexed": False, "cleaned":
True} and set reindex_required = True (and save) when result["success"] is True
(or equivalent full-success indicator used by cleanup_for_data_source); if
result shows failures, preserve index_ids_history/raw_index_id and set status to
reflect cleanup failure (or leave indexed state) and still save errors via
index_manager.save(), matching the pattern in cleanup_for_data_source and
ensuring retryability.

In `@backend/lookup/services/workflow_integration.py`:
- Around line 179-183: The query in workflow_integration.py that matches
PromptStudioRegistry.prompt_registry_id against ToolInstance.tool_id is
type-unsafe because tool_instance.tool_id is a CharField while
prompt_registry_id is a UUIDField; update the lookup (around where
ToolInstance.objects.filter(...).first() is used) to explicitly convert
tool_instance.tool_id to a UUID before querying PromptStudioRegistry (e.g.,
parse with uuid.UUID) and handle invalid UUIDs with a try/except (log the
invalid value and return original_output, False) so the ORM receives a proper
UUID object rather than a raw string; alternatively, if you prefer a schema
change, migrate ToolInstance.tool_id to a UUIDField and adjust related code
accordingly.

In `@backend/lookup/tests/test_api/test_template_api.py`:
- Around line 21-37: The test setup creates a LookupPromptTemplate without the
required project relation; update setUp() to create a LookupProject and
associate it when creating the template (create a LookupProject instance and
pass it via the project argument to LookupPromptTemplate.objects.create),
ensuring the template's required OneToOneField to LookupProject is satisfied;
adjust any variable names as needed and keep the created project tied to self
(e.g., self.project) for reuse in tests referencing LookupPromptTemplate.

In `@backend/lookup/tests/test_migrations.py`:
- Around line 24-33: The setUp currently asserts self.migrate_from truthy
causing failures when migrate_from is intentionally None to represent the zero
state; remove that requirement and treat None as the initial migration by
allowing self.migrate_from to be [(self.app, None)] (instead of asserting) and
keep self.migrate_to handling as before; update the setUp logic around setUp,
self.migrate_from, self.migrate_to and the MigrationExecutor(connection) call so
tests can pass when migrate_from is None.

In `@backend/lookup/tests/test_services/test_lookup_orchestrator.py`:
- Around line 93-103: The test mocks define several helpers named
execute_side_effect that only accept positional args and use the name project,
but LookUpOrchestrator invokes executor.execute with keyword args
(lookup_project, input_data, execution_id, prompt_studio_project_id), causing
TypeError; update all six execute_side_effect helpers to accept the keyword
parameters (e.g., def execute_side_effect(*, lookup_project, input_data,
execution_id=None, prompt_studio_project_id=None) or accept **kwargs and read
lookup_project) and use lookup_project instead of project when constructing the
returned dict so the mocked signature matches the actual executor.execute call.

In
`@backend/prompt_studio/prompt_studio_output_manager_v2/output_manager_helper.py`:
- Around line 373-392: The current INFO logs in the post-processing loop print
full extracted values (see extracted_data_for_lookup, prompts loop, outputs and
logger.info calls), which may expose sensitive data; change those logger.info
calls to either logger.debug or remove the value content and instead log only
the prompt key and value type (e.g., prompt.prompt_key and
type(output_value).__name__) or a redacted placeholder, and similarly avoid
logging the full extracted_data_for_lookup dict at INFO—log its keys or count
only; update the two logger.info occurrences in this block and keep the skip
debug for PSOMKeys.NOTES as-is.
- Around line 432-459: Enriched values in the serialized_data loop are being
inserted raw (possibly dict/list) which breaks consistency with earlier
enforce_type serialization; fetch the prompt metadata from prompt_by_key (reuse
prompt_by_key) for the current prompt_key, run the same
enforce_type/serialization logic used elsewhere (e.g., json.dumps for structured
types like json/table/record/line-item) on enriched_value before assigning
item["lookup_replacement"]["enriched_value"], item["output"], and before calling
PromptStudioOutputManager.objects.filter(...).update(...), so stored/returned
outputs keep the same serialized format as non-enriched values.

In `@backend/prompt_studio/prompt_studio_v2/serializers.py`:
- Around line 35-58: The validation uses self.instance.tool_id_id (old value)
instead of respecting an incoming tool_id in updates; change the logic in the
serializer validation block so that if "tool_id" exists in self.initial_data it
overrides self.instance.tool_id_id (i.e., determine tool_id =
self.initial_data.get("tool_id") if present else self.instance.tool_id_id), then
perform the PromptStudioLookupLink.objects.filter check
(prompt_studio_project_id=tool_id, lookup_project=value) and raise
serializers.ValidationError if no link exists; keep the ImportError handling
as-is.

In
`@frontend/src/components/custom-tools/prompt-card/LookupReplacementIndicator.jsx`:
- Around line 48-61: The Popover uses trigger="hover" which blocks keyboard
access and the Button lacks an aria-label; change Popover's trigger to
trigger={["hover","focus"]} and add a descriptive aria-label to the Button (the
element rendering SwapOutlined) so screen readers and keyboard users can
discover the replacement details. Also replace any uses of the logical OR
fallback (e.g. value || fallback) in this component with the nullish coalescing
operator (value ?? fallback) so valid falsy values (0, false, "") are preserved;
look for usages in LookupReplacementIndicator.jsx around the Popover/content
rendering and update them accordingly.

In `@frontend/src/components/custom-tools/settings-modal/SettingsModal.jsx`:
- Around line 119-124: LookupManager is using the same numeric key (9) as
CustomDataSettings which overwrites listOfComponents[9] and creates duplicate
menu keys; update the LookupManager menu entry and its component mapping to use
an unused key (for example 10) by changing the getMenuItem call and the
listOfComponents index from 9 to 10 (referencing LookupManager, getMenuItem, and
listOfComponents) so CustomDataSettings remains reachable and keys are unique.

In `@frontend/src/components/logging/log-modal/LogModal.jsx`:
- Around line 43-57: The component is missing imports for Tag and SearchOutlined
which causes runtime errors: add Tag from "antd" and SearchOutlined from
"@ant-design/icons" to the existing import lists so getStageStyle (which returns
<SearchOutlined />) and the column render that uses <Tag> work correctly; update
the import statement(s) that currently import Button, Modal, Table, Tooltip and
CopyOutlined to also include Tag and SearchOutlined respectively.

In `@frontend/src/components/lookups/create-project-modal/CreateProjectModal.jsx`:
- Around line 9-16: handleSubmit currently treats any error as a validation
failure; change it to first await form.validateFields() in a try/catch to handle
and log/return validation errors, then call await onCreate(values) in a separate
try/catch so API/create failures are handled distinctly (log the error, show
user-visible feedback via message.error or set a submission error) and still
call form.resetFields() only on successful creation; reference handleSubmit,
form.validateFields, onCreate, and form.resetFields when updating the control
flow.
🟡 Minor comments (25)
frontend/src/components/custom-tools/display-logs/DisplayLogs.css-27-33 (1)

27-33: ⚠️ Potential issue | 🟡 Minor

CSS class mismatch with DisplayLogs.jsx.

The CSS defines .display-logs-stage-extract and .display-logs-stage-index, but DisplayLogs.jsx uses TOOL stage which maps to .display-logs-stage-tool (not defined here). This mismatch means:

  • TOOL stage tags won't get their custom border styling
  • EXTRACT and INDEX class definitions appear unused
🔧 Proposed fix to align with DisplayLogs.jsx
-.display-logs-stage-extract {
-  border-color: `#52c41a`;
-}
-
-.display-logs-stage-index {
+.display-logs-stage-tool {
+  border-color: `#13c2c2`;
+}
+
+.display-logs-stage-default {
   border-color: `#faad14`;
 }
frontend/src/components/lookups/debug-tab/DebugTab.jsx-353-444 (1)

353-444: ⚠️ Potential issue | 🟡 Minor

Guard against missing/zero token metrics to avoid NaN UI math.

If tokenCount or contextLimit is missing/zero, percentage math yields NaN/Infinity and can break the progress display. Add a defensive guard with a fallback alert.

🛡️ Suggested fix
-    const { tokenCount, contextLimit, model } = errorDetails;
-    const usagePercent = Math.min(
-      Math.round((tokenCount / contextLimit) * 100),
-      100
-    );
-    const overflowAmount = tokenCount - contextLimit;
-    const overflowPercent = Math.round((overflowAmount / contextLimit) * 100);
+    const { tokenCount, contextLimit, model } = errorDetails;
+    if (
+      !Number.isFinite(tokenCount) ||
+      !Number.isFinite(contextLimit) ||
+      contextLimit <= 0
+    ) {
+      return (
+        <Alert
+          message="Context Window Exceeded"
+          description="Token usage details are unavailable for this run."
+          type="error"
+          showIcon={false}
+          style={{ marginBottom: 16 }}
+        />
+      );
+    }
+    const usagePercent = Math.min(
+      Math.round((tokenCount / contextLimit) * 100),
+      100
+    );
+    const overflowAmount = Math.max(tokenCount - contextLimit, 0);
+    const overflowPercent = Math.round((overflowAmount / contextLimit) * 100);
frontend/src/components/lookups/create-project-modal/CreateProjectModal.jsx-20-24 (1)

20-24: ⚠️ Potential issue | 🟡 Minor

Reset the form on cancel to avoid stale values on reopen.

Currently, canceling leaves prior input in the form since the modal stays mounted. When reopened, users see previously entered values for name and description fields, creating a poor UX. Implement the suggested handleCancel handler to reset the form to its initial state before calling onCancel.

♻️ Proposed refinement
 export function CreateProjectModal({ open, onCancel, onCreate }) {
   const [form] = Form.useForm();
+  const handleCancel = () => {
+    form.resetFields();
+    onCancel();
+  };
 ...
   return (
     <Modal
       title="Create Look-Up Project"
       open={open}
-      onCancel={onCancel}
+      onCancel={handleCancel}
frontend/src/components/custom-tools/prompt-card/LookupReplacementIndicator.jsx-24-41 (1)

24-41: ⚠️ Potential issue | 🟡 Minor

Use nullish coalescing (??) instead of logical OR (||) to preserve valid falsy values.

The current code uses || "(empty)" which will incorrectly display (empty) for legitimate falsy values like 0, false, or empty strings. Since original_value and enriched_value can be any type (per PropTypes), use ?? to only treat null and undefined as empty.

Suggested fix
-            {originalValue || "(empty)"}
+            {originalValue ?? "(empty)"}
-            {enrichedValue || "(empty)"}
+            {enrichedValue ?? "(empty)"}
backend/lookup/services/vector_db_cleanup_service.py-244-250 (1)

244-250: ⚠️ Potential issue | 🟡 Minor

State update may incorrectly remove failed IDs from history.

When result["deleted"] > 0, the code removes all ids_to_delete from index_ids_history, but some of those IDs may have failed deletion. Since cleanup_index_ids doesn't return which specific IDs succeeded, failed IDs will be lost from history and can't be retried.

Consider either:

  1. Only update history when result["success"] is True (all deletions succeeded)
  2. Have cleanup_index_ids return the list of successfully deleted IDs
🔧 Conservative fix option
-        # Update history to remove deleted IDs
-        if result["deleted"] > 0:
-            remaining_ids = [
-                id for id in index_manager.index_ids_history if id not in ids_to_delete
-            ]
-            index_manager.index_ids_history = remaining_ids
-            index_manager.save()
+        # Only update history when all deletions succeeded
+        if result["success"]:
+            remaining_ids = [
+                id for id in index_manager.index_ids_history if id not in ids_to_delete
+            ]
+            index_manager.index_ids_history = remaining_ids
+            index_manager.save()
backend/lookup/tests/test_services/test_audit_logger.py-226-245 (1)

226-245: ⚠️ Potential issue | 🟡 Minor

Drop unused result assignment.
Static analysis flagged result as unused here; the call can stand alone.

🧹 Cleanup
-        result = audit_logger.log_execution(**minimal_params)
+        audit_logger.log_execution(**minimal_params)
backend/lookup/tests/test_services/test_audit_logger.py-93-105 (1)

93-105: ⚠️ Potential issue | 🟡 Minor

Fix the mock exception setup in test_project_not_found.

When using mock_project_model.DoesNotExist() as a side_effect, it creates a MagicMock instance rather than raising an exception. The except block catching LookupProject.DoesNotExist is never exercised. Create a real exception class on the mock instead:

Suggested fix
-        mock_project_model.objects.get.side_effect = mock_project_model.DoesNotExist()
+        mock_project_model.DoesNotExist = type("DoesNotExist", (Exception,), {})
+        mock_project_model.objects.get.side_effect = mock_project_model.DoesNotExist
backend/lookup/tests/test_services/test_lookup_executor.py-365-368 (1)

365-368: ⚠️ Potential issue | 🟡 Minor

Silence the unused-argument warning by prefixing parameters with underscores.
Ruff's ARG rule will flag the unused *args and **kwargs in this test mock function.

🔧 Suggested fix
-        def delayed_generate(*args, **kwargs):
+        def delayed_generate(*_args, **_kwargs):
             import time
             time.sleep(0.01)  # 10ms delay
             return '{"result": "data"}'
backend/lookup/tests/test_api/test_execution_api.py-31-35 (1)

31-35: ⚠️ Potential issue | 🟡 Minor

Suppress Ruff S106 for test-only passwords (both setups).

🛠️ Suggested fix
-            password='testpass123'
+            password='testpass123',  # noqa: S106 - test-only password
...
-            password='testpass123'
+            password='testpass123',  # noqa: S106 - test-only password

Also applies to: 166-170

backend/lookup/tests/test_api/test_execution_api.py-71-74 (1)

71-74: ⚠️ Potential issue | 🟡 Minor

Silence the unused patched arg to satisfy Ruff ARG002.

🛠️ Suggested fix
-    def test_debug_with_ps_project(self, mock_executor_class, mock_orchestrator_class):
+    def test_debug_with_ps_project(self, _mock_executor_class, mock_orchestrator_class):
backend/lookup/tests/test_api/test_linking_api.py-101-108 (1)

101-108: ⚠️ Potential issue | 🟡 Minor

Remove unused link variables to satisfy Ruff F841.

🧹 Suggested fix
-        link1 = PromptStudioLookupLink.objects.create(
+        PromptStudioLookupLink.objects.create(
             prompt_studio_project_id=self.ps_project_id,
             lookup_project=self.lookup1
         )
-        link2 = PromptStudioLookupLink.objects.create(
+        PromptStudioLookupLink.objects.create(
             prompt_studio_project_id=self.ps_project_id,
             lookup_project=self.lookup2
         )
backend/lookup/integrations/redis_cache.py-120-131 (1)

120-131: ⚠️ Potential issue | 🟡 Minor

Treat empty cached values as hits.
Redis can return an empty string; the current truthiness check marks that as a miss.

🛠️ Suggested fix
-                if value:
+                if value is not None:
                     logger.debug(f"Redis cache hit for key: {key[:20]}...")
                     self._update_stats("hits")
                     return value
                 else:
                     logger.debug(f"Redis cache miss for key: {key[:20]}...")
                     self._update_stats("misses")
backend/lookup/tests/test_api/test_linking_api.py-27-31 (1)

27-31: ⚠️ Potential issue | 🟡 Minor

Suppress Ruff S106 for the test-only password.
This avoids a lint error while keeping the test simple.

🛠️ Suggested fix
-            password='testpass123'
+            password='testpass123',  # noqa: S106 - test-only password
backend/lookup/tests/test_services/test_reference_data_loader.py-71-71 (1)

71-71: ⚠️ Potential issue | 🟡 Minor

Change file_type="text/csv" to file_type="csv".

The file_type field in LookupDataSource validates against FILE_TYPE_CHOICES, which only accepts "csv", not the MIME type "text/csv". This will fail model validation. Use the plain choice value instead.

backend/lookup/integrations/unstract_llm_client.py-122-163 (1)

122-163: ⚠️ Potential issue | 🟡 Minor

Unused config and timeout parameters should be utilized or removed.

The config parameter (for temperature, etc.) and timeout parameter are declared but never used. The LLM call at line 142 doesn't apply the configuration, and no timeout is enforced.

Proposed fix to utilize or document the parameters

If the parameters are intended for future use, add a TODO comment. Otherwise, apply config to the LLM call:

     def generate(self, prompt: str, config: dict[str, Any], timeout: int = 30) -> str:
         ...
         try:
             # Create LLM instance using SDK
-            llm = LLM(adapter_id=self.adapter_id, adapter_metadata=self.adapter_metadata)
+            # Apply config settings if provided
+            metadata = {**self.adapter_metadata}
+            if config.get("temperature") is not None:
+                metadata["temperature"] = config["temperature"]
+            llm = LLM(adapter_id=self.adapter_id, adapter_metadata=metadata)
+            # TODO: Implement timeout handling for LLM calls
backend/lookup/services/lookup_retrieval_service.py-151-155 (1)

151-155: ⚠️ Potential issue | 🟡 Minor

Potential IndexError when accessing unique_chunks by index.

If all chunks are duplicates (edge case), unique_chunks could be empty after deduplication, causing an IndexError on line 154 when accessing unique_chunks[-1] and unique_chunks[0].

Proposed fix
+        if not unique_chunks:
+            logger.warning(
+                f"All {len(all_chunks)} chunks were duplicates for project {project_id}"
+            )
+            return ""
+
         logger.info(
             f"Retrieved {len(unique_chunks)} unique chunks "
             f"(from {len(all_chunks)} total) for project {project_id}. "
             f"Score range: {unique_chunks[-1].score:.3f} - {unique_chunks[0].score:.3f}"
         )
backend/lookup/tests/test_integrations/test_storage_integration.py-164-196 (1)

164-196: ⚠️ Potential issue | 🟡 Minor

Remove unused expected_meta variable.

The variable expected_meta is computed but never used in assertions. Either add an assertion to verify metadata content or remove it.

Proposed fix - add assertion or remove variable

Option 1 - Add assertion:

         # Mock JSON encoding for metadata
         import json
         expected_meta = json.dumps(metadata, indent=2)
         ...
         # Check metadata upload
         meta_path = f"{expected_path}.meta.json"
         meta_calls = [call for call in self.mock_fs.write.call_args_list
                       if call[1]['path'] == meta_path]
         self.assertEqual(len(meta_calls), 1)
+        # Verify metadata content
+        actual_meta_data = meta_calls[0][1]['data'].decode('utf-8')
+        self.assertEqual(actual_meta_data, expected_meta)

Option 2 - Remove unused variable:

-        # Mock JSON encoding for metadata
-        import json
-        expected_meta = json.dumps(metadata, indent=2)
-
backend/lookup/integrations/llmwhisperer_client.py-188-222 (1)

188-222: ⚠️ Potential issue | 🟡 Minor

Blocking poll loop could impact request threads.

The wait_for_extraction method uses time.sleep() which blocks the thread. If called from a request handler, this could exhaust worker threads. Consider:

  1. Using this only in background tasks/Celery workers
  2. Adding documentation about intended usage context
  3. For request threads, return immediately and use webhooks/callbacks
backend/lookup/models/prompt_studio_lookup_link.py-91-106 (1)

91-106: ⚠️ Potential issue | 🟡 Minor

Auto-assignment condition may incorrectly skip assignment for explicit zero.

The condition self.execution_order == 0 and not self.pk will not auto-assign if someone explicitly sets execution_order=0. Also, for UUID primary keys with defaults, not self.pk may not work as expected since UUID fields have defaults.

🐛 Proposed fix
     def save(self, *args, **kwargs):
         """Override save to auto-assign execution_order if not set."""
-        if self.execution_order == 0 and not self.pk:
+        # Check if this is a new instance by querying the database
+        # (self.pk is always set for UUIDs due to default=uuid.uuid4)
+        is_new = self.pk is None or not PromptStudioLookupLink.objects.filter(pk=self.pk).exists()
+        
+        if is_new and self._state.adding and self.execution_order == 0:
             # Get the maximum execution order for this PS project
             max_order = PromptStudioLookupLink.objects.filter(
                 prompt_studio_project_id=self.prompt_studio_project_id
             ).aggregate(max_order=models.Max("execution_order"))["max_order"]
 
             self.execution_order = (max_order or 0) + 1
 
         super().save(*args, **kwargs)

Using self._state.adding is the Django-idiomatic way to check for new instances.

backend/lookup/tests/test_integrations/test_redis_cache_integration.py-144-164 (1)

144-164: ⚠️ Potential issue | 🟡 Minor

Move the RedisError import to the top of the file for consistency.

The from redis.exceptions import RedisError import at line 147 is placed inline within the test method. While redis is a declared project dependency (in backend/pyproject.toml and other modules), the import statement itself should be at the top of the file with other imports for consistency and maintainability, rather than deferred until the test method executes.

backend/lookup/services/indexing_service.py-398-401 (1)

398-401: ⚠️ Potential issue | 🟡 Minor

Add exception chaining for better debugging.

When re-raising in an except clause, use raise ... from err or from None to preserve the exception chain.

🔧 Proposed fix
         try:
             project = LookupProject.objects.get(id=project_id)
         except LookupProject.DoesNotExist:
-            raise ValueError(f"Project {project_id} not found")
+            raise ValueError(f"Project {project_id} not found") from None
backend/lookup/services/lookup_executor.py-664-669 (1)

664-669: ⚠️ Potential issue | 🟡 Minor

Add exception chaining in _parse_llm_response.

When converting exceptions to ParseError, chain the original exception to preserve debugging context.

🔧 Proposed fix
         except json.JSONDecodeError as e:
             logger.warning(f"Failed to parse LLM response as JSON: {e}")
-            raise ParseError(f"Invalid JSON response: {str(e)}")
+            raise ParseError(f"Invalid JSON response: {e!s}") from e
         except Exception as e:
             logger.warning(f"Unexpected error parsing LLM response: {e}")
-            raise ParseError(f"Parse error: {str(e)}")
+            raise ParseError(f"Parse error: {e!s}") from e
backend/lookup/views.py-1357-1361 (1)

1357-1361: ⚠️ Potential issue | 🟡 Minor

Add exception chaining for better debugging.

When converting IntegrityError to DuplicateData, chain the original exception to preserve the database error details for debugging.

🔧 Proposed fix
         try:
             self.perform_create(serializer)
-        except IntegrityError:
-            raise DuplicateData(LookupProfileManagerErrors.PROFILE_NAME_EXISTS)
+        except IntegrityError as e:
+            raise DuplicateData(LookupProfileManagerErrors.PROFILE_NAME_EXISTS) from e
backend/lookup/views.py-260-265 (1)

260-265: ⚠️ Potential issue | 🟡 Minor

Hardcoded user_id may cause issues with multi-user scenarios.

Using a fixed user_id = "lookup" means all lookup uploads share the same path segment, which could cause file collisions if two users upload files with the same name to the same project. Consider using the actual user ID or a unique identifier.

🔧 Proposed fix
             org_id = UserContext.get_organization_identifier()
-            # Use a fixed user_id for lookup uploads to match PS path structure
-            # Path: {base_path}/{org_id}/{user_id}/{tool_id}/{filename}
-            user_id = "lookup"
+            # Use actual user ID to prevent file collisions between users
+            user_id = str(request.user.user_id) if hasattr(request.user, 'user_id') else str(request.user.id)
             tool_id = str(project.id)
backend/lookup/integrations/llm_provider.py-100-142 (1)

100-142: ⚠️ Potential issue | 🟡 Minor

Unused timeout parameter and missing exception chaining.

The timeout parameter is declared but never used. Additionally, catching a broad Exception and re-raising without chaining loses the original traceback context.

🔧 Proposed fix
-    def generate(self, prompt: str, config: dict[str, Any], timeout: int = 30) -> str:
+    def generate(self, prompt: str, config: dict[str, Any], timeout: int = 30) -> str:  # noqa: ARG002
         """Generate LLM response for Look-Up enrichment.
 
         Args:
             prompt: The prompt text with resolved variables
             config: LLM configuration (provider, model, temperature, etc.)
-            timeout: Request timeout in seconds
+            timeout: Request timeout in seconds (reserved for future use)
 
         Returns:
             JSON-formatted string with enrichment data
 
         Raises:
             RuntimeError: If LLM call fails
         """
         start_time = time.time()
 
         try:
             if self.llm_available:
                 # Use actual Unstract LLM integration
                 llm_config = self._get_llm_instance(config)
 
                 # In production, this would make actual LLM call
                 # For now, simulate the call
                 response = self._simulate_llm_call(prompt, llm_config)
             else:
                 # Fallback implementation
                 response = self._fallback_generate(prompt, config)
 
             # Validate response is JSON
             try:
                 json.loads(response)
             except json.JSONDecodeError:
                 # Try to extract JSON from response
                 response = self._extract_json(response)
 
             elapsed_time = time.time() - start_time
             logger.info(f"LLM generation completed in {elapsed_time:.2f}s")
 
             return response
 
         except Exception as e:
-            logger.error(f"LLM generation failed: {e}")
-            raise RuntimeError(f"LLM generation failed: {str(e)}")
+            logger.exception("LLM generation failed")
+            raise RuntimeError(f"LLM generation failed: {e!s}") from e

Comment on lines +207 to +222
# Try to find JSON in the response
import re

# Look for JSON object pattern
json_pattern = r"\{[^{}]*\}"
matches = re.findall(json_pattern, response)

if matches:
# Try to parse each match
for match in matches:
try:
json.loads(match)
return match
except json.JSONDecodeError:
continue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

JSON extraction regex won't match nested objects.

The pattern r"\{[^{}]*\}" excludes any braces inside the match, so nested JSON like {"data": {"value": 1}} won't be extracted correctly. Consider using a more robust approach.

🔧 Proposed fix using iterative brace matching
     def _extract_json(self, response: str) -> str:
         """Extract JSON from LLM response if wrapped in text."""
-        # Try to find JSON in the response
-        import re
-
-        # Look for JSON object pattern
-        json_pattern = r"\{[^{}]*\}"
-        matches = re.findall(json_pattern, response)
-
-        if matches:
-            # Try to parse each match
-            for match in matches:
-                try:
-                    json.loads(match)
-                    return match
-                except json.JSONDecodeError:
-                    continue
+        # Try to find JSON by locating balanced braces
+        start_idx = response.find("{")
+        if start_idx == -1:
+            # No JSON object found
+            return json.dumps({
+                "raw_response": response[:500],
+                "confidence": 0.3,
+                "warning": "Could not extract structured data",
+            })
+
+        # Try progressively longer substrings starting from first brace
+        for end_idx in range(len(response), start_idx, -1):
+            if response[end_idx - 1] == "}":
+                candidate = response[start_idx:end_idx]
+                try:
+                    json.loads(candidate)
+                    return candidate
+                except json.JSONDecodeError:
+                    continue
 
         # If no valid JSON found, create a basic response
         return json.dumps({
             "raw_response": response[:500],
             "confidence": 0.3,
             "warning": "Could not extract structured data",
         })
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 219-219: Consider moving this statement to an else block

(TRY300)

🤖 Prompt for AI Agents
In `@backend/lookup/integrations/llm_provider.py` around lines 207 - 222, The
current JSON extraction uses the regex json_pattern = r"\{[^{}]*\}" which fails
on nested objects; replace it with an iterative balanced-brace extractor that
scans the response string (the variable response) and uses a stack/counter to
find matching '{' ... '}' spans, attempt json.loads on each balanced substring
and return the first valid parse; update the logic where matches are
built/iterated (the block referencing json_pattern, matches, and the json.loads
try/except) to use this brace-matching approach so nested JSON like
{"data":{"value":1}} is correctly extracted and parsed.

Comment on lines +374 to +381
count = 0

for prompt, response in preload_data.items():
# Generate key with project namespace
key = f"{self.key_prefix}project:{project_id}:{hashlib.md5(prompt.encode()).hexdigest()}"

if self.set(key, response):
count += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Explore the file structure
cd backend/lookup/integrations && wc -l redis_cache.py

Repository: Zipstack/unstract

Length of output: 79


🏁 Script executed:

# Get context around the flagged lines and search for other hashing
cd backend/lookup/integrations && sed -n '350,400p' redis_cache.py

Repository: Zipstack/unstract

Length of output: 1208


🏁 Script executed:

# Search for hash-related code throughout the file
cd backend/lookup/integrations && grep -n "hashlib\|sha256\|md5" redis_cache.py

Repository: Zipstack/unstract

Length of output: 247


🏁 Script executed:

# Get context around line 108 to understand the main cache key scheme
cd backend/lookup/integrations && sed -n '100,120p' redis_cache.py

Repository: Zipstack/unstract

Length of output: 632


🏁 Script executed:

# Get more context to see the method name containing line 108
cd backend/lookup/integrations && sed -n '95,115p' redis_cache.py

Repository: Zipstack/unstract

Length of output: 684


Align warmup cache key hashing with the main cache scheme — use SHA256 instead of MD5.
The main cache key generation (generate_cache_key()) uses SHA256, but the warmup method uses MD5, creating an inconsistency. Additionally, MD5 is cryptographically weak and flagged by security linters.

🛠️ Suggested fix
-            key = f"{self.key_prefix}project:{project_id}:{hashlib.md5(prompt.encode()).hexdigest()}"
+            prompt_hash = hashlib.sha256(prompt.encode("utf-8")).hexdigest()
+            key = f"{self.key_prefix}project:{project_id}:{prompt_hash}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
count = 0
for prompt, response in preload_data.items():
# Generate key with project namespace
key = f"{self.key_prefix}project:{project_id}:{hashlib.md5(prompt.encode()).hexdigest()}"
if self.set(key, response):
count += 1
count = 0
for prompt, response in preload_data.items():
# Generate key with project namespace
prompt_hash = hashlib.sha256(prompt.encode("utf-8")).hexdigest()
key = f"{self.key_prefix}project:{project_id}:{prompt_hash}"
if self.set(key, response):
count += 1
🧰 Tools
🪛 Ruff (0.14.14)

[error] 378-378: Probable use of insecure hash functions in hashlib: md5

(S324)

🤖 Prompt for AI Agents
In `@backend/lookup/integrations/redis_cache.py` around lines 374 - 381, The
warmup loop uses hashlib.md5 to build keys, which diverges from the main
generate_cache_key() scheme (SHA256) and triggers security linting; update the
key generation inside the preload_data loop to use
hashlib.sha256(prompt.encode()).hexdigest() (matching the generate_cache_key()
approach), keeping the same self.key_prefix and project:{project_id}: namespace
so keys are consistent; ensure the change is made where the code builds key (the
preload loop that calls self.set(key, response)) so warmup entries match the
primary cache lookup.

Comment on lines +168 to +196
@receiver(pre_save, sender=LookupDataSource)
def auto_increment_version_and_update_latest(sender, instance, **kwargs):
"""Signal to auto-increment version number and manage is_latest flag.

This signal:
1. Auto-increments version_number if not set
2. Marks all previous versions as not latest
"""
# Check if this is a new instance by querying the database
# (instance.pk is always truthy for UUIDField with default=uuid.uuid4)
is_new = not LookupDataSource.objects.filter(pk=instance.pk).exists()

if is_new:
# Get the highest version number for this project
max_version = LookupDataSource.objects.filter(project=instance.project).aggregate(
max_version=models.Max("version_number")
)["max_version"]

# Always auto-increment version for new instances
instance.version_number = (max_version or 0) + 1

# Mark all previous versions as not latest
LookupDataSource.objects.filter(project=instance.project, is_latest=True).update(
is_latest=False
)

# Ensure new version is marked as latest
instance.is_latest = True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Race condition risk in version auto-increment signal.

The pre_save signal queries Max("version_number") and then sets the new version, but concurrent inserts could result in duplicate version numbers since the unique constraint check happens at commit time. Consider using select_for_update() or database-level sequences.

🔒 Proposed mitigation using select_for_update
 `@receiver`(pre_save, sender=LookupDataSource)
 def auto_increment_version_and_update_latest(sender, instance, **kwargs):
     # Check if this is a new instance
     is_new = not LookupDataSource.objects.filter(pk=instance.pk).exists()
 
     if is_new:
+        from django.db import transaction
+        with transaction.atomic():
+            # Lock existing records to prevent race conditions
+            existing = LookupDataSource.objects.filter(
+                project=instance.project
+            ).select_for_update().order_by('-version_number').first()
+            
+            instance.version_number = (existing.version_number if existing else 0) + 1
-        max_version = LookupDataSource.objects.filter(project=instance.project).aggregate(
-            max_version=models.Max("version_number")
-        )["max_version"]
-
-        instance.version_number = (max_version or 0) + 1
 
         # Mark all previous versions as not latest
         LookupDataSource.objects.filter(project=instance.project, is_latest=True).update(
             is_latest=False
         )
 
         instance.is_latest = True

Note: This approach requires the signal to be called within a transaction context. Alternatively, consider handling version assignment in the model's save() method where you have more control over the transaction.

🧰 Tools
🪛 Ruff (0.14.14)

[warning] 169-169: Unused function argument: sender

(ARG001)


[warning] 169-169: Unused function argument: kwargs

(ARG001)

🤖 Prompt for AI Agents
In `@backend/lookup/models/lookup_data_source.py` around lines 168 - 196, The
pre_save signal handler auto_increment_version_and_update_latest on
LookupDataSource is vulnerable to race conditions because it reads
Max("version_number") and then writes, so concurrent creations can get duplicate
version_number; update this by acquiring a row-level lock and doing the
read/update inside a transaction (e.g., wrap in transaction.atomic() and use
select_for_update() on LookupDataSource objects filtered by project to lock the
relevant rows before computing max and updating is_latest) or switch to a
DB-backed sequence/serial for version_number and set is_latest in the same
transaction to ensure uniqueness; ensure all logic referencing version_number
and is_latest in auto_increment_version_and_update_latest runs inside that
transaction context.

Comment on lines +134 to +140
# Ensure error_message is provided for failed status
if self.status == "failed" and not self.error_message:
raise ValidationError("Error message is required for failed executions")

# Ensure enriched_output is provided for success status
if self.status == "success" and not self.enriched_output:
raise ValidationError("Enriched output is required for successful executions")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how partial status is used in the codebase
rg '"partial"' --type py -C3 | head -100

Repository: Zipstack/unstract

Length of output: 3646


🏁 Script executed:

#!/bin/bash
# Search for status patterns and partial usage
rg "status.*partial|partial.*status" --type py -C2 | head -80

Repository: Zipstack/unstract

Length of output: 3771


🏁 Script executed:

#!/bin/bash
# Look at the full context of the validation file
cat -n backend/lookup/models/lookup_execution_audit.py | sed -n '120,160p'

Repository: Zipstack/unstract

Length of output: 1136


🏁 Script executed:

#!/bin/bash
# Find all status values used in the codebase
rg "status\s*=\s*[\"']" --type py | grep -E "(success|failed|partial)" | head -30

Repository: Zipstack/unstract

Length of output: 2733


Add validation for enriched_output on "partial" status to match "success" requirements.

The code validates enriched_output for status "success" but not for "partial". Since the was_successful() property treats both "success" and "partial" as successful outcomes, the validation should be consistent. If "success" requires enriched output, "partial" should too—partial success still implies some processed output exists.

Consider adding validation for "partial" status:

if self.status in ["success", "partial"] and not self.enriched_output:
    raise ValidationError("Enriched output is required for successful executions")
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 136-136: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 140-140: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@backend/lookup/models/lookup_execution_audit.py` around lines 134 - 140, The
validation currently requires enriched_output only when self.status == "success"
but was_successful() treats "partial" as successful too; update the validation
in the LookupExecutionAudit model (the block that checks self.status and
self.enriched_output) to require enriched_output when self.status is either
"success" or "partial" (e.g., change the condition to check status in
["success", "partial"]) and keep the ValidationError message consistent.

Comment on lines +142 to +163
def clean(self):
"""Validate model fields."""
super().clean()
from django.core.exceptions import ValidationError

# Validate LLM provider
valid_providers = [choice[0] for choice in self.LLM_PROVIDER_CHOICES]
if self.llm_provider not in valid_providers:
raise ValidationError(
f"Invalid LLM provider: {self.llm_provider}. "
f"Must be one of: {', '.join(valid_providers)}"
)

# Validate LLM config structure
if self.llm_config and not isinstance(self.llm_config, dict):
raise ValidationError("LLM config must be a dictionary")

# Validate lookup_type (only static_data for POC)
if self.lookup_type != "static_data":
raise ValidationError(
"Only 'static_data' lookup type is supported in this POC"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Validation fails for nullable llm_provider when it's None.

The llm_provider field allows null=True, but clean() validates that it must be in LLM_PROVIDER_CHOICES. When llm_provider is None, this validation will raise a ValidationError since None is not in the valid choices list.

🐛 Proposed fix
     def clean(self):
         """Validate model fields."""
         super().clean()
         from django.core.exceptions import ValidationError

         # Validate LLM provider
-        valid_providers = [choice[0] for choice in self.LLM_PROVIDER_CHOICES]
-        if self.llm_provider not in valid_providers:
-            raise ValidationError(
-                f"Invalid LLM provider: {self.llm_provider}. "
-                f"Must be one of: {', '.join(valid_providers)}"
-            )
+        if self.llm_provider is not None:
+            valid_providers = [choice[0] for choice in self.LLM_PROVIDER_CHOICES]
+            if self.llm_provider not in valid_providers:
+                raise ValidationError(
+                    f"Invalid LLM provider: {self.llm_provider}. "
+                    f"Must be one of: {', '.join(valid_providers)}"
+                )

         # Validate LLM config structure
         if self.llm_config and not isinstance(self.llm_config, dict):
             raise ValidationError("LLM config must be a dictionary")
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 150-153: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 157-157: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 161-163: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@backend/lookup/models/lookup_project.py` around lines 142 - 163, The clean()
method wrongly enforces LLM_PROVIDER_CHOICES even when llm_provider is nullable;
update the validation in clean() to only check membership if self.llm_provider
is not None (and optionally not an empty string) before comparing against
valid_providers (referencing clean(), LLM_PROVIDER_CHOICES and llm_provider), so
nullable values pass and only non-null providers are validated.

Comment on lines +35 to +58
# Get tool_id from instance (update) or initial data (create)
tool_id = None
if self.instance:
tool_id = self.instance.tool_id_id
elif "tool_id" in self.initial_data:
tool_id = self.initial_data["tool_id"]

if tool_id:
try:
from lookup.models import PromptStudioLookupLink

link_exists = PromptStudioLookupLink.objects.filter(
prompt_studio_project_id=tool_id,
lookup_project=value,
).exists()

if not link_exists:
raise serializers.ValidationError(
"Selected lookup project must be linked to this "
"Prompt Studio project at the project level first."
)
except ImportError:
# Lookup app not installed, skip validation
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

head -100 backend/prompt_studio/prompt_studio_v2/serializers.py | cat -n

Repository: Zipstack/unstract

Length of output: 3410


🏁 Script executed:

find backend -name "models.py" -type f | xargs grep -l "ToolStudioPrompt" | head -5

Repository: Zipstack/unstract

Length of output: 173


🏁 Script executed:

grep -n "class ToolStudioPrompt" backend/prompt_studio/prompt_studio_v2/models.py

Repository: Zipstack/unstract

Length of output: 98


🏁 Script executed:

sed -n '11,80p' backend/prompt_studio/prompt_studio_v2/models.py | cat -n

Repository: Zipstack/unstract

Length of output: 2927


🏁 Script executed:

grep -n "lookup_project" backend/prompt_studio/prompt_studio_v2/models.py

Repository: Zipstack/unstract

Length of output: 104


🏁 Script executed:

sed -n '143,155p' backend/prompt_studio/prompt_studio_v2/models.py | cat -n

Repository: Zipstack/unstract

Length of output: 656


🏁 Script executed:

find backend -type f -name "*.py" | xargs grep -l "PromptStudioLookupLink" | head -5

Repository: Zipstack/unstract

Length of output: 404


🏁 Script executed:

cat -n backend/lookup/models/prompt_studio_lookup_link.py

Repository: Zipstack/unstract

Length of output: 5455


🏁 Script executed:

cat > /tmp/test_validation_logic.py << 'EOF'
# Test to understand the issue with validation logic

# Scenario 1: PATCH with both tool_id and lookup_project changing
# Current code flow:
print("=== Current Logic (Instance First) ===")
print("Request: PATCH with tool_id='new-tool', lookup_project='lookup-A'")
print("Instance (DB): tool_id='old-tool'")
print()
print("Execution:")
print("1. self.instance exists (PATCH request) → True")
print("2. tool_id = self.instance.tool_id_id  # Uses 'old-tool' from DB")
print("3. Validates lookup_project against 'old-tool' (WRONG!)")
print("4. Should validate against 'new-tool' (incoming data)")
print()

print("=== Suggested Logic (Initial Data First) ===")
print("Request: PATCH with tool_id='new-tool', lookup_project='lookup-A'")
print("Instance (DB): tool_id='old-tool'")
print()
print("Execution:")
print("1. 'tool_id' in self.initial_data → True")
print("2. tool_id = self.initial_data['tool_id']  # Uses 'new-tool' from request")
print("3. Validates lookup_project against 'new-tool' (CORRECT!)")
EOF
python /tmp/test_validation_logic.py

Repository: Zipstack/unstract

Length of output: 764


🏁 Script executed:

find backend -type f -name "*.py" | xargs grep -l "ToolStudioPromptSerializer" | grep -E "(views|viewset)" | head -5

Repository: Zipstack/unstract

Length of output: 294


🏁 Script executed:

grep -n "ToolStudioPromptSerializer" backend/prompt_studio/prompt_studio_v2/views.py | head -10

Repository: Zipstack/unstract

Length of output: 198


🏁 Script executed:

sed -n '20,80p' backend/prompt_studio/prompt_studio_v2/views.py | cat -n

Repository: Zipstack/unstract

Length of output: 2583


🏁 Script executed:

grep -A 5 "class Meta:" backend/prompt_studio/prompt_studio_v2/serializers.py | head -10

Repository: Zipstack/unstract

Length of output: 273


🏁 Script executed:

cat > /tmp/drf_validation_test.py << 'EOF'
# Understanding Django REST Framework validation behavior

# When a PATCH request is made with multiple fields:
# 1. self.initial_data contains the incoming request data (what the client sent)
# 2. self.instance contains the DB object being updated
# 3. Field validators run during validation phase

# Example PATCH request:
# {
#   "tool_id": "new-tool-uuid",
#   "lookup_project": "lookup-uuid-A"
# }

# DB state before PATCH:
# ToolStudioPrompt(
#   tool_id="old-tool-uuid",
#   lookup_project="lookup-uuid-B"
# )

# During validation of lookup_project field:
# - self.initial_data["tool_id"] = "new-tool-uuid" (incoming)
# - self.instance.tool_id_id = "old-tool-uuid" (from DB)

# Current code prioritizes instance → validation against old tool_id (INCORRECT)
# Suggested code prioritizes initial_data → validation against new tool_id (CORRECT)

print("✓ The review comment is correct:")
print("  - Current logic validates against old tool_id on PATCH with updated tool_id")
print("  - Should validate against incoming tool_id when present")
print("  - Suggested fix correctly reverses the precedence")
EOF
python /tmp/drf_validation_test.py

Repository: Zipstack/unstract

Length of output: 281


Validate against the incoming tool_id on updates.

When a PATCH/PUT request includes both a new tool_id and lookup_project, the current logic validates against self.instance.tool_id_id (the old value), allowing the lookup project to be linked to the wrong project. The incoming tool_id from self.initial_data should take precedence during updates.

Suggested fix
-        tool_id = None
-        if self.instance:
-            tool_id = self.instance.tool_id_id
-        elif "tool_id" in self.initial_data:
-            tool_id = self.initial_data["tool_id"]
+        tool_id = None
+        if "tool_id" in self.initial_data:
+            tool_id = self.initial_data["tool_id"]
+        elif self.instance:
+            tool_id = self.instance.tool_id_id
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 52-55: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@backend/prompt_studio/prompt_studio_v2/serializers.py` around lines 35 - 58,
The validation uses self.instance.tool_id_id (old value) instead of respecting
an incoming tool_id in updates; change the logic in the serializer validation
block so that if "tool_id" exists in self.initial_data it overrides
self.instance.tool_id_id (i.e., determine tool_id =
self.initial_data.get("tool_id") if present else self.instance.tool_id_id), then
perform the PromptStudioLookupLink.objects.filter check
(prompt_studio_project_id=tool_id, lookup_project=value) and raise
serializers.ValidationError if no link exists; keep the ImportError handling
as-is.

Comment on lines +48 to +61
<Popover
content={content}
title={null}
trigger="hover"
placement="top"
overlayClassName="lookup-replacement-popover"
>
<Button
size="small"
type="text"
className="prompt-card-action-button lookup-indicator-btn"
>
<SwapOutlined className="prompt-card-actions-head lookup-indicator-icon" />
</Button>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n frontend/src/components/custom-tools/prompt-card/LookupReplacementIndicator.jsx

Repository: Zipstack/unstract

Length of output: 2759


🏁 Script executed:

rg "trigger=\[" --type jsx --type tsx -A 2 -B 2 frontend/src/components/ | head -50

Repository: Zipstack/unstract

Length of output: 88


🏁 Script executed:

rg "LookupReplacementIndicator" --type jsx --type tsx -B 3 -A 3

Repository: Zipstack/unstract

Length of output: 88


🏁 Script executed:

rg 'trigger=\[' -A 2 -B 2 frontend/src/components/ | head -50

Repository: Zipstack/unstract

Length of output: 43


🏁 Script executed:

rg "LookupReplacementIndicator" -B 3 -A 3

Repository: Zipstack/unstract

Length of output: 5353


🏁 Script executed:

rg "trigger=" frontend/src/components/ | grep -i popover

Repository: Zipstack/unstract

Length of output: 43


🌐 Web query:

Ant Design Popover trigger prop array syntax React

💡 Result:

In Ant Design (React), Popover inherits the trigger prop behavior from Tooltip, where trigger can be a single mode or multiple modes by passing an array. (4x.ant.design)

import { Popover, Button } from "antd";

<Popover
  title="Title"
  content="Content"
  trigger={["hover", "click"]}   // array syntax
>
  <Button>Hover or click me</Button>
</Popover>

Valid trigger values: "hover" | "focus" | "click" | "contextMenu" (and an array of them). (4x-ant-design.antgroup.com)

Citations:


Improve keyboard accessibility and fix falsy-value rendering.

The hover-only trigger prevents keyboard users from discovering replacement details. Change trigger="hover" to trigger={["hover", "focus"]} and add an aria-label to the button for screen reader users.

Additionally, the || operator incorrectly treats valid falsy values (0, false, empty strings) as missing data. Use the nullish coalescing operator ?? instead to only replace null or undefined.

Suggested fix
   return (
     <Popover
       content={content}
       title={null}
-      trigger="hover"
+      trigger={["hover", "focus"]}
       placement="top"
       overlayClassName="lookup-replacement-popover"
     >
       <Button
         size="small"
         type="text"
         className="prompt-card-action-button lookup-indicator-btn"
+        aria-label="Show look-up replacement details"
       >
         <SwapOutlined className="prompt-card-actions-head lookup-indicator-icon" />
       </Button>
     </Popover>
   );

Update lines 29 and 40 to use nullish coalescing:

-          <Text className="lookup-replacement-value original">
-            {originalValue || "(empty)"}
+          <Text className="lookup-replacement-value original">
+            {originalValue ?? "(empty)"}
           </Text>
           ...
-          <Text className="lookup-replacement-value enriched">
-            {enrichedValue || "(empty)"}
+          <Text className="lookup-replacement-value enriched">
+            {enrichedValue ?? "(empty)"}
           </Text>
🤖 Prompt for AI Agents
In
`@frontend/src/components/custom-tools/prompt-card/LookupReplacementIndicator.jsx`
around lines 48 - 61, The Popover uses trigger="hover" which blocks keyboard
access and the Button lacks an aria-label; change Popover's trigger to
trigger={["hover","focus"]} and add a descriptive aria-label to the Button (the
element rendering SwapOutlined) so screen readers and keyboard users can
discover the replacement details. Also replace any uses of the logical OR
fallback (e.g. value || fallback) in this component with the nullish coalescing
operator (value ?? fallback) so valid falsy values (0, false, "") are preserved;
look for usages in LookupReplacementIndicator.jsx around the Popover/content
rendering and update them accordingly.

Comment on lines +119 to +124
if (LookupManager) {
items.push(getMenuItem("Lookups", 9, <SearchOutlined />));
listOfComponents[9] = (
<LookupManager handleUpdateTool={handleUpdateTool} />
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Key collision: LookupManager uses the same key (9) as CustomDataSettings.

The key 9 is already assigned to "Custom Data" on line 47 and listOfComponents[9] is set to <CustomDataSettings /> on line 68. When LookupManager is available, this code overwrites listOfComponents[9] with <LookupManager /> and adds a duplicate menu item with the same key, causing:

  1. CustomDataSettings component becomes unreachable
  2. React will warn about duplicate keys in the menu

Use an unused key (e.g., 10) for the Lookups menu item.

🐛 Proposed fix
     if (LookupManager) {
-      items.push(getMenuItem("Lookups", 9, <SearchOutlined />));
-      listOfComponents[9] = (
+      items.push(getMenuItem("Lookups", 10, <SearchOutlined />));
+      listOfComponents[10] = (
         <LookupManager handleUpdateTool={handleUpdateTool} />
       );
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (LookupManager) {
items.push(getMenuItem("Lookups", 9, <SearchOutlined />));
listOfComponents[9] = (
<LookupManager handleUpdateTool={handleUpdateTool} />
);
}
if (LookupManager) {
items.push(getMenuItem("Lookups", 10, <SearchOutlined />));
listOfComponents[10] = (
<LookupManager handleUpdateTool={handleUpdateTool} />
);
}
🤖 Prompt for AI Agents
In `@frontend/src/components/custom-tools/settings-modal/SettingsModal.jsx` around
lines 119 - 124, LookupManager is using the same numeric key (9) as
CustomDataSettings which overwrites listOfComponents[9] and creates duplicate
menu keys; update the LookupManager menu entry and its component mapping to use
an unused key (for example 10) by changing the getMenuItem call and the
listOfComponents index from 9 to 10 (referencing LookupManager, getMenuItem, and
listOfComponents) so CustomDataSettings remains reachable and keys are unique.

Comment on lines +9 to +16
const handleSubmit = async () => {
try {
const values = await form.validateFields();
await onCreate(values);
form.resetFields();
} catch (error) {
console.error("Validation failed:", error);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n frontend/src/components/lookups/create-project-modal/CreateProjectModal.jsx

Repository: Zipstack/unstract

Length of output: 2220


Don't swallow onCreate errors or mislabel them as validation failures.

The current catch block logs both validation errors and create failures (e.g., API errors) as "Validation failed:", providing no visibility into what actually went wrong. Users receive no feedback when project creation fails. Distinguish between validation and create errors so create failures get proper error messaging.

🔧 Proposed fix (separate validation vs. create failures)
-import { Form, Input, Modal } from "antd";
+import { Form, Input, Modal, message } from "antd";
 ...
   const handleSubmit = async () => {
     try {
       const values = await form.validateFields();
       await onCreate(values);
       form.resetFields();
     } catch (error) {
-      console.error("Validation failed:", error);
+      if (error?.errorFields) {
+        console.error("Validation failed:", error);
+        return;
+      }
+      message.error("Failed to create project. Please try again.");
+      console.error("Create failed:", error);
     }
   };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const handleSubmit = async () => {
try {
const values = await form.validateFields();
await onCreate(values);
form.resetFields();
} catch (error) {
console.error("Validation failed:", error);
}
const handleSubmit = async () => {
try {
const values = await form.validateFields();
await onCreate(values);
form.resetFields();
} catch (error) {
if (error?.errorFields) {
console.error("Validation failed:", error);
return;
}
message.error("Failed to create project. Please try again.");
console.error("Create failed:", error);
}
};
🤖 Prompt for AI Agents
In `@frontend/src/components/lookups/create-project-modal/CreateProjectModal.jsx`
around lines 9 - 16, handleSubmit currently treats any error as a validation
failure; change it to first await form.validateFields() in a try/catch to handle
and log/return validation errors, then call await onCreate(values) in a separate
try/catch so API/create failures are handled distinctly (log the error, show
user-visible feedback via message.error or set a submission error) and still
call form.resetFields() only on successful creation; reference handleSubmit,
form.validateFields, onCreate, and form.resetFields when updating the control
flow.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@docs/lookup_architecture_final.md`:
- Around line 186-196: The static method
LookupProfileManager.get_default_profile references an undefined
DefaultProfileError and assumes exactly one default exists; define or import
DefaultProfileError (e.g., from backend.lookup.exceptions) and update docs to
describe the profile initialization strategy (auto-create a default on
LookupProject creation or require profile creation) and system-wide defaults;
also handle the potential MultipleObjectsReturned from
LookupProfileManager.objects.get by documenting/implementing the uniqueness
constraint (or catching MultipleObjectsReturned and choosing a deterministic
resolution) so callers won’t fail on multiple/default-missing cases.
- Around line 273-278: The docs mention a UniqueConstraint named
unique_data_source_profile_index that prevents duplicate (data_source, profile)
entries but the indexing workflow's Step 7 creates a new IndexManager row
without addressing existing rows; update the documentation to specify the
intended reindexing behavior for the IndexManager model (whether to update the
existing IndexManager record, delete-and-recreate it, or use a DB upsert), and
change the guidance in Step 7 to explicitly recommend and describe the chosen
approach (for example, use IndexManager.objects.update_or_create(...) instead of
IndexManager.objects.create(...) if you want an upsert, or describe the
delete-then-create flow and its transactional implications if you prefer
recreate).
- Around line 434-443: The docs mark LLM selection as optional but don't state
the decision logic; update the Step 6 section to explicitly describe when to
call LLM.select_best_match (e.g., based on vector-search confidence thresholds,
profile configuration, or result count), reference the profile LLM flag
(profile.llm.id or a boolean like profile.enable_llm_selection) and the inputs
(matches array and query), and document rules such as "invoke when top-N scores
are below X, or when multiple candidates exceed threshold Y, otherwise skip if a
single match score > X" or point to the exact configuration parameter that
controls this behavior so readers know when LLM(...) and .select_best_match(...)
will be executed.
- Around line 321-385: The workflow lacks error handling and should document and
implement retries, rollbacks and status transitions: for failures in
LookupProfileManager.get_default_profile/X2Text.process (Step 2) set
data_source.extraction_status='failed' and schedule retry/backoff; for
Embedding.embed (Step 5) and VectorDB.index (Step 6) catch adapter-specific
exceptions, mark indexing_status='failed' on the LookupIndexManager entry (or on
data_source), and trigger a cleanup routine to delete any partial uploads in
storage and any partial vector DB entries; replace the direct
LookupIndexManager.objects.create call with
LookupIndexManager.objects.update_or_create (or an explicit reindex pathway) to
avoid IntegrityError on reindex, and document the exact retry policy,
idempotency guarantees, and atomic rollback steps (storage.delete,
vector_db.delete_index) so each failure path transitions the data_source and
LookupIndexManager to deterministic states.
- Around line 46-57: The model currently allows multiple profiles with
is_default=True because the Meta.constraints only enforces unique
(prompt_studio_tool, profile_name); add a partial unique constraint to the same
Meta that enforces uniqueness of prompt_studio_tool when is_default=True (i.e.,
UniqueConstraint(fields=["prompt_studio_tool"], condition=Q(is_default=True),
name="unique_default_profile_per_tool")), import Q from django.db.models and
ensure this pattern is also added to LookupProfileManager.Meta to enforce a
single default per tool/project (note: requires PostgreSQL and Django 4.0+).
🧹 Nitpick comments (2)
docs/lookup_architecture_final.md (2)

287-317: Add language specifiers to fenced code blocks.

The workflow diagrams at lines 287-317, 321-385, and 389-445 should specify a language for better rendering and syntax highlighting. Consider using text, mermaid, or plaintext.

Example fix
-```
+```text
 User Action: Create New Profile
     ↓

As per coding guidelines, markdownlint-cli2 flags these as MD040 violations.

Also applies to: 321-385, 389-445


453-500: Add language specifiers and document API error responses.

The API endpoint blocks should specify a language (e.g., http or text) for proper rendering.

Additionally, consider documenting:

  • Authentication requirements (JWT, session, API key?)
  • Error response formats (HTTP status codes, error payload structure)
  • Pagination for list endpoints
  • Rate limiting policies

As per coding guidelines, markdownlint-cli2 flags these as MD040 violations.

Comment on lines 46 to 57
# Flags
is_default = BooleanField(default=False)
reindex = BooleanField(default=False)

# Unique constraint
class Meta:
constraints = [
UniqueConstraint(
fields=["prompt_studio_tool", "profile_name"],
name="unique_prompt_studio_tool_profile_name_index"
)
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing constraint: Multiple default profiles allowed per project.

The model allows is_default=True for multiple profiles within the same project because there's no database constraint enforcing uniqueness. The UniqueConstraint on lines 52-57 only prevents duplicate profile names, not multiple defaults.

This same issue applies to the Lookup System pattern (lines 146-184).

🔒 Suggested constraint to ensure single default profile

Add a partial unique constraint (supported in PostgreSQL, Django 4.0+):

class Meta:
    constraints = [
        UniqueConstraint(
            fields=["prompt_studio_tool", "profile_name"],
            name="unique_prompt_studio_tool_profile_name_index"
        ),
        UniqueConstraint(
            fields=["prompt_studio_tool"],
            condition=Q(is_default=True),
            name="unique_default_profile_per_tool"
        )
    ]

Apply the same pattern to LookupProfileManager.Meta (lines 175-184).

🤖 Prompt for AI Agents
In `@docs/lookup_architecture_final.md` around lines 46 - 57, The model currently
allows multiple profiles with is_default=True because the Meta.constraints only
enforces unique (prompt_studio_tool, profile_name); add a partial unique
constraint to the same Meta that enforces uniqueness of prompt_studio_tool when
is_default=True (i.e., UniqueConstraint(fields=["prompt_studio_tool"],
condition=Q(is_default=True), name="unique_default_profile_per_tool")), import Q
from django.db.models and ensure this pattern is also added to
LookupProfileManager.Meta to enforce a single default per tool/project (note:
requires PostgreSQL and Django 4.0+).

Comment on lines 186 to 196
@staticmethod
def get_default_profile(project: LookupProject) -> "LookupProfileManager":
"""Get default profile for a lookup project."""
try:
return LookupProfileManager.objects.get(
lookup_project=project,
is_default=True
)
except LookupProfileManager.DoesNotExist:
raise DefaultProfileError("No default profile found for lookup project")
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Undefined exception and missing safeguards.

The DefaultProfileError exception is referenced but never defined in this document. Additionally, the static method assumes exactly one default profile exists, but there's no documented strategy for ensuring a default profile is created when a new LookupProject is initialized.

Recommendations:

  1. Define DefaultProfileError in the exceptions section or reference where it's defined (e.g., backend/lookup/exceptions.py)
  2. Document the profile initialization strategy: Should a default profile be auto-created with the project? Should project creation require a profile? Should there be system-wide default adapter selections?
  3. Consider handling the edge case where .get() might return multiple results if the constraint from the previous comment isn't implemented
🤖 Prompt for AI Agents
In `@docs/lookup_architecture_final.md` around lines 186 - 196, The static method
LookupProfileManager.get_default_profile references an undefined
DefaultProfileError and assumes exactly one default exists; define or import
DefaultProfileError (e.g., from backend.lookup.exceptions) and update docs to
describe the profile initialization strategy (auto-create a default on
LookupProject creation or require profile creation) and system-wide defaults;
also handle the potential MultipleObjectsReturned from
LookupProfileManager.objects.get by documenting/implementing the uniqueness
constraint (or catching MultipleObjectsReturned and choosing a deterministic
resolution) so callers won’t fail on multiple/default-missing cases.

Comment on lines 273 to 278
constraints = [
models.UniqueConstraint(
fields=["data_source", "profile"],
name="unique_data_source_profile_index"
)
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Document reindexing behavior with unique constraint.

The unique_data_source_profile_index constraint prevents duplicate index entries, but the indexing workflow (Step 7, line 374) shows creating a new IndexManager entry without handling the case where one already exists.

Document whether reindexing should:

  • Update the existing IndexManager entry
  • Delete and recreate it
  • Use update_or_create() instead of create()
🤖 Prompt for AI Agents
In `@docs/lookup_architecture_final.md` around lines 273 - 278, The docs mention a
UniqueConstraint named unique_data_source_profile_index that prevents duplicate
(data_source, profile) entries but the indexing workflow's Step 7 creates a new
IndexManager row without addressing existing rows; update the documentation to
specify the intended reindexing behavior for the IndexManager model (whether to
update the existing IndexManager record, delete-and-recreate it, or use a DB
upsert), and change the guidance in Step 7 to explicitly recommend and describe
the chosen approach (for example, use IndexManager.objects.update_or_create(...)
instead of IndexManager.objects.create(...) if you want an upsert, or describe
the delete-then-create flow and its transactional implications if you prefer
recreate).

Comment on lines 321 to 385
```
User Action: Upload Reference Data File
Frontend: Reference Data Tab
- Upload vendors.csv
API: POST /api/v1/unstract/{org_id}/lookup/data-source/
- Creates LookupDataSource with extraction_status='pending'
Backend: Celery Task - Extract and Index
Step 1: Get Default Profile
profile = LookupProfileManager.get_default_profile(project)
Step 2: Extract Text using Profile's X2Text Adapter
x2text = X2Text(
tool=util,
adapter_instance_id=profile.x2text.id
)
extracted_text = x2text.process(file_content)
Step 3: Store Extracted Text in Object Storage
storage.upload(
path=f"lookup/{project_id}/{filename}.txt",
content=extracted_text
)
Step 4: Chunk Text
chunks = create_chunks(
text=extracted_text,
chunk_size=profile.chunk_size,
overlap=profile.chunk_overlap
)
Step 5: Generate Embeddings using Profile's Embedding Adapter
embedding = Embedding(
tool=util,
adapter_instance_id=profile.embedding_model.id
)
vectors = [embedding.embed(chunk) for chunk in chunks]
Step 6: Store in Vector DB using Profile's Vector DB Adapter
vector_db = VectorDB(
tool=util,
adapter_instance_id=profile.vector_store.id
)
index_id = vector_db.index(
documents=chunks,
embeddings=vectors,
metadata={...}
)
Step 7: Create Index Manager Entry
LookupIndexManager.objects.create(
data_source=data_source,
profile=profile,
vector_db_index_id=index_id,
indexing_status='completed',
chunk_count=len(chunks)
)
Step 8: Update Data Source Status
data_source.extraction_status = 'completed'
data_source.save()
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Document error handling and rollback strategy.

The indexing workflow shows an 8-step happy path but doesn't address:

  • What happens if extraction fails at Step 2?
  • How to handle embedding failures at Step 5?
  • What if vector DB indexing fails at Step 6?
  • Should partial indexing be rolled back?
  • How are retries handled?

At Step 7 (line 374), create() will raise IntegrityError if an index already exists due to the unique constraint. The workflow should use update_or_create() or document the reindexing pattern.

Recommendation: Add an error handling subsection documenting retry logic, rollback strategy, and status transitions for failed operations.

🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 321-321: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@docs/lookup_architecture_final.md` around lines 321 - 385, The workflow lacks
error handling and should document and implement retries, rollbacks and status
transitions: for failures in
LookupProfileManager.get_default_profile/X2Text.process (Step 2) set
data_source.extraction_status='failed' and schedule retry/backoff; for
Embedding.embed (Step 5) and VectorDB.index (Step 6) catch adapter-specific
exceptions, mark indexing_status='failed' on the LookupIndexManager entry (or on
data_source), and trigger a cleanup routine to delete any partial uploads in
storage and any partial vector DB entries; replace the direct
LookupIndexManager.objects.create call with
LookupIndexManager.objects.update_or_create (or an explicit reindex pathway) to
avoid IntegrityError on reindex, and document the exact retry policy,
idempotency guarantees, and atomic rollback steps (storage.delete,
vector_db.delete_index) so each failure path transitions the data_source and
LookupIndexManager to deterministic states.

Comment on lines 434 to 443
Step 6: LLM Selection (Optional) using Profile's LLM Adapter
llm = LLM(
tool=util,
adapter_instance_id=profile.llm.id
)
best_match = llm.select_best_match(
query="Micro soft",
candidates=matches
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Clarify when LLM selection is used.

Step 6 labels LLM selection as "(Optional)" but doesn't explain the criteria for when it should be invoked. Questions:

  • Is it based on confidence scores from vector search?
  • Is it a configuration setting in the profile?
  • Does it depend on the number of matches returned?
  • Is it always skipped if only one result has score > threshold?

Document the decision logic or reference the configuration parameter that controls this behavior.

🤖 Prompt for AI Agents
In `@docs/lookup_architecture_final.md` around lines 434 - 443, The docs mark LLM
selection as optional but don't state the decision logic; update the Step 6
section to explicitly describe when to call LLM.select_best_match (e.g., based
on vector-search confidence thresholds, profile configuration, or result count),
reference the profile LLM flag (profile.llm.id or a boolean like
profile.enable_llm_selection) and the inputs (matches array and query), and
document rules such as "invoke when top-N scores are below X, or when multiple
candidates exceed threshold Y, otherwise skip if a single match score > X" or
point to the exact configuration parameter that controls this behavior so
readers know when LLM(...) and .select_best_match(...) will be executed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@docs/LOOKUP_ARCHITECTURE.md`:
- Around line 344-395: Several fenced code blocks in the LOOKUP_ARCHITECTURE.md
doc are missing language identifiers (MD040); update each triple-backtick block
that lists routes (e.g., the blocks starting with "GET    /lookup-projects/...",
"GET    /lookup-profiles/...", "GET    /data-sources/...", "GET   
/lookup-templates/...", the "Linking & Execution" block with "GET   
/lookup-links/..." and others) to include a language tag such as ```text (or
```bash) so markdownlint stops flagging MD040; apply the same change to the
other reported ranges (402-412, 435-455, 459-476, 480-494, 499-518, 615-637)
ensuring every fenced block with route listings has a language identifier.
- Around line 5-6: Update the "Last Updated" field in the document by replacing
the existing date value "2025-02-05" with the correct current date "2026-02-05"
so the metadata accurately reflects the new document addition; locate the **Last
Updated** line in LOOKUP_ARCHITECTURE.md and change only the date string,
leaving the **Status** line and other content unchanged.

Comment on lines +5 to +6
**Last Updated**: 2025-02-05
**Status**: Implemented
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update the “Last Updated” date to reflect the current doc change.

This file is newly added in Feb 2026 but still shows 2025-02-05.

🔧 Proposed update
-**Last Updated**: 2025-02-05
+**Last Updated**: 2026-02-05
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
**Last Updated**: 2025-02-05
**Status**: Implemented
**Last Updated**: 2026-02-05
**Status**: Implemented
🤖 Prompt for AI Agents
In `@docs/LOOKUP_ARCHITECTURE.md` around lines 5 - 6, Update the "Last Updated"
field in the document by replacing the existing date value "2025-02-05" with the
correct current date "2026-02-05" so the metadata accurately reflects the new
document addition; locate the **Last Updated** line in LOOKUP_ARCHITECTURE.md
and change only the date string, leaving the **Status** line and other content
unchanged.

Comment on lines +344 to +395
```
GET /lookup-projects/ # List all projects
POST /lookup-projects/ # Create new project
GET /lookup-projects/{id}/ # Get project details
PUT /lookup-projects/{id}/ # Update project
DELETE /lookup-projects/{id}/ # Delete project
```

### Profile Management

```
GET /lookup-profiles/ # List profiles
POST /lookup-profiles/ # Create profile
GET /lookup-profiles/{id}/ # Get profile details
PUT /lookup-profiles/{id}/ # Update profile
DELETE /lookup-profiles/{id}/ # Delete profile
POST /lookup-profiles/{id}/set-default/ # Set as default profile
```

### Data Source Management

```
GET /data-sources/ # List data sources
POST /data-sources/ # Upload new reference data
GET /data-sources/{id}/ # Get data source details
DELETE /data-sources/{id}/ # Delete data source
POST /data-sources/{id}/reindex/ # Trigger reindexing
```

### Template Management

```
GET /lookup-templates/ # List templates
POST /lookup-templates/ # Create template
GET /lookup-templates/{id}/ # Get template details
PUT /lookup-templates/{id}/ # Update template
DELETE /lookup-templates/{id}/ # Delete template
```

### Linking & Execution

```
GET /lookup-links/ # List PS project links
POST /lookup-links/ # Create link
DELETE /lookup-links/{id}/ # Remove link

GET /execution-audits/ # List execution history
GET /execution-audits/{id}/ # Get execution details

POST /lookup-debug/test/ # Test lookup execution
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced code blocks (MD040).

Multiple fenced blocks are missing a language tag, which trips markdownlint and reduces readability.

✅ Example fix pattern
-```
+```text
 GET    /lookup-projects/                 # List all projects
 ...

Apply similarly to the other fenced blocks in this file.
</details>


Also applies to: 402-412, 435-455, 459-476, 480-494, 499-518, 615-637

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.20.0)</summary>

[warning] 344-344: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

[warning] 354-354: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

[warning] 365-365: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

[warning] 375-375: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

[warning] 385-385: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

In @docs/LOOKUP_ARCHITECTURE.md around lines 344 - 395, Several fenced code
blocks in the LOOKUP_ARCHITECTURE.md doc are missing language identifiers
(MD040); update each triple-backtick block that lists routes (e.g., the blocks
starting with "GET /lookup-projects/...", "GET /lookup-profiles/...", "GET
/data-sources/...", "GET /lookup-templates/...", the "Linking & Execution"
block with "GET /lookup-links/..." and others) to include a language tag such
as text (or bash) so markdownlint stops flagging MD040; apply the same
change to the other reported ranges (402-412, 435-455, 459-476, 480-494,
499-518, 615-637) ensuring every fenced block with route listings has a language
identifier.


</details>

<!-- fingerprinting:phantom:triton:eagle -->

<!-- This is an auto-generated comment by CodeRabbit -->

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

Test Results

Summary
  • Runner Tests: 11 passed, 0 failed (11 total)
  • SDK1 Tests: 66 passed, 0 failed (66 total)

Runner Tests - Full Report
filepath function $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_logs}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_client\_init}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_run\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{11}}$$ $$\textcolor{#23d18b}{\tt{11}}$$
SDK1 Tests - Full Report
filepath function $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_success\_on\_first\_attempt}}$$ $$\textcolor{#23d18b}{\tt{2}}$$ $$\textcolor{#23d18b}{\tt{2}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_retry\_on\_connection\_error}}$$ $$\textcolor{#23d18b}{\tt{2}}$$ $$\textcolor{#23d18b}{\tt{2}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_non\_retryable\_http\_error}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_retryable\_http\_errors}}$$ $$\textcolor{#23d18b}{\tt{3}}$$ $$\textcolor{#23d18b}{\tt{3}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_post\_method\_retry}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_retry\_logging}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_prompt.py}}$$ $$\textcolor{#23d18b}{\tt{TestPromptToolRetry.test\_success\_on\_first\_attempt}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_prompt.py}}$$ $$\textcolor{#23d18b}{\tt{TestPromptToolRetry.test\_retry\_on\_errors}}$$ $$\textcolor{#23d18b}{\tt{2}}$$ $$\textcolor{#23d18b}{\tt{2}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_prompt.py}}$$ $$\textcolor{#23d18b}{\tt{TestPromptToolRetry.test\_wrapper\_methods\_retry}}$$ $$\textcolor{#23d18b}{\tt{4}}$$ $$\textcolor{#23d18b}{\tt{4}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_connection\_error\_is\_retryable}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_timeout\_is\_retryable}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_http\_error\_retryable\_status\_codes}}$$ $$\textcolor{#23d18b}{\tt{3}}$$ $$\textcolor{#23d18b}{\tt{3}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_http\_error\_non\_retryable\_status\_codes}}$$ $$\textcolor{#23d18b}{\tt{5}}$$ $$\textcolor{#23d18b}{\tt{5}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_http\_error\_without\_response}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_os\_error\_retryable\_errno}}$$ $$\textcolor{#23d18b}{\tt{5}}$$ $$\textcolor{#23d18b}{\tt{5}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_os\_error\_non\_retryable\_errno}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_other\_exception\_not\_retryable}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCalculateDelay.test\_exponential\_backoff\_without\_jitter}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCalculateDelay.test\_exponential\_backoff\_with\_jitter}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCalculateDelay.test\_max\_delay\_cap}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCalculateDelay.test\_max\_delay\_cap\_with\_jitter}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_successful\_call\_first\_attempt}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_retry\_after\_transient\_failure}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_max\_retries\_exceeded}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_max\_time\_exceeded}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_retry\_with\_custom\_predicate}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_no\_retry\_with\_predicate\_false}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_exception\_not\_in\_tuple\_not\_retried}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_delay\_would\_exceed\_max\_time}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_default\_configuration}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_environment\_variable\_configuration}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_invalid\_max\_retries}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_invalid\_max\_time}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_invalid\_base\_delay}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_invalid\_multiplier}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_jitter\_values}}$$ $$\textcolor{#23d18b}{\tt{2}}$$ $$\textcolor{#23d18b}{\tt{2}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_custom\_exceptions\_only}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_custom\_predicate\_only}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_both\_exceptions\_and\_predicate}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_exceptions\_match\_but\_predicate\_false}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestPreconfiguredDecorators.test\_retry\_platform\_service\_call\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestPreconfiguredDecorators.test\_retry\_prompt\_service\_call\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestPreconfiguredDecorators.test\_platform\_service\_decorator\_retries\_on\_connection\_error}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestPreconfiguredDecorators.test\_prompt\_service\_decorator\_retries\_on\_timeout}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryLogging.test\_warning\_logged\_on\_retry}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryLogging.test\_info\_logged\_on\_success\_after\_retry}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryLogging.test\_exception\_logged\_on\_giving\_up}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{66}}$$ $$\textcolor{#23d18b}{\tt{66}}$$

@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 6, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
12 Security Hotspots
D Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant