Add Lookup System for Reference Data Enrichment in Prompts by gaya3-zipstack · Pull Request #1780 · Zipstack/unstract

gaya3-zipstack · 2026-02-05T05:14:09Z

What

Adds a comprehensive Lookup System for document-based reference data lookups and variable replacement in prompts
New lookup Django app with complete backend models, services, and APIs
Full frontend UI for managing lookup projects, profiles, templates, and reference data
Integration with Prompt Studio for prompt-level lookup variable replacement

Why

Enable users to upload reference documents (CSVs, JSON, PDFs) and automatically look up values during prompt execution
Provides RAG-based retrieval to match extracted document content against reference data
Allows standardization/normalization of extracted values (e.g., mapping company name variations to canonical names)
Supports data enrichment workflows where extracted values need to be augmented with reference data

How

Backend (`backend/lookup/`)

Models:

LookupProject - Container for lookup configurations
LookupDataSource - Reference data file storage and metadata
LookupProfileManager - Adapter configurations (X2Text, Embedding, VectorDB, LLM)
LookupIndexManager - Vector DB index tracking
LookupPromptTemplate - Template configurations for lookup execution
PromptStudioLookupLink - Links Prompt Studio projects to Lookup projects
LookupExecutionAudit - Execution history and debugging

Services:

IndexingService - Document indexing with chunking and vector embeddings
LookupExecutor - Executes lookups using RAG retrieval
LookupOrchestrator - Coordinates lookup workflow
VectorDBCleanupService - Manages vector DB lifecycle
AuditLogger - Execution logging and tracking

Integrations:

LLMWhisperer client for document extraction
Redis caching for performance
MinIO/file storage for reference data
Vector DB integration for semantic search

Frontend (`frontend/src/components/lookups/`)

LookUpProjectList - Project listing and management
LookUpProjectDetail - Project detail view with tabs
ProfileManagementTab - Adapter profile configuration
ReferenceDataTab - Reference data file management
TemplateTab - Lookup template configuration
LinkedProjectsTab - Prompt Studio project linking
ExecutionHistoryTab - Execution audit viewing
DebugTab - Testing and debugging interface

Prompt Studio Integration

Added lookup_project field to ToolStudioPrompt model
New UI indicator for prompts with lookup replacements configured
Variable replacement during prompt execution using lookup results

Can this PR break any existing features. If yes, please list possible items. If no, please explain why.

Low risk of breaking existing features:

The Lookup system is a new, isolated Django app with its own models and endpoints
Prompt Studio integration is additive (new optional field lookup_project)
Frontend components are new routes/pages, not modifications to existing UI
Database changes are new tables via migrations, no modifications to existing schema
Backend URL routes are added to urls_v2.py without modifying existing endpoints

Database Migrations

0001_initial.py - Creates all lookup tables (LookupProject, LookupDataSource, LookupProfileManager, LookupIndexManager, LookupPromptTemplate, PromptStudioLookupLink, LookupExecutionAudit)
0002_remove_reference_data_type.py - Schema refinement
0003_add_file_execution_id.py - Adds file execution tracking
0004_fix_file_paths_for_minio.py - MinIO storage path fixes
0005_add_reindex_required_field.py - Adds reindex tracking
prompt_studio_v2/0014_toolstudioprompt_lookup_project.py - Adds lookup_project FK to prompts

Env Config

No new required environment variables. Uses existing adapter configurations (X2Text, Embedding, VectorDB, LLM).

Relevant Docs

docs/lookup_architecture_final.md - Complete architecture design

Related Issues or PRs

Feature request for reference data lookups and variable enrichment

Dependencies Versions

No new external dependencies added.

Notes on Testing

Unit tests added in backend/lookup/tests/
- API tests for all endpoints
- Service tests for indexing, execution, caching
- Integration tests for adapters
Manual testing recommended:
1. Create a Lookup Project
2. Configure a Profile with adapters
3. Upload reference data (CSV/JSON)
4. Create and link to Prompt Studio project
5. Execute prompt with lookup variable
6. Verify lookup replacement in output

Screenshots

Lookups-Screenshots.zip

Checklist

I have read and understood the Contribution Guidelines.

🤖 Generated with Claude Code

… prompt studio projects

Signed-off-by: Gayathri <142381512+gaya3-zipstack@users.noreply.github.com>

coderabbitai · 2026-02-05T05:14:33Z

Summary by CodeRabbit

New Features
- Look-Up enrichment: automatic data augmentation using reference data and LLMs (toggleable, timeout-configurable).
- Reference data management: upload, extraction, versioning, and semantic indexing with audit trails.
- Profile-based adapters: per-project LLM/embedding/vector-store profiles for flexible enrichment.
- Prompt Studio integration: per-prompt lookup selection, available-lookups list, and seamless enrichment in workflows.
- Frontend UX: Look-Up project creation, Debug tab, replacement indicator, logs/stage styling, and settings UI entry.
Documentation
- Added architecture overview for the Look-Up system.

Walkthrough

Adds a full Look-Up system: Django models, migrations, REST APIs, services (indexing, retrieval, orchestration), LLM and storage integrations, Prompt Studio linkage, frontend UI for managing/debugging lookups, extensive tests, and configuration flags for automatic enrichment and timeouts.

Changes

Cohort / File(s)	Summary
Backend settings & routing `backend/backend/settings/base.py`, `backend/backend/urls.py`, `backend/backend/urls_v2.py`	Added LOOKUP_AUTO_ENRICH_ENABLED and LOOKUP_ENRICHMENT_TIMEOUT; registered `lookup` app in shared apps; mounted `lookup.urls` in main URL configs.
App bootstrap & constants `backend/lookup/apps.py`, `backend/lookup/constants.py`, `backend/lookup/__init__.py`	New app config and centralized constants for profile keys/errors; package initializer added.
Exceptions `backend/lookup/exceptions.py`	Introduced LookupError hierarchy (ExtractionNotCompleteError, ContextWindowExceededError, etc.) with informative messages and payloads.
Integrations (LLM, extraction, storage, cache) `backend/lookup/integrations/...` `.../llm_provider.py`, `.../unstract_llm_client.py`, `.../llmwhisperer_client.py`, `.../file_storage_client.py`, `.../storage_client.py`, `.../redis_cache.py`	Multiple integration clients: Unstract/OpenAI/Anthropic wrappers, Unstract SDK client, LLMWhisperer extraction client (polling/status/result), File/Remote storage clients, and Redis-backed LLM cache with memory fallback. Review error handling, timeouts, and external-call isolation.
Models & migrations (core) `backend/lookup/models/...`, `backend/lookup/migrations/*`	Added LookupProject, LookupPromptTemplate, LookupDataSource (versioning signals), LookupProfileManager, LookupIndexManager, LookupExecutionAudit, PromptStudioLookupLink, plus initial and follow-up migrations (field removals, data fixes, reindex flag, file path migration, index field). Verify constraints, signals, and index creation.
Models API surface `backend/lookup/models/__init__.py`	Aggregated model exports via all.
Services — core orchestration `backend/lookup/services/lookup_executor.py`, `.../lookup_orchestrator.py`, `.../variable_resolver.py`, `.../execution_context.py`	New LookUpExecutor, LookUpOrchestrator, variable resolver, and execution context dataclass enabling RAG/full-context execution, parallel orchestration, and variable substitution logic.
Services — indexing & retrieval `backend/lookup/services/indexing_service.py`, `.../lookup_retrieval_service.py`, `.../reference_data_loader.py`, `.../document_indexing_service.py`	Extraction + indexing flow (x2text -> embeddings -> vector DB), retrieval service for RAG, reference data loader and document indexing status/caching. Pay attention to external SDK calls and idempotency.
Services — support & management `backend/lookup/services/audit_logger.py`, `.../enrichment_merger.py`, `.../log_emitter.py`, `.../llm_cache.py`, `.../lookup_index_helper.py`, `.../vector_db_cleanup_service.py`, `.../document_indexing_service.py`, `.../mock_clients.py`	Audit logging, enrichment merging (conflict resolution), dual-target logging (WebSocket + Redis), in-memory LLM cache, index manager helpers, vector DB cleanup workflows, and test mock clients. Check transactional updates and cleanup safety.
Integration entrypoint `backend/lookup/services/lookup_integration_service.py`, `.../workflow_integration.py`	LookupIntegrationService.enrich_if_linked orchestration (timeout handling, parallel enrichment) and workflow-level enrichment integration; includes LLMClientWrapper. Review timeout/threading and error propagation boundaries.
REST API: serializers & views `backend/lookup/serializers.py`, `backend/lookup/views.py`, `backend/lookup/urls.py`	Comprehensive serializers and viewsets for projects, templates, profiles, data sources, audits, debug endpoints; upload/indexing/execution endpoints; router registration under `/lookup/`. Pay attention to validation, default-profile logic, and side-effecting endpoints (indexing, cleanup).
Prompt Studio integration `backend/prompt_studio/...` `prompt_studio_v2/models.py`, `.../migrations/0014_`, `.../views.py`, `.../serializers.py`, `.../output_manager_v2/`, `prompt_studio_core_v2/views.py`	Added optional `lookup_project` FK on prompts, available_lookups endpoint, serializers/validation, automatic enrichment wiring in output manager, and LookupEnrichmentError handling in prompt response flow. Confirm graceful degradation when lookup app absent.
Workflow manager integration `backend/workflow_manager/workflow_v2/file_execution_tasks.py`	Attempt to enrich final workflow outputs via LookupWorkflowIntegration; integration guarded for missing lookup app and errors.
Frontend — Look-Up UI & integrations `frontend/src/components/lookups/`, `frontend/src/components/custom-tools/prompt-card/`, `frontend/src/components/custom-tools/display-logs/`, `frontend/src/components/logging/`, `frontend/package.json`	New Look-Up UI: create project modal, debug tab, execution history, prompt-card lookup controls and replacement indicator, log stage styling (LOOKUP), settings-modal hook for LookupManager, and CodeMirror JSON deps. Review API usage, lazy-loading behavior, and new UI CSS.
Frontend — combined output / misc `frontend/src/components/custom-tools/combined-output/*`, `JsonView.jsx`	Minor DOM/CSS tweaks to combined output and enrichment info styling.
Tests `backend/lookup/tests/...` (api, integrations, services, migrations)	Extensive test coverage added: API tests, integration tests for LLM/whisperer/redis/storage, service unit tests (executor, orchestrator, merger, cache, reference loader), migration tests. Expect many mocked external interactions; review for completeness and flakiness.
Documentation `docs/LOOKUP_ARCHITECTURE.md`	New architecture document describing models, services, workflows, and integration points.

Sequence Diagram(s)

sequenceDiagram
  participant PS as Prompt Studio
  participant Integration as LookupIntegrationService
  participant Orchestrator as LookUpOrchestrator
  participant Executor as LookUpExecutor
  participant Loader as ReferenceDataLoader
  participant Cache as LLM/Redis Cache
  participant LLM as LLM Provider
  participant Log as LookupLogEmitter

  PS->>Integration: request enrich_if_linked(extracted_data, prompt_lookup_map)
  Integration->>Orchestrator: build projects, execution_id
  Orchestrator->>Executor: execute(project, input_data, execution_id)
  Executor->>Loader: load_latest_for_project(project_id)
  Loader-->>Executor: concatenated reference_data
  Executor->>Cache: check cache(prompt + reference)
  alt cache hit
    Cache-->>Executor: cached_response
  else cache miss
    Executor->>LLM: generate(prompt, config)
    LLM-->>Executor: llm_response
    Executor->>Cache: store response
  end
  Executor->>Log: emit_enrichment_* (start/success/failure)
  Executor-->>Orchestrator: enrichment_result
  Orchestrator-->>Integration: aggregated_results
  Integration-->>PS: lookup_enrichment + _lookup_metadata

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: adding a Lookup System for reference data enrichment in prompts. It is specific, concise, and directly reflects the core objective.
Description check	✅ Passed	The description is comprehensive and follows the template structure well. It includes What, Why, How sections with detailed backend/frontend breakdown, risk assessment, database migrations, testing notes, and all required sections.
Docstring Coverage	✅ Passed	Docstring coverage is 98.31% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/look-ups

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 20

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

frontend/src/components/logging/log-modal/LogModal.jsx (1)
223-229: ⚠️ Potential issue | 🟡 Minor

Incorrect PropTypes definitions.

logDescModalOpen is used as a boolean (lines 175, 195) and setLogDescModalOpen is used as a function (line 97), but both are defined as PropTypes.array.
🔧 Proposed fix
 LogModal.propTypes = {
   executionId: PropTypes.string,
   fileId: PropTypes.string,
-  logDescModalOpen: PropTypes.array,
-  setLogDescModalOpen: PropTypes.array,
+  logDescModalOpen: PropTypes.bool,
+  setLogDescModalOpen: PropTypes.func,
   filterParams: PropTypes.object,
 };
backend/prompt_studio/prompt_studio_core_v2/views.py (1)
435-462: ⚠️ Potential issue | 🟠 Major

Handle LookupEnrichmentError in single_pass_extraction as well.

The same lookup failure that now returns 400 in fetch_response will still bubble as 500 here. Mirror the new error handling for consistency.
🛠️ Proposed fix
     def single_pass_extraction(self, request: HttpRequest, pk: uuid) -> Response:
         """API Entry point method to fetch response to prompt.
@@
         """
+        from prompt_studio.prompt_studio_output_manager_v2.output_manager_helper import (
+            LookupEnrichmentError,
+        )
         # TODO: Handle fetch_response and single_pass_
         # extraction using common function
         custom_tool = self.get_object()
@@
-        response: dict[str, Any] = PromptStudioHelper.prompt_responder(
-            tool_id=tool_id,
-            org_id=UserSessionUtils.get_organization_id(request),
-            user_id=custom_tool.created_by.user_id,
-            document_id=document_id,
-            run_id=run_id,
-        )
-        return Response(response, status=status.HTTP_200_OK)
+        try:
+            response: dict[str, Any] = PromptStudioHelper.prompt_responder(
+                tool_id=tool_id,
+                org_id=UserSessionUtils.get_organization_id(request),
+                user_id=custom_tool.created_by.user_id,
+                document_id=document_id,
+                run_id=run_id,
+            )
+            return Response(response, status=status.HTTP_200_OK)
+        except LookupEnrichmentError as e:
+            error_response = {
+                "error": str(e),
+                "error_type": e.error_type,
+                **e.details,
+            }
+            return Response(error_response, status=status.HTTP_400_BAD_REQUEST)
backend/workflow_manager/workflow_v2/file_execution_tasks.py (1)

1025-1043: ⚠️ Potential issue | 🟠 Major

Lookup enrichment is skipped entirely for filesystem and database destinations—design mismatch with intended support.

Lookup integration is designed to support ETL, Workflow, and API deployments (per workflow_integration.py documentation), but the current implementation only applies enrichment to API destinations. The condition on line 1037 (if output_result and not processing_error:) prevents enrichment from running for filesystem and database destinations because handle_output() returns None for those types—it only returns the tool execution result for API deployments.

This creates a silent data loss: enrichment is skipped for FILESYSTEM/DATABASE workflows even though the lookup system is designed to support them. Additionally, for FILESYSTEM/DATABASE destinations, persistence occurs inside handle_output() before any enrichment opportunity, so even if enrichment ran, it wouldn't reach persisted outputs.

Filesystem/database workflows should either receive enriched output via a redesigned flow, or enrichment scope should be explicitly limited to API deployments only (with documentation clarifying why).

🤖 Fix all issues with AI agents

In `@backend/lookup/integrations/llm_provider.py`:
- Around line 207-222: The current JSON extraction uses the regex json_pattern =
r"\{[^{}]*\}" which fails on nested objects; replace it with an iterative
balanced-brace extractor that scans the response string (the variable response)
and uses a stack/counter to find matching '{' ... '}' spans, attempt json.loads
on each balanced substring and return the first valid parse; update the logic
where matches are built/iterated (the block referencing json_pattern, matches,
and the json.loads try/except) to use this brace-matching approach so nested
JSON like {"data":{"value":1}} is correctly extracted and parsed.

In `@backend/lookup/integrations/redis_cache.py`:
- Around line 374-381: The warmup loop uses hashlib.md5 to build keys, which
diverges from the main generate_cache_key() scheme (SHA256) and triggers
security linting; update the key generation inside the preload_data loop to use
hashlib.sha256(prompt.encode()).hexdigest() (matching the generate_cache_key()
approach), keeping the same self.key_prefix and project:{project_id}: namespace
so keys are consistent; ensure the change is made where the code builds key (the
preload loop that calls self.set(key, response)) so warmup entries match the
primary cache lookup.

In `@backend/lookup/models/lookup_data_source.py`:
- Around line 168-196: The pre_save signal handler
auto_increment_version_and_update_latest on LookupDataSource is vulnerable to
race conditions because it reads Max("version_number") and then writes, so
concurrent creations can get duplicate version_number; update this by acquiring
a row-level lock and doing the read/update inside a transaction (e.g., wrap in
transaction.atomic() and use select_for_update() on LookupDataSource objects
filtered by project to lock the relevant rows before computing max and updating
is_latest) or switch to a DB-backed sequence/serial for version_number and set
is_latest in the same transaction to ensure uniqueness; ensure all logic
referencing version_number and is_latest in
auto_increment_version_and_update_latest runs inside that transaction context.

In `@backend/lookup/models/lookup_execution_audit.py`:
- Around line 134-140: The validation currently requires enriched_output only
when self.status == "success" but was_successful() treats "partial" as
successful too; update the validation in the LookupExecutionAudit model (the
block that checks self.status and self.enriched_output) to require
enriched_output when self.status is either "success" or "partial" (e.g., change
the condition to check status in ["success", "partial"]) and keep the
ValidationError message consistent.

In `@backend/lookup/models/lookup_project.py`:
- Around line 142-163: The clean() method wrongly enforces LLM_PROVIDER_CHOICES
even when llm_provider is nullable; update the validation in clean() to only
check membership if self.llm_provider is not None (and optionally not an empty
string) before comparing against valid_providers (referencing clean(),
LLM_PROVIDER_CHOICES and llm_provider), so nullable values pass and only
non-null providers are validated.

In `@backend/lookup/serializers.py`:
- Around line 331-361: The AdapterProcessor.get_adapter_instance_by_id function
currently returns adapter.adapter_name (string) but should return the full
AdapterInstance object; update get_adapter_instance_by_id in
backend/adapter_processor_v2/adapter_processor.py to return the adapter object
(not adapter.adapter_name), correct the return type annotation to the actual
AdapterInstance type (or Adapter) to match callers, and ensure error handling
returns None or raises appropriately before returning so callers like
LookupProfileSerializer.to_representation (which calls
AdapterProcessor.get_adapter_instance_by_id) receive the full instance or a
clear failure signal.

In `@backend/lookup/services/enrichment_merger.py`:
- Around line 103-112: The conflict counter misses cases where the existing
entry wins (winner == existing) because the current elif is unreachable; update
the resolution block in enrichment_merger.py so that whenever there is a
conflict between existing and field_entry (i.e., they are different lookup_ids)
you always increment conflicts_resolved after selecting a winner: keep the
existing branch that updates field_sources/merged_data/fields_added when winner
differs from existing (winner["lookup_id"] != existing["lookup_id"]), and
replace the elif with an else (or an explicit check for existing winning) that
simply increments conflicts_resolved when winner == existing and
existing["lookup_id"] != field_entry["lookup_id"]; this ensures
conflicts_resolved increments for both cases (new winner or existing winner).

In `@backend/lookup/services/llm_cache.py`:
- Around line 112-134: The generate_cache_key method currently concatenates
prompt and reference_data directly, causing possible collisions; update
generate_cache_key in LLMResponseCache to combine inputs with an unambiguous
scheme (e.g., length-prefixed fields like
"<len>:<prompt>|<len>:<reference_data>" or a reserved delimiter with escaping)
before hashing, and make the same change in the corresponding key-building
function in redis_cache (the method that assembles prompt+reference_data at
backend/lookup/integrations/redis_cache.py line ~107) so both implementations
use the identical separator/length-prefix format to ensure deterministic,
collision-free keys.

In `@backend/lookup/services/vector_db_cleanup_service.py`:
- Around line 286-301: The loop that calls cleanup_index_ids currently always
clears index_manager.index_ids_history, raw_index_id and marks
status/reindex_required even when result indicates failures, which can orphan
vector DB rows; change the logic in the loop that iterates index_managers so
that you only clear index_manager.index_ids_history, set
index_manager.raw_index_id = None, mark status {"indexed": False, "cleaned":
True} and set reindex_required = True (and save) when result["success"] is True
(or equivalent full-success indicator used by cleanup_for_data_source); if
result shows failures, preserve index_ids_history/raw_index_id and set status to
reflect cleanup failure (or leave indexed state) and still save errors via
index_manager.save(), matching the pattern in cleanup_for_data_source and
ensuring retryability.

In `@backend/lookup/services/workflow_integration.py`:
- Around line 179-183: The query in workflow_integration.py that matches
PromptStudioRegistry.prompt_registry_id against ToolInstance.tool_id is
type-unsafe because tool_instance.tool_id is a CharField while
prompt_registry_id is a UUIDField; update the lookup (around where
ToolInstance.objects.filter(...).first() is used) to explicitly convert
tool_instance.tool_id to a UUID before querying PromptStudioRegistry (e.g.,
parse with uuid.UUID) and handle invalid UUIDs with a try/except (log the
invalid value and return original_output, False) so the ORM receives a proper
UUID object rather than a raw string; alternatively, if you prefer a schema
change, migrate ToolInstance.tool_id to a UUIDField and adjust related code
accordingly.

In `@backend/lookup/tests/test_api/test_template_api.py`:
- Around line 21-37: The test setup creates a LookupPromptTemplate without the
required project relation; update setUp() to create a LookupProject and
associate it when creating the template (create a LookupProject instance and
pass it via the project argument to LookupPromptTemplate.objects.create),
ensuring the template's required OneToOneField to LookupProject is satisfied;
adjust any variable names as needed and keep the created project tied to self
(e.g., self.project) for reuse in tests referencing LookupPromptTemplate.

In `@backend/lookup/tests/test_migrations.py`:
- Around line 24-33: The setUp currently asserts self.migrate_from truthy
causing failures when migrate_from is intentionally None to represent the zero
state; remove that requirement and treat None as the initial migration by
allowing self.migrate_from to be [(self.app, None)] (instead of asserting) and
keep self.migrate_to handling as before; update the setUp logic around setUp,
self.migrate_from, self.migrate_to and the MigrationExecutor(connection) call so
tests can pass when migrate_from is None.

In `@backend/lookup/tests/test_services/test_lookup_orchestrator.py`:
- Around line 93-103: The test mocks define several helpers named
execute_side_effect that only accept positional args and use the name project,
but LookUpOrchestrator invokes executor.execute with keyword args
(lookup_project, input_data, execution_id, prompt_studio_project_id), causing
TypeError; update all six execute_side_effect helpers to accept the keyword
parameters (e.g., def execute_side_effect(*, lookup_project, input_data,
execution_id=None, prompt_studio_project_id=None) or accept **kwargs and read
lookup_project) and use lookup_project instead of project when constructing the
returned dict so the mocked signature matches the actual executor.execute call.

In
`@backend/prompt_studio/prompt_studio_output_manager_v2/output_manager_helper.py`:
- Around line 373-392: The current INFO logs in the post-processing loop print
full extracted values (see extracted_data_for_lookup, prompts loop, outputs and
logger.info calls), which may expose sensitive data; change those logger.info
calls to either logger.debug or remove the value content and instead log only
the prompt key and value type (e.g., prompt.prompt_key and
type(output_value).__name__) or a redacted placeholder, and similarly avoid
logging the full extracted_data_for_lookup dict at INFO—log its keys or count
only; update the two logger.info occurrences in this block and keep the skip
debug for PSOMKeys.NOTES as-is.
- Around line 432-459: Enriched values in the serialized_data loop are being
inserted raw (possibly dict/list) which breaks consistency with earlier
enforce_type serialization; fetch the prompt metadata from prompt_by_key (reuse
prompt_by_key) for the current prompt_key, run the same
enforce_type/serialization logic used elsewhere (e.g., json.dumps for structured
types like json/table/record/line-item) on enriched_value before assigning
item["lookup_replacement"]["enriched_value"], item["output"], and before calling
PromptStudioOutputManager.objects.filter(...).update(...), so stored/returned
outputs keep the same serialized format as non-enriched values.

In `@backend/prompt_studio/prompt_studio_v2/serializers.py`:
- Around line 35-58: The validation uses self.instance.tool_id_id (old value)
instead of respecting an incoming tool_id in updates; change the logic in the
serializer validation block so that if "tool_id" exists in self.initial_data it
overrides self.instance.tool_id_id (i.e., determine tool_id =
self.initial_data.get("tool_id") if present else self.instance.tool_id_id), then
perform the PromptStudioLookupLink.objects.filter check
(prompt_studio_project_id=tool_id, lookup_project=value) and raise
serializers.ValidationError if no link exists; keep the ImportError handling
as-is.

In
`@frontend/src/components/custom-tools/prompt-card/LookupReplacementIndicator.jsx`:
- Around line 48-61: The Popover uses trigger="hover" which blocks keyboard
access and the Button lacks an aria-label; change Popover's trigger to
trigger={["hover","focus"]} and add a descriptive aria-label to the Button (the
element rendering SwapOutlined) so screen readers and keyboard users can
discover the replacement details. Also replace any uses of the logical OR
fallback (e.g. value || fallback) in this component with the nullish coalescing
operator (value ?? fallback) so valid falsy values (0, false, "") are preserved;
look for usages in LookupReplacementIndicator.jsx around the Popover/content
rendering and update them accordingly.

In `@frontend/src/components/custom-tools/settings-modal/SettingsModal.jsx`:
- Around line 119-124: LookupManager is using the same numeric key (9) as
CustomDataSettings which overwrites listOfComponents[9] and creates duplicate
menu keys; update the LookupManager menu entry and its component mapping to use
an unused key (for example 10) by changing the getMenuItem call and the
listOfComponents index from 9 to 10 (referencing LookupManager, getMenuItem, and
listOfComponents) so CustomDataSettings remains reachable and keys are unique.

In `@frontend/src/components/logging/log-modal/LogModal.jsx`:
- Around line 43-57: The component is missing imports for Tag and SearchOutlined
which causes runtime errors: add Tag from "antd" and SearchOutlined from
"@ant-design/icons" to the existing import lists so getStageStyle (which returns
<SearchOutlined />) and the column render that uses <Tag> work correctly; update
the import statement(s) that currently import Button, Modal, Table, Tooltip and
CopyOutlined to also include Tag and SearchOutlined respectively.

In `@frontend/src/components/lookups/create-project-modal/CreateProjectModal.jsx`:
- Around line 9-16: handleSubmit currently treats any error as a validation
failure; change it to first await form.validateFields() in a try/catch to handle
and log/return validation errors, then call await onCreate(values) in a separate
try/catch so API/create failures are handled distinctly (log the error, show
user-visible feedback via message.error or set a submission error) and still
call form.resetFields() only on successful creation; reference handleSubmit,
form.validateFields, onCreate, and form.resetFields when updating the control
flow.

🟡 Minor comments (25)

frontend/src/components/custom-tools/display-logs/DisplayLogs.css-27-33 (1)
27-33: ⚠️ Potential issue | 🟡 Minor

CSS class mismatch with DisplayLogs.jsx.

The CSS defines .display-logs-stage-extract and .display-logs-stage-index, but DisplayLogs.jsx uses TOOL stage which maps to .display-logs-stage-tool (not defined here). This mismatch means:

TOOL stage tags won't get their custom border styling

EXTRACT and INDEX class definitions appear unused
🔧 Proposed fix to align with DisplayLogs.jsx
-.display-logs-stage-extract {
-  border-color: `#52c41a`;
-}
-
-.display-logs-stage-index {
+.display-logs-stage-tool {
+  border-color: `#13c2c2`;
+}
+
+.display-logs-stage-default {
   border-color: `#faad14`;
 }
frontend/src/components/lookups/debug-tab/DebugTab.jsx-353-444 (1)
353-444: ⚠️ Potential issue | 🟡 Minor

Guard against missing/zero token metrics to avoid NaN UI math.

If tokenCount or contextLimit is missing/zero, percentage math yields NaN/Infinity and can break the progress display. Add a defensive guard with a fallback alert.
🛡️ Suggested fix
-    const { tokenCount, contextLimit, model } = errorDetails;
-    const usagePercent = Math.min(
-      Math.round((tokenCount / contextLimit) * 100),
-      100
-    );
-    const overflowAmount = tokenCount - contextLimit;
-    const overflowPercent = Math.round((overflowAmount / contextLimit) * 100);
+    const { tokenCount, contextLimit, model } = errorDetails;
+    if (
+      !Number.isFinite(tokenCount) ||
+      !Number.isFinite(contextLimit) ||
+      contextLimit <= 0
+    ) {
+      return (
+        <Alert
+          message="Context Window Exceeded"
+          description="Token usage details are unavailable for this run."
+          type="error"
+          showIcon={false}
+          style={{ marginBottom: 16 }}
+        />
+      );
+    }
+    const usagePercent = Math.min(
+      Math.round((tokenCount / contextLimit) * 100),
+      100
+    );
+    const overflowAmount = Math.max(tokenCount - contextLimit, 0);
+    const overflowPercent = Math.round((overflowAmount / contextLimit) * 100);
frontend/src/components/lookups/create-project-modal/CreateProjectModal.jsx-20-24 (1)
20-24: ⚠️ Potential issue | 🟡 Minor

Reset the form on cancel to avoid stale values on reopen.

Currently, canceling leaves prior input in the form since the modal stays mounted. When reopened, users see previously entered values for name and description fields, creating a poor UX. Implement the suggested handleCancel handler to reset the form to its initial state before calling onCancel.
♻️ Proposed refinement
 export function CreateProjectModal({ open, onCancel, onCreate }) {
   const [form] = Form.useForm();
+  const handleCancel = () => {
+    form.resetFields();
+    onCancel();
+  };
 ...
   return (
     <Modal
       title="Create Look-Up Project"
       open={open}
-      onCancel={onCancel}
+      onCancel={handleCancel}
frontend/src/components/custom-tools/prompt-card/LookupReplacementIndicator.jsx-24-41 (1)
24-41: ⚠️ Potential issue | 🟡 Minor

Use nullish coalescing (??) instead of logical OR (||) to preserve valid falsy values.

The current code uses || "(empty)" which will incorrectly display (empty) for legitimate falsy values like 0, false, or empty strings. Since original_value and enriched_value can be any type (per PropTypes), use ?? to only treat null and undefined as empty.
Suggested fix
-            {originalValue || "(empty)"}
+            {originalValue ?? "(empty)"}
-            {enrichedValue || "(empty)"}
+            {enrichedValue ?? "(empty)"}
backend/lookup/services/vector_db_cleanup_service.py-244-250 (1)
244-250: ⚠️ Potential issue | 🟡 Minor

State update may incorrectly remove failed IDs from history.

When result["deleted"] > 0, the code removes all ids_to_delete from index_ids_history, but some of those IDs may have failed deletion. Since cleanup_index_ids doesn't return which specific IDs succeeded, failed IDs will be lost from history and can't be retried.

Consider either:

Only update history when result["success"] is True (all deletions succeeded)

Have cleanup_index_ids return the list of successfully deleted IDs
🔧 Conservative fix option
-        # Update history to remove deleted IDs
-        if result["deleted"] > 0:
-            remaining_ids = [
-                id for id in index_manager.index_ids_history if id not in ids_to_delete
-            ]
-            index_manager.index_ids_history = remaining_ids
-            index_manager.save()
+        # Only update history when all deletions succeeded
+        if result["success"]:
+            remaining_ids = [
+                id for id in index_manager.index_ids_history if id not in ids_to_delete
+            ]
+            index_manager.index_ids_history = remaining_ids
+            index_manager.save()
backend/lookup/tests/test_services/test_audit_logger.py-226-245 (1)
226-245: ⚠️ Potential issue | 🟡 Minor

Drop unused result assignment.
Static analysis flagged result as unused here; the call can stand alone.
🧹 Cleanup
-        result = audit_logger.log_execution(**minimal_params)
+        audit_logger.log_execution(**minimal_params)
backend/lookup/tests/test_services/test_audit_logger.py-93-105 (1)
93-105: ⚠️ Potential issue | 🟡 Minor

Fix the mock exception setup in test_project_not_found.

When using mock_project_model.DoesNotExist() as a side_effect, it creates a MagicMock instance rather than raising an exception. The except block catching LookupProject.DoesNotExist is never exercised. Create a real exception class on the mock instead:
Suggested fix
-        mock_project_model.objects.get.side_effect = mock_project_model.DoesNotExist()
+        mock_project_model.DoesNotExist = type("DoesNotExist", (Exception,), {})
+        mock_project_model.objects.get.side_effect = mock_project_model.DoesNotExist
backend/lookup/tests/test_services/test_lookup_executor.py-365-368 (1)
365-368: ⚠️ Potential issue | 🟡 Minor

Silence the unused-argument warning by prefixing parameters with underscores.
Ruff's ARG rule will flag the unused *args and **kwargs in this test mock function.
🔧 Suggested fix
-        def delayed_generate(*args, **kwargs):
+        def delayed_generate(*_args, **_kwargs):
             import time
             time.sleep(0.01)  # 10ms delay
             return '{"result": "data"}'
backend/lookup/tests/test_api/test_execution_api.py-31-35 (1)
31-35: ⚠️ Potential issue | 🟡 Minor

Suppress Ruff S106 for test-only passwords (both setups).
🛠️ Suggested fix
-            password='testpass123'
+            password='testpass123',  # noqa: S106 - test-only password
...
-            password='testpass123'
+            password='testpass123',  # noqa: S106 - test-only password
Also applies to: 166-170
backend/lookup/tests/test_api/test_execution_api.py-71-74 (1)
71-74: ⚠️ Potential issue | 🟡 Minor

Silence the unused patched arg to satisfy Ruff ARG002.
🛠️ Suggested fix
-    def test_debug_with_ps_project(self, mock_executor_class, mock_orchestrator_class):
+    def test_debug_with_ps_project(self, _mock_executor_class, mock_orchestrator_class):
backend/lookup/tests/test_api/test_linking_api.py-101-108 (1)
101-108: ⚠️ Potential issue | 🟡 Minor

Remove unused link variables to satisfy Ruff F841.
🧹 Suggested fix
-        link1 = PromptStudioLookupLink.objects.create(
+        PromptStudioLookupLink.objects.create(
             prompt_studio_project_id=self.ps_project_id,
             lookup_project=self.lookup1
         )
-        link2 = PromptStudioLookupLink.objects.create(
+        PromptStudioLookupLink.objects.create(
             prompt_studio_project_id=self.ps_project_id,
             lookup_project=self.lookup2
         )
backend/lookup/integrations/redis_cache.py-120-131 (1)
120-131: ⚠️ Potential issue | 🟡 Minor

Treat empty cached values as hits.
Redis can return an empty string; the current truthiness check marks that as a miss.
🛠️ Suggested fix
-                if value:
+                if value is not None:
                     logger.debug(f"Redis cache hit for key: {key[:20]}...")
                     self._update_stats("hits")
                     return value
                 else:
                     logger.debug(f"Redis cache miss for key: {key[:20]}...")
                     self._update_stats("misses")
backend/lookup/tests/test_api/test_linking_api.py-27-31 (1)
27-31: ⚠️ Potential issue | 🟡 Minor

Suppress Ruff S106 for the test-only password.
This avoids a lint error while keeping the test simple.
🛠️ Suggested fix
-            password='testpass123'
+            password='testpass123',  # noqa: S106 - test-only password
backend/lookup/tests/test_services/test_reference_data_loader.py-71-71 (1)

71-71: ⚠️ Potential issue | 🟡 Minor

Change file_type="text/csv" to file_type="csv".

The file_type field in LookupDataSource validates against FILE_TYPE_CHOICES, which only accepts "csv", not the MIME type "text/csv". This will fail model validation. Use the plain choice value instead.
backend/lookup/integrations/unstract_llm_client.py-122-163 (1)
122-163: ⚠️ Potential issue | 🟡 Minor

Unused config and timeout parameters should be utilized or removed.

The config parameter (for temperature, etc.) and timeout parameter are declared but never used. The LLM call at line 142 doesn't apply the configuration, and no timeout is enforced.
Proposed fix to utilize or document the parameters

If the parameters are intended for future use, add a TODO comment. Otherwise, apply config to the LLM call:
     def generate(self, prompt: str, config: dict[str, Any], timeout: int = 30) -> str:
         ...
         try:
             # Create LLM instance using SDK
-            llm = LLM(adapter_id=self.adapter_id, adapter_metadata=self.adapter_metadata)
+            # Apply config settings if provided
+            metadata = {**self.adapter_metadata}
+            if config.get("temperature") is not None:
+                metadata["temperature"] = config["temperature"]
+            llm = LLM(adapter_id=self.adapter_id, adapter_metadata=metadata)
+            # TODO: Implement timeout handling for LLM calls
backend/lookup/services/lookup_retrieval_service.py-151-155 (1)
151-155: ⚠️ Potential issue | 🟡 Minor

Potential IndexError when accessing unique_chunks by index.

If all chunks are duplicates (edge case), unique_chunks could be empty after deduplication, causing an IndexError on line 154 when accessing unique_chunks[-1] and unique_chunks[0].
Proposed fix
+        if not unique_chunks:
+            logger.warning(
+                f"All {len(all_chunks)} chunks were duplicates for project {project_id}"
+            )
+            return ""
+
         logger.info(
             f"Retrieved {len(unique_chunks)} unique chunks "
             f"(from {len(all_chunks)} total) for project {project_id}. "
             f"Score range: {unique_chunks[-1].score:.3f} - {unique_chunks[0].score:.3f}"
         )
backend/lookup/tests/test_integrations/test_storage_integration.py-164-196 (1)
164-196: ⚠️ Potential issue | 🟡 Minor

Remove unused expected_meta variable.

The variable expected_meta is computed but never used in assertions. Either add an assertion to verify metadata content or remove it.
Proposed fix - add assertion or remove variable

Option 1 - Add assertion:
         # Mock JSON encoding for metadata
         import json
         expected_meta = json.dumps(metadata, indent=2)
         ...
         # Check metadata upload
         meta_path = f"{expected_path}.meta.json"
         meta_calls = [call for call in self.mock_fs.write.call_args_list
                       if call[1]['path'] == meta_path]
         self.assertEqual(len(meta_calls), 1)
+        # Verify metadata content
+        actual_meta_data = meta_calls[0][1]['data'].decode('utf-8')
+        self.assertEqual(actual_meta_data, expected_meta)
Option 2 - Remove unused variable:
-        # Mock JSON encoding for metadata
-        import json
-        expected_meta = json.dumps(metadata, indent=2)
-
backend/lookup/integrations/llmwhisperer_client.py-188-222 (1)

188-222: ⚠️ Potential issue | 🟡 Minor

Blocking poll loop could impact request threads.

The wait_for_extraction method uses time.sleep() which blocks the thread. If called from a request handler, this could exhaust worker threads. Consider:

Using this only in background tasks/Celery workers

Adding documentation about intended usage context

For request threads, return immediately and use webhooks/callbacks
backend/lookup/models/prompt_studio_lookup_link.py-91-106 (1)
91-106: ⚠️ Potential issue | 🟡 Minor

Auto-assignment condition may incorrectly skip assignment for explicit zero.

The condition self.execution_order == 0 and not self.pk will not auto-assign if someone explicitly sets execution_order=0. Also, for UUID primary keys with defaults, not self.pk may not work as expected since UUID fields have defaults.
🐛 Proposed fix
     def save(self, *args, **kwargs):
         """Override save to auto-assign execution_order if not set."""
-        if self.execution_order == 0 and not self.pk:
+        # Check if this is a new instance by querying the database
+        # (self.pk is always set for UUIDs due to default=uuid.uuid4)
+        is_new = self.pk is None or not PromptStudioLookupLink.objects.filter(pk=self.pk).exists()
+        
+        if is_new and self._state.adding and self.execution_order == 0:
             # Get the maximum execution order for this PS project
             max_order = PromptStudioLookupLink.objects.filter(
                 prompt_studio_project_id=self.prompt_studio_project_id
             ).aggregate(max_order=models.Max("execution_order"))["max_order"]
 
             self.execution_order = (max_order or 0) + 1
 
         super().save(*args, **kwargs)
Using self._state.adding is the Django-idiomatic way to check for new instances.
backend/lookup/tests/test_integrations/test_redis_cache_integration.py-144-164 (1)

144-164: ⚠️ Potential issue | 🟡 Minor

Move the RedisError import to the top of the file for consistency.

The from redis.exceptions import RedisError import at line 147 is placed inline within the test method. While redis is a declared project dependency (in backend/pyproject.toml and other modules), the import statement itself should be at the top of the file with other imports for consistency and maintainability, rather than deferred until the test method executes.
backend/lookup/services/indexing_service.py-398-401 (1)
398-401: ⚠️ Potential issue | 🟡 Minor

Add exception chaining for better debugging.

When re-raising in an except clause, use raise ... from err or from None to preserve the exception chain.
🔧 Proposed fix
         try:
             project = LookupProject.objects.get(id=project_id)
         except LookupProject.DoesNotExist:
-            raise ValueError(f"Project {project_id} not found")
+            raise ValueError(f"Project {project_id} not found") from None
backend/lookup/services/lookup_executor.py-664-669 (1)
664-669: ⚠️ Potential issue | 🟡 Minor

Add exception chaining in _parse_llm_response.

When converting exceptions to ParseError, chain the original exception to preserve debugging context.
🔧 Proposed fix
         except json.JSONDecodeError as e:
             logger.warning(f"Failed to parse LLM response as JSON: {e}")
-            raise ParseError(f"Invalid JSON response: {str(e)}")
+            raise ParseError(f"Invalid JSON response: {e!s}") from e
         except Exception as e:
             logger.warning(f"Unexpected error parsing LLM response: {e}")
-            raise ParseError(f"Parse error: {str(e)}")
+            raise ParseError(f"Parse error: {e!s}") from e
backend/lookup/views.py-1357-1361 (1)
1357-1361: ⚠️ Potential issue | 🟡 Minor

Add exception chaining for better debugging.

When converting IntegrityError to DuplicateData, chain the original exception to preserve the database error details for debugging.
🔧 Proposed fix
         try:
             self.perform_create(serializer)
-        except IntegrityError:
-            raise DuplicateData(LookupProfileManagerErrors.PROFILE_NAME_EXISTS)
+        except IntegrityError as e:
+            raise DuplicateData(LookupProfileManagerErrors.PROFILE_NAME_EXISTS) from e
backend/lookup/views.py-260-265 (1)
260-265: ⚠️ Potential issue | 🟡 Minor

Hardcoded user_id may cause issues with multi-user scenarios.

Using a fixed user_id = "lookup" means all lookup uploads share the same path segment, which could cause file collisions if two users upload files with the same name to the same project. Consider using the actual user ID or a unique identifier.
🔧 Proposed fix
             org_id = UserContext.get_organization_identifier()
-            # Use a fixed user_id for lookup uploads to match PS path structure
-            # Path: {base_path}/{org_id}/{user_id}/{tool_id}/{filename}
-            user_id = "lookup"
+            # Use actual user ID to prevent file collisions between users
+            user_id = str(request.user.user_id) if hasattr(request.user, 'user_id') else str(request.user.id)
             tool_id = str(project.id)
backend/lookup/integrations/llm_provider.py-100-142 (1)
100-142: ⚠️ Potential issue | 🟡 Minor

Unused timeout parameter and missing exception chaining.

The timeout parameter is declared but never used. Additionally, catching a broad Exception and re-raising without chaining loses the original traceback context.
🔧 Proposed fix
-    def generate(self, prompt: str, config: dict[str, Any], timeout: int = 30) -> str:
+    def generate(self, prompt: str, config: dict[str, Any], timeout: int = 30) -> str:  # noqa: ARG002
         """Generate LLM response for Look-Up enrichment.
 
         Args:
             prompt: The prompt text with resolved variables
             config: LLM configuration (provider, model, temperature, etc.)
-            timeout: Request timeout in seconds
+            timeout: Request timeout in seconds (reserved for future use)
 
         Returns:
             JSON-formatted string with enrichment data
 
         Raises:
             RuntimeError: If LLM call fails
         """
         start_time = time.time()
 
         try:
             if self.llm_available:
                 # Use actual Unstract LLM integration
                 llm_config = self._get_llm_instance(config)
 
                 # In production, this would make actual LLM call
                 # For now, simulate the call
                 response = self._simulate_llm_call(prompt, llm_config)
             else:
                 # Fallback implementation
                 response = self._fallback_generate(prompt, config)
 
             # Validate response is JSON
             try:
                 json.loads(response)
             except json.JSONDecodeError:
                 # Try to extract JSON from response
                 response = self._extract_json(response)
 
             elapsed_time = time.time() - start_time
             logger.info(f"LLM generation completed in {elapsed_time:.2f}s")
 
             return response
 
         except Exception as e:
-            logger.error(f"LLM generation failed: {e}")
-            raise RuntimeError(f"LLM generation failed: {str(e)}")
+            logger.exception("LLM generation failed")
+            raise RuntimeError(f"LLM generation failed: {e!s}") from e

coderabbitai · 2026-02-05T05:27:24Z

backend/lookup/integrations/llm_provider.py

+        # Try to find JSON in the response
+        import re
+
+        # Look for JSON object pattern
+        json_pattern = r"\{[^{}]*\}"
+        matches = re.findall(json_pattern, response)
+
+        if matches:
+            # Try to parse each match
+            for match in matches:
+                try:
+                    json.loads(match)
+                    return match
+                except json.JSONDecodeError:
+                    continue
+


⚠️ Potential issue | 🟠 Major

JSON extraction regex won't match nested objects.

The pattern r"\{[^{}]*\}" excludes any braces inside the match, so nested JSON like {"data": {"value": 1}} won't be extracted correctly. Consider using a more robust approach.

🔧 Proposed fix using iterative brace matching

def _extract_json(self, response: str) -> str: """Extract JSON from LLM response if wrapped in text.""" - # Try to find JSON in the response - import re - - # Look for JSON object pattern - json_pattern = r"\{[^{}]*\}" - matches = re.findall(json_pattern, response) - - if matches: - # Try to parse each match - for match in matches: - try: - json.loads(match) - return match - except json.JSONDecodeError: - continue + # Try to find JSON by locating balanced braces + start_idx = response.find("{") + if start_idx == -1: + # No JSON object found + return json.dumps({ + "raw_response": response[:500], + "confidence": 0.3, + "warning": "Could not extract structured data", + }) + + # Try progressively longer substrings starting from first brace + for end_idx in range(len(response), start_idx, -1): + if response[end_idx - 1] == "}": + candidate = response[start_idx:end_idx] + try: + json.loads(candidate) + return candidate + except json.JSONDecodeError: + continue # If no valid JSON found, create a basic response return json.dumps({ "raw_response": response[:500], "confidence": 0.3, "warning": "Could not extract structured data", })

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 219-219: Consider moving this statement to an else block

(TRY300)

🤖 Prompt for AI Agents

In `@backend/lookup/integrations/llm_provider.py` around lines 207 - 222, The current JSON extraction uses the regex json_pattern = r"\{[^{}]*\}" which fails on nested objects; replace it with an iterative balanced-brace extractor that scans the response string (the variable response) and uses a stack/counter to find matching '{' ... '}' spans, attempt json.loads on each balanced substring and return the first valid parse; update the logic where matches are built/iterated (the block referencing json_pattern, matches, and the json.loads try/except) to use this brace-matching approach so nested JSON like {"data":{"value":1}} is correctly extracted and parsed.

coderabbitai · 2026-02-05T05:27:24Z

backend/lookup/integrations/redis_cache.py

+        count = 0
+
+        for prompt, response in preload_data.items():
+            # Generate key with project namespace
+            key = f"{self.key_prefix}project:{project_id}:{hashlib.md5(prompt.encode()).hexdigest()}"
+
+            if self.set(key, response):
+                count += 1


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Explore the file structure cd backend/lookup/integrations && wc -l redis_cache.py

Repository: Zipstack/unstract

Length of output: 79

🏁 Script executed:

# Get context around the flagged lines and search for other hashing cd backend/lookup/integrations && sed -n '350,400p' redis_cache.py

Repository: Zipstack/unstract

Length of output: 1208

🏁 Script executed:

# Search for hash-related code throughout the file cd backend/lookup/integrations && grep -n "hashlib\|sha256\|md5" redis_cache.py

Repository: Zipstack/unstract

Length of output: 247

🏁 Script executed:

# Get context around line 108 to understand the main cache key scheme cd backend/lookup/integrations && sed -n '100,120p' redis_cache.py

Repository: Zipstack/unstract

Length of output: 632

🏁 Script executed:

# Get more context to see the method name containing line 108 cd backend/lookup/integrations && sed -n '95,115p' redis_cache.py

Repository: Zipstack/unstract

Length of output: 684

Align warmup cache key hashing with the main cache scheme — use SHA256 instead of MD5.
The main cache key generation (generate_cache_key()) uses SHA256, but the warmup method uses MD5, creating an inconsistency. Additionally, MD5 is cryptographically weak and flagged by security linters.

🛠️ Suggested fix

- key = f"{self.key_prefix}project:{project_id}:{hashlib.md5(prompt.encode()).hexdigest()}" + prompt_hash = hashlib.sha256(prompt.encode("utf-8")).hexdigest() + key = f"{self.key_prefix}project:{project_id}:{prompt_hash}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

count = 0

for prompt, response in preload_data.items():

# Generate key with project namespace

key = f"{self.key_prefix}project:{project_id}:{hashlib.md5(prompt.encode()).hexdigest()}"

if self.set(key, response):

count += 1

count = 0

for prompt, response in preload_data.items():

# Generate key with project namespace

prompt_hash = hashlib.sha256(prompt.encode("utf-8")).hexdigest()

key = f"{self.key_prefix}project:{project_id}:{prompt_hash}"

if self.set(key, response):

count += 1

🧰 Tools

🪛 Ruff (0.14.14)

[error] 378-378: Probable use of insecure hash functions in hashlib: md5

(S324)

🤖 Prompt for AI Agents

In `@backend/lookup/integrations/redis_cache.py` around lines 374 - 381, The warmup loop uses hashlib.md5 to build keys, which diverges from the main generate_cache_key() scheme (SHA256) and triggers security linting; update the key generation inside the preload_data loop to use hashlib.sha256(prompt.encode()).hexdigest() (matching the generate_cache_key() approach), keeping the same self.key_prefix and project:{project_id}: namespace so keys are consistent; ensure the change is made where the code builds key (the preload loop that calls self.set(key, response)) so warmup entries match the primary cache lookup.

coderabbitai · 2026-02-05T05:27:24Z

backend/lookup/models/lookup_data_source.py

+@receiver(pre_save, sender=LookupDataSource)
+def auto_increment_version_and_update_latest(sender, instance, **kwargs):
+    """Signal to auto-increment version number and manage is_latest flag.
+
+    This signal:
+    1. Auto-increments version_number if not set
+    2. Marks all previous versions as not latest
+    """
+    # Check if this is a new instance by querying the database
+    # (instance.pk is always truthy for UUIDField with default=uuid.uuid4)
+    is_new = not LookupDataSource.objects.filter(pk=instance.pk).exists()
+
+    if is_new:
+        # Get the highest version number for this project
+        max_version = LookupDataSource.objects.filter(project=instance.project).aggregate(
+            max_version=models.Max("version_number")
+        )["max_version"]
+
+        # Always auto-increment version for new instances
+        instance.version_number = (max_version or 0) + 1
+
+        # Mark all previous versions as not latest
+        LookupDataSource.objects.filter(project=instance.project, is_latest=True).update(
+            is_latest=False
+        )
+
+        # Ensure new version is marked as latest
+        instance.is_latest = True
+


⚠️ Potential issue | 🟠 Major

Race condition risk in version auto-increment signal.

The pre_save signal queries Max("version_number") and then sets the new version, but concurrent inserts could result in duplicate version numbers since the unique constraint check happens at commit time. Consider using select_for_update() or database-level sequences.

🔒 Proposed mitigation using select_for_update

`@receiver`(pre_save, sender=LookupDataSource) def auto_increment_version_and_update_latest(sender, instance, **kwargs): # Check if this is a new instance is_new = not LookupDataSource.objects.filter(pk=instance.pk).exists() if is_new: + from django.db import transaction + with transaction.atomic(): + # Lock existing records to prevent race conditions + existing = LookupDataSource.objects.filter( + project=instance.project + ).select_for_update().order_by('-version_number').first() + + instance.version_number = (existing.version_number if existing else 0) + 1 - max_version = LookupDataSource.objects.filter(project=instance.project).aggregate( - max_version=models.Max("version_number") - )["max_version"] - - instance.version_number = (max_version or 0) + 1 # Mark all previous versions as not latest LookupDataSource.objects.filter(project=instance.project, is_latest=True).update( is_latest=False ) instance.is_latest = True

Note: This approach requires the signal to be called within a transaction context. Alternatively, consider handling version assignment in the model's save() method where you have more control over the transaction.

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 169-169: Unused function argument: sender

(ARG001)

[warning] 169-169: Unused function argument: kwargs

(ARG001)

🤖 Prompt for AI Agents

In `@backend/lookup/models/lookup_data_source.py` around lines 168 - 196, The pre_save signal handler auto_increment_version_and_update_latest on LookupDataSource is vulnerable to race conditions because it reads Max("version_number") and then writes, so concurrent creations can get duplicate version_number; update this by acquiring a row-level lock and doing the read/update inside a transaction (e.g., wrap in transaction.atomic() and use select_for_update() on LookupDataSource objects filtered by project to lock the relevant rows before computing max and updating is_latest) or switch to a DB-backed sequence/serial for version_number and set is_latest in the same transaction to ensure uniqueness; ensure all logic referencing version_number and is_latest in auto_increment_version_and_update_latest runs inside that transaction context.

coderabbitai · 2026-02-05T05:27:24Z

backend/lookup/models/lookup_execution_audit.py

+        # Ensure error_message is provided for failed status
+        if self.status == "failed" and not self.error_message:
+            raise ValidationError("Error message is required for failed executions")
+
+        # Ensure enriched_output is provided for success status
+        if self.status == "success" and not self.enriched_output:
+            raise ValidationError("Enriched output is required for successful executions")


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check how partial status is used in the codebase rg '"partial"' --type py -C3 | head -100

Repository: Zipstack/unstract

Length of output: 3646

🏁 Script executed:

#!/bin/bash # Search for status patterns and partial usage rg "status.*partial|partial.*status" --type py -C2 | head -80

Repository: Zipstack/unstract

Length of output: 3771

🏁 Script executed:

#!/bin/bash # Look at the full context of the validation file cat -n backend/lookup/models/lookup_execution_audit.py | sed -n '120,160p'

Repository: Zipstack/unstract

Length of output: 1136

🏁 Script executed:

#!/bin/bash # Find all status values used in the codebase rg "status\s*=\s*[\"']" --type py | grep -E "(success|failed|partial)" | head -30

Repository: Zipstack/unstract

Length of output: 2733

Add validation for enriched_output on "partial" status to match "success" requirements.

The code validates enriched_output for status "success" but not for "partial". Since the was_successful() property treats both "success" and "partial" as successful outcomes, the validation should be consistent. If "success" requires enriched output, "partial" should too—partial success still implies some processed output exists.

Consider adding validation for "partial" status:

if self.status in ["success", "partial"] and not self.enriched_output: raise ValidationError("Enriched output is required for successful executions")

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 136-136: Avoid specifying long messages outside the exception class

(TRY003)

[warning] 140-140: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In `@backend/lookup/models/lookup_execution_audit.py` around lines 134 - 140, The validation currently requires enriched_output only when self.status == "success" but was_successful() treats "partial" as successful too; update the validation in the LookupExecutionAudit model (the block that checks self.status and self.enriched_output) to require enriched_output when self.status is either "success" or "partial" (e.g., change the condition to check status in ["success", "partial"]) and keep the ValidationError message consistent.

coderabbitai · 2026-02-05T05:27:25Z

backend/lookup/models/lookup_project.py

+    def clean(self):
+        """Validate model fields."""
+        super().clean()
+        from django.core.exceptions import ValidationError
+
+        # Validate LLM provider
+        valid_providers = [choice[0] for choice in self.LLM_PROVIDER_CHOICES]
+        if self.llm_provider not in valid_providers:
+            raise ValidationError(
+                f"Invalid LLM provider: {self.llm_provider}. "
+                f"Must be one of: {', '.join(valid_providers)}"
+            )
+
+        # Validate LLM config structure
+        if self.llm_config and not isinstance(self.llm_config, dict):
+            raise ValidationError("LLM config must be a dictionary")
+
+        # Validate lookup_type (only static_data for POC)
+        if self.lookup_type != "static_data":
+            raise ValidationError(
+                "Only 'static_data' lookup type is supported in this POC"
+            )


⚠️ Potential issue | 🔴 Critical

Validation fails for nullable llm_provider when it's None.

The llm_provider field allows null=True, but clean() validates that it must be in LLM_PROVIDER_CHOICES. When llm_provider is None, this validation will raise a ValidationError since None is not in the valid choices list.

🐛 Proposed fix

def clean(self): """Validate model fields.""" super().clean() from django.core.exceptions import ValidationError # Validate LLM provider - valid_providers = [choice[0] for choice in self.LLM_PROVIDER_CHOICES] - if self.llm_provider not in valid_providers: - raise ValidationError( - f"Invalid LLM provider: {self.llm_provider}. " - f"Must be one of: {', '.join(valid_providers)}" - ) + if self.llm_provider is not None: + valid_providers = [choice[0] for choice in self.LLM_PROVIDER_CHOICES] + if self.llm_provider not in valid_providers: + raise ValidationError( + f"Invalid LLM provider: {self.llm_provider}. " + f"Must be one of: {', '.join(valid_providers)}" + ) # Validate LLM config structure if self.llm_config and not isinstance(self.llm_config, dict): raise ValidationError("LLM config must be a dictionary")

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 150-153: Avoid specifying long messages outside the exception class

(TRY003)

[warning] 157-157: Avoid specifying long messages outside the exception class

(TRY003)

[warning] 161-163: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In `@backend/lookup/models/lookup_project.py` around lines 142 - 163, The clean() method wrongly enforces LLM_PROVIDER_CHOICES even when llm_provider is nullable; update the validation in clean() to only check membership if self.llm_provider is not None (and optionally not an empty string) before comparing against valid_providers (referencing clean(), LLM_PROVIDER_CHOICES and llm_provider), so nullable values pass and only non-null providers are validated.

coderabbitai · 2026-02-05T05:27:26Z

backend/prompt_studio/prompt_studio_v2/serializers.py

+        # Get tool_id from instance (update) or initial data (create)
+        tool_id = None
+        if self.instance:
+            tool_id = self.instance.tool_id_id
+        elif "tool_id" in self.initial_data:
+            tool_id = self.initial_data["tool_id"]
+
+        if tool_id:
+            try:
+                from lookup.models import PromptStudioLookupLink
+
+                link_exists = PromptStudioLookupLink.objects.filter(
+                    prompt_studio_project_id=tool_id,
+                    lookup_project=value,
+                ).exists()
+
+                if not link_exists:
+                    raise serializers.ValidationError(
+                        "Selected lookup project must be linked to this "
+                        "Prompt Studio project at the project level first."
+                    )
+            except ImportError:
+                # Lookup app not installed, skip validation
+                pass


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

head -100 backend/prompt_studio/prompt_studio_v2/serializers.py | cat -n

Repository: Zipstack/unstract

Length of output: 3410

🏁 Script executed:

find backend -name "models.py" -type f | xargs grep -l "ToolStudioPrompt" | head -5

Repository: Zipstack/unstract

Length of output: 173

🏁 Script executed:

grep -n "class ToolStudioPrompt" backend/prompt_studio/prompt_studio_v2/models.py

Repository: Zipstack/unstract

Length of output: 98

🏁 Script executed:

sed -n '11,80p' backend/prompt_studio/prompt_studio_v2/models.py | cat -n

Repository: Zipstack/unstract

Length of output: 2927

🏁 Script executed:

grep -n "lookup_project" backend/prompt_studio/prompt_studio_v2/models.py

Repository: Zipstack/unstract

Length of output: 104

🏁 Script executed:

sed -n '143,155p' backend/prompt_studio/prompt_studio_v2/models.py | cat -n

Repository: Zipstack/unstract

Length of output: 656

🏁 Script executed:

find backend -type f -name "*.py" | xargs grep -l "PromptStudioLookupLink" | head -5

Repository: Zipstack/unstract

Length of output: 404

🏁 Script executed:

cat -n backend/lookup/models/prompt_studio_lookup_link.py

Repository: Zipstack/unstract

Length of output: 5455

🏁 Script executed:

cat > /tmp/test_validation_logic.py << 'EOF' # Test to understand the issue with validation logic # Scenario 1: PATCH with both tool_id and lookup_project changing # Current code flow: print("=== Current Logic (Instance First) ===") print("Request: PATCH with tool_id='new-tool', lookup_project='lookup-A'") print("Instance (DB): tool_id='old-tool'") print() print("Execution:") print("1. self.instance exists (PATCH request) → True") print("2. tool_id = self.instance.tool_id_id # Uses 'old-tool' from DB") print("3. Validates lookup_project against 'old-tool' (WRONG!)") print("4. Should validate against 'new-tool' (incoming data)") print() print("=== Suggested Logic (Initial Data First) ===") print("Request: PATCH with tool_id='new-tool', lookup_project='lookup-A'") print("Instance (DB): tool_id='old-tool'") print() print("Execution:") print("1. 'tool_id' in self.initial_data → True") print("2. tool_id = self.initial_data['tool_id'] # Uses 'new-tool' from request") print("3. Validates lookup_project against 'new-tool' (CORRECT!)") EOF python /tmp/test_validation_logic.py

Repository: Zipstack/unstract

Length of output: 764

🏁 Script executed:

find backend -type f -name "*.py" | xargs grep -l "ToolStudioPromptSerializer" | grep -E "(views|viewset)" | head -5

Repository: Zipstack/unstract

Length of output: 294

🏁 Script executed:

grep -n "ToolStudioPromptSerializer" backend/prompt_studio/prompt_studio_v2/views.py | head -10

Repository: Zipstack/unstract

Length of output: 198

🏁 Script executed:

sed -n '20,80p' backend/prompt_studio/prompt_studio_v2/views.py | cat -n

Repository: Zipstack/unstract

Length of output: 2583

🏁 Script executed:

grep -A 5 "class Meta:" backend/prompt_studio/prompt_studio_v2/serializers.py | head -10

Repository: Zipstack/unstract

Length of output: 273

🏁 Script executed:

cat > /tmp/drf_validation_test.py << 'EOF' # Understanding Django REST Framework validation behavior # When a PATCH request is made with multiple fields: # 1. self.initial_data contains the incoming request data (what the client sent) # 2. self.instance contains the DB object being updated # 3. Field validators run during validation phase # Example PATCH request: # { # "tool_id": "new-tool-uuid", # "lookup_project": "lookup-uuid-A" # } # DB state before PATCH: # ToolStudioPrompt( # tool_id="old-tool-uuid", # lookup_project="lookup-uuid-B" # ) # During validation of lookup_project field: # - self.initial_data["tool_id"] = "new-tool-uuid" (incoming) # - self.instance.tool_id_id = "old-tool-uuid" (from DB) # Current code prioritizes instance → validation against old tool_id (INCORRECT) # Suggested code prioritizes initial_data → validation against new tool_id (CORRECT) print("✓ The review comment is correct:") print(" - Current logic validates against old tool_id on PATCH with updated tool_id") print(" - Should validate against incoming tool_id when present") print(" - Suggested fix correctly reverses the precedence") EOF python /tmp/drf_validation_test.py

Repository: Zipstack/unstract

Length of output: 281

Validate against the incoming tool_id on updates.

When a PATCH/PUT request includes both a new tool_id and lookup_project, the current logic validates against self.instance.tool_id_id (the old value), allowing the lookup project to be linked to the wrong project. The incoming tool_id from self.initial_data should take precedence during updates.

Suggested fix

- tool_id = None - if self.instance: - tool_id = self.instance.tool_id_id - elif "tool_id" in self.initial_data: - tool_id = self.initial_data["tool_id"] + tool_id = None + if "tool_id" in self.initial_data: + tool_id = self.initial_data["tool_id"] + elif self.instance: + tool_id = self.instance.tool_id_id

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 52-55: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In `@backend/prompt_studio/prompt_studio_v2/serializers.py` around lines 35 - 58, The validation uses self.instance.tool_id_id (old value) instead of respecting an incoming tool_id in updates; change the logic in the serializer validation block so that if "tool_id" exists in self.initial_data it overrides self.instance.tool_id_id (i.e., determine tool_id = self.initial_data.get("tool_id") if present else self.instance.tool_id_id), then perform the PromptStudioLookupLink.objects.filter check (prompt_studio_project_id=tool_id, lookup_project=value) and raise serializers.ValidationError if no link exists; keep the ImportError handling as-is.

coderabbitai · 2026-02-05T05:27:26Z

frontend/src/components/custom-tools/prompt-card/LookupReplacementIndicator.jsx

+    <Popover
+      content={content}
+      title={null}
+      trigger="hover"
+      placement="top"
+      overlayClassName="lookup-replacement-popover"
+    >
+      <Button
+        size="small"
+        type="text"
+        className="prompt-card-action-button lookup-indicator-btn"
+      >
+        <SwapOutlined className="prompt-card-actions-head lookup-indicator-icon" />
+      </Button>


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n frontend/src/components/custom-tools/prompt-card/LookupReplacementIndicator.jsx

Repository: Zipstack/unstract

Length of output: 2759

🏁 Script executed:

rg "trigger=\[" --type jsx --type tsx -A 2 -B 2 frontend/src/components/ | head -50

Repository: Zipstack/unstract

Length of output: 88

🏁 Script executed:

rg "LookupReplacementIndicator" --type jsx --type tsx -B 3 -A 3

Repository: Zipstack/unstract

Length of output: 88

🏁 Script executed:

rg 'trigger=\[' -A 2 -B 2 frontend/src/components/ | head -50

Repository: Zipstack/unstract

Length of output: 43

🏁 Script executed:

rg "LookupReplacementIndicator" -B 3 -A 3

Repository: Zipstack/unstract

Length of output: 5353

🏁 Script executed:

rg "trigger=" frontend/src/components/ | grep -i popover

Repository: Zipstack/unstract

Length of output: 43

🌐 Web query:

Ant Design Popover trigger prop array syntax React

💡 Result:

In Ant Design (React), Popover inherits the trigger prop behavior from Tooltip, where trigger can be a single mode or multiple modes by passing an array. (4x.ant.design)

import { Popover, Button } from "antd"; <Popover title="Title" content="Content" trigger={["hover", "click"]} // array syntax > <Button>Hover or click me</Button> </Popover>

Valid trigger values: "hover" | "focus" | "click" | "contextMenu" (and an array of them). (4x-ant-design.antgroup.com)

Citations:

1: https://4x.ant.design/components/popover/?utm_source=openai

2: https://4x-ant-design.antgroup.com/components/tooltip/?utm_source=openai

Improve keyboard accessibility and fix falsy-value rendering.

The hover-only trigger prevents keyboard users from discovering replacement details. Change trigger="hover" to trigger={["hover", "focus"]} and add an aria-label to the button for screen reader users.

Additionally, the || operator incorrectly treats valid falsy values (0, false, empty strings) as missing data. Use the nullish coalescing operator ?? instead to only replace null or undefined.

Suggested fix

return ( <Popover content={content} title={null} - trigger="hover" + trigger={["hover", "focus"]} placement="top" overlayClassName="lookup-replacement-popover" > <Button size="small" type="text" className="prompt-card-action-button lookup-indicator-btn" + aria-label="Show look-up replacement details" > <SwapOutlined className="prompt-card-actions-head lookup-indicator-icon" /> </Button> </Popover> );

Update lines 29 and 40 to use nullish coalescing:

- <Text className="lookup-replacement-value original"> - {originalValue || "(empty)"} + <Text className="lookup-replacement-value original"> + {originalValue ?? "(empty)"} </Text> ... - <Text className="lookup-replacement-value enriched"> - {enrichedValue || "(empty)"} + <Text className="lookup-replacement-value enriched"> + {enrichedValue ?? "(empty)"} </Text>

🤖 Prompt for AI Agents

In `@frontend/src/components/custom-tools/prompt-card/LookupReplacementIndicator.jsx` around lines 48 - 61, The Popover uses trigger="hover" which blocks keyboard access and the Button lacks an aria-label; change Popover's trigger to trigger={["hover","focus"]} and add a descriptive aria-label to the Button (the element rendering SwapOutlined) so screen readers and keyboard users can discover the replacement details. Also replace any uses of the logical OR fallback (e.g. value || fallback) in this component with the nullish coalescing operator (value ?? fallback) so valid falsy values (0, false, "") are preserved; look for usages in LookupReplacementIndicator.jsx around the Popover/content rendering and update them accordingly.

coderabbitai · 2026-02-05T05:27:26Z

frontend/src/components/custom-tools/settings-modal/SettingsModal.jsx

+    if (LookupManager) {
+      items.push(getMenuItem("Lookups", 9, <SearchOutlined />));
+      listOfComponents[9] = (
+        <LookupManager handleUpdateTool={handleUpdateTool} />
+      );
+    }


⚠️ Potential issue | 🔴 Critical

Key collision: LookupManager uses the same key (9) as CustomDataSettings.

The key 9 is already assigned to "Custom Data" on line 47 and listOfComponents[9] is set to <CustomDataSettings /> on line 68. When LookupManager is available, this code overwrites listOfComponents[9] with <LookupManager /> and adds a duplicate menu item with the same key, causing:

CustomDataSettings component becomes unreachable

React will warn about duplicate keys in the menu

Use an unused key (e.g., 10) for the Lookups menu item.

🐛 Proposed fix

if (LookupManager) { - items.push(getMenuItem("Lookups", 9, <SearchOutlined />)); - listOfComponents[9] = ( + items.push(getMenuItem("Lookups", 10, <SearchOutlined />)); + listOfComponents[10] = ( <LookupManager handleUpdateTool={handleUpdateTool} /> ); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (LookupManager) {

items.push(getMenuItem("Lookups", 9, <SearchOutlined />));

listOfComponents[9] = (

<LookupManager handleUpdateTool={handleUpdateTool} />

);

}

if (LookupManager) {

items.push(getMenuItem("Lookups", 10, <SearchOutlined />));

listOfComponents[10] = (

<LookupManager handleUpdateTool={handleUpdateTool} />

);

}

🤖 Prompt for AI Agents

In `@frontend/src/components/custom-tools/settings-modal/SettingsModal.jsx` around lines 119 - 124, LookupManager is using the same numeric key (9) as CustomDataSettings which overwrites listOfComponents[9] and creates duplicate menu keys; update the LookupManager menu entry and its component mapping to use an unused key (for example 10) by changing the getMenuItem call and the listOfComponents index from 9 to 10 (referencing LookupManager, getMenuItem, and listOfComponents) so CustomDataSettings remains reachable and keys are unique.

frontend/src/components/logging/log-modal/LogModal.jsx

coderabbitai · 2026-02-05T05:27:26Z

frontend/src/components/lookups/create-project-modal/CreateProjectModal.jsx

+  const handleSubmit = async () => {
+    try {
+      const values = await form.validateFields();
+      await onCreate(values);
+      form.resetFields();
+    } catch (error) {
+      console.error("Validation failed:", error);
+    }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n frontend/src/components/lookups/create-project-modal/CreateProjectModal.jsx

Repository: Zipstack/unstract

Length of output: 2220

Don't swallow onCreate errors or mislabel them as validation failures.

The current catch block logs both validation errors and create failures (e.g., API errors) as "Validation failed:", providing no visibility into what actually went wrong. Users receive no feedback when project creation fails. Distinguish between validation and create errors so create failures get proper error messaging.

🔧 Proposed fix (separate validation vs. create failures)

-import { Form, Input, Modal } from "antd"; +import { Form, Input, Modal, message } from "antd"; ... const handleSubmit = async () => { try { const values = await form.validateFields(); await onCreate(values); form.resetFields(); } catch (error) { - console.error("Validation failed:", error); + if (error?.errorFields) { + console.error("Validation failed:", error); + return; + } + message.error("Failed to create project. Please try again."); + console.error("Create failed:", error); } };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const handleSubmit = async () => {

try {

const values = await form.validateFields();

await onCreate(values);

form.resetFields();

} catch (error) {

console.error("Validation failed:", error);

}

const handleSubmit = async () => {

try {

const values = await form.validateFields();

await onCreate(values);

form.resetFields();

} catch (error) {

if (error?.errorFields) {

console.error("Validation failed:", error);

return;

}

message.error("Failed to create project. Please try again.");

console.error("Create failed:", error);

}

};

🤖 Prompt for AI Agents

In `@frontend/src/components/lookups/create-project-modal/CreateProjectModal.jsx` around lines 9 - 16, handleSubmit currently treats any error as a validation failure; change it to first await form.validateFields() in a try/catch to handle and log/return validation errors, then call await onCreate(values) in a separate try/catch so API/create failures are handled distinctly (log the error, show user-visible feedback via message.error or set a submission error) and still call form.resetFields() only on successful creation; reference handleSubmit, form.validateFields, onCreate, and form.resetFields when updating the control flow.

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In `@docs/lookup_architecture_final.md`:
- Around line 186-196: The static method
LookupProfileManager.get_default_profile references an undefined
DefaultProfileError and assumes exactly one default exists; define or import
DefaultProfileError (e.g., from backend.lookup.exceptions) and update docs to
describe the profile initialization strategy (auto-create a default on
LookupProject creation or require profile creation) and system-wide defaults;
also handle the potential MultipleObjectsReturned from
LookupProfileManager.objects.get by documenting/implementing the uniqueness
constraint (or catching MultipleObjectsReturned and choosing a deterministic
resolution) so callers won’t fail on multiple/default-missing cases.
- Around line 273-278: The docs mention a UniqueConstraint named
unique_data_source_profile_index that prevents duplicate (data_source, profile)
entries but the indexing workflow's Step 7 creates a new IndexManager row
without addressing existing rows; update the documentation to specify the
intended reindexing behavior for the IndexManager model (whether to update the
existing IndexManager record, delete-and-recreate it, or use a DB upsert), and
change the guidance in Step 7 to explicitly recommend and describe the chosen
approach (for example, use IndexManager.objects.update_or_create(...) instead of
IndexManager.objects.create(...) if you want an upsert, or describe the
delete-then-create flow and its transactional implications if you prefer
recreate).
- Around line 434-443: The docs mark LLM selection as optional but don't state
the decision logic; update the Step 6 section to explicitly describe when to
call LLM.select_best_match (e.g., based on vector-search confidence thresholds,
profile configuration, or result count), reference the profile LLM flag
(profile.llm.id or a boolean like profile.enable_llm_selection) and the inputs
(matches array and query), and document rules such as "invoke when top-N scores
are below X, or when multiple candidates exceed threshold Y, otherwise skip if a
single match score > X" or point to the exact configuration parameter that
controls this behavior so readers know when LLM(...) and .select_best_match(...)
will be executed.
- Around line 321-385: The workflow lacks error handling and should document and
implement retries, rollbacks and status transitions: for failures in
LookupProfileManager.get_default_profile/X2Text.process (Step 2) set
data_source.extraction_status='failed' and schedule retry/backoff; for
Embedding.embed (Step 5) and VectorDB.index (Step 6) catch adapter-specific
exceptions, mark indexing_status='failed' on the LookupIndexManager entry (or on
data_source), and trigger a cleanup routine to delete any partial uploads in
storage and any partial vector DB entries; replace the direct
LookupIndexManager.objects.create call with
LookupIndexManager.objects.update_or_create (or an explicit reindex pathway) to
avoid IntegrityError on reindex, and document the exact retry policy,
idempotency guarantees, and atomic rollback steps (storage.delete,
vector_db.delete_index) so each failure path transitions the data_source and
LookupIndexManager to deterministic states.
- Around line 46-57: The model currently allows multiple profiles with
is_default=True because the Meta.constraints only enforces unique
(prompt_studio_tool, profile_name); add a partial unique constraint to the same
Meta that enforces uniqueness of prompt_studio_tool when is_default=True (i.e.,
UniqueConstraint(fields=["prompt_studio_tool"], condition=Q(is_default=True),
name="unique_default_profile_per_tool")), import Q from django.db.models and
ensure this pattern is also added to LookupProfileManager.Meta to enforce a
single default per tool/project (note: requires PostgreSQL and Django 4.0+).

🧹 Nitpick comments (2)

docs/lookup_architecture_final.md (2)
287-317: Add language specifiers to fenced code blocks.

The workflow diagrams at lines 287-317, 321-385, and 389-445 should specify a language for better rendering and syntax highlighting. Consider using text, mermaid, or plaintext.
Example fix
-```
+```text
 User Action: Create New Profile
     ↓
As per coding guidelines, markdownlint-cli2 flags these as MD040 violations.

Also applies to: 321-385, 389-445

453-500: Add language specifiers and document API error responses.

The API endpoint blocks should specify a language (e.g., http or text) for proper rendering.

Additionally, consider documenting:

Authentication requirements (JWT, session, API key?)

Error response formats (HTTP status codes, error payload structure)

Pagination for list endpoints

Rate limiting policies

As per coding guidelines, markdownlint-cli2 flags these as MD040 violations.

coderabbitai · 2026-02-05T05:33:23Z