fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries.#38
fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries.#38
Conversation
…ralized utility function, ensuring consistent and secure handling of special characters across all database queries. Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
…logic Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
WalkthroughThis PR introduces SQL LIKE pattern escaping across the API to prevent unintended wildcard behavior and potential injection vulnerabilities. A new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@api/core/rag/datasource/vdb/iris/iris_vector.py`:
- Around line 290-300: The escaped character used by escape_like_pattern (which
currently emits backslashes) does not match the SQL ESCAPE clause ('|') — update
the SQL in the iris_vector.py block that builds sql and executes cursor.execute
(the variables escaped_query and query_pattern) to use ESCAPE '\' so the
backslash escapes produced by escape_like_pattern are honored, or alternatively
change escape_like_pattern to emit '|' escapes and keep ESCAPE '|' consistent;
ensure the final query_pattern and the SQL ESCAPE clause use the same escape
character.
In `@api/core/rag/retrieval/dataset_retrieval.py`:
- Around line 1205-1207: The "not contains" branch builds escaped_value via
escape_like_pattern but calls json_field.notlike without the escape argument;
update the notlike invocation in the "not contains" case to pass escape="\\",
mirroring the other conditions (use the same escape parameter used elsewhere),
so json_field.notlike(f"%{escaped_value}%", escape="\\") is used to ensure
backslash-escaped patterns are interpreted correctly; locate the case "not
contains" and the escape_like_pattern/json_field.notlike usage to apply the
change.
In `@api/tests/unit_tests/libs/test_helper.py`:
- Around line 96-100: The test passes None to escape_like_pattern but the
function is annotated as accepting str and returning str; update the function
signature escape_like_pattern(pattern: str) -> str to accept and return Optional
types (e.g., pattern: str | None -> str | None) so the type hints match runtime
behavior, and update any related type annotations/imports (or alternatively
change the test to only pass strings if None is not intended); ensure the
function implementation and callers handle and return None accordingly.
🧹 Nitpick comments (5)
api/libs/helper.py (1)
64-65: Variable naming violates Python convention.
escapedPatternuses camelCase. Python/PEP8 convention preferssnake_casefor local variables.🔧 Suggested fix
- escapedPattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_") - return escapedPattern + escaped_pattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_") + return escaped_patternapi/services/tag_service.py (1)
22-25: Correct escaping implementation, butsa.and_()wrapper is redundant.The escaping logic using
escape_like_patternwithescape="\\"is correctly implemented. However, wrapping a single condition insa.and_()is unnecessary and can be simplified.🔧 Suggested simplification
escaped_keyword = escape_like_pattern(keyword) - query = query.where(sa.and_(Tag.name.ilike(f"%{escaped_keyword}%", escape="\\"))) + query = query.where(Tag.name.ilike(f"%{escaped_keyword}%", escape="\\"))api/tests/test_containers_integration_tests/services/test_annotation_service.py (1)
511-517: Test 4 is a duplicate of Test 1.Lines 511-517 (Test 4) repeat the same "50%" search already performed in lines 487-493 (Test 1). Both tests verify that searching for "50%" returns exactly one result and doesn't match "100%". Consider either removing this redundant test or changing it to verify a different scenario (e.g., searching for "100%" should not match "50%").
♻️ Suggested fix: Replace duplicate with a distinct test case
- # Test 4: Search with % should NOT match 100% (verifies escaping works) - annotation_list, total = AppAnnotationService.get_annotation_list_by_app_id( - app.id, page=1, limit=10, keyword="50%" - ) - # Should only find the 50% annotation, not the 100% one - assert total == 1 - assert all("50%" in (item.question or "") or "50%" in (item.content or "") for item in annotation_list) + # Test 4: Search for 100% should NOT match 50% (verifies escaping works both ways) + annotation_list, total = AppAnnotationService.get_annotation_list_by_app_id( + app.id, page=1, limit=10, keyword="100%" + ) + # Should only find the 100% annotation, not the 50% one + assert total == 1 + assert all("100%" in (item.question or "") or "100%" in (item.content or "") for item in annotation_list)api/tests/test_containers_integration_tests/services/test_workflow_app_service.py (2)
322-329: Missing backslash character test case.The docstring states special characters
(%, _)are tested, but the corresponding test intest_annotation_service.pyalso tests backslash (\) escaping. For consistency with the PR's goal of uniform LIKE pattern escaping across all services, consider adding a test case for backslash characters (e.g.,"path\\to\\file"in workflow run inputs/outputs).♻️ Suggested addition: Add backslash test case
Add the following test case after Test 2 (underscore test):
# Test 3: Search with \ character workflow_run_3 = WorkflowRun( id=str(uuid.uuid4()), tenant_id=app.tenant_id, app_id=app.id, workflow_id=workflow.id, type="workflow", triggered_from="app-run", version="1.0.0", graph=json.dumps({"nodes": [], "edges": []}), status="succeeded", inputs=json.dumps({"search_term": "path\\to\\file", "input2": "other_value"}), outputs=json.dumps({"result": "path\\to\\file found", "status": "success"}), created_by_role=CreatorUserRole.ACCOUNT, created_by=account.id, created_at=datetime.now(UTC), ) db.session.add(workflow_run_3) db.session.flush() workflow_app_log_3 = WorkflowAppLog( tenant_id=app.tenant_id, app_id=app.id, workflow_id=workflow.id, workflow_run_id=workflow_run_3.id, created_from="service-api", created_by_role=CreatorUserRole.ACCOUNT, created_by=account.id, ) workflow_app_log_3.id = str(uuid.uuid4()) workflow_app_log_3.created_at = datetime.now(UTC) db.session.add(workflow_app_log_3) db.session.commit() result = service.get_paginate_workflow_app_logs( session=db_session_with_containers, app_model=app, keyword="path\\to\\file", page=1, limit=20 ) assert result["total"] >= 1 assert any(log.workflow_run_id == workflow_run_3.id for log in result["data"])Also update the docstring to include backslash:
- - Special characters (%, _) in keyword are properly escaped + - Special characters (%, _, \) in keyword are properly escaped
424-425: Minor: Variable naming skipsworkflow_run_3.The variable naming jumps from
workflow_run_2(line 382) toworkflow_run_4(line 424), skippingworkflow_run_3. This appears to be a numbering oversight. While it doesn't affect functionality, consider renaming toworkflow_run_3for consistency.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
api/controllers/console/app/conversation.pyapi/controllers/console/datasets/datasets_segments.pyapi/core/rag/datasource/vdb/clickzetta/clickzetta_vector.pyapi/core/rag/datasource/vdb/iris/iris_vector.pyapi/core/rag/retrieval/dataset_retrieval.pyapi/libs/helper.pyapi/services/annotation_service.pyapi/services/app_service.pyapi/services/conversation_service.pyapi/services/dataset_service.pyapi/services/external_knowledge_service.pyapi/services/tag_service.pyapi/services/workflow_app_service.pyapi/tests/test_containers_integration_tests/services/test_annotation_service.pyapi/tests/test_containers_integration_tests/services/test_app_service.pyapi/tests/test_containers_integration_tests/services/test_tag_service.pyapi/tests/test_containers_integration_tests/services/test_workflow_app_service.pyapi/tests/unit_tests/libs/test_helper.py
🧰 Additional context used
🧬 Code graph analysis (16)
api/services/conversation_service.py (1)
api/libs/helper.py (1)
escape_like_pattern(35-65)
api/core/rag/retrieval/dataset_retrieval.py (1)
api/libs/helper.py (1)
escape_like_pattern(35-65)
api/tests/test_containers_integration_tests/services/test_annotation_service.py (1)
api/services/annotation_service.py (2)
AppAnnotationService(26-679)get_annotation_list_by_app_id(128-161)
api/core/rag/datasource/vdb/iris/iris_vector.py (1)
api/libs/helper.py (1)
escape_like_pattern(35-65)
api/services/app_service.py (2)
api/libs/helper.py (1)
escape_like_pattern(35-65)api/models/model.py (1)
App(75-312)
api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py (1)
api/libs/helper.py (1)
escape_like_pattern(35-65)
api/tests/test_containers_integration_tests/services/test_app_service.py (1)
api/services/app_service.py (2)
create_app(80-173)get_paginate_apps(34-78)
api/services/dataset_service.py (1)
api/libs/helper.py (1)
escape_like_pattern(35-65)
api/controllers/console/app/conversation.py (1)
api/libs/helper.py (1)
escape_like_pattern(35-65)
api/services/external_knowledge_service.py (1)
api/libs/helper.py (1)
escape_like_pattern(35-65)
api/controllers/console/datasets/datasets_segments.py (1)
api/libs/helper.py (1)
escape_like_pattern(35-65)
api/services/workflow_app_service.py (3)
api/libs/helper.py (1)
escape_like_pattern(35-65)api/models/workflow.py (1)
WorkflowRun(556-714)api/models/model.py (1)
EndUser(1553-1583)
api/tests/unit_tests/libs/test_helper.py (1)
api/libs/helper.py (2)
escape_like_pattern(35-65)extract_tenant_id(68-89)
api/services/tag_service.py (2)
api/libs/helper.py (1)
escape_like_pattern(35-65)api/models/model.py (1)
Tag(1983-2002)
api/services/annotation_service.py (2)
api/libs/helper.py (1)
escape_like_pattern(35-65)api/models/model.py (1)
MessageAnnotation(1409-1439)
api/tests/test_containers_integration_tests/services/test_tag_service.py (1)
api/services/tag_service.py (1)
get_tags(15-28)
🔇 Additional comments (22)
api/tests/unit_tests/libs/test_helper.py (1)
71-113: Good test coverage for the escape utility.The tests comprehensively cover individual special characters, combined cases, edge cases (empty string, None), normal strings, and importantly verify the correct escape order to prevent double-escaping.
api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py (1)
987-991: LGTM - Correctly integrates centralized LIKE pattern escaping.The implementation properly chains escaping: first escaping LIKE wildcards via
escape_like_pattern, then escaping SQL string literals with.replace("'", "''"). TheESCAPE '\\\\'clause correctly specifies backslash as the escape character for the LIKE pattern.api/services/app_service.py (1)
57-62: LGTM - Correct usage of centralized escaping for app name search.The implementation properly escapes the user-provided name using
escape_like_patternand specifiesescape="\\"in theilikecall, ensuring consistent handling of special characters like%,_, and\in search queries.api/services/conversation_service.py (1)
218-236: LGTM - Consistent escaping across database types.The implementation correctly applies
escape_like_patterntovariable_nameand handles both MySQL-family (func.json_extract) and PostgreSQL (func.json_extract_path_text) database types with consistent escaping behavior usingescape="\\".api/services/external_knowledge_service.py (1)
38-41: LGTM! Correct application of LIKE pattern escaping.The implementation correctly escapes the search term and uses the appropriate
escape="\\"clause, ensuring consistent and secure handling of special characters in the LIKE query.api/services/annotation_service.py (1)
140-149: LGTM! Proper escaping applied to keyword search filters.The implementation correctly escapes the keyword and applies it to both
questionandcontentfields with the appropriateescape="\\"clause.api/services/workflow_app_service.py (1)
89-103: LGTM! Proper escaping with keyword trimming.The implementation correctly escapes special characters and applies a sensible 30-character trim limit. All LIKE conditions use the consistent
escape="\\"clause.api/core/rag/retrieval/dataset_retrieval.py (1)
1198-1203: LGTM! Correct escaping for contains, start with, and end with filters.These filter conditions correctly apply the escaping function and include the
escape="\\"clause for proper LIKE pattern handling.Also applies to: 1209-1215
api/controllers/console/datasets/datasets_segments.py (2)
33-33: LGTM! Top-level import is appropriate for this controller.The import placement is fine since the keyword filtering is a common code path in this controller.
148-173: LGTM! Comprehensive escaping for segment keyword search.The implementation correctly escapes the keyword and applies it consistently across:
- PostgreSQL JSONB array search
- MySQL JSON cast fallback
- Content field search
All conditions properly use
escape="\\"for consistent behavior.api/services/dataset_service.py (3)
146-148: Escaped dataset name filter looks correct.
3426-3429: Escaped child-chunk keyword filter looks correct.
3460-3463: Escaped segment keyword filter looks correct.api/tests/test_containers_integration_tests/services/test_app_service.py (2)
10-13: Deferred AppService imports are consistently applied.Also applies to: 76-78, 117-119, 170-172, 208-210, 262-264, 335-337, 415-417, 484-486, 537-539, 594-596, 652-654, 710-712, 761-763, 812-814, 873-875, 912-914, 952-954, 1003-1005
1012-1131: Special-character name search test is solid coverage.api/controllers/console/app/conversation.py (2)
345-353: Escaped completion-conversation keyword filter looks correct.
460-478: Escaped chat-conversation keyword filter looks correct.api/tests/test_containers_integration_tests/services/test_tag_service.py (1)
1-1: Special-character tag search test is good coverage, and the new uuid import is justified.Also applies to: 316-394
api/tests/test_containers_integration_tests/services/test_annotation_service.py (1)
447-518: Good test coverage for SQL LIKE pattern escaping.This test thoroughly validates the escaping of special characters (%, _, ) in keyword searches, which is crucial for the security fix in this PR. The test structure mirrors existing tests in the class and properly verifies that special characters are treated as literals rather than wildcards.
api/tests/test_containers_integration_tests/services/test_workflow_app_service.py (3)
13-16: LGTM - Deferred import pattern is appropriate.The comment clearly documents why
AppServiceimport is deferred to avoid circular dependencies, and the commented-out import line serves as documentation for what would normally be at module scope.
91-93: LGTM - Local imports handle circular dependency correctly.The local imports of
AppServicewithin helper methods are a standard pattern to avoid circular dependencies in test files.Also applies to: 155-157
319-468: Good test coverage for SQL LIKE pattern escaping in workflow logs.The test properly validates that special characters are escaped when searching workflow app logs, preventing wildcard behavior and potential SQL injection via LIKE patterns. The test structure is comprehensive with positive and negative match scenarios.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| # Escape special characters for LIKE clause to prevent SQL injection | ||
| from libs.helper import escape_like_pattern | ||
|
|
||
| escaped_query = escape_like_pattern(query) | ||
| query_pattern = f"%{escaped_query}%" | ||
| sql = f""" | ||
| SELECT TOP {top_k} id, text, meta | ||
| FROM {self.schema}.{self.table_name} | ||
| WHERE text LIKE ? | ||
| WHERE text LIKE ? ESCAPE '|' | ||
| """ | ||
| cursor.execute(sql, (query_pattern,)) |
There was a problem hiding this comment.
Escape character mismatch will cause incorrect search behavior.
The escape_like_pattern function escapes special characters using backslash (\), but line 298 specifies ESCAPE '|' in the SQL query. This mismatch means the escaping will not work as intended.
For example, searching for "50% discount" will produce 50\% discount in the pattern, but the database expects | as the escape character, so it will interpret \% as a literal backslash followed by a percent wildcard.
🐛 Proposed fix: Use consistent escape character
escaped_query = escape_like_pattern(query)
query_pattern = f"%{escaped_query}%"
sql = f"""
SELECT TOP {top_k} id, text, meta
FROM {self.schema}.{self.table_name}
- WHERE text LIKE ? ESCAPE '|'
+ WHERE text LIKE ? ESCAPE '\\'
"""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Escape special characters for LIKE clause to prevent SQL injection | |
| from libs.helper import escape_like_pattern | |
| escaped_query = escape_like_pattern(query) | |
| query_pattern = f"%{escaped_query}%" | |
| sql = f""" | |
| SELECT TOP {top_k} id, text, meta | |
| FROM {self.schema}.{self.table_name} | |
| WHERE text LIKE ? | |
| WHERE text LIKE ? ESCAPE '|' | |
| """ | |
| cursor.execute(sql, (query_pattern,)) | |
| # Escape special characters for LIKE clause to prevent SQL injection | |
| from libs.helper import escape_like_pattern | |
| escaped_query = escape_like_pattern(query) | |
| query_pattern = f"%{escaped_query}%" | |
| sql = f""" | |
| SELECT TOP {top_k} id, text, meta | |
| FROM {self.schema}.{self.table_name} | |
| WHERE text LIKE ? ESCAPE '\\' | |
| """ | |
| cursor.execute(sql, (query_pattern,)) |
🤖 Prompt for AI Agents
In `@api/core/rag/datasource/vdb/iris/iris_vector.py` around lines 290 - 300, The
escaped character used by escape_like_pattern (which currently emits
backslashes) does not match the SQL ESCAPE clause ('|') — update the SQL in the
iris_vector.py block that builds sql and executes cursor.execute (the variables
escaped_query and query_pattern) to use ESCAPE '\' so the backslash escapes
produced by escape_like_pattern are honored, or alternatively change
escape_like_pattern to emit '|' escapes and keep ESCAPE '|' consistent; ensure
the final query_pattern and the SQL ESCAPE clause use the same escape character.
| case "not contains": | ||
| filters.append(json_field.notlike(f"%{value}%")) | ||
| escaped_value = escape_like_pattern(str(value)) | ||
| filters.append(json_field.notlike(f"%{escaped_value}%")) |
There was a problem hiding this comment.
Missing escape clause in notlike condition.
The "not contains" case escapes the value but doesn't specify the escape="\\" parameter in the notlike call. Without this, the escaped backslashes won't be interpreted correctly by the database, causing inconsistent behavior compared to the other filter conditions.
🐛 Proposed fix: Add escape clause
case "not contains":
escaped_value = escape_like_pattern(str(value))
- filters.append(json_field.notlike(f"%{escaped_value}%"))
+ filters.append(json_field.notlike(f"%{escaped_value}%", escape="\\"))🤖 Prompt for AI Agents
In `@api/core/rag/retrieval/dataset_retrieval.py` around lines 1205 - 1207, The
"not contains" branch builds escaped_value via escape_like_pattern but calls
json_field.notlike without the escape argument; update the notlike invocation in
the "not contains" case to pass escape="\\", mirroring the other conditions (use
the same escape parameter used elsewhere), so
json_field.notlike(f"%{escaped_value}%", escape="\\") is used to ensure
backslash-escaped patterns are interpreted correctly; locate the case "not
contains" and the escape_like_pattern/json_field.notlike usage to apply the
change.
| def test_escape_none_handling(self): | ||
| """Test escaping None returns None (falsy check handles it).""" | ||
| # The function checks `if not pattern`, so None is falsy and returns as-is | ||
| result = escape_like_pattern(None) | ||
| assert result is None |
There was a problem hiding this comment.
Type hint mismatch: function signature is str but test passes None.
The function escape_like_pattern(pattern: str) -> str declares pattern as str, but this test passes None. While the function handles this at runtime via the falsy check, this creates a type inconsistency that static type checkers (mypy, pyright) would flag.
Consider either updating the function signature to pattern: str | None and return type to str | None, or adjusting this test if None inputs are not expected in practice.
🤖 Prompt for AI Agents
In `@api/tests/unit_tests/libs/test_helper.py` around lines 96 - 100, The test
passes None to escape_like_pattern but the function is annotated as accepting
str and returning str; update the function signature
escape_like_pattern(pattern: str) -> str to accept and return Optional types
(e.g., pattern: str | None -> str | None) so the type hints match runtime
behavior, and update any related type annotations/imports (or alternatively
change the test to only pass strings if None is not intended); ensure the
function implementation and callers handle and return None accordingly.
Benchmark PR from agentic-review-benchmarks#12
Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.