Skip to content

Conversation

@rostilos
Copy link
Owner

No description provided.

- Updated JobService to use REQUIRES_NEW transaction propagation for deleting ignored jobs, ensuring fresh entity retrieval and preventing issues with the calling transaction.
- Removed token limitation from AI connection model and related DTOs, transitioning to project-level configuration for token limits.
- Adjusted AIConnectionDTO tests to reflect the removal of token limitation.
- Enhanced Bitbucket, GitHub, and GitLab AI client services to check token limits before analysis, throwing DiffTooLargeException when limits are exceeded.
- Updated command processors to utilize project-level token limits instead of AI connection-specific limits.
- Modified webhook processing to handle diff size issues gracefully, posting informative messages to VCS when analysis is skipped due to large diffs.
- Cleaned up integration tests to remove references to token limitation in AI connection creation and updates.
…sis processing. Project PR analysis max analysis token limit implementation
@rostilos
Copy link
Owner Author

/codecrow analyze

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecrow-local codecrow-local bot deleted a comment from codecrow-ai bot Jan 28, 2026
- Added AST-based code splitter using Tree-sitter for accurate code parsing.
- Introduced TreeSitterParser for dynamic language loading and caching.
- Created scoring configuration for RAG query result reranking with configurable boost factors and priority patterns.
- Refactored RAGQueryService to utilize the new scoring configuration for enhanced result ranking.
- Improved metadata extraction and handling for better context in scoring.
@rostilos
Copy link
Owner Author

/codecrow analyze

@codecrow-local codecrow-local bot deleted a comment from codecrow-ai bot Jan 28, 2026
@rostilos
Copy link
Owner Author

/codecrow analyze

@codecrow-ai
Copy link

codecrow-ai bot commented Jan 28, 2026

⚠️ CodeCrow Command Failed

Comment commands are not enabled for this project


Check the job logs in CodeCrow for detailed error information.

@codecrow-local
Copy link

⚠️ Code Analysis Results

Summary

Pull Request Review: Feature/pr analysis rate limiting

Status FAIL
Risk Level CRITICAL
Review Coverage 5 files analyzed in depth
Confidence HIGH

Executive Summary

This PR introduces rate limiting and performance optimizations across the Java and Python ecosystems, specifically targeting the PR analysis engine and RAG pipeline. While the PR successfully implements lock-management refinements and memory optimizations, it introduces critical breaking changes in the token management architecture and inconsistent locking protocols across VCS providers.

Recommendation

Decision: FAIL

This PR is rejected due to critical architectural regressions that will cause immediate compilation and runtime failures. Specifically, the removal of core fields in the AIConnection entity without updating downstream consumers and the inconsistent implementation of the locking protocol in the GitLab handler must be resolved before this can be merged.


Architectural & Cross-File Concerns

1. Breaking Change in Token Limitation Management (CRITICAL)

The migration of tokenLimitation from the AIConnection entity to project-level settings is incomplete. While the fields were removed from the core model and DTOs, several service classes and test helpers still attempt to access these non-existent methods. This will lead to build failures in CI/CD and runtime crashes in the integration test suite.

2. Inconsistent Webhook & Locking Protocol (HIGH)

The implementation of the new locking protocol is inconsistent across VCS providers. The GitLab handler fails to propagate the preAcquiredLockKey to the analysis processor, unlike the GitHub and Bitbucket counterparts. This inconsistency will result in AnalysisLockedException errors and stalled processing for GitLab-based projects.

3. Data Integrity & Resource Management (MEDIUM)

The RAG pipeline indexing logic lacks a robust cleanup mechanism for temporary collections. In the event of a failure during the swap operation, orphaned collections will remain in the vector store, leading to potential storage exhaustion and data drift over time.

Issues Overview

Severity Count
🔴 High 3 Critical issues requiring immediate attention
🟡 Medium 13 Issues that should be addressed
🔵 Low 6 Minor issues and improvements
ℹ️ Info 4 Informational notes and suggestions
✅ Resolved 4 Resolved issues

Analysis completed on 2026-01-28 00:56:14 | View Full Report | Pull Request


📋 Detailed Issues (26)

🔴 High Severity Issues

Id on Platform: 1434

Category: 🐛 Bug Risk

File: .../index_manager/point_operations.py

Issue: The 'chunk_index' used for generating deterministic point IDs is still reset for every file path. While using a UUID for missing paths prevents collision between different 'unknown' files, the logic still relies on the list order of chunks within those generated paths. If the same data is re-indexed, the UUID will be different, leading to duplicate entries in the vector store instead of updates.

💡 Suggested Fix

Use a hash of the content combined with the path (if available) to generate a truly deterministic ID that persists across indexing runs, similar to the implementation in 'ast_splitter.py'.

--- a/python-ecosystem/rag-pipeline/src/rag_pipeline/core/index_manager/point_operations.py
+++ b/python-ecosystem/rag-pipeline/src/rag_pipeline/core/index_manager/point_operations.py
@@ -52,3 +52,3 @@
         for chunk in chunks:
-            path = chunk.metadata.get("path", "unknown")
+            path = chunk.metadata.get("path", str(uuid.uuid4()))
             if path not in chunks_by_file:

View Issue Details


Id on Platform: 1447

Category: 🐛 Bug Risk

File: .../ai/AIConnection.java:98

Issue: Removing these methods while AIConnectionService still calls setTokenLimitation and existing tests (AIConnectionTest and AIConnectionDTOTest) rely on them will cause compilation and runtime errors. The token limitation logic seems to have been moved to ProjectConfig, but the removal in this entity is a breaking change for existing consumers not updated in this batch.

💡 Suggested Fix

Restore the tokenLimitation field and its accessors to maintain backward compatibility or ensure all callers are updated before removal.

--- a/java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/ai/AIConnection.java
+++ b/java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/ai/AIConnection.java
@@ -39,2 +39,5 @@
-    @Column(name= "token_limitation", nullable = false)
-    private int tokenLimitation = 100000;
+    @Column(name= "token_limitation", nullable = false)
+    private int tokenLimitation = 100000;
@@ -98,0 +101,8 @@
+    public void setTokenLimitation(int tokenLimitation) {
+        this.tokenLimitation = tokenLimitation;
+    }
+
+    public int getTokenLimitation() {
+        return tokenLimitation;
+    }

View Issue Details


Id on Platform: 1448

Category: 🐛 Bug Risk

File: .../ai/AIConnection.java:98

Issue: Removing these methods while AIConnectionService.java still calls setTokenLimitation (as seen in context #15) will cause a compilation error. Similarly, existing tests (#2, #10) will fail. While you cannot change the service in this file, the removal of these public methods is a breaking change for existing consumers that have not been updated in this batch.

💡 Suggested Fix

Ensure that AIConnectionService.java and related tests are updated in the same PR to prevent build failures. If not possible in one go, consider deprecating the methods instead of immediate removal.

No suggested fix provided

View Issue Details


🟡 Medium Severity Issues

Id on Platform: 1432

Category: 🛡️ Error Handling

File: .../index_manager/indexer.py:225

Issue: This issue regarding orphaned temporary collections on failure remains unaddressed in the provided diff.

💡 Suggested Fix

Add a try-except-finally block around the indexing and swap logic to ensure the temporary collection is deleted if the swap does not complete successfully.

--- a/python-ecosystem/rag-pipeline/src/rag_pipeline/core/index_manager/indexer.py
+++ b/python-ecosystem/rag-pipeline/src/rag_pipeline/core/index_manager/indexer.py
@@ -224,2 +224,3 @@
-            temp_info = self.point_ops.client.get_collection(temp_collection_name)
-            if temp_info.points_count == 0:
+            try:
+                temp_info = self.point_ops.client.get_collection(temp_collection_name)
+                if temp_info.points_count == 0:
@@ -231,2 +233,5 @@
-            self._perform_atomic_swap(
-                alias_name, temp_collection_name, old_collection_exists
+                self._perform_atomic_swap(
+                    alias_name, temp_collection_name, old_collection_exists
+                )
+            except Exception as swap_err:
+                raise swap_err

View Issue Details


Id on Platform: 1435

Category: ⚡ Performance

File: .../index_manager/point_operations.py:85

Issue: Large lists of chunks are still processed in a single batch call. This remains a risk for payload size limits and timeouts.

💡 Suggested Fix

Implement a batching mechanism to split the embedding requests into smaller chunks (e.g., 20-50 nodes per request).

--- a/python-ecosystem/rag-pipeline/src/rag_pipeline/core/index_manager/point_operations.py
+++ b/python-ecosystem/rag-pipeline/src/rag_pipeline/core/index_manager/point_operations.py
@@ -85,3 +85,8 @@
-        texts_to_embed = [chunk.text for _, chunk in chunk_data]
-        embeddings = self.embed_model.get_text_embedding_batch(texts_to_embed)
-        
+        embeddings = []
+        texts_to_embed = [chunk.text for _, chunk in chunk_data]
+        for i in range(0, len(texts_to_embed), self.batch_size):
+            batch = texts_to_embed[i:i + self.batch_size]
+            embeddings.extend(self.embed_model.get_text_embedding_batch(batch))
+
         # Build points with embeddings

View Issue Details


Id on Platform: 1436

Category: 🐛 Bug Risk

File: .../splitter/metadata.py:105

Issue: The signature extraction logic in extract_signature iterates over lines[:15]. If a function signature starts after line 15 of a chunk, it will fail to be extracted. Additionally, if the first 15 lines contain multiple definitions, the logic only returns the first one found even if the chunk represents a later definition.

💡 Suggested Fix

Adjust the signature extraction to be more context-aware or increase the scan limit, and ensure it aligns with the primary node being processed in the chunk.

No suggested fix provided

View Issue Details


Id on Platform: 1437

Category: ⚡ Performance

File: .../splitter/metadata.py:76

Issue: The docstring regex r'"""([\s\S]*?)"""|\'\'\'([\s\S]*?)\'\'\'' and subsequent JSDoc regexes use [\s\S]*? which can lead to Catastrophic Backtracking on very large, malformed, or nested string inputs in some regex engines. While the impact is limited by chunking, it is better practice to use more specific matching or limit search scope.

💡 Suggested Fix

Use more restrictive patterns or ensure the input string length is capped before applying complex multi-line regexes. Alternatively, pre-compile these common patterns.

--- a/python-ecosystem/rag-pipeline/src/rag_pipeline/core/splitter/metadata.py
+++ b/python-ecosystem/rag-pipeline/src/rag_pipeline/core/splitter/metadata.py
@@ -73,2 +73,4 @@
-        if language == 'python':
-            match = re.search(r'"""([\s\S]*?)"""|\'\'\'([\s\S]*?)\'\'\'', content)
+        # Pre-compile or use more restrictive patterns for large chunks
+        PY_DOC = re.compile(r'"""(.*?)"""|\'\'\'(.*?)\'\'\'', re.DOTALL)
+        if language == 'python':
+            match = PY_DOC.search(content)

View Issue Details


Id on Platform: 1438

Category: 🧪 Testing

File: .../util/AuthTestHelper.java:257

Issue: Removing the token limitation assignment in a test helper may lead to unexpected failures in downstream tests that rely on this value being present, especially since the AI system logic uses this to calculate context windows.

💡 Suggested Fix

Restore a default token limit to ensure tests that calculate context windows or limits do not encounter NullPointerExceptions or logic errors.

--- a/java-ecosystem/tests/integration-tests/src/test/java/org/rostilos/codecrow/integration/util/AuthTestHelper.java
+++ b/java-ecosystem/tests/integration-tests/src/test/java/org/rostilos/codecrow/integration/util/AuthTestHelper.java
@@ -256,2 +256,3 @@
         aiConnection.setApiKeyEncrypted("test-encrypted-api-key-" + UUID.randomUUID().toString().substring(0, 8));
+        aiConnection.setTokenLimitation(100000);

View Issue Details


Id on Platform: 1439

Category: ⚡ Performance

File: .../index_manager/branch_manager.py:1

Issue: The method preserve_other_branch_points reads all points from other branches into a single Python list in memory. If the collection is large, this will cause Out Of Memory (OOM) errors.

💡 Suggested Fix

Convert the method into a generator or process in batches to avoid loading all points into memory at once.

--- a/python-ecosystem/rag-pipeline/src/rag_pipeline/core/index_manager/branch_manager.py
+++ b/python-ecosystem/rag-pipeline/src/rag_pipeline/core/index_manager/branch_manager.py
@@ -113,7 +113,7 @@
-    def preserve_other_branch_points(
+    def get_other_branch_points_iterator(
         self,
         collection_name: str,
         exclude_branch: str
-    ) -> List[PointStruct]:
+    ):
         """Preserve points from branches other than the one being reindexed."
         logger.info(f"Preserving points from branches other than '{exclude_branch}'...")
-        preserved_points = []
         offset = None
         try:
             while True:
                 results = self.client.scroll(
@@ -133,5 +133,4 @@
                 points, next_offset = results
-                preserved_points.extend(points)
+                for p in points: yield p
                 if next_offset is None or len(points) < 100:
                     break
                 offset = next_offset
-            return preserved_points

View Issue Details


Id on Platform: 1445

Category: 🧹 Code Quality

File: .../service/ProjectService.java:576

Issue: The method 'updateAnalysisSettings' continues to grow in parameter count (now 6 parameters), making it harder to maintain and increasing the risk of argument-order bugs.

💡 Suggested Fix

Refactor the method to accept a DTO or a Command object instead of individual parameters to improve readability and extensibility.

No suggested fix provided

View Issue Details


Id on Platform: 1446

Category: ✨ Best Practices

File: .../service/ProjectService.java:561

Issue: The method 'updateAnalysisSettings' is becoming a 'telescoping' method with many parameters. As the number of project configuration options grows, this will become difficult to maintain and leads to error-prone calls.

💡 Suggested Fix

Consider refactoring the method to accept a DTO or a 'SettingsUpdate' object instead of individual parameters.

No suggested fix provided

View Issue Details


Id on Platform: 1449

Category: 🏗️ Architecture

File: .../ai/AIConnectionDTO.java:14

Issue: Removing 'tokenLimitation' from this DTO is a breaking change for any API consumers or internal services relying on this field. Given that 'maxAnalysisTokenLimit' was added to ProjectDTO in this same batch, the logic seems to be shifting token management from the connection level to the project level, but existing tests (Related Code #1, #3, #10) still expect this field.

💡 Suggested Fix

Ensure that all client-side code and internal tests (like AIConnectionDTOTest) are updated to reflect the removal of this field to prevent compilation or runtime errors.

No suggested fix provided

View Issue Details


Id on Platform: 1450

Category: ✨ Best Practices

File: .../controller/ProjectController.java:597

Issue: The 'maxAnalysisTokenLimit' field is an Integer but lacks validation constraints. Accepting arbitrary or negative integers for token limits can lead to logic errors in the analysis engine or resource exhaustion if very high values are provided.

💡 Suggested Fix

Add validation annotations like @positive or @min(0) to ensure the token limit is a valid non-negative integer.

--- a/java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/controller/ProjectController.java
+++ b/java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/controller/ProjectController.java
@@ -593,8 +593,9 @@
     public record UpdateAnalysisSettingsRequest(
             Boolean prAnalysisEnabled,
             Boolean branchAnalysisEnabled,
             String installationMethod,
-            Integer maxAnalysisTokenLimit
+            @jakarta.validation.constraints.Min(0) Integer maxAnalysisTokenLimit
     ) {}

View Issue Details


Id on Platform: 1452

Category: ⚡ Performance

File: .../service/WebhookDeduplicationService.java:1

Issue: The cleanup logic still runs on the critical path of every isDuplicateCommitAnalysis call. While using removeIf on ConcurrentHashMap is thread-safe, iterating over the map during high-volume webhook bursts can cause unnecessary overhead for every request.

💡 Suggested Fix

Move the cleanup logic to a scheduled background task to keep the request path fast.

--- a/java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/WebhookDeduplicationService.java
+++ b/java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/WebhookDeduplicationService.java
@@ -1,6 +1,7 @@
 package org.rostilos.codecrow.pipelineagent.generic.service;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+import org.springframework.scheduling.annotation.Scheduled;
 import org.springframework.stereotype.Service;
 
@@ -68,14 +69,12 @@
         // Record this analysis
         recentCommitAnalyses.put(key, now);
 
-        // Cleanup old entries
-        cleanupOldEntries(now);
-
         return false;
     }
 
-    private void cleanupOldEntries(Instant now) {
+    @Scheduled(fixedDelay = 60000)
+    public void cleanupOldEntries() {
+        Instant now = Instant.now();
         recentCommitAnalyses.entrySet().removeIf(entry -> {
             long age = now.getEpochSecond() - entry.getValue().getEpochSecond();

View Issue Details


Id on Platform: 1453

Category: ⚡ Performance

File: .../service/WebhookDeduplicationService.java:72

Issue: The cleanupOldEntries method is called inside the critical path of isDuplicateCommitAnalysis. Under high webhook volume, iterating over the entire ConcurrentHashMap for every request can cause performance degradation and contention.

💡 Suggested Fix

Decouple cleanup from the main request flow using a background scheduled task or use a cache implementation with native TTL (like Caffeine). If keeping current logic, run cleanup only periodically.

--- a/java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/WebhookDeduplicationService.java
+++ b/java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/WebhookDeduplicationService.java
@@ -70,7 +70,9 @@
         recentCommitAnalyses.put(key, now);
         
         // Cleanup old entries
-        cleanupOldEntries(now);
+        if (java.util.concurrent.ThreadLocalRandom.current().nextInt(100) == 0) {
+            cleanupOldEntries(now);
+        }
         
         return false;
     }

View Issue Details


Id on Platform: 1456

Category: 🐛 Bug Risk

File: .../webhookhandler/GitLabMergeRequestWebhookHandler.java:1

Issue: The GitLab handler acquires a lock but fails to pass 'preAcquiredLockKey' to the PrProcessRequest. This may lead to redundant lock attempts or 'AnalysisLockedException' inside the PullRequestAnalysisProcessor. Additionally, it lacks the specific catch blocks for 'DiffTooLargeException' and 'AnalysisLockedException' seen in the GitHub/Bitbucket counterparts.

💡 Suggested Fix

Pass the earlyLock key to the request object and update the catch blocks to re-throw recoverable exceptions.

--- a/java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/gitlab/webhookhandler/GitLabMergeRequestWebhookHandler.java
+++ b/java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/gitlab/webhookhandler/GitLabMergeRequestWebhookHandler.java
@@ -146,6 +146,7 @@
             request.placeholderCommentId = placeholderCommentId;
             request.prAuthorId = payload.prAuthorId();
             request.prAuthorUsername = payload.prAuthorUsername();
+            request.preAcquiredLockKey = earlyLock.get();
             
             log.info("Processing MR analysis: project={}, MR={}, source={}, target={}", 
                     project.getId(), request.pullRequestId, request.sourceBranchName, request.targetBranchName);
@@ -155,6 +156,10 @@
             
             return WebhookResult.success("MR analysis completed", result);
             
+        } catch (DiffTooLargeException | AnalysisLockedException e) {
+            log.warn("MR analysis failed with recoverable exception for project {}: {}", project.getId(), e.getMessage());
+            throw e;
         } catch (Exception e) {
             log.error("MR analysis failed for project {}", project.getId(), e);
             if (placeholderCommentId != null) {

View Issue Details


🔵 Low Severity Issues

Id on Platform: 1440

Category: ✨ Best Practices

File: .../splitter/tree_parser.py:108

Issue: The 'Parser' class is imported inside the 'parse' method on every call. While Python caches imports, it is more idiomatic and slightly more performant to import at the top of the file or handle it in 'init' if availability is confirmed.

💡 Suggested Fix

Move the 'from tree_sitter import Parser' import to the top of the module or into the class constructor.

No suggested fix provided

View Issue Details


Id on Platform: 1441

Category: ✨ Best Practices

File: .../models/scoring_config.py:220

Issue: Using a global variable and manual singleton pattern is generally discouraged in modern Python when using Pydantic or Dependency Injection frameworks. It can make unit testing harder as state persists between tests.

💡 Suggested Fix

Consider using a dependency injection container or passing the config instance explicitly to services that need it, instead of relying on a global singleton.

No suggested fix provided

View Issue Details


Id on Platform: 1442

Category: ✨ Best Practices

File: .../analysis-engine/pom.xml:74

Issue: The 'jtokkit' dependency lacks a version version. Unless this is managed in a section in a parent POM (not visible here), it can lead to non-deterministic builds.

💡 Suggested Fix

Specify a version for the jtokkit dependency (e.g., 1.1.0) to ensure build reproducibility.

--- a/java-ecosystem/libs/analysis-engine/pom.xml
+++ b/java-ecosystem/libs/analysis-engine/pom.xml
@@ -73,4 +73,5 @@
         <dependency>
             <groupId>com.knuddels</groupId>
             <artifactId>jtokkit</artifactId>
+            <version>1.1.0</version>
         </dependency>

View Issue Details


Id on Platform: 1444

Category: ⚡ Performance

File: .../service/JobService.java:228

Issue: Re-fetching the job using findById(job.getId()).orElse(job) inside every lifecycle method is defensive but may lead to redundant database hits if the entity is already managed. However, the developer has intentionally added these in the current diff to handle multi-context transitions.

💡 Suggested Fix

Consider using a managed entity if possible, or verify if the external context transition happens frequently enough to justify the extra DB hit.

--- a/java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/service/JobService.java
+++ b/java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/service/JobService.java
@@ -226,7 +226,9 @@
     @Transactional
     public Job startJob(Job job) {
-        // Re-fetch the job in case it was passed from a different transaction context
-        job = jobRepository.findById(job.getId()).orElse(job);
+        if (job.getId() != null) {
+            // Re-fetch the job in case it was passed from a different transaction context
+            job = jobRepository.findById(job.getId()).orElse(job);
+        }
         job.start();

View Issue Details


Id on Platform: 1454

Category: 🧹 Code Quality

File: .../processor/WebhookAsyncProcessor.java:478

Issue: The newly added postInfoToVcs method and the existing postErrorToVcs method share significant structural logic regarding VcsReportingService retrieval and placeholder comment conditional logic.

💡 Suggested Fix

Extract a private helper method for posting/updating comments to reduce duplication.

--- a/java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/WebhookAsyncProcessor.java
+++ b/java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/WebhookAsyncProcessor.java
@@ -512,11 +512,11 @@
-    private void postInfoToVcs(EVcsProvider provider, Project project, WebhookPayload payload, 
-                               String infoMessage, String placeholderCommentId, Job job) {
+    private void postMessageToVcs(EVcsProvider provider, Project project, WebhookPayload payload, 
+                               String message, String placeholderCommentId, String logLabel) {
         try {
             if (payload.pullRequestId() == null) {
                 return;
             }
             
             VcsReportingService reportingService = vcsServiceFactory.getReportingService(provider);
             
             if (placeholderCommentId != null) {
                 reportingService.updateComment(
                     project,
                     Long.parseLong(payload.pullRequestId()),
                     placeholderCommentId,
-                    infoMessage,
+                    message,
                     CODECROW_COMMAND_MARKER
                 );
-                log.info("Updated placeholder comment {} with info message for PR {}", placeholderCommentId, payload.pullRequestId());
+                log.info("Updated placeholder comment {} with {} for PR {}", placeholderCommentId, logLabel, payload.pullRequestId());
             } else {
                 reportingService.postComment(
                     project, 
                     Long.parseLong(payload.pullRequestId()), 
-                    infoMessage,
+                    message,
                     CODECROW_COMMAND_MARKER
                 );
-                log.info("Posted info message to PR {}", payload.pullRequestId());
+                log.info("Posted {} to PR {}", logLabel, payload.pullRequestId());
             }
         } catch (Exception e) {
-            log.error("Failed to post info to VCS: {}", e.getMessage());
+            log.error("Failed to post {} to VCS: {}", logLabel, e.getMessage());
         }
     }

View Issue Details


Id on Platform: 1455

Category: ✨ Best Practices

File: .../processor/WebhookAsyncProcessor.java:516

Issue: The method postInfoToVcs contains logic that is almost identical to postErrorToVcs. This duplication increases maintenance overhead and potential for inconsistent reporting logic.

💡 Suggested Fix

Consider refactoring these into a generic 'postMessageToVcs' method that takes the message content and log context as parameters.

--- a/java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/WebhookAsyncProcessor.java
+++ b/java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/WebhookAsyncProcessor.java
@@ -512,11 +512,11 @@
-    private void postInfoToVcs(EVcsProvider provider, Project project, WebhookPayload payload, 
-                               String infoMessage, String placeholderCommentId, Job job) {
+    private void postMessageToVcs(EVcsProvider provider, Project project, WebhookPayload payload, 
+                               String message, String placeholderCommentId, String logLabel) {
         try {
             if (payload.pullRequestId() == null) {
                 return;
             }
             
             VcsReportingService reportingService = vcsServiceFactory.getReportingService(provider);
             
             if (placeholderCommentId != null) {
                 reportingService.updateComment(
                     project,
                     Long.parseLong(payload.pullRequestId()),
                     placeholderCommentId,
-                    infoMessage,
+                    message,
                     CODECROW_COMMAND_MARKER
                 );
-                log.info("Updated placeholder comment {} with info message for PR {}", placeholderCommentId, payload.pullRequestId());
+                log.info("Updated placeholder comment {} with {} for PR {}", placeholderCommentId, logLabel, payload.pullRequestId());
             } else {
                 reportingService.postComment(
                     project, 
                     Long.parseLong(payload.pullRequestId()), 
-                    infoMessage,
+                    message,
                     CODECROW_COMMAND_MARKER
                 );
-                log.info("Posted info message to PR {}", payload.pullRequestId());
+                log.info("Posted {} to PR {}", logLabel, payload.pullRequestId());
             }
         } catch (Exception e) {
-            log.error("Failed to post info to VCS: {}", e.getMessage());
+            log.error("Failed to post {} to VCS: {}", logLabel, e.getMessage());
         }
     }

View Issue Details


ℹ️ Informational Notes

Id on Platform: 1443

Category: 🏗️ Architecture

File: .../core/index_manager.py:59

Issue: The hardcoded removal of 'SemanticCodeSplitter' and the environment variable check 'RAG_USE_AST_SPLITTER' eliminates the option for users to use the simpler regex-based splitter if tree-sitter installation fails in specific environments. While the code mentions internal fallback, the explicit configuration control is lost.

💡 Suggested Fix

Consider keeping the configuration check but defaulting it to true, or ensuring the internal fallback in ASTCodeSplitter is equivalent in performance/behavior to the removed SemanticCodeSplitter.

No suggested fix provided

View Issue Details


Id on Platform: 1451

Category: ✨ Best Practices

File: .../job/JobRepository.java:1

Issue: The custom query 'DELETE FROM Job j WHERE j.id = :jobId' is redundant because JpaRepository already provides 'deleteById(ID id)'. Using standard repository methods is generally preferred over custom @query for simple primary key operations.

💡 Suggested Fix

Remove the custom deleteJobById method and use the built-in deleteById method provided by JpaRepository.

--- a/java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/job/JobRepository.java
+++ b/java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/job/JobRepository.java
@@ -101,8 +101,4 @@
     @Modifying
     @Query("DELETE FROM Job j WHERE j.project.id = :projectId")
     void deleteByProjectId(@Param("projectId") Long projectId);
-
-    @Modifying
-    @Query("DELETE FROM Job j WHERE j.id = :jobId")
-    void deleteJobById(@Param("jobId") Long jobId);
 }

View Issue Details


Id on Platform: 1457

Category: ✨ Best Practices

File: .../command/ReviewCommandProcessor.java:1

Issue: Logging internal database IDs (aiConnectionId) and specific provider keys in production logs can sometimes lead to information disclosure if logs are not properly secured. While not a direct security breach, it is better to log non-sensitive correlation IDs.

💡 Suggested Fix

Ensure the aiConnectionId is considered safe to log within your organization's security policy, or use a masked version/external identifier.

No suggested fix provided

View Issue Details


Id on Platform: 1458

Category: 🏗️ Architecture

File: .../config/AsyncConfig.java:1

Issue: The webhook executor now uses a caller-runs fallback. While this prevents task loss and now includes logging, it can cause the main thread to block under heavy load, potentially impacting webhook response times if the queue fills up frequently.

💡 Suggested Fix

Consider using a DiscardOldestPolicy or increasing queue capacity if blocking the caller (likely the HTTP server thread) is undesirable.

No suggested fix provided

View Issue Details


Files Affected

  • .../splitter/metadata.py: 2 issues
  • .../analysis/PullRequestAnalysisProcessor.java: 2 issues
  • .../service/WebhookDeduplicationService.java: 2 issues
  • .../service/ProjectService.java: 2 issues
  • .../index_manager/point_operations.py: 2 issues
  • .../processor/WebhookAsyncProcessor.java: 2 issues
  • .../index_manager/indexer.py: 2 issues
  • .../ai/AIConnection.java: 2 issues
  • .../core/index_manager.py: 1 issue
  • .../webhookhandler/GitLabMergeRequestWebhookHandler.java: 1 issue

@rostilos rostilos closed this Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants