Feature/pr analysis rate limiting #101

rostilos · 2026-01-28T00:36:26Z

No description provided.

- Updated JobService to use REQUIRES_NEW transaction propagation for deleting ignored jobs, ensuring fresh entity retrieval and preventing issues with the calling transaction. - Removed token limitation from AI connection model and related DTOs, transitioning to project-level configuration for token limits. - Adjusted AIConnectionDTO tests to reflect the removal of token limitation. - Enhanced Bitbucket, GitHub, and GitLab AI client services to check token limits before analysis, throwing DiffTooLargeException when limits are exceeded. - Updated command processors to utilize project-level token limits instead of AI connection-specific limits. - Modified webhook processing to handle diff size issues gracefully, posting informative messages to VCS when analysis is skipped due to large diffs. - Cleaned up integration tests to remove references to token limitation in AI connection creation and updates.

…sis processing. Project PR analysis max analysis token limit implementation

…Exception in webhook processors

… entities from async contexts

…ies without re-fetching in async contexts

… lazy loading of associations

…cy across transaction contexts

…oading of associations

…ansaction management in async context

…mproved job management

…mmit analysis

…xt management

…ndling

…ervice for direct deletion

…d of deleting

rostilos · 2026-01-28T00:36:36Z

/codecrow analyze

coderabbitai · 2026-01-28T00:36:37Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…nt in webhook processing

…n for RAG context

- Added AST-based code splitter using Tree-sitter for accurate code parsing. - Introduced TreeSitterParser for dynamic language loading and caching. - Created scoring configuration for RAG query result reranking with configurable boost factors and priority patterns. - Refactored RAGQueryService to utilize the new scoring configuration for enhanced result ranking. - Improved metadata extraction and handling for better context in scoring.

rostilos · 2026-01-28T00:44:33Z

/codecrow analyze

…rove code referencing in prompts

rostilos · 2026-01-28T00:55:35Z

/codecrow analyze

codecrow-ai · 2026-01-28T00:55:38Z

⚠️ CodeCrow Command Failed

Comment commands are not enabled for this project

Check the job logs in CodeCrow for detailed error information.

codecrow-local · 2026-01-28T00:56:16Z

⚠️ Code Analysis Results

Summary

Pull Request Review: Feature/pr analysis rate limiting


Status	FAIL
Risk Level	CRITICAL
Review Coverage	5 files analyzed in depth
Confidence	HIGH

Executive Summary

This PR introduces rate limiting and performance optimizations across the Java and Python ecosystems, specifically targeting the PR analysis engine and RAG pipeline. While the PR successfully implements lock-management refinements and memory optimizations, it introduces critical breaking changes in the token management architecture and inconsistent locking protocols across VCS providers.

Recommendation

Decision: FAIL

This PR is rejected due to critical architectural regressions that will cause immediate compilation and runtime failures. Specifically, the removal of core fields in the AIConnection entity without updating downstream consumers and the inconsistent implementation of the locking protocol in the GitLab handler must be resolved before this can be merged.

Architectural & Cross-File Concerns

1. Breaking Change in Token Limitation Management (CRITICAL)

The migration of tokenLimitation from the AIConnection entity to project-level settings is incomplete. While the fields were removed from the core model and DTOs, several service classes and test helpers still attempt to access these non-existent methods. This will lead to build failures in CI/CD and runtime crashes in the integration test suite.

2. Inconsistent Webhook & Locking Protocol (HIGH)

The implementation of the new locking protocol is inconsistent across VCS providers. The GitLab handler fails to propagate the preAcquiredLockKey to the analysis processor, unlike the GitHub and Bitbucket counterparts. This inconsistency will result in AnalysisLockedException errors and stalled processing for GitLab-based projects.

3. Data Integrity & Resource Management (MEDIUM)

The RAG pipeline indexing logic lacks a robust cleanup mechanism for temporary collections. In the event of a failure during the swap operation, orphaned collections will remain in the vector store, leading to potential storage exhaustion and data drift over time.

Issues Overview

Severity	Count
🔴 High	3	Critical issues requiring immediate attention
🟡 Medium	13	Issues that should be addressed
🔵 Low	6	Minor issues and improvements
ℹ️ Info	4	Informational notes and suggestions
✅ Resolved	4	Resolved issues

Analysis completed on 2026-01-28 00:56:14 | View Full Report | Pull Request

📋 Detailed Issues (26)

🔴 High Severity Issues

Id on Platform: 1434

Category: 🐛 Bug Risk

File: .../index_manager/point_operations.py

Issue: The 'chunk_index' used for generating deterministic point IDs is still reset for every file path. While using a UUID for missing paths prevents collision between different 'unknown' files, the logic still relies on the list order of chunks within those generated paths. If the same data is re-indexed, the UUID will be different, leading to duplicate entries in the vector store instead of updates.

💡 Suggested Fix

Use a hash of the content combined with the path (if available) to generate a truly deterministic ID that persists across indexing runs, similar to the implementation in 'ast_splitter.py'.

--- a/python-ecosystem/rag-pipeline/src/rag_pipeline/core/index_manager/point_operations.py
+++ b/python-ecosystem/rag-pipeline/src/rag_pipeline/core/index_manager/point_operations.py
@@ -52,3 +52,3 @@
         for chunk in chunks:
-            path = chunk.metadata.get("path", "unknown")
+            path = chunk.metadata.get("path", str(uuid.uuid4()))
             if path not in chunks_by_file:

Feature/pr analysis rate limiting #101

Feature/pr analysis rate limiting #101

Uh oh!

Conversation

rostilos commented Jan 28, 2026

Uh oh!

rostilos commented Jan 28, 2026

Uh oh!

coderabbitai bot commented Jan 28, 2026

Review skipped

Uh oh!

rostilos commented Jan 28, 2026

Uh oh!

rostilos commented Jan 28, 2026

Uh oh!

codecrow-ai bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecrow-local bot commented Jan 28, 2026

⚠️ Code Analysis Results

Summary

Pull Request Review: Feature/pr analysis rate limiting

Executive Summary

Recommendation

Architectural & Cross-File Concerns

1. Breaking Change in Token Limitation Management (CRITICAL)

2. Inconsistent Webhook & Locking Protocol (HIGH)

3. Data Integrity & Resource Management (MEDIUM)

Issues Overview

🔴 High Severity Issues

🟡 Medium Severity Issues

🔵 Low Severity Issues

ℹ️ Informational Notes

Files Affected

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecrow-ai bot commented Jan 28, 2026 •

edited

Loading