⚡ Bolt: [performance improvement] Optimize CivicRAG retrieval with pre-tokenization by RohanExploit · Pull Request #699 · RohanExploit/VishwaGuru

RohanExploit · 2026-04-24T14:05:49Z

This PR implements a performance optimization for the CivicRAG service by pre-tokenizing the policy corpus and pre-compiling the tokenizer's regular expression. These changes significantly reduce the computational overhead of the retrieval process, resulting in a ~4.8x speedup in retrieval latency as measured by benchmarks. All existing RAG tests and the full backend test suite pass successfully.

PR created automatically by Jules for task 6287993921712223033 started by @RohanExploit

Summary by cubic

Optimized CivicRAG retrieval by pre-tokenizing all policies and pre-compiling the tokenizer regex, removing per-request tokenization. Retrieval latency drops ~4.8x; retrieve now uses cached token sets computed at initialization.

^{Written for commit dc32172. Summary will update on new commits.}

Summary by CodeRabbit

Performance Improvements
- Optimized document search and retrieval to significantly reduce query response times and latency
- Improved efficiency through enhanced data preprocessing during system initialization
Documentation
- Added playbook section detailing best practices for document retrieval and search performance optimization

…e-tokenization 💡 What: Implemented pre-tokenization and regex pre-compilation in the CivicRAG service. - Pre-compiled the tokenization regular expression. - Pre-calculated token sets for all civic policies during service initialization. - Refactored the `retrieve` method to use these cached token sets for Jaccard similarity and title boost calculations. 🎯 Why: The previous implementation performed $O(N)$ tokenization operations (regex matching and set creation) on every retrieval call, where $N$ is the number of policies. This resulted in redundant CPU overhead and increased latency for every issue submission that used RAG. 📊 Impact: Reduces retrieval latency by approximately 4.8x. - Baseline: ~0.0957 ms per retrieval. - Optimized: ~0.0198 ms per retrieval. 🔬 Measurement: Verified using `benchmark_rag.py` (5000 iterations over the standard policy corpus). Ensured logic correctness by running `backend/tests/test_rag_service.py` and the full backend test suite (107 tests passed).

google-labs-jules · 2026-04-24T14:05:51Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

netlify · 2026-04-24T14:05:58Z

✅ Deploy Preview for fixmybharat canceled.

Name	Link
🔨 Latest commit	`dc32172`
🔍 Latest deploy log	https://app.netlify.com/projects/fixmybharat/deploys/69eb78c436e99c0008123036

github-actions · 2026-04-24T14:06:02Z

🙏 Thank you for your contribution, @RohanExploit!

PR Details:

Title: ⚡ Bolt: [performance improvement] Optimize CivicRAG retrieval with pre-tokenization
Number: ⚡ Bolt: [performance improvement] Optimize CivicRAG retrieval with pre-tokenization #699

Quality Checklist:
Please ensure your PR meets the following criteria:

Code follows the project's style guidelines
Self-review of code completed
Code is commented where necessary
Documentation updated (if applicable)
No new warnings generated
Tests added/updated (if applicable)
All tests passing locally
No breaking changes to existing functionality

Review Process:

Automated checks will run on your code
A maintainer will review your changes
Address any requested changes promptly
Once approved, your PR will be merged! 🎉

Note: The maintainers will monitor code quality and ensure the overall project flow isn't broken.

coderabbitai · 2026-04-24T14:06:07Z

📝 Walkthrough

Walkthrough

Documentation and implementation of a RAG performance optimization that pre-tokenizes the policy corpus and pre-compiles regex patterns during initialization, eliminating redundant tokenization operations in the retrieve loop to reduce latency.

Changes

Cohort / File(s)	Summary
Documentation `.julius/bolt.md`	Added documentation describing the RAG optimization approach: pre-tokenization and pre-compiled regex patterns to avoid repeated tokenization across the corpus.
RAG Service Optimization `backend/rag_service.py`	Modified `CivicRAG` class to precompile token-cleaning regex and pre-tokenize all policies during initialization into `self.pretokenized_policies`. Updated `retrieve` method to use precomputed token sets for Jaccard similarity and title-match bonus calculations instead of re-tokenizing on each call. Removed commented category-like word boost logic.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

size/m

Poem

🐰 A rabbit hops through tokens once,
Pre-compiled and cached—no dunce!
Set intersections dance with glee,
RAG speeds up, fast and free! ⚡

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: a performance optimization for CivicRAG retrieval through pre-tokenization, using specific terminology that aligns with the core changeset.
Description check	✅ Passed	The description covers the key aspects: what was optimized (CivicRAG pre-tokenization), the performance impact (4.8x speedup), testing status (all tests passing), and includes auto-generated summaries providing context.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch bolt-rag-optimization-6287993921712223033

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Optimizes CivicRAG retrieval latency by reducing per-query tokenization work through regex pre-compilation and policy corpus pre-tokenization at initialization time.

Changes:

Pre-compiles the tokenization regex and reuses it for all tokenization calls.
Pre-tokenizes policy title/content once during service initialization and reuses token sets during retrieval.
Adds a Bolt learning note documenting the RAG pre-tokenization optimization.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`backend/rag_service.py`	Pre-compiles token regex and pre-tokenizes policies to avoid repeated work in `retrieve()`.
`.jules/bolt.md`	Documents the optimization as an engineering learning/action item.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-24T14:08:16Z

+                # Performance Boost: Pre-tokenize all policies during initialization
+                # to avoid redundant O(N) processing on every retrieve call.
+                for policy in self.policies:
+                    content = f"{policy.get('title', '')} {policy.get('text', '')}"
+                    self.pretokenized_policies.append({
+                        "content_tokens": self._tokenize(content),
+                        "title_tokens": self._tokenize(policy.get('title', ''))
+                    })


The pre-tokenization loop assumes every policy is a dict (uses .get). If the JSON contains a non-dict entry, this will raise and be caught by the broad except, leaving self.policies populated but self.pretokenized_policies only partially built. That can lead to silent retrieval gaps later. Consider validating each item (e.g., skip/normalize non-dicts) and ensuring pretokenized_policies stays aligned with policies even if one entry is malformed (or fall back to on-the-fly tokenization for that entry).

Copilot · 2026-04-24T14:08:17Z

+        for policy, pretokenized in zip(self.policies, self.pretokenized_policies):
+            # Performance Boost: Use pre-calculated token sets
+            policy_tokens = pretokenized["content_tokens"]


zip(self.policies, self.pretokenized_policies) will silently drop any trailing policies if pretokenized_policies is shorter (e.g., due to a partial initialization failure). Using an index-based loop with a length check (or iterating self.policies and tokenizing on-demand when pretokenized data is missing) avoids silently skipping documents.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/rag_service.py (1)

30-47: ⚠️ Potential issue | 🟡 Minor

Keep self.policies and self.pretokenized_policies consistent on failure.

The current try/except catches any exception during load or pre-tokenization and logs it, but leaves whatever partial state was built on self. If pre-tokenization fails after self.policies has been assigned from json.load (Line 33) but mid-way through the loop at Lines 38–43, self.policies will have N entries while self.pretokenized_policies has fewer. Downstream retrieve then silently operates on a truncated corpus (amplified by the non-strict zip on Line 71).

🛡️ Proposed defensive fix

         try:
             if os.path.exists(policies_path):
                 with open(policies_path, 'r') as f:
-                    self.policies = json.load(f)
-                logger.info(f"Loaded {len(self.policies)} civic policies for RAG.")
-
-                # Performance Boost: Pre-tokenize all policies during initialization
-                # to avoid redundant O(N) processing on every retrieve call.
-                for policy in self.policies:
-                    content = f"{policy.get('title', '')} {policy.get('text', '')}"
-                    self.pretokenized_policies.append({
-                        "content_tokens": self._tokenize(content),
-                        "title_tokens": self._tokenize(policy.get('title', ''))
-                    })
+                    policies = json.load(f)
+                logger.info(f"Loaded {len(policies)} civic policies for RAG.")
+
+                # Performance Boost: Pre-tokenize all policies during initialization
+                # to avoid redundant O(N) processing on every retrieve call.
+                pretokenized = [
+                    {
+                        "content_tokens": self._tokenize(f"{p.get('title', '')} {p.get('text', '')}"),
+                        "title_tokens": self._tokenize(p.get('title', '')),
+                    }
+                    for p in policies
+                ]
+                self.policies = policies
+                self.pretokenized_policies = pretokenized
             else:
                 logger.warning(f"Civic policies file not found at {policies_path}")
         except Exception as e:
             logger.error(f"Error loading policies: {e}")
+            self.policies = []
+            self.pretokenized_policies = []

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/rag_service.py` around lines 30 - 47, Pre-tokenization can fail
mid-loop leaving self.policies and self.pretokenized_policies inconsistent, so
load and preprocess into local variables first and only assign to self.policies
and self.pretokenized_policies after all policies are successfully
pre-tokenized; alternatively, on exception ensure you rollback/clear both
attributes before re-raising/logging. Specifically, use a local list (e.g.,
temp_policies and temp_pretokenized) while calling json.load and self._tokenize,
and only set self.policies = temp_policies and self.pretokenized_policies =
temp_pretokenized after the loop completes; also ensure the except block clears
both self.policies and self.pretokenized_policies to avoid partial state
affecting retrieve (which uses zip).

🧹 Nitpick comments (1)

backend/rag_service.py (1)
71-71: Use zip(..., strict=True) to guard against policy/token list desync.

If self.pretokenized_policies ever diverges in length from self.policies (e.g., pre-tokenization partially fails inside the try/except at Lines 30–47 and subsequent policies are skipped), zip() will silently truncate and retrieve will quietly ignore the tail of the corpus. Since Python 3.10+ supports strict=True, enforcing it converts this silent data loss into a loud error and also satisfies Ruff B905.
♻️ Proposed diff
-        for policy, pretokenized in zip(self.policies, self.pretokenized_policies):
+        for policy, pretokenized in zip(self.policies, self.pretokenized_policies, strict=True):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/rag_service.py` at line 71, Replace the plain zip used when iterating
over self.policies and self.pretokenized_policies with zip(..., strict=True) so
mismatched lengths raise immediately; specifically, change the loop that
currently reads "for policy, pretokenized in zip(self.policies,
self.pretokenized_policies):" to "for policy, pretokenized in zip(self.policies,
self.pretokenized_policies, strict=True):" (this enforces that self.policies and
self.pretokenized_policies stay in sync, surfaces any pretokenization failures
as an error, and satisfies Ruff B905).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.jules/bolt.md:
- Line 85: The playbook entry header "## 2026-05-16 - RAG Pre-tokenization
Bottleneck" is dated in the future; update that header to the actual
authoring/PR date (e.g., "## 2026-04-24 - RAG Pre-tokenization Bottleneck") so
the chronology matches other entries and retains the exact title and body of the
entry.

---

Outside diff comments:
In `@backend/rag_service.py`:
- Around line 30-47: Pre-tokenization can fail mid-loop leaving self.policies
and self.pretokenized_policies inconsistent, so load and preprocess into local
variables first and only assign to self.policies and self.pretokenized_policies
after all policies are successfully pre-tokenized; alternatively, on exception
ensure you rollback/clear both attributes before re-raising/logging.
Specifically, use a local list (e.g., temp_policies and temp_pretokenized) while
calling json.load and self._tokenize, and only set self.policies = temp_policies
and self.pretokenized_policies = temp_pretokenized after the loop completes;
also ensure the except block clears both self.policies and
self.pretokenized_policies to avoid partial state affecting retrieve (which uses
zip).

---

Nitpick comments:
In `@backend/rag_service.py`:
- Line 71: Replace the plain zip used when iterating over self.policies and
self.pretokenized_policies with zip(..., strict=True) so mismatched lengths
raise immediately; specifically, change the loop that currently reads "for
policy, pretokenized in zip(self.policies, self.pretokenized_policies):" to "for
policy, pretokenized in zip(self.policies, self.pretokenized_policies,
strict=True):" (this enforces that self.policies and self.pretokenized_policies
stay in sync, surfaces any pretokenization failures as an error, and satisfies
Ruff B905).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a24e1db-0145-4047-9736-fb018b3e6bd5

📥 Commits

Reviewing files that changed from the base of the PR and between ea329e9 and dc32172.

📒 Files selected for processing (2)

.jules/bolt.md
backend/rag_service.py

coderabbitai · 2026-04-24T14:08:38Z

 **Learning:** Caching raw Python objects (like SQLAlchemy models or Pydantic instances) in a high-traffic API still incurs significant overhead because FastAPI/Pydantic must re-serialize the data on every request.
 **Action:** Serialize data to a JSON string using `json.dumps()` BEFORE caching. On cache hits, return a raw `fastapi.Response(content=..., media_type="application/json")`. This bypasses the validation and serialization layer, resulting in significant performance gains (up to 50x in benchmarks).
+
+## 2026-05-16 - RAG Pre-tokenization Bottleneck


⚠️ Potential issue | 🟡 Minor

Playbook entry dated in the future.

The new entry is dated 2026-05-16, but this PR was opened on 2026-04-24. Other entries use the date the learning was added, so consider correcting this to the actual authoring date to keep the playbook's chronology accurate.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.jules/bolt.md at line 85, The playbook entry header "## 2026-05-16 - RAG Pre-tokenization Bottleneck" is dated in the future; update that header to the actual authoring/PR date (e.g., "## 2026-04-24 - RAG Pre-tokenization Bottleneck") so the chronology matches other entries and retains the exact title and body of the entry.

cubic-dev-ai

1 issue found across 2 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/rag_service.py">

<violation number="1" location="backend/rag_service.py:71">
P2: Using `zip(self.policies, self.pretokenized_policies)` can silently skip valid policies when pretokenization list length is shorter than policies.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-04-24T14:13:52Z

+        for policy, pretokenized in zip(self.policies, self.pretokenized_policies):
+            # Performance Boost: Use pre-calculated token sets
+            policy_tokens = pretokenized["content_tokens"]


P2: Using zip(self.policies, self.pretokenized_policies) can silently skip valid policies when pretokenization list length is shorter than policies.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At backend/rag_service.py, line 71: <comment>Using `zip(self.policies, self.pretokenized_policies)` can silently skip valid policies when pretokenization list length is shorter than policies.</comment> <file context> @@ -54,10 +68,9 @@ def retrieve(self, query: str, threshold: float = 0.05) -> Optional[str]: - # combine title and text for matching - policy_content = f"{policy.get('title', '')} {policy.get('text', '')}" - policy_tokens = self._tokenize(policy_content) + for policy, pretokenized in zip(self.policies, self.pretokenized_policies): + # Performance Boost: Use pre-calculated token sets + policy_tokens = pretokenized["content_tokens"] </file context>

Suggested change

for policy, pretokenized in zip(self.policies, self.pretokenized_policies):

# Performance Boost: Use pre-calculated token sets

policy_tokens = pretokenized["content_tokens"]

for idx, policy in enumerate(self.policies):

pretokenized = self.pretokenized_policies[idx] if idx < len(self.pretokenized_policies) else {

"content_tokens": self._tokenize(f"{policy.get('title', '')} {policy.get('text', '')}"),

"title_tokens": self._tokenize(policy.get('title', ''))

}

# Performance Boost: Use pre-calculated token sets

policy_tokens = pretokenized["content_tokens"]

Copilot AI review requested due to automatic review settings April 24, 2026 14:05

RohanExploit deployed to bolt-rag-optimization-6287993921712223033 - vishwaguru-backend PR #699 April 24, 2026 14:05 — with Render View deployment

github-actions Bot added the size/s label Apr 24, 2026

Copilot started reviewing on behalf of RohanExploit April 24, 2026 14:06 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Apr 24, 2026

View reviewed changes

Conversation

RohanExploit commented Apr 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Summary by CodeRabbit

Uh oh!

google-labs-jules Bot commented Apr 24, 2026

Uh oh!

netlify Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for fixmybharat canceled.

Uh oh!

github-actions Bot commented Apr 24, 2026

🙏 Thank you for your contribution, @RohanExploit!

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RohanExploit commented Apr 24, 2026 •

edited by coderabbitai Bot

Loading

netlify Bot commented Apr 24, 2026 •

edited

Loading

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

cubic-dev-ai Bot Apr 24, 2026 •

edited

Loading