Adopt Turso-inspired performance patterns: batch inserts, lazy schema, and LRU cache by iberi22 · Pull Request #489 · iberi22/xavier

iberi22 · 2026-06-04T07:39:42Z

This change adopts several performance patterns inspired by Turso to improve Xavier's scalability and responsiveness.

Key improvements:

Batching: Added insert_symbols_batch, insert_chunks_batch, and insert_embeddings_batch to CodebaseDb. These methods use libSQL transactions to drastically reduce I/O overhead during initial repository indexing.
Lazy Initialization: Database schemas for both codebase and conversations are now created on-demand using OnceCell, preventing unnecessary table creation for inactive projects.
LRU Connection Pooling: ConnectionManager now uses moka::future::Cache to manage active database connections with a capacity limit and idle timeout, replacing the previous manual eviction logic.
Asynchronous I/O: The database layer is now unified on libsql for asynchronous operations, and heavy CPU-bound tasks are offloaded to spawn_blocking to avoid blocking the Tokio executor.

Verified with unit tests in src/codebase/db.rs, src/codebase/conversations_db.rs, and src/codebase/connection_manager.rs.

Fixes #427

PR created automatically by Jules for task 18222215058586446253 started by @iberi22

- Implement batch insertion for symbols, chunks, and embeddings in CodebaseDb using transactions. - Add lazy schema initialization in CodebaseDb and ConversationsDb using tokio::sync::OnceCell. - Refactor ConnectionManager to use moka::future::Cache for LRU connection pooling and libsql for async database connections. - Offload heavy CPU-bound tasks (embedding serialization and FTS population) to tokio::task::spawn_blocking. - Add unit tests for batch inserts and connection management.

google-labs-jules · 2026-06-04T07:39:44Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

coderabbitai · 2026-06-04T07:39:51Z

Warning

Review limit reached

@iberi22, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 51 minutes and 19 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8286d4cb-6955-4c79-98b7-54926101b3c3

📥 Commits

Reviewing files that changed from the base of the PR and between cb207d2 and c51620a.

📒 Files selected for processing (3)

src/codebase/connection_manager.rs
src/codebase/conversations_db.rs
src/codebase/db.rs

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf-turso-patterns-18222215058586446253

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request migrates the ConnectionManager from r2d2 and DashMap to moka::future::Cache and libsql::Connection, making connection management fully asynchronous. It also introduces lazy schema initialization using OnceCell in both ConversationsDb and CodebaseDb, ensuring schemas are created on-demand before any database operations. Additionally, CodebaseDb is enhanced with batch insertion capabilities for symbols, chunks, and embeddings, along with FTS index population. Feedback on the changes highlights several performance and correctness improvements in CodebaseDb: using INSERT OR REPLACE instead of INSERT for batch embedding inserts to ensure idempotency, avoiding unnecessary string cloning in symbol and chunk batch inserts by using .as_str() and .as_deref(), and avoiding the anti-pattern of using spawn_blocking combined with block_on for the asynchronous execute_batch call in populate_fts.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-04T07:41:54Z

+        for (id, blob) in embeddings_with_blobs {
+            tx.execute(
+                "INSERT INTO code_embeddings (id, embedding) VALUES (?1, ?2)",
+                params![id, blob],
+            ).await?;
+        }


Using INSERT instead of INSERT OR REPLACE for batch inserting embeddings will cause unique constraint violations and abort the transaction if any chunk's embedding is updated or re-inserted. Using INSERT OR REPLACE ensures idempotency.

Suggested change

for (id, blob) in embeddings_with_blobs {

tx.execute(

"INSERT INTO code_embeddings (id, embedding) VALUES (?1, ?2)",

params![id, blob],

).await?;

}

for (id, blob) in embeddings_with_blobs {

tx.execute(

"INSERT OR REPLACE INTO code_embeddings (id, embedding) VALUES (?1, ?2)",

params![id, blob],

).await?;

}

gemini-code-assist · 2026-06-04T07:41:54Z

+        for s in symbols {
+            tx.execute(
+                "INSERT OR REPLACE INTO symbols
+                 (id, name, kind, file_path, line_start, line_end, signature, visibility, doc_comment, language, module_path, complexity)
+                 VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12)",
+                params![
+                    s.id.clone(), s.name.clone(), s.kind.clone(), s.file_path.clone(),
+                    s.line_start, s.line_end, s.signature.clone(), s.visibility.clone(),
+                    s.doc_comment.clone(), s.language.clone(), s.module_path.clone(), s.complexity
+                ],
+            ).await?;
+        }


Cloning all strings (s.id.clone(), s.name.clone(), etc.) inside the loop for every symbol in a batch is inefficient and allocates memory unnecessarily. Since libsql's params! macro accepts references, you can use .as_str() and .as_deref() to avoid these allocations.

Suggested change

for s in symbols {

tx.execute(

"INSERT OR REPLACE INTO symbols

(id, name, kind, file_path, line_start, line_end, signature, visibility, doc_comment, language, module_path, complexity)

VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12)",

params![

s.id.clone(), s.name.clone(), s.kind.clone(), s.file_path.clone(),

s.line_start, s.line_end, s.signature.clone(), s.visibility.clone(),

s.doc_comment.clone(), s.language.clone(), s.module_path.clone(), s.complexity

],

).await?;

}

for s in symbols {

tx.execute(

"INSERT OR REPLACE INTO symbols

(id, name, kind, file_path, line_start, line_end, signature, visibility, doc_comment, language, module_path, complexity)

VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12)",

params![

s.id.as_str(), s.name.as_str(), s.kind.as_str(), s.file_path.as_str(),

s.line_start, s.line_end, s.signature.as_deref(), s.visibility.as_deref(),

s.doc_comment.as_deref(), s.language.as_str(), s.module_path.as_deref(), s.complexity

],

).await?;

}

gemini-code-assist · 2026-06-04T07:41:54Z

+        for c in chunks {
+            tx.execute(
+                "INSERT OR REPLACE INTO code_chunks (id, path, content, language, symbol_id, tokens)
+                 VALUES (?1, ?2, ?3, ?4, ?5, ?6)",
+                params![c.id.clone(), c.path.clone(), c.content.clone(), c.language.clone(), c.symbol_id.clone(), c.tokens],
+            ).await?;
+        }


Cloning c.content and other strings inside the loop for every chunk in a batch is highly inefficient, especially since code chunks can contain very large file contents. Passing references using .as_str() and .as_deref() avoids these expensive allocations.

Suggested change

for c in chunks {

tx.execute(

"INSERT OR REPLACE INTO code_chunks (id, path, content, language, symbol_id, tokens)

VALUES (?1, ?2, ?3, ?4, ?5, ?6)",

params![c.id.clone(), c.path.clone(), c.content.clone(), c.language.clone(), c.symbol_id.clone(), c.tokens],

).await?;

}

for c in chunks {

tx.execute(

"INSERT OR REPLACE INTO code_chunks (id, path, content, language, symbol_id, tokens)

VALUES (?1, ?2, ?3, ?4, ?5, ?6)",

params![c.id.as_str(), c.path.as_str(), c.content.as_str(), c.language.as_deref(), c.symbol_id.as_deref(), c.tokens],

).await?;

}

gemini-code-assist · 2026-06-04T07:41:54Z

+        let conn = self.conn.clone();
+        tokio::task::spawn_blocking(move || {
+            let rt = tokio::runtime::Handle::current();
+            rt.block_on(async {
+                conn.execute_batch(&populate_fts_from_chunks_sql())
+                    .await
+                    .context("failed to populate FTS index")?;
+                Ok::<(), anyhow::Error>(())
+            })
+        }).await.context("spawn_blocking for FTS population failed")??;
+        Ok(())


Spawning a blocking task with spawn_blocking only to immediately block on an asynchronous operation (conn.execute_batch) using block_on is an anti-pattern. It wastes a thread pool thread and adds unnecessary overhead. Since execute_batch is already asynchronous, you can directly .await it in the main async context.

self.conn.execute_batch(&populate_fts_from_chunks_sql()) .await .context("failed to populate FTS index")?; Ok(())

- Resolved conflicts in connection_manager.rs, conversations_db.rs, db.rs - Preserved PR #489's batch inserts (insert_symbols_batch, insert_chunks_batch, insert_embeddings_batch) - Preserved lazy schema initialization via OnceCell - Adapted all methods to use ConnectionManager::with_conn (r2d2) instead of libsql async - Updated tests to use ConnectionManager API

google-labs-jules Bot mentioned this pull request Jun 4, 2026

[perf] Adoptar patrones de Turso: batch inserts, lazy schema, LRU cache, spawn_blocking #427

Closed

gemini-code-assist Bot reviewed Jun 4, 2026

View reviewed changes

iberi22 added jules Assigned to Google Jules and removed jules Assigned to Google Jules labels Jun 4, 2026

iberi22 marked this pull request as ready for review June 4, 2026 13:41

iberi22 merged commit dc70923 into main Jun 4, 2026
1 of 2 checks passed

This was referenced Jun 4, 2026

[security] Sanitize project_id in conversations_db.rs to prevent path traversal #494

Closed

Adopt Turso-inspired performance patterns for Xavier database #484

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adopt Turso-inspired performance patterns: batch inserts, lazy schema, and LRU cache#489

Adopt Turso-inspired performance patterns: batch inserts, lazy schema, and LRU cache#489
iberi22 merged 2 commits into
mainfrom
perf-turso-patterns-18222215058586446253

iberi22 commented Jun 4, 2026

Uh oh!

google-labs-jules Bot commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Review limit reached

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

iberi22 commented Jun 4, 2026

Uh oh!

google-labs-jules Bot commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading