Skip to content

Adopt Turso-inspired performance patterns: batch inserts, lazy schema, and LRU cache#489

Merged
iberi22 merged 2 commits into
mainfrom
perf-turso-patterns-18222215058586446253
Jun 4, 2026
Merged

Adopt Turso-inspired performance patterns: batch inserts, lazy schema, and LRU cache#489
iberi22 merged 2 commits into
mainfrom
perf-turso-patterns-18222215058586446253

Conversation

@iberi22
Copy link
Copy Markdown
Owner

@iberi22 iberi22 commented Jun 4, 2026

This change adopts several performance patterns inspired by Turso to improve Xavier's scalability and responsiveness.

Key improvements:

  • Batching: Added insert_symbols_batch, insert_chunks_batch, and insert_embeddings_batch to CodebaseDb. These methods use libSQL transactions to drastically reduce I/O overhead during initial repository indexing.
  • Lazy Initialization: Database schemas for both codebase and conversations are now created on-demand using OnceCell, preventing unnecessary table creation for inactive projects.
  • LRU Connection Pooling: ConnectionManager now uses moka::future::Cache to manage active database connections with a capacity limit and idle timeout, replacing the previous manual eviction logic.
  • Asynchronous I/O: The database layer is now unified on libsql for asynchronous operations, and heavy CPU-bound tasks are offloaded to spawn_blocking to avoid blocking the Tokio executor.

Verified with unit tests in src/codebase/db.rs, src/codebase/conversations_db.rs, and src/codebase/connection_manager.rs.

Fixes #427


PR created automatically by Jules for task 18222215058586446253 started by @iberi22

- Implement batch insertion for symbols, chunks, and embeddings in CodebaseDb using transactions.
- Add lazy schema initialization in CodebaseDb and ConversationsDb using tokio::sync::OnceCell.
- Refactor ConnectionManager to use moka::future::Cache for LRU connection pooling and libsql for async database connections.
- Offload heavy CPU-bound tasks (embedding serialization and FTS population) to tokio::task::spawn_blocking.
- Add unit tests for batch inserts and connection management.
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

Warning

Review limit reached

@iberi22, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 51 minutes and 19 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8286d4cb-6955-4c79-98b7-54926101b3c3

📥 Commits

Reviewing files that changed from the base of the PR and between cb207d2 and c51620a.

📒 Files selected for processing (3)
  • src/codebase/connection_manager.rs
  • src/codebase/conversations_db.rs
  • src/codebase/db.rs
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf-turso-patterns-18222215058586446253

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates the ConnectionManager from r2d2 and DashMap to moka::future::Cache and libsql::Connection, making connection management fully asynchronous. It also introduces lazy schema initialization using OnceCell in both ConversationsDb and CodebaseDb, ensuring schemas are created on-demand before any database operations. Additionally, CodebaseDb is enhanced with batch insertion capabilities for symbols, chunks, and embeddings, along with FTS index population. Feedback on the changes highlights several performance and correctness improvements in CodebaseDb: using INSERT OR REPLACE instead of INSERT for batch embedding inserts to ensure idempotency, avoiding unnecessary string cloning in symbol and chunk batch inserts by using .as_str() and .as_deref(), and avoiding the anti-pattern of using spawn_blocking combined with block_on for the asynchronous execute_batch call in populate_fts.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/codebase/db.rs Outdated
Comment on lines +311 to +316
for (id, blob) in embeddings_with_blobs {
tx.execute(
"INSERT INTO code_embeddings (id, embedding) VALUES (?1, ?2)",
params![id, blob],
).await?;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using INSERT instead of INSERT OR REPLACE for batch inserting embeddings will cause unique constraint violations and abort the transaction if any chunk's embedding is updated or re-inserted. Using INSERT OR REPLACE ensures idempotency.

Suggested change
for (id, blob) in embeddings_with_blobs {
tx.execute(
"INSERT INTO code_embeddings (id, embedding) VALUES (?1, ?2)",
params![id, blob],
).await?;
}
for (id, blob) in embeddings_with_blobs {
tx.execute(
"INSERT OR REPLACE INTO code_embeddings (id, embedding) VALUES (?1, ?2)",
params![id, blob],
).await?;
}

Comment thread src/codebase/db.rs Outdated
Comment on lines +259 to +270
for s in symbols {
tx.execute(
"INSERT OR REPLACE INTO symbols
(id, name, kind, file_path, line_start, line_end, signature, visibility, doc_comment, language, module_path, complexity)
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12)",
params![
s.id.clone(), s.name.clone(), s.kind.clone(), s.file_path.clone(),
s.line_start, s.line_end, s.signature.clone(), s.visibility.clone(),
s.doc_comment.clone(), s.language.clone(), s.module_path.clone(), s.complexity
],
).await?;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Cloning all strings (s.id.clone(), s.name.clone(), etc.) inside the loop for every symbol in a batch is inefficient and allocates memory unnecessarily. Since libsql's params! macro accepts references, you can use .as_str() and .as_deref() to avoid these allocations.

Suggested change
for s in symbols {
tx.execute(
"INSERT OR REPLACE INTO symbols
(id, name, kind, file_path, line_start, line_end, signature, visibility, doc_comment, language, module_path, complexity)
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12)",
params![
s.id.clone(), s.name.clone(), s.kind.clone(), s.file_path.clone(),
s.line_start, s.line_end, s.signature.clone(), s.visibility.clone(),
s.doc_comment.clone(), s.language.clone(), s.module_path.clone(), s.complexity
],
).await?;
}
for s in symbols {
tx.execute(
"INSERT OR REPLACE INTO symbols
(id, name, kind, file_path, line_start, line_end, signature, visibility, doc_comment, language, module_path, complexity)
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12)",
params![
s.id.as_str(), s.name.as_str(), s.kind.as_str(), s.file_path.as_str(),
s.line_start, s.line_end, s.signature.as_deref(), s.visibility.as_deref(),
s.doc_comment.as_deref(), s.language.as_str(), s.module_path.as_deref(), s.complexity
],
).await?;
}

Comment thread src/codebase/db.rs Outdated
Comment on lines +279 to +285
for c in chunks {
tx.execute(
"INSERT OR REPLACE INTO code_chunks (id, path, content, language, symbol_id, tokens)
VALUES (?1, ?2, ?3, ?4, ?5, ?6)",
params![c.id.clone(), c.path.clone(), c.content.clone(), c.language.clone(), c.symbol_id.clone(), c.tokens],
).await?;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Cloning c.content and other strings inside the loop for every chunk in a batch is highly inefficient, especially since code chunks can contain very large file contents. Passing references using .as_str() and .as_deref() avoids these expensive allocations.

Suggested change
for c in chunks {
tx.execute(
"INSERT OR REPLACE INTO code_chunks (id, path, content, language, symbol_id, tokens)
VALUES (?1, ?2, ?3, ?4, ?5, ?6)",
params![c.id.clone(), c.path.clone(), c.content.clone(), c.language.clone(), c.symbol_id.clone(), c.tokens],
).await?;
}
for c in chunks {
tx.execute(
"INSERT OR REPLACE INTO code_chunks (id, path, content, language, symbol_id, tokens)
VALUES (?1, ?2, ?3, ?4, ?5, ?6)",
params![c.id.as_str(), c.path.as_str(), c.content.as_str(), c.language.as_deref(), c.symbol_id.as_deref(), c.tokens],
).await?;
}

Comment thread src/codebase/db.rs Outdated
Comment on lines +512 to +522
let conn = self.conn.clone();
tokio::task::spawn_blocking(move || {
let rt = tokio::runtime::Handle::current();
rt.block_on(async {
conn.execute_batch(&populate_fts_from_chunks_sql())
.await
.context("failed to populate FTS index")?;
Ok::<(), anyhow::Error>(())
})
}).await.context("spawn_blocking for FTS population failed")??;
Ok(())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Spawning a blocking task with spawn_blocking only to immediately block on an asynchronous operation (conn.execute_batch) using block_on is an anti-pattern. It wastes a thread pool thread and adds unnecessary overhead. Since execute_batch is already asynchronous, you can directly .await it in the main async context.

        self.conn.execute_batch(&populate_fts_from_chunks_sql())
            .await
            .context("failed to populate FTS index")?;
        Ok(())

- Resolved conflicts in connection_manager.rs, conversations_db.rs, db.rs
- Preserved PR #489's batch inserts (insert_symbols_batch, insert_chunks_batch, insert_embeddings_batch)
- Preserved lazy schema initialization via OnceCell
- Adapted all methods to use ConnectionManager::with_conn (r2d2) instead of libsql async
- Updated tests to use ConnectionManager API
@iberi22 iberi22 merged commit dc70923 into main Jun 4, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jules Assigned to Google Jules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[perf] Adoptar patrones de Turso: batch inserts, lazy schema, LRU cache, spawn_blocking

1 participant