Skip to content

fix: share ML model instances to reduce startup time#229

Open
harsh-kumar-patwa wants to merge 1 commit intoThe-OpenROAD-Project:masterfrom
harsh-kumar-patwa:topic/fix-long-startup-shared-models
Open

fix: share ML model instances to reduce startup time#229
harsh-kumar-patwa wants to merge 1 commit intoThe-OpenROAD-Project:masterfrom
harsh-kumar-patwa:topic/fix-long-startup-shared-models

Conversation

@harsh-kumar-patwa
Copy link

Summary

Fixes #88

The backend takes a long time to start because RetrieverTools.initialize() creates 6 retriever chains, and each one independently loads its own copy of the embedding model (thenlper/gte-large) and reranker model (BAAI/bge-reranker-base). That is 12 heavy model loads when only 2 are needed, since all 6 chains use the exact same model configuration.

This PR creates both models once at the top of initialize() and passes the shared instances down through HybridRetrieverChainSimilarityRetrieverChainFAISSVectorDatabase. Each chain still builds its own independent FAISS index with its own documents.

Changes

backend/src/agents/retriever_tools.py
Added a _create_embedding_model() factory method that creates the embedding model once based on config. In initialize(), both the embedding model and reranker model are created once and passed to all 6 chain constructors.

backend/src/vectorstores/faiss.py
Added an optional embedding_model parameter to FAISSVectorDatabase.__init__(). When provided, it skips creating a new model and uses the shared one directly.

backend/src/chains/similarity_retriever_chain.py
Added embedding_model parameter and passes it through to FAISSVectorDatabase in create_vector_db().

backend/src/chains/hybrid_retriever_chain.py
Added embedding_model and reranker_model parameters. Passes the embedding model to SimilarityRetrieverChain, and uses the shared reranker in create_hybrid_retriever() with a fallback to creating a new one if not provided.

Test files
Updated mocks in test_retriever_tools.py and test_similarity_retriever_chain.py to account for the new model creation path.

Why sharing is safe

Both models are stateless. The embedding model only runs encode() to produce vectors, and the reranker only runs score() to produce similarity scores. Neither modifies internal state during inference. Each FAISS database still maintains its own independent index. All new parameters default to None, so existing code that doesn't pass shared models continues to work exactly as before.

Benchmark

Tested on local machine (Apple M3, thenlper/gte-large + BAAI/bge-reranker-base):

Before After
Embedding model loads 6 1
Reranker model loads 6 1
Model loading time ~34s ~7s
Speedup 4.9x

Test plan

  • All 343 existing tests pass (uv run pytest tests/ -v)
  • make format passes
  • Full stack verified locally (PostgreSQL + Ollama + backend + frontend)
  • Chat endpoint returns correct RAG responses
  • Streaming endpoint works
  • Conversations persist in database
  • Startup logs show model loaded only once

Copilot AI review requested due to automatic review settings March 1, 2026 13:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces backend startup time by creating the embeddings and reranker models once and sharing those instances across the 6 retriever chains, instead of loading identical models repeatedly per chain.

Changes:

  • Create a shared embedding model and shared cross-encoder reranker once in RetrieverTools.initialize() and pass them down into retriever chains.
  • Add optional embedding_model plumbing through HybridRetrieverChainSimilarityRetrieverChainFAISSVectorDatabase.
  • Update unit tests/mocks to reflect the new initialization path and constructor signature.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
backend/src/agents/retriever_tools.py Adds an embedding-model factory and shares embedding/reranker instances across all retriever chain instances.
backend/src/vectorstores/faiss.py Allows injecting a pre-created embeddings object to avoid redundant model instantiation.
backend/src/chains/similarity_retriever_chain.py Accepts embedding_model and forwards it into FAISSVectorDatabase.
backend/src/chains/hybrid_retriever_chain.py Accepts shared embedding/reranker instances and uses the shared reranker when contextual rerank is enabled.
backend/tests/test_retriever_tools.py Updates initialization tests to patch the new shared-model creation path.
backend/tests/test_similarity_retriever_chain.py Updates constructor assertion to include the new embedding_model argument.
Comments suppressed due to low confidence (1)

backend/src/chains/hybrid_retriever_chain.py:140

  • create_hybrid_retriever() can raise UnboundLocalError because ensemble_retriever is only assigned inside the (similarity_retriever and mmr_retriever and bm25_retriever) block, but it’s used unconditionally when building ContextualCompressionRetriever (and in the else). If processed_docs is empty (or BM25 isn’t created for any reason), ensemble_retriever (and even bm25_retriever) will be undefined. Initialize these locals to None and either build the ensemble with available retrievers or raise a clear error before referencing them.
            self.retriever = ContextualCompressionRetriever(
                base_compressor=compressor, base_retriever=ensemble_retriever
            )
        else:
            self.retriever = ensemble_retriever

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The backend startup was slow because RetrieverTools.initialize() creates
6 retriever chains, and each one independently loaded its own copy of
the embedding model (thenlper/gte-large) and reranker model
(BAAI/bge-reranker-base). That meant 12 heavy model loads when only
2 are actually needed, since all chains use the same model config.

This fix creates both models once at the top of initialize() and passes
the shared instances down through HybridRetrieverChain,
SimilarityRetrieverChain, and FAISSVectorDatabase. Both models are
stateless (they only run encode/score inference) so sharing a single
instance across all chains is safe. Each chain still builds its own
independent FAISS index with its own documents.

Startup model loading goes from ~34s to ~7s on a local machine (4.9x).

Resolves The-OpenROAD-Project#88

Signed-off-by: Harsh Kumar <harshkumar3446@gmail.com>
@harsh-kumar-patwa harsh-kumar-patwa force-pushed the topic/fix-long-startup-shared-models branch from f41df8e to c85b110 Compare March 1, 2026 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Long DB startup times

2 participants