fix: share ML model instances to reduce startup time by harsh-kumar-patwa · Pull Request #229 · The-OpenROAD-Project/ORAssistant

harsh-kumar-patwa · 2026-03-01T13:56:52Z

Summary

Fixes #88

The backend takes a long time to start because RetrieverTools.initialize() creates 6 retriever chains, and each one independently loads its own copy of the embedding model (thenlper/gte-large) and reranker model (BAAI/bge-reranker-base). That is 12 heavy model loads when only 2 are needed, since all 6 chains use the exact same model configuration.

This PR creates both models once at the top of initialize() and passes the shared instances down through HybridRetrieverChain → SimilarityRetrieverChain → FAISSVectorDatabase. Each chain still builds its own independent FAISS index with its own documents.

Changes

backend/src/agents/retriever_tools.py
Added a _create_embedding_model() factory method that creates the embedding model once based on config. In initialize(), both the embedding model and reranker model are created once and passed to all 6 chain constructors.

backend/src/vectorstores/faiss.py
Added an optional embedding_model parameter to FAISSVectorDatabase.__init__(). When provided, it skips creating a new model and uses the shared one directly.

backend/src/chains/similarity_retriever_chain.py
Added embedding_model parameter and passes it through to FAISSVectorDatabase in create_vector_db().

backend/src/chains/hybrid_retriever_chain.py
Added embedding_model and reranker_model parameters. Passes the embedding model to SimilarityRetrieverChain, and uses the shared reranker in create_hybrid_retriever() with a fallback to creating a new one if not provided.

Test files
Updated mocks in test_retriever_tools.py and test_similarity_retriever_chain.py to account for the new model creation path.

Why sharing is safe

Both models are stateless. The embedding model only runs encode() to produce vectors, and the reranker only runs score() to produce similarity scores. Neither modifies internal state during inference. Each FAISS database still maintains its own independent index. All new parameters default to None, so existing code that doesn't pass shared models continues to work exactly as before.

Benchmark

Tested on local machine (Apple M3, thenlper/gte-large + BAAI/bge-reranker-base):

	Before	After
Embedding model loads	6	1
Reranker model loads	6	1
Model loading time	~34s	~7s
Speedup	—	4.9x

Test plan

All 343 existing tests pass (uv run pytest tests/ -v)
make format passes
Full stack verified locally (PostgreSQL + Ollama + backend + frontend)
Chat endpoint returns correct RAG responses
Streaming endpoint works
Conversations persist in database
Startup logs show model loaded only once

Copilot

Pull request overview

This PR reduces backend startup time by creating the embeddings and reranker models once and sharing those instances across the 6 retriever chains, instead of loading identical models repeatedly per chain.

Changes:

Create a shared embedding model and shared cross-encoder reranker once in RetrieverTools.initialize() and pass them down into retriever chains.
Add optional embedding_model plumbing through HybridRetrieverChain → SimilarityRetrieverChain → FAISSVectorDatabase.
Update unit tests/mocks to reflect the new initialization path and constructor signature.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`backend/src/agents/retriever_tools.py`	Adds an embedding-model factory and shares embedding/reranker instances across all retriever chain instances.
`backend/src/vectorstores/faiss.py`	Allows injecting a pre-created embeddings object to avoid redundant model instantiation.
`backend/src/chains/similarity_retriever_chain.py`	Accepts `embedding_model` and forwards it into `FAISSVectorDatabase`.
`backend/src/chains/hybrid_retriever_chain.py`	Accepts shared embedding/reranker instances and uses the shared reranker when contextual rerank is enabled.
`backend/tests/test_retriever_tools.py`	Updates initialization tests to patch the new shared-model creation path.
`backend/tests/test_similarity_retriever_chain.py`	Updates constructor assertion to include the new `embedding_model` argument.

Comments suppressed due to low confidence (1)

backend/src/chains/hybrid_retriever_chain.py:140

create_hybrid_retriever() can raise UnboundLocalError because ensemble_retriever is only assigned inside the (similarity_retriever and mmr_retriever and bm25_retriever) block, but it’s used unconditionally when building ContextualCompressionRetriever (and in the else). If processed_docs is empty (or BM25 isn’t created for any reason), ensemble_retriever (and even bm25_retriever) will be undefined. Initialize these locals to None and either build the ensemble with available retrievers or raise a clear error before referencing them.

            self.retriever = ContextualCompressionRetriever(
                base_compressor=compressor, base_retriever=ensemble_retriever
            )
        else:
            self.retriever = ensemble_retriever

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

backend/src/chains/similarity_retriever_chain.py

backend/src/chains/hybrid_retriever_chain.py

backend/tests/test_retriever_tools.py

backend/tests/test_similarity_retriever_chain.py

The backend startup was slow because RetrieverTools.initialize() creates 6 retriever chains, and each one independently loaded its own copy of the embedding model (thenlper/gte-large) and reranker model (BAAI/bge-reranker-base). That meant 12 heavy model loads when only 2 are actually needed, since all chains use the same model config. This fix creates both models once at the top of initialize() and passes the shared instances down through HybridRetrieverChain, SimilarityRetrieverChain, and FAISSVectorDatabase. Both models are stateless (they only run encode/score inference) so sharing a single instance across all chains is safe. Each chain still builds its own independent FAISS index with its own documents. Startup model loading goes from ~34s to ~7s on a local machine (4.9x). Resolves The-OpenROAD-Project#88 Signed-off-by: Harsh Kumar <harshkumar3446@gmail.com>

Copilot AI review requested due to automatic review settings March 1, 2026 13:56

Copilot started reviewing on behalf of harsh-kumar-patwa March 1, 2026 13:57 View session

harsh-kumar-patwa mentioned this pull request Mar 1, 2026

Long DB startup times #88

Open

Copilot AI reviewed Mar 1, 2026

View reviewed changes

backend/src/chains/similarity_retriever_chain.py Show resolved Hide resolved

backend/src/chains/hybrid_retriever_chain.py Show resolved Hide resolved

backend/tests/test_retriever_tools.py Show resolved Hide resolved

backend/tests/test_similarity_retriever_chain.py Show resolved Hide resolved

harsh-kumar-patwa force-pushed the topic/fix-long-startup-shared-models branch from f41df8e to c85b110 Compare March 1, 2026 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: share ML model instances to reduce startup time#229

fix: share ML model instances to reduce startup time#229
harsh-kumar-patwa wants to merge 1 commit intoThe-OpenROAD-Project:masterfrom
harsh-kumar-patwa:topic/fix-long-startup-shared-models

harsh-kumar-patwa commented Mar 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

harsh-kumar-patwa commented Mar 1, 2026

Summary

Changes

Why sharing is safe

Benchmark

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants