Auto-build FAISS index on startup when absent#1
Draft
Copilot wants to merge 2 commits into
Draft
Conversation
Copilot
AI
changed the title
[WIP] Update app to build FAISS index automatically on startup
Auto-build FAISS index on startup when absent
Jun 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fresh deployments (e.g. Streamlit Cloud) only have committed repo files;
data/faiss_index/is gitignored, so the app immediately hitst.error("Index not found…") / st.stop()on every cold start.Changes
ingest.pybuild_index(data_dir, index_dir)— importable by the app, CLI usage (python ingest.py) unchanged.__file__rather than CWD.app/main.pyst.error / st.stop()guard.Path(__file__).parent.parent— correct regardless of Streamlit's working directory.sys.pathsoingestis importable from theapp/subdirectory.load_vectorstore()now auto-callsbuild_indexinside a spinner when the index is missing; surfaces a actionable error message if it fails.README.md(new)Original prompt
Automatically fix the Streamlit deployment failure in
riiseup08/RAG_Streamlitwhere the app showsIndex not found. Run python ingest.py first.on startup.Context:
app/main.pycurrently expectsdata/faiss_indexto already exist and stops with an error if it does not.data/and aningest.pyscript that can build the FAISS index from those PDFs.data/faiss_indexare not present unless committed, so startup fails.Goal:
python ingest.pystep.Requirements:
data/faiss_indexis missing, the app builds the index automatically from the existing PDFs instead of stopping with an error.Deliverable:
The following is the prior conversation context from the user's chat exploration (may be truncated):
User: Index not found. Run python ingest.py first." msg that i get from streAMLIT
Assistant: The direct cause is: your Streamlit app is looking for a local folder at
data/faiss_index, and that folder does not exist when the app starts.In this repo:
And
ingest.pyis what creates that directory:What this means
Your repo already contains the source PDFs in
data/:data/policy_on_results_EN.pdfdata/policy_on_results_FR.pdfBut it does not contain the generated FAISS index folder
data/faiss_index. So when Streamlit runs, it fails the existence check and shows that message.Why this happens on Streamlit
On Streamlit Cloud or a fresh environment:
data/faiss_indexare usually not there unless:Right now your app expects the index to already exist, but your deployment does not create it automatically.
Fastest fixes
Option 1: Run ingest locally, then commit the generated index
Run:
That should create:
data/faiss_index/index.faissdata/faiss_index/index.pklThen commit and push that folder.
This is the quickest fix, though committing generated vector indexes is not always ideal.
Option 2: Generate the index automatically if missing
A better app behavior is:
data/faiss_indexexists, load itThat avoids needing a separate manual
python ingest.pystep in deployment.Root cause
The violated assumption is:
app/main.pyassumes the FAISS index already exists in the deployed filesystem.That assumption is not enforced anywhere in deployment, and the repo currently only includes the PDFs, not the built index.
Confidence
High. The code explicitly checks for
data/faiss_index, and the repository contents you shared show the PDFs indata/but nofaiss_indexdirectory.If you want, I can do one of these next
data/faiss_indexriiseup08/RAG_Streamlitthat fixes this automaticallyUser: create a...
This pull request was created from Copilot chat.