The main application class that orchestrates the entire RAG pipeline.
RAGPDFChatbot()Returns: RAGPDFChatbot instance
Description: Initializes the chatbot with all required components.
Initialize the application and prepare for question answering.
Parameters:
rebuild_vector_store(bool, optional): Whether to rebuild vector store from scratch. Defaults toFalse.
Raises:
RuntimeError: If document processing or vector store initialization fails
Ask a question using the RAG pipeline.
Parameters:
question(str): Question to answer
Returns: str - Answer to the question
Raises:
RuntimeError: If application is not initialized
Run the application in interactive mode allowing multiple questions in a session.
Handles loading and processing of PDF documents.
DocumentProcessor()Returns: DocumentProcessor instance
Discover all PDF files in the dataset directory.
Returns: List[str] - List of file paths to PDF documents
Raises:
FileNotFoundError: If no PDF files are found
Load all PDF documents from the dataset.
Returns: List[Dict[str, Any]] - List of document objects
Raises:
RuntimeError: If no documents are successfully loaded
Split documents into chunks for embedding.
Parameters:
documents(List[Dict[str, Any]]): List of document objects
Returns: List[Dict[str, Any]] - List of document chunks
Complete document processing pipeline.
Returns: List[Dict[str, Any]] - List of processed document chunks
Manages document embedding and vector storage.
VectorStoreManager()Returns: VectorStoreManager instance
Create and populate vector store with document embeddings.
Parameters:
documents(List[Dict[str, Any]]): List of document chunks to embed
Returns: FAISS - Populated vector store
Get a retriever configured with current settings.
Returns: Any - Configured retriever object
Raises:
RuntimeError: If vector store is not initialized
Save vector store to local storage.
Parameters:
path(str, optional): Path to save vector store
Raises:
RuntimeError: If vector store is not initialized
Load vector store from local storage.
Parameters:
path(str, optional): Path to load vector store from
Returns: FAISS - Loaded vector store
Check if vector store exists at specified path.
Parameters:
path(str, optional): Path to check
Returns: bool - True if vector store exists
Implements the Retrieval-Augmented Generation pipeline.
RAGChain(retriever: Any)Parameters:
retriever(Any): Document retriever object
Returns: RAGChain instance
Ask a question using the RAG pipeline.
Parameters:
question(str): Question to answer
Returns: str - Answer to the question
Raises:
RuntimeError: If RAG chain is not initialized
Get the RAG chain object.
Returns: Any - RAG chain object
Main application configuration dataclass.
embedding: EmbeddingConfig- Embedding model configurationllm: LLMConfig- LLM model configurationvector_store: VectorStoreConfig- Vector store configurationretrieval: RetrievalConfig- Retrieval configurationdocument_processing: DocumentProcessingConfig- Document processing configuration
Configuration for embedding models.
model_name: str- Model name (default: "nomic-embed-text")base_url: str- Ollama base URL (default: "http://localhost:11434")dimension: Optional[int]- Embedding dimension (default: None)
Configuration for LLM models.
model_name: str- Model name (default: "llama3.2:3b")base_url: str- Ollama base URL (default: "http://localhost:11434")temperature: float- Generation temperature (default: 0.7)max_tokens: int- Maximum tokens (default: 512)
Configuration for vector store.
index_type: str- Index type (default: "flat")metric: str- Distance metric (default: "L2")save_local: bool- Save locally (default: True)local_path: str- Local path (default: "health_supplemets")
Configuration for document retrieval.
search_type: str- Search type (default: "mmr")k: int- Number of documents (default: 3)fetch_k: int- Number to fetch (default: 100)lambda_mult: float- MMR lambda (default: 1.0)
Configuration for document processing.
chunk_size: int- Chunk size (default: 1000)chunk_overlap: int- Chunk overlap (default: 100)dataset_path: str- Dataset path (default: "rag-dataset")
The application can be run via command line with the following options:
python -m src.main [OPTIONS]
Options:
--rebuild Rebuild vector store from scratch
--interactive Run in interactive mode
--question TEXT Ask a specific question
--help Show help messagefrom src import RAGPDFChatbot
# Initialize chatbot
chatbot = RAGPDFChatbot()
chatbot.initialize()
# Ask questions
answer = chatbot.ask("What are the benefits of BCAA supplements?")
print(answer)
# Interactive mode
chatbot.interactive_mode()- Raised when application is not properly initialized
- Raised when vector store operations fail
- Raised when document processing fails
- Raised when no PDF files are found in dataset directory
- Raised for invalid configuration values
- Raised for invalid input parameters
from src import RAGPDFChatbot
# Create and initialize chatbot
chatbot = RAGPDFChatbot()
chatbot.initialize()
# Ask a question
question = "What are the benefits of BCAA supplements?"
answer = chatbot.ask(question)
print(f"Answer: {answer}")import os
from src.config import config
# Set environment variables before importing
os.environ["LLM_MODEL"] = "llama3.2:1b"
os.environ["CHUNK_SIZE"] = "500"
# Config will use the new values
print(f"Model: {config.llm.model_name}")
print(f"Chunk size: {config.document_processing.chunk_size}")from src.config import EmbeddingConfig, LLMConfig, AppConfig
# Create custom config
custom_config = AppConfig(
embedding=EmbeddingConfig(model_name="custom-embedding"),
llm=LLMConfig(model_name="custom-llm", temperature=0.5),
# ... other configs
)
# Use with components
processor = DocumentProcessor()
processor.dataset_path = custom_config.document_processing.dataset_pathfrom src import RAGPDFChatbot
chatbot = RAGPDFChatbot()
try:
# This will fail if not initialized
answer = chatbot.ask("Test question")
except RuntimeError as e:
print(f"Error: {e}")
# Initialize first
chatbot.initialize()
answer = chatbot.ask("Test question")- README.md - Project overview and quick start
- ARCHITECTURE.md - System architecture details
- CONTRIBUTING.md - Contribution guidelines
- SECURITY.md - Security policy