llamaRAG is a Retrieval-Augmented Generation (RAG) system that provides two main functionalities:
- Confluence Documentation Q&A: Query and get answers from Confluence wiki pages using natural language
- Java Repository Analysis: Analyze Java codebases to extract database entities and their relationships
The system uses Ollama for local LLM inference, ChromaDB for vector storage, and LangChain for orchestration.
- Confluence RAG: Fetches, embeds, and enables Q&A on Confluence documentation
- Repository RAG: Analyzes Java repositories to identify database entities, tables, and columns
- Vector-based search: Uses embeddings for semantic search across documentation
- Local LLM: Runs entirely locally using Ollama (no external API calls needed)
- Persistent storage: ChromaDB stores embeddings for fast retrieval
- Python 3.8+
- Ollama installed and running
- Git
-
Clone the repository:
git clone https://github.com/dkovacevic/llamaRAG.git cd llamaRAG -
(Optional but recommended) Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Python dependencies:
pip install -r requirements.txt
-
Pull required Ollama models:
ollama pull mxbai-embed-large ollama pull gemma3
Set the following environment variables for Confluence access:
export CONFLUENCE_BASE_URL="https://your-confluence-instance.com"
export CONFLUENCE_USERNAME="your-username"
export CONFLUENCE_API_TOKEN="your-api-token"Edit confluence.py to set your space key and parent page ID:
SPACE_KEY = "ERPCRM" # Your Confluence space key
PARENT_ID = "169651392" # Parent page ID to start fetching fromEdit repoRAG.py to configure the repository to analyze:
REPO_URL = "https://your-repo-url.git"
LOCAL_PATH = "repo"Run the Confluence RAG system to ask questions about your documentation:
python3 main.pyOn first run, it will:
- Fetch all pages from the configured Confluence space
- Create embeddings using
mxbai-embed-large - Store them in ChromaDB (
./chrome_langchain_db/)
Then you can ask questions in the interactive prompt:
question: How do I configure the authentication system?
Type q to quit.
Run the repository analyzer to extract database entities:
python3 repoRAG.pyThis will:
- Clone or update the configured Git repository
- Find all Java files
- Create embeddings and analyze the code
- Generate a report (
inter_service_report.json) with identified database tables and columns
llamaRAG/
├── main.py # Confluence Q&A interactive interface
├── confluence.py # Confluence API integration and page fetching
├── vector.py # Vector store setup and document embedding
├── repoRAG.py # Java repository analysis tool
├── requirements.txt # Python dependencies
├── Readme.md # This file
├── .gitignore # Git ignore rules
├── chrome_langchain_db/ # ChromaDB vector store (created on first run)
├── pages/ # Downloaded Confluence pages (created on first run)
├── repo/ # Cloned repository (created by repoRAG.py)
└── chroma_db/ # Vector store for repo analysis (created by repoRAG.py)
-
Data Ingestion (
confluence.py):- Fetches pages from Confluence using REST API
- Extracts text content from HTML storage format
- Saves pages locally for reference
-
Embedding & Storage (
vector.py):- Splits documents into chunks (2000 chars with 200 char overlap)
- Creates embeddings using Ollama's
mxbai-embed-largemodel - Stores in ChromaDB for efficient similarity search
-
Query & Retrieval (
main.py):- Takes user questions via interactive prompt
- Retrieves top 5 most relevant document chunks
- Generates answers using
gemma3LLM - Citations show source document titles
-
Code Ingestion (
repoRAG.py):- Clones/updates Git repository
- Finds all Java files recursively
- Loads file contents as documents
-
Analysis:
- Splits code into chunks for processing
- Uses RAG to identify Spring Boot entities
- Extracts
@Entity,@Table,@Column,@Idannotations - Generates structured JSON report with tables and columns
Core dependencies (see requirements.txt):
langchain- LLM orchestration frameworklangchain-ollama- Ollama integration for LangChainlangchain-chroma- ChromaDB vector store integrationlangchain-community- Community integrationsbeautifulsoup4/bs4- HTML parsing for Confluence pagesrequests- HTTP client for API calls
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
MIT