llamaRAG

Overview

llamaRAG is a Retrieval-Augmented Generation (RAG) system that provides two main functionalities:

Confluence Documentation Q&A: Query and get answers from Confluence wiki pages using natural language
Java Repository Analysis: Analyze Java codebases to extract database entities and their relationships

The system uses Ollama for local LLM inference, ChromaDB for vector storage, and LangChain for orchestration.

Features

Confluence RAG: Fetches, embeds, and enables Q&A on Confluence documentation
Repository RAG: Analyzes Java repositories to identify database entities, tables, and columns
Vector-based search: Uses embeddings for semantic search across documentation
Local LLM: Runs entirely locally using Ollama (no external API calls needed)
Persistent storage: ChromaDB stores embeddings for fast retrieval

Installation

Prerequisites

Python 3.8+
Ollama installed and running
Git

Setup Steps

Clone the repository:

git clone https://github.com/dkovacevic/llamaRAG.git
cd llamaRAG

(Optional but recommended) Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Python dependencies:
```
pip install -r requirements.txt
```

Pull required Ollama models:

ollama pull mxbai-embed-large
ollama pull gemma3

Configuration

Confluence RAG Setup

Set the following environment variables for Confluence access:

export CONFLUENCE_BASE_URL="https://your-confluence-instance.com"
export CONFLUENCE_USERNAME="your-username"
export CONFLUENCE_API_TOKEN="your-api-token"

Edit confluence.py to set your space key and parent page ID:

SPACE_KEY = "ERPCRM"  # Your Confluence space key
PARENT_ID = "169651392"  # Parent page ID to start fetching from

Repository RAG Setup

Edit repoRAG.py to configure the repository to analyze:

REPO_URL = "https://your-repo-url.git"
LOCAL_PATH = "repo"

Usage

Confluence Documentation Q&A

Run the Confluence RAG system to ask questions about your documentation:

python3 main.py

On first run, it will:

Fetch all pages from the configured Confluence space
Create embeddings using mxbai-embed-large
Store them in ChromaDB (./chrome_langchain_db/)

Then you can ask questions in the interactive prompt:

question: How do I configure the authentication system?

Type q to quit.

Java Repository Analysis

Run the repository analyzer to extract database entities:

python3 repoRAG.py

This will:

Clone or update the configured Git repository
Find all Java files
Create embeddings and analyze the code
Generate a report (inter_service_report.json) with identified database tables and columns

Project Structure

llamaRAG/
├── main.py              # Confluence Q&A interactive interface
├── confluence.py        # Confluence API integration and page fetching
├── vector.py            # Vector store setup and document embedding
├── repoRAG.py           # Java repository analysis tool
├── requirements.txt     # Python dependencies
├── Readme.md           # This file
├── .gitignore          # Git ignore rules
├── chrome_langchain_db/ # ChromaDB vector store (created on first run)
├── pages/              # Downloaded Confluence pages (created on first run)
├── repo/               # Cloned repository (created by repoRAG.py)
└── chroma_db/          # Vector store for repo analysis (created by repoRAG.py)

How It Works

Confluence RAG Pipeline

Data Ingestion (confluence.py):
- Fetches pages from Confluence using REST API
- Extracts text content from HTML storage format
- Saves pages locally for reference
Embedding & Storage (vector.py):
- Splits documents into chunks (2000 chars with 200 char overlap)
- Creates embeddings using Ollama's mxbai-embed-large model
- Stores in ChromaDB for efficient similarity search
Query & Retrieval (main.py):
- Takes user questions via interactive prompt
- Retrieves top 5 most relevant document chunks
- Generates answers using gemma3 LLM
- Citations show source document titles

Repository RAG Pipeline

Code Ingestion (repoRAG.py):
- Clones/updates Git repository
- Finds all Java files recursively
- Loads file contents as documents
Analysis:
- Splits code into chunks for processing
- Uses RAG to identify Spring Boot entities
- Extracts @Entity, @Table, @Column, @Id annotations
- Generates structured JSON report with tables and columns

Dependencies

Core dependencies (see requirements.txt):

langchain - LLM orchestration framework
langchain-ollama - Ollama integration for LangChain
langchain-chroma - ChromaDB vector store integration
langchain-community - Community integrations
beautifulsoup4 / bs4 - HTML parsing for Confluence pages
requests - HTTP client for API calls

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llamaRAG

Overview

Features

Installation

Prerequisites

Setup Steps

Configuration

Confluence RAG Setup

Repository RAG Setup

Usage

Confluence Documentation Q&A

Java Repository Analysis

Project Structure

How It Works

Confluence RAG Pipeline

Repository RAG Pipeline

Dependencies

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
Readme.md		Readme.md
confluence.py		confluence.py
main.py		main.py
repoRAG.py		repoRAG.py
requirements.txt		requirements.txt
vector.py		vector.py

dkovacevic/llamaRAG

Folders and files

Latest commit

History

Repository files navigation

llamaRAG

Overview

Features

Installation

Prerequisites

Setup Steps

Configuration

Confluence RAG Setup

Repository RAG Setup

Usage

Confluence Documentation Q&A

Java Repository Analysis

Project Structure

How It Works

Confluence RAG Pipeline

Repository RAG Pipeline

Dependencies

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages