Wellbore Data Agent

Version: 1.0.0
Status: Development (SPE Hackathon Project)

An AI-powered wellbore analysis system leveraging Retrieval-Augmented Generation (RAG) to enable intelligent document analysis, real-time chat interactions, and data extraction from technical wellbore documents.

🎯 Overview

The Wellbore Data Agent is a full-stack application designed to help petroleum engineers and analysts extract valuable insights from technical wellbore documents. By combining modern AI, vector databases, and a responsive web interface, the system enables users to:

Upload and process PDF documents containing wellbore data and technical information
Query documents using natural language through an intelligent AI agent
Extract insights including summaries, tables, and calculated analyses
Perform nodal analysis calculations on wellbore data
Real-time interaction via WebSocket for responsive, streaming conversations

📋 Project Structure

wellbore-data-agent/
├── backend/                      # FastAPI backend service
│   ├── app/
│   │   ├── main.py              # FastAPI application entry point
│   │   ├── agents/              # AI agent logic (LangGraph-based)
│   │   │   ├── agent_graph.py
│   │   │   ├── extraction_graph.py
│   │   │   ├── summarization_graph.py
│   │   │   └── langgraph_agent.py
│   │   ├── api/                 # API endpoints and middleware
│   │   │   ├── routes/          # API route handlers
│   │   │   ├── middleware/      # CORS, error handling
│   │   │   └── deps.py          # Dependency injection
│   │   ├── core/                # Core configuration
│   │   ├── db/                  # Database management
│   │   ├── models/              # Pydantic data models
│   │   ├── rag/                 # RAG pipeline components
│   │   │   ├── chunking.py      # Document chunking strategies
│   │   │   ├── embeddings.py    # Embedding generation
│   │   │   ├── retriever.py     # Document retrieval
│   │   │   └── vector_store_manager.py
│   │   ├── services/            # Business logic services
│   │   │   ├── llm_service.py
│   │   │   ├── document_service.py
│   │   │   └── conversation_service.py
│   │   ├── utils/               # Utility functions
│   │   └── validation/          # Request/response validation
│   ├── data/                    # Data directory
│   │   ├── raw/                 # Raw uploaded documents
│   │   ├── processed/           # Processed documents
│   │   ├── uploads/             # Temporary upload storage
│   │   └── vector_db/           # Chroma vector database
│   ├── scripts/                 # Setup and utility scripts
│   ├── requirements.txt         # Python dependencies
│   └── README.md               # Backend documentation
│
├── frontend/                    # React + Vite frontend application
│   ├── src/
│   │   ├── App.tsx             # Root component
│   │   ├── main.tsx            # Entry point
│   │   ├── routes.tsx          # Router configuration
│   │   ├── components/         # Reusable React components
│   │   ├── pages/              # Page components
│   │   ├── services/           # API client services
│   │   ├── store/              # Redux state management
│   │   ├── types/              # TypeScript type definitions
│   │   ├── layout/             # Layout components
│   │   └── context/            # React context hooks
│   ├── public/                 # Static assets
│   └── package.json            # Node.js dependencies
│
├── docs/                       # Documentation
│   ├── architecture.md
│   ├── api.md
│   ├── agent-workflow.md
│   └── deployment.md
│
├── docker-compose.yml          # Multi-container orchestration
└── README.md                   # This file

🏗️ Architecture

Backend Stack

Framework: FastAPI (Python)
AI/ML:
- LangGraph for agent orchestration
- LangChain for LLM interactions
- Ollama for local LLM inference
Vector Database: Chroma (persistent vector storage)
Embeddings: Sentence Transformers
Real-time Communication: WebSocket support via FastAPI
Document Processing: PDF extraction using PDFMiner, pdfplumber, PyMuPDF
Async Runtime: Uvicorn with async/await support

Frontend Stack

Framework: React 19 with TypeScript
Build Tool: Vite
UI Components: Material-UI (MUI), custom Radix UI components
State Management: Redux Toolkit
Styling: Tailwind CSS
HTTP Client: Axios
Real-time: WebSocket integration for live chat
Markdown: React-Markdown for rendered content

Infrastructure

Containerization: Docker & Docker Compose
Communication: Backend (port 8000) ↔ Frontend (port 5173)
External: Ollama LLM service (port 11434)

🚀 Key Features

1. Document Management

Upload PDFs: Drag-and-drop or file selection interface
Automatic Processing: Documents are chunked and embedded into vector store
Metadata Tracking: Tracks page count, word count, chunk count, and upload timestamps
Document Retrieval: List all documents with detailed metadata
Document Deletion: Remove documents and associated data

2. AI-Powered Chat

Three Chat Endpoints:
- /chat/ - Simple query endpoint
- /chat/ask - Question-answering with confidence scores and source citations
- /chat/stream - Streaming responses for real-time interaction
WebSocket Interface (/ws/):
- question: Get answers to queries about documents
- summarize: Generate document summaries
- extract_tables: Extract tables based on natural language queries

3. Intelligent Agent System

Built on LangGraph for agentic workflows
Tool-based architecture with specialized agents:
- Extraction Agent: Extract structured data from documents
- Summarization Agent: Generate concise summaries
- Analysis Agent: Perform calculations and analysis

4. RAG Pipeline

Document Chunking: Intelligent splitting with overlap for context preservation
Embedding Generation: Dense embeddings using sentence-transformers
Vector Search: Semantic similarity search via Chroma
Context Retrieval: Top-K document chunk retrieval for LLM context

5. Nodal Analysis

Calculation framework for wellbore nodal analysis
Currently includes mocked calculations with extensible architecture

6. Health Monitoring

/health endpoint to check system status
Validates LLM service connectivity
Monitors vector store health
Reports detailed service status

📦 Technology Stack

Python Packages (Backend)

LLM & AI: langchain, langgraph, langchain-ollama, ollama
Vector DB: chromadb, langchain-chroma
Web: fastapi, uvicorn, python-socketio, websockets
PDF Processing: pdfplumber, pdfminer.six, PyPDF2, PyMuPDF, camelot
ML: torch, transformers, sentence-transformers, scikit-learn
Data: pandas, pydantic, sqlalchemy
Utilities: python-dotenv, tenacity, httpx

Node Packages (Frontend)

React Ecosystem: react, react-dom, react-router-dom
State: redux, @reduxjs/toolkit, react-redux
UI: @mui/material, tailwindcss, lucide-react, react-icons
Utilities: axios, marked, react-markdown, dompurify
Forms: react-dropzone (for file uploads)

🔧 Getting Started

Prerequisites

Docker & Docker Compose
OR
- Python 3.10+
- Node.js 18+
- Ollama (for local LLM inference)

Quick Start (Docker)

Clone the repository:

git clone <repository-url>
cd wellbore-data-agent

Start services:
```
docker-compose up --build
```
Access the application:
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs

Manual Setup

Backend

Install Python dependencies:

cd backend
pip install -r requirements.txt

Configure environment:

cp .env.example .env
# Edit .env with your settings:
# - OLLAMA_BASE_URL (default: http://localhost:11434)
# - OLLAMA_MODEL (default: llama2)

Start Ollama service (if using local LLM):
```
ollama serve
```

Run the application:

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Frontend

Install Node dependencies:
```
cd frontend
npm install
```
Start development server:
```
npm run dev
```
Access:
- Application: http://localhost:5173

📊 Data Flow

User Input
    ↓
Frontend (React)
    ↓
WebSocket/HTTP to Backend
    ↓
FastAPI Router
    ↓
LangGraph Agent
    ├→ Document Retrieval (Vector Store)
    ├→ LLM Service (Ollama)
    └→ Tool Execution (Extraction, Summarization, Analysis)
    ↓
Response 
    ↓
Frontend Display

📄 License

This project was originally developed for the SPE (Society of Petroleum Engineers) Hackathon and still under review. You are welcome to view the code, explore the architecture, and reference the approach for educational or evaluative purposes.

However, reuse, redistribution, or commercial use of the project is not permitted at this time without prior permission from the author.

👥 Team

Developed for the SPE (Society of Petroleum Engineers) Hackathon

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wellbore Data Agent

🎯 Overview

📋 Project Structure

🏗️ Architecture

Backend Stack

Frontend Stack

Infrastructure

🚀 Key Features

1. Document Management

2. AI-Powered Chat

3. Intelligent Agent System

4. RAG Pipeline

5. Nodal Analysis

6. Health Monitoring

📦 Technology Stack

Python Packages (Backend)

Node Packages (Frontend)

🔧 Getting Started

Prerequisites

Quick Start (Docker)

Manual Setup

Backend

Frontend

📊 Data Flow

📄 License

👥 Team

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wellbore Data Agent

🎯 Overview

📋 Project Structure

🏗️ Architecture

Backend Stack

Frontend Stack

Infrastructure

🚀 Key Features

1. Document Management

2. AI-Powered Chat

3. Intelligent Agent System

4. RAG Pipeline

5. Nodal Analysis

6. Health Monitoring

📦 Technology Stack

Python Packages (Backend)

Node Packages (Frontend)

🔧 Getting Started

Prerequisites

Quick Start (Docker)

Manual Setup

Backend

Frontend

📊 Data Flow

📄 License

👥 Team

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages