Version: 1.0.0
Status: Development (SPE Hackathon Project)
An AI-powered wellbore analysis system leveraging Retrieval-Augmented Generation (RAG) to enable intelligent document analysis, real-time chat interactions, and data extraction from technical wellbore documents.
The Wellbore Data Agent is a full-stack application designed to help petroleum engineers and analysts extract valuable insights from technical wellbore documents. By combining modern AI, vector databases, and a responsive web interface, the system enables users to:
- Upload and process PDF documents containing wellbore data and technical information
- Query documents using natural language through an intelligent AI agent
- Extract insights including summaries, tables, and calculated analyses
- Perform nodal analysis calculations on wellbore data
- Real-time interaction via WebSocket for responsive, streaming conversations
wellbore-data-agent/
├── backend/ # FastAPI backend service
│ ├── app/
│ │ ├── main.py # FastAPI application entry point
│ │ ├── agents/ # AI agent logic (LangGraph-based)
│ │ │ ├── agent_graph.py
│ │ │ ├── extraction_graph.py
│ │ │ ├── summarization_graph.py
│ │ │ └── langgraph_agent.py
│ │ ├── api/ # API endpoints and middleware
│ │ │ ├── routes/ # API route handlers
│ │ │ ├── middleware/ # CORS, error handling
│ │ │ └── deps.py # Dependency injection
│ │ ├── core/ # Core configuration
│ │ ├── db/ # Database management
│ │ ├── models/ # Pydantic data models
│ │ ├── rag/ # RAG pipeline components
│ │ │ ├── chunking.py # Document chunking strategies
│ │ │ ├── embeddings.py # Embedding generation
│ │ │ ├── retriever.py # Document retrieval
│ │ │ └── vector_store_manager.py
│ │ ├── services/ # Business logic services
│ │ │ ├── llm_service.py
│ │ │ ├── document_service.py
│ │ │ └── conversation_service.py
│ │ ├── utils/ # Utility functions
│ │ └── validation/ # Request/response validation
│ ├── data/ # Data directory
│ │ ├── raw/ # Raw uploaded documents
│ │ ├── processed/ # Processed documents
│ │ ├── uploads/ # Temporary upload storage
│ │ └── vector_db/ # Chroma vector database
│ ├── scripts/ # Setup and utility scripts
│ ├── requirements.txt # Python dependencies
│ └── README.md # Backend documentation
│
├── frontend/ # React + Vite frontend application
│ ├── src/
│ │ ├── App.tsx # Root component
│ │ ├── main.tsx # Entry point
│ │ ├── routes.tsx # Router configuration
│ │ ├── components/ # Reusable React components
│ │ ├── pages/ # Page components
│ │ ├── services/ # API client services
│ │ ├── store/ # Redux state management
│ │ ├── types/ # TypeScript type definitions
│ │ ├── layout/ # Layout components
│ │ └── context/ # React context hooks
│ ├── public/ # Static assets
│ └── package.json # Node.js dependencies
│
├── docs/ # Documentation
│ ├── architecture.md
│ ├── api.md
│ ├── agent-workflow.md
│ └── deployment.md
│
├── docker-compose.yml # Multi-container orchestration
└── README.md # This file
- Framework: FastAPI (Python)
- AI/ML:
- LangGraph for agent orchestration
- LangChain for LLM interactions
- Ollama for local LLM inference
- Vector Database: Chroma (persistent vector storage)
- Embeddings: Sentence Transformers
- Real-time Communication: WebSocket support via FastAPI
- Document Processing: PDF extraction using PDFMiner, pdfplumber, PyMuPDF
- Async Runtime: Uvicorn with async/await support
- Framework: React 19 with TypeScript
- Build Tool: Vite
- UI Components: Material-UI (MUI), custom Radix UI components
- State Management: Redux Toolkit
- Styling: Tailwind CSS
- HTTP Client: Axios
- Real-time: WebSocket integration for live chat
- Markdown: React-Markdown for rendered content
- Containerization: Docker & Docker Compose
- Communication: Backend (port 8000) ↔ Frontend (port 5173)
- External: Ollama LLM service (port 11434)
- Upload PDFs: Drag-and-drop or file selection interface
- Automatic Processing: Documents are chunked and embedded into vector store
- Metadata Tracking: Tracks page count, word count, chunk count, and upload timestamps
- Document Retrieval: List all documents with detailed metadata
- Document Deletion: Remove documents and associated data
- Three Chat Endpoints:
/chat/- Simple query endpoint/chat/ask- Question-answering with confidence scores and source citations/chat/stream- Streaming responses for real-time interaction
- WebSocket Interface (
/ws/):- question: Get answers to queries about documents
- summarize: Generate document summaries
- extract_tables: Extract tables based on natural language queries
- Built on LangGraph for agentic workflows
- Tool-based architecture with specialized agents:
- Extraction Agent: Extract structured data from documents
- Summarization Agent: Generate concise summaries
- Analysis Agent: Perform calculations and analysis
- Document Chunking: Intelligent splitting with overlap for context preservation
- Embedding Generation: Dense embeddings using sentence-transformers
- Vector Search: Semantic similarity search via Chroma
- Context Retrieval: Top-K document chunk retrieval for LLM context
- Calculation framework for wellbore nodal analysis
- Currently includes mocked calculations with extensible architecture
/healthendpoint to check system status- Validates LLM service connectivity
- Monitors vector store health
- Reports detailed service status
- LLM & AI: langchain, langgraph, langchain-ollama, ollama
- Vector DB: chromadb, langchain-chroma
- Web: fastapi, uvicorn, python-socketio, websockets
- PDF Processing: pdfplumber, pdfminer.six, PyPDF2, PyMuPDF, camelot
- ML: torch, transformers, sentence-transformers, scikit-learn
- Data: pandas, pydantic, sqlalchemy
- Utilities: python-dotenv, tenacity, httpx
- React Ecosystem: react, react-dom, react-router-dom
- State: redux, @reduxjs/toolkit, react-redux
- UI: @mui/material, tailwindcss, lucide-react, react-icons
- Utilities: axios, marked, react-markdown, dompurify
- Forms: react-dropzone (for file uploads)
- Docker & Docker Compose
- OR
- Python 3.10+
- Node.js 18+
- Ollama (for local LLM inference)
-
Clone the repository:
git clone <repository-url> cd wellbore-data-agent
-
Start services:
docker-compose up --build
-
Access the application:
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
-
Install Python dependencies:
cd backend pip install -r requirements.txt -
Configure environment:
cp .env.example .env # Edit .env with your settings: # - OLLAMA_BASE_URL (default: http://localhost:11434) # - OLLAMA_MODEL (default: llama2)
-
Start Ollama service (if using local LLM):
ollama serve
-
Run the application:
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
-
Install Node dependencies:
cd frontend npm install -
Start development server:
npm run dev
-
Access:
- Application: http://localhost:5173
User Input
↓
Frontend (React)
↓
WebSocket/HTTP to Backend
↓
FastAPI Router
↓
LangGraph Agent
├→ Document Retrieval (Vector Store)
├→ LLM Service (Ollama)
└→ Tool Execution (Extraction, Summarization, Analysis)
↓
Response
↓
Frontend Display
All Rights Reserved.
This project was originally developed for the SPE (Society of Petroleum Engineers) Hackathon and still under review. You are welcome to view the code, explore the architecture, and reference the approach for educational or evaluative purposes.
However, reuse, redistribution, or commercial use of the project is not permitted at this time without prior permission from the author.
Developed for the SPE (Society of Petroleum Engineers) Hackathon