A database that understands time and causality. Ingest documents, extract events and causal relationships automatically, then query your knowledge base with natural language questions like "Why did revenue drop in Q3?" and get structured, cited answers by traversing causal event graphs.
Document ──► NLP Pipeline ──► Event Store (PostgreSQL + pgvector)
│
▼
Graph Store (Neo4j)
│
▼
Natural Language Query ──► Query Engine ──► Cited Answer
- Ingest documents (PDF, DOCX, TXT, Markdown) or text via the REST API.
- NLP Pipeline extracts entities, events, timestamps, and causal relationships automatically.
- Dual Storage persists events in PostgreSQL (with vector embeddings for semantic search) and causal graphs in Neo4j.
- Query with natural language. The engine classifies your intent, traverses the appropriate stores, and synthesizes a cited answer using a local LLM.
- Causal Reasoning — Ask "why" questions and get answers backed by causal chains extracted from your documents.
- Temporal Awareness — Query by time ranges, fiscal quarters, relative dates ("last month"), and more.
- Semantic Search — Find similar events using vector embeddings (pgvector + all-MiniLM-L6-v2).
- Entity Resolution — Automatic deduplication and linking of entities across documents using fuzzy matching and embedding similarity.
- Structured Citations — Every answer references source documents, timestamps, and confidence scores.
- Fully Local — Runs entirely on your machine. No data leaves your infrastructure. LLM inference via Ollama.
| Component | Technology |
|---|---|
| API Framework | FastAPI |
| Event Store | PostgreSQL 16 + pgvector |
| Graph Store | Neo4j 5 Community Edition |
| Cache / Dedup | Redis 7 |
| Message Bus | Redpanda (Kafka-compatible) |
| NLP | spaCy (NER, dependency parsing) |
| Embeddings | sentence-transformers (all-MiniLM-L6-v2) |
| Local LLM | Ollama (llama3.1:8b for synthesis, codellama:7b for query generation) |
| ORM | SQLAlchemy (async) + asyncpg |
| Migrations | Alembic |
| Task Queue | Celery + Redis |
| Validation | Pydantic v2 |
| Logging | structlog (structured JSON) |
| Containerization | Docker + Docker Compose |
temporaldb-backend/
├── docker-compose.yml # All infrastructure services
├── .env.example # Environment variable template
├── requirements.txt # Python dependencies
├── alembic/ # Database migrations
│ └── versions/
├── app/
│ ├── main.py # FastAPI entry point
│ ├── config.py # Settings from environment variables
│ ├── database/
│ │ ├── postgres.py # Async SQLAlchemy engine + session
│ │ ├── neo4j.py # Neo4j driver connection
│ │ └── redis.py # Redis connection
│ ├── models/
│ │ ├── sql/ # SQLAlchemy ORM models
│ │ │ ├── event.py # Event table
│ │ │ ├── entity.py # Entity table
│ │ │ └── document.py # Source document table
│ │ └── schemas/ # Pydantic request/response schemas
│ │ ├── event.py
│ │ ├── entity.py
│ │ ├── query.py
│ │ └── ingest.py
│ ├── ingestion/
│ │ ├── connectors/
│ │ │ ├── base.py # Abstract base connector
│ │ │ ├── file.py # PDF, DOCX, TXT, Markdown
│ │ │ ├── notion.py # Notion API (planned)
│ │ │ └── confluence.py # Confluence REST (planned)
│ │ ├── normalizer.py # Text cleaning and normalization
│ │ ├── deduplicator.py # SHA-256 fingerprint + Redis check
│ │ └── producer.py # Kafka producer
│ ├── nlp/
│ │ ├── pipeline.py # NLP orchestrator
│ │ ├── ner.py # Named entity recognition
│ │ ├── coref.py # Coreference resolution
│ │ ├── event_extractor.py # Event extraction (SRL + dep parsing)
│ │ ├── temporal_parser.py # Date/time parsing
│ │ ├── entity_linker.py # Fuzzy + embedding entity linking
│ │ ├── causal_extractor.py # Causal relationship extraction
│ │ └── embedder.py # Embedding generation
│ ├── storage/
│ │ ├── event_store.py # PostgreSQL CRUD for events
│ │ ├── graph_store.py # Neo4j CRUD for causal graph
│ │ ├── entity_store.py # Entity CRUD (both stores)
│ │ └── sync.py # PostgreSQL → Neo4j sync
│ ├── query/
│ │ ├── orchestrator.py # Main query handler
│ │ ├── intent.py # Intent classification
│ │ ├── temporal_extractor.py # Time constraint extraction
│ │ ├── entity_resolver.py # Entity mention → UUID resolution
│ │ ├── planners/
│ │ │ ├── causal_planner.py # Causal chain traversal
│ │ │ ├── temporal_planner.py # Time range queries
│ │ │ ├── similarity_planner.py # Semantic similarity search
│ │ │ └── entity_planner.py # Entity timeline queries
│ │ ├── generators/
│ │ │ ├── cypher_gen.py # LLM-generated Cypher queries
│ │ │ └── sql_gen.py # LLM-generated SQL queries
│ │ └── synthesizer.py # Answer synthesis with citations
│ ├── llm/
│ │ ├── client.py # Ollama client wrapper
│ │ └── prompts.py # Prompt templates
│ ├── tasks/
│ │ └── nlp_tasks.py # Celery async tasks
│ └── api/
│ ├── routes/
│ │ ├── query.py # POST /query
│ │ ├── ingest.py # POST /ingest, POST /ingest/file
│ │ ├── events.py # GET /events, GET /events/{id}
│ │ ├── entities.py # GET /entities, GET /entities/{id}
│ │ └── graph.py # GET /graph/entity/{id}
│ └── middleware.py # CORS, rate limiting, logging, auth
├── workers/
│ └── nlp_worker.py # Kafka consumer entry point
└── tests/
├── unit/
│ ├── test_causal_extractor.py
│ ├── test_temporal_parser.py
│ ├── test_intent_classifier.py
│ └── test_entity_linker.py
└── integration/
├── test_ingest_pipeline.py
└── test_query_orchestrator.py
- Python 3.11+
- Docker and Docker Compose
- Ollama installed locally with the following models pulled:
ollama pull llama3.1:8b ollama pull codellama:7b
git clone https://github.com/your-org/temporaldb-backend.git
cd temporaldb-backendcp .env.example .env
# Edit .env with your preferred settings (defaults work for local development)docker compose up -dThis starts PostgreSQL (with pgvector), Neo4j, Redis, Redpanda, and pgAdmin.
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
python -m spacy download en_core_web_trfalembic upgrade headuvicorn app.main:app --reload --host 0.0.0.0 --port 8000In a separate terminal:
python -m workers.nlp_worker# API health check
curl http://localhost:8000/health
# pgAdmin UI
open http://localhost:5050
# Neo4j Browser
open http://localhost:7474# Upload a file
curl -X POST http://localhost:8000/ingest/file \
-H "X-API-Key: your-api-key" \
-F "file=@report.pdf"
# Ingest raw text
curl -X POST http://localhost:8000/ingest \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{"text": "Revenue declined 15% in Q3 due to supply chain disruptions.", "source": "quarterly-report"}'# Ask a causal question
curl -X POST http://localhost:8000/query \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{"question": "Why did revenue drop in Q3?"}'
# With filters
curl -X POST http://localhost:8000/query \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"question": "What happened to Acme Corp last quarter?",
"entity_filter": "Acme Corp",
"time_range": {"start": "2024-07-01", "end": "2024-09-30"},
"max_causal_hops": 3
}'{
"answer": "Revenue dropped 15% in Q3 primarily due to supply chain disruptions that began in July 2024...",
"confidence": 0.87,
"causal_chain": [
{
"id": "evt-001",
"description": "Supply chain disruptions reported",
"ts_start": "2024-07-15T00:00:00Z",
"confidence": 0.95
},
{
"id": "evt-002",
"description": "Production delays across manufacturing",
"ts_start": "2024-08-01T00:00:00Z",
"confidence": 0.88
},
{
"id": "evt-003",
"description": "Revenue declined 15% in Q3",
"ts_start": "2024-10-01T00:00:00Z",
"confidence": 0.92
}
],
"sources": [
{
"id": "doc-001",
"source": "quarterly-report",
"metadata": {"filename": "Q3-2024-Report.pdf"}
}
]
}# List events
curl http://localhost:8000/events?entity_id=uuid&from_date=2024-01-01&limit=20
# Get single event
curl http://localhost:8000/events/{event_id}
# Search entities
curl http://localhost:8000/entities?name=Acme
# Get entity causal graph
curl http://localhost:8000/graph/entity/{entity_id}| Intent | Example Question | Engine |
|---|---|---|
| CAUSAL_WHY | "Why did revenue drop in Q3?" | Neo4j causal graph traversal |
| TEMPORAL_RANGE | "What happened between July and September?" | PostgreSQL range scan |
| SIMILARITY | "Find events similar to the supply chain disruption" | pgvector cosine similarity |
| ENTITY_TIMELINE | "Show me everything about Acme Corp" | Combined PostgreSQL + Neo4j |
┌─────────────┐
│ REST API │
│ (FastAPI) │
└──────┬──────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Ingest │ │ Query │ │ Browse │
│ Endpoint │ │ Endpoint │ │ Endpoints│
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
▼ │ │
┌──────────┐ │ │
│ Redpanda │ │ │
│ (Kafka) │ │ │
└────┬─────┘ │ │
│ │ │
▼ │ │
┌──────────┐ │ │
│ NLP │ │ │
│ Pipeline │ │ │
└────┬─────┘ │ │
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────┐
│ Storage Layer │
│ ┌────────────┐ ┌──────────────┐ │
│ │ PostgreSQL │ │ Neo4j │ │
│ │ + pgvector │◄─┤ Causal Graph │ │
│ └────────────┘ └──────────────┘ │
└─────────────────────────────────────┘
│
▼
┌─────────────┐
│ Ollama │
│ (Local LLM)│
└─────────────┘
- Named Entity Recognition — spaCy identifies people, organizations, dates, etc.
- Coreference Resolution — Pronouns and references linked to canonical entities.
- Event Extraction — Subject-verb-object tuples extracted from each sentence.
- Temporal Parsing — Date expressions normalized to UTC timestamps.
- Entity Linking — Mentions matched to canonical entities (exact → fuzzy → embedding).
- Causal Extraction — Causal cue phrases identify cause-effect relationships.
- Embedding Generation — Dense vector representations for semantic search.
# Unit tests (no infrastructure required)
pytest tests/unit/ -v
# Integration tests (requires Docker services running)
pytest tests/integration/ -v
# All tests
pytest -v| Service | Port | UI |
|---|---|---|
| FastAPI | 8000 | http://localhost:8000/docs |
| PostgreSQL | 5432 | — |
| Neo4j | 7474 / 7687 | http://localhost:7474 |
| Redis | 6379 | — |
| Redpanda | 9092 | — |
| pgAdmin | 5050 | http://localhost:5050 |
- Fork the repository.
- Create a feature branch:
git checkout -b feature/your-feature. - Follow the coding standards:
- All async functions in API routes.
- Type hints on all function signatures, including return types.
- Docstrings on every function.
- Configuration from
app/config.pyonly — no hardcoded values. - Parameterized database queries — no string interpolation in SQL/Cypher.
- Structured logging via
structlog. - Consistent error responses:
{"error": str, "detail": str, "code": str}.
- Write tests for new functionality.
- Submit a pull request.
MIT