Skip to content

Noblenog/Atlas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

182 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Atlas

A full-stack research tool that uses a multi-agent LLM pipeline to reconstruct historical daily life for any region and year, then generates immersive second-person narratives grounded in primary sources.

Built as a Masters dissertation project evaluating multi-agent LLM pipelines for historically grounded narrative generation.


Overview

Given a region (e.g. Florence), a year (e.g. 1348), and an optional role (e.g. merchant), Historical Explorer:

  1. Researches — a Historian agent gathers facts from YouTube, Wikidata, and Wikipedia.
  2. Structures — a Sociologist agent maps facts to a Seshat Ontology (social classes, professions, beliefs, diet, housing) and stores it in Neo4j.
  3. Narrates — a Narrator agent generates a streaming second-person narrative, optionally with branching interactive choices.

The frontend provides an interactive historical map, video browser, narrative view, and comparative role analysis.


Architecture

┌─────────────────────────────────────────────────────────┐
│  React Frontend (port 3000)                              │
│  Map · Video Browser · Narrative · Comparison           │
└───────────────────────┬─────────────────────────────────┘
                        │ HTTP / SSE
┌───────────────────────▼─────────────────────────────────┐
│  Flask API (port 5001)                                   │
│  20 endpoints across 8 blueprints                        │
├──────────────────┬──────────────────┬────────────────────┤
│ LangGraph Agents │  Data Sources    │  Geocoder          │
│  Historian       │  YouTube API     │  TimeMap           │
│  Sociologist     │  Wikidata SPARQL │  Historical polity │
│  Narrator        │  Wikipedia REST  │  boundaries        │
│                  │  Europeana       │  Nominatim         │
│                  │  Semantic Scholar│                    │
│                  │  Internet Archive│                    │
├──────────────────┴──────────────────┴────────────────────┤
│  Neo4j (bolt://localhost:7687)                           │
│  Polity · Video · InfoChunk · Keyword · Person nodes     │
└─────────────────────────────────────────────────────────┘

Tech Stack

Layer Technology
Frontend React 19, Mapbox GL, vis-network, Embla Carousel
Backend Python 3, Flask 3, Flask-CORS
Agent orchestration LangGraph, LangChain, Anthropic Claude
Database Neo4j 5 (local)
Geocoding Nominatim, Shapely, historical basemaps
Data sources YouTube Data API v3, Wikidata, Wikipedia, Europeana, Semantic Scholar, Internet Archive
Embeddings sentence-transformers
Testing pytest (backend), Jest + React Testing Library (frontend)

Prerequisites

  • Python 3.10+
  • Node.js 18+ and npm
  • Neo4j 5 running locally at bolt://localhost:7687
  • API keys (see Configuration)

Installation

1. Clone the repository

git clone <repo-url>
cd LLM-Scraper-Project

2. Backend

cd backend
pip install -r requirements.txt

3. Frontend

cd frontend
npm install

Windows Setup

Step 1 — Install Prerequisites

Install these in order:

Software Source Notes
Python 3.10+ python.org/downloads ✅ Check "Add Python to PATH" during install
Node.js 18+ nodejs.org LTS version recommended
Git for Windows git-scm.com Installs Git Bash — required for npm scripts
Neo4j Desktop neo4j.com/download Bundles its own Java; create a local DB after install

VSCode extensions to install:

  • Python (Microsoft)
  • Pylance

Step 2 — Configure VSCode Terminal to Use Git Bash

The npm scripts use Unix-style env var syntax (NODE_OPTIONS=... react-scripts start) which does not work in PowerShell or CMD — it only works in Git Bash.

In VSCode: Ctrl+Shift+P"Terminal: Select Default Profile"Git Bash

All terminal commands below assume Git Bash.


Step 3 — Copy / Clone the Project

git clone <repo-url>
cd LLM-Scraper-Project

Or transfer the zip archive to the Windows machine and extract it.


Step 4 — Backend Setup

cd backend

# Create and activate a virtual environment
python -m venv venv                  # Note: "python" not "python3" on Windows
source venv/Scripts/activate         # Git Bash path (not venv/bin/activate)

# Install dependencies (sentence-transformers is large — takes a few minutes)
pip install -r requirements.txt

If pip install fails on sentence-transformers or shapely: Install Microsoft C++ Build Tools (free), restart, then retry.


Step 5 — Create backend/.env

Create a new file at backend/.env:

ANTHROPIC_API_KEY=sk-ant-...
YOUTUBE_API_KEY=AIza...
EUROPEANA_API_KEY=...
NEO4J_PASSWORD=your-neo4j-password
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
SEMANTIC_SCHOLAR_API_KEY=         # optional
YOUTUBE_COOKIES_FILE=             # optional — if used, use forward slashes:
                                  # C:/Users/YourName/Downloads/youtube_cookies.txt

Step 6 — Frontend Setup

cd frontend
npm install

Create frontend/.env.local:

REACT_APP_MAPBOX_TOKEN=pk.eyJ1...
REACT_APP_NEO4J_PASSWORD=your-neo4j-password

Step 7 — Start Neo4j

  1. Open Neo4j Desktop
  2. Create a new local database (or open an existing one)
  3. Set the password to match NEO4J_PASSWORD in your .env
  4. Click Start — wait until the status shows "Running"

Step 8 — Run the Project

Open two Git Bash terminals in VSCode (Ctrl+Shift+5 to split):

Terminal 1 — Backend:

cd backend
source venv/Scripts/activate
python main.py api
# Should print: Listening on http://localhost:5001

Terminal 2 — Frontend:

cd frontend
npm start
# Opens http://localhost:3000 in the browser automatically

Mac → Windows Differences at a Glance

Mac Windows (Git Bash)
Python command python3 python
Venv activate source venv/bin/activate source venv/Scripts/activate
npm scripts work as-is work as-is in Git Bash only
.env file paths /Users/name/... C:/Users/name/...
Neo4j install Homebrew or Desktop Neo4j Desktop only

The project is fully cross-platform when Git Bash is used as the VSCode terminal. The only configuration difference is YOUTUBE_COOKIES_FILE in .env — use Windows-style paths if that feature is needed.


Configuration

Backend — backend/.env

ANTHROPIC_API_KEY=sk-ant-...
YOUTUBE_API_KEY=AIza...
EUROPEANA_API_KEY=...
NEO4J_PASSWORD=your-neo4j-password
NEO4J_URI=bolt://localhost:7687   # optional, defaults to this
NEO4J_USER=neo4j                  # optional, defaults to neo4j
SEMANTIC_SCHOLAR_API_KEY=         # optional — higher rate limits
YOUTUBE_COOKIES_FILE=             # optional — for age-restricted videos

Frontend — frontend/.env.local

REACT_APP_MAPBOX_TOKEN=pk.eyJ1...
REACT_APP_NEO4J_PASSWORD=your-neo4j-password

Running the project

API server (required for the frontend)

cd backend
python3 main.py api
# Listening on http://localhost:5001

Frontend dev server

cd frontend
npm start
# Opens http://localhost:3000

CLI — run the research pipeline directly

cd backend
python3 main.py research -r Florence -y 1348
python3 main.py research -r "Medieval England" -y 1200 --role "peasant farmer"

CLI — interactive narrative REPL

cd backend
python3 main.py interactive

API Reference

All endpoints are served at http://localhost:5001.

Research

Method Path Description
POST /api/research/stream Run the 3-agent pipeline; returns SSE stream of narrative chunks
POST /api/research/interactive Advance a branching narrative session
GET /api/research/suggest-roles LLM-suggested roles for a region/year
POST /api/research/compare Generate parallel narratives for multiple roles

/api/research/stream body: { "region": "Florence", "year": 1348, "role": "merchant" }

SSE event types: status, location, ontology, narrative_chunk, complete, error

Videos

Method Path Description
GET /api/videos/by-role YouTube videos grouped by historical role
GET /api/video/transcript Timestamped transcript for a video
POST /api/video/save Persist a video + transcript to Neo4j
GET /api/video/annotated-transcript LLM-extracted location mentions

Geography

Method Path Description
GET /api/polygons GeoJSON historical borders for a year
GET /api/polities All known historical polities
GET /api/polities/at-point Polity at a lat/lon/year
GET /api/polities/search Autocomplete polity search
GET /api/roles/suggest Suggested roles for a polity/year

Knowledge graph & misc

Method Path Description
GET /api/polity/context Neo4j knowledge for a polity
GET /api/graph/keyword Keyword relationships from Neo4j
GET /api/keyword/discover Historical periods + enriched videos for a keyword
GET /api/figures/at-year Wikidata figures alive at a year
GET /api/wikipedia/excerpt Wikipedia summary proxy
GET /api/health Flask + Neo4j status

Experiments

Five evaluation scripts support Chapter 4 of the dissertation. All are run from the backend/ directory and write results to experiments/results/.

cd backend

# 1. Geocoder accuracy — 30 ground-truth region/year pairs (0–1800 CE)
python3 ../experiments/01_geocoder_accuracy.py

# 2. Pipeline benchmark — end-to-end latency for 8 region/year combos
python3 ../experiments/02_pipeline_benchmark.py        # live API
python3 ../experiments/02_pipeline_benchmark.py --mock # cached responses

# 3. Relevance scoring — precision/recall vs. 40 synthetic videos
python3 ../experiments/03_relevance_scoring.py

# 4. Source coverage — quality and latency metrics per data source
python3 ../experiments/04_source_coverage.py

# 5. Narrative grounding — RAG vs. no-RAG narrative grounding rate
python3 ../experiments/05_narrative_grounding.py

Results are saved as CSV files and consolidated into experiments/results/chapter4_tables.xlsx.


Testing

Backend

cd backend
pytest                   # all tests
pytest --cov=.           # with coverage

Frontend

cd frontend
npm test                 # watch mode
npm test -- --watchAll=false  # single run

External API quotas

Service Key required Default quota
YouTube Data API v3 Yes 10,000 units/day
Anthropic Claude Yes Per account plan
Europeana Yes 1,000 req/day
Wikidata SPARQL No Public
Semantic Scholar Optional 100 req/5 min (unauthenticated)
Internet Archive No ~1 req/2 s
Nominatim No ~1 req/s

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors