Atlas

A full-stack research tool that uses a multi-agent LLM pipeline to reconstruct historical daily life for any region and year, then generates immersive second-person narratives grounded in primary sources.

Built as a Masters dissertation project evaluating multi-agent LLM pipelines for historically grounded narrative generation.

Overview

Given a region (e.g. Florence), a year (e.g. 1348), and an optional role (e.g. merchant), Historical Explorer:

Researches — a Historian agent gathers facts from YouTube, Wikidata, and Wikipedia.
Structures — a Sociologist agent maps facts to a Seshat Ontology (social classes, professions, beliefs, diet, housing) and stores it in Neo4j.
Narrates — a Narrator agent generates a streaming second-person narrative, optionally with branching interactive choices.

The frontend provides an interactive historical map, video browser, narrative view, and comparative role analysis.

Architecture

┌─────────────────────────────────────────────────────────┐
│  React Frontend (port 3000)                              │
│  Map · Video Browser · Narrative · Comparison           │
└───────────────────────┬─────────────────────────────────┘
                        │ HTTP / SSE
┌───────────────────────▼─────────────────────────────────┐
│  Flask API (port 5001)                                   │
│  20 endpoints across 8 blueprints                        │
├──────────────────┬──────────────────┬────────────────────┤
│ LangGraph Agents │  Data Sources    │  Geocoder          │
│  Historian       │  YouTube API     │  TimeMap           │
│  Sociologist     │  Wikidata SPARQL │  Historical polity │
│  Narrator        │  Wikipedia REST  │  boundaries        │
│                  │  Europeana       │  Nominatim         │
│                  │  Semantic Scholar│                    │
│                  │  Internet Archive│                    │
├──────────────────┴──────────────────┴────────────────────┤
│  Neo4j (bolt://localhost:7687)                           │
│  Polity · Video · InfoChunk · Keyword · Person nodes     │
└─────────────────────────────────────────────────────────┘

Tech Stack

Layer	Technology
Frontend	React 19, Mapbox GL, vis-network, Embla Carousel
Backend	Python 3, Flask 3, Flask-CORS
Agent orchestration	LangGraph, LangChain, Anthropic Claude
Database	Neo4j 5 (local)
Geocoding	Nominatim, Shapely, historical basemaps
Data sources	YouTube Data API v3, Wikidata, Wikipedia, Europeana, Semantic Scholar, Internet Archive
Embeddings	sentence-transformers
Testing	pytest (backend), Jest + React Testing Library (frontend)

Prerequisites

Python 3.10+
Node.js 18+ and npm
Neo4j 5 running locally at bolt://localhost:7687
API keys (see Configuration)

Installation

1. Clone the repository

git clone <repo-url>
cd LLM-Scraper-Project

2. Backend

cd backend
pip install -r requirements.txt

3. Frontend

cd frontend
npm install

Windows Setup

Step 1 — Install Prerequisites

Install these in order:

Software	Source	Notes
Python 3.10+	python.org/downloads	✅ Check "Add Python to PATH" during install
Node.js 18+	nodejs.org	LTS version recommended
Git for Windows	git-scm.com	Installs Git Bash — required for npm scripts
Neo4j Desktop	neo4j.com/download	Bundles its own Java; create a local DB after install

VSCode extensions to install:

Python (Microsoft)
Pylance

Step 2 — Configure VSCode Terminal to Use Git Bash

The npm scripts use Unix-style env var syntax (NODE_OPTIONS=... react-scripts start) which does not work in PowerShell or CMD — it only works in Git Bash.

In VSCode: Ctrl+Shift+P → "Terminal: Select Default Profile" → Git Bash

All terminal commands below assume Git Bash.

Step 3 — Copy / Clone the Project

git clone <repo-url>
cd LLM-Scraper-Project

Or transfer the zip archive to the Windows machine and extract it.

Step 4 — Backend Setup

cd backend

# Create and activate a virtual environment
python -m venv venv                  # Note: "python" not "python3" on Windows
source venv/Scripts/activate         # Git Bash path (not venv/bin/activate)

# Install dependencies (sentence-transformers is large — takes a few minutes)
pip install -r requirements.txt

If pip install fails on sentence-transformers or shapely: Install Microsoft C++ Build Tools (free), restart, then retry.

Step 5 — Create `backend/.env`

Create a new file at backend/.env:

ANTHROPIC_API_KEY=sk-ant-...
YOUTUBE_API_KEY=AIza...
EUROPEANA_API_KEY=...
NEO4J_PASSWORD=your-neo4j-password
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
SEMANTIC_SCHOLAR_API_KEY=         # optional
YOUTUBE_COOKIES_FILE=             # optional — if used, use forward slashes:
                                  # C:/Users/YourName/Downloads/youtube_cookies.txt

Step 6 — Frontend Setup

cd frontend
npm install

Create frontend/.env.local:

REACT_APP_MAPBOX_TOKEN=pk.eyJ1...
REACT_APP_NEO4J_PASSWORD=your-neo4j-password

Step 7 — Start Neo4j

Open Neo4j Desktop
Create a new local database (or open an existing one)
Set the password to match NEO4J_PASSWORD in your .env
Click Start — wait until the status shows "Running"

Step 8 — Run the Project

Open two Git Bash terminals in VSCode (Ctrl+Shift+5 to split):

Terminal 1 — Backend:

cd backend
source venv/Scripts/activate
python main.py api
# Should print: Listening on http://localhost:5001

Terminal 2 — Frontend:

cd frontend
npm start
# Opens http://localhost:3000 in the browser automatically

Mac → Windows Differences at a Glance

	Mac	Windows (Git Bash)
Python command	`python3`	`python`
Venv activate	`source venv/bin/activate`	`source venv/Scripts/activate`
npm scripts	work as-is	work as-is in Git Bash only
`.env` file paths	`/Users/name/...`	`C:/Users/name/...`
Neo4j install	Homebrew or Desktop	Neo4j Desktop only

The project is fully cross-platform when Git Bash is used as the VSCode terminal. The only configuration difference is YOUTUBE_COOKIES_FILE in .env — use Windows-style paths if that feature is needed.

Configuration

Backend — `backend/.env`

ANTHROPIC_API_KEY=sk-ant-...
YOUTUBE_API_KEY=AIza...
EUROPEANA_API_KEY=...
NEO4J_PASSWORD=your-neo4j-password
NEO4J_URI=bolt://localhost:7687   # optional, defaults to this
NEO4J_USER=neo4j                  # optional, defaults to neo4j
SEMANTIC_SCHOLAR_API_KEY=         # optional — higher rate limits
YOUTUBE_COOKIES_FILE=             # optional — for age-restricted videos

Frontend — `frontend/.env.local`

REACT_APP_MAPBOX_TOKEN=pk.eyJ1...
REACT_APP_NEO4J_PASSWORD=your-neo4j-password

Running the project

API server (required for the frontend)

cd backend
python3 main.py api
# Listening on http://localhost:5001

Frontend dev server

cd frontend
npm start
# Opens http://localhost:3000

CLI — run the research pipeline directly

cd backend
python3 main.py research -r Florence -y 1348
python3 main.py research -r "Medieval England" -y 1200 --role "peasant farmer"

CLI — interactive narrative REPL

cd backend
python3 main.py interactive

API Reference

All endpoints are served at http://localhost:5001.

Research

Method	Path	Description
`POST`	`/api/research/stream`	Run the 3-agent pipeline; returns SSE stream of narrative chunks
`POST`	`/api/research/interactive`	Advance a branching narrative session
`GET`	`/api/research/suggest-roles`	LLM-suggested roles for a region/year
`POST`	`/api/research/compare`	Generate parallel narratives for multiple roles

/api/research/stream body: { "region": "Florence", "year": 1348, "role": "merchant" }

SSE event types: status, location, ontology, narrative_chunk, complete, error

Videos

Method	Path	Description
`GET`	`/api/videos/by-role`	YouTube videos grouped by historical role
`GET`	`/api/video/transcript`	Timestamped transcript for a video
`POST`	`/api/video/save`	Persist a video + transcript to Neo4j
`GET`	`/api/video/annotated-transcript`	LLM-extracted location mentions

Geography

Method	Path	Description
`GET`	`/api/polygons`	GeoJSON historical borders for a year
`GET`	`/api/polities`	All known historical polities
`GET`	`/api/polities/at-point`	Polity at a lat/lon/year
`GET`	`/api/polities/search`	Autocomplete polity search
`GET`	`/api/roles/suggest`	Suggested roles for a polity/year

Knowledge graph & misc

Method	Path	Description
`GET`	`/api/polity/context`	Neo4j knowledge for a polity
`GET`	`/api/graph/keyword`	Keyword relationships from Neo4j
`GET`	`/api/keyword/discover`	Historical periods + enriched videos for a keyword
`GET`	`/api/figures/at-year`	Wikidata figures alive at a year
`GET`	`/api/wikipedia/excerpt`	Wikipedia summary proxy
`GET`	`/api/health`	Flask + Neo4j status

Experiments

Five evaluation scripts support Chapter 4 of the dissertation. All are run from the backend/ directory and write results to experiments/results/.

cd backend

# 1. Geocoder accuracy — 30 ground-truth region/year pairs (0–1800 CE)
python3 ../experiments/01_geocoder_accuracy.py

# 2. Pipeline benchmark — end-to-end latency for 8 region/year combos
python3 ../experiments/02_pipeline_benchmark.py        # live API
python3 ../experiments/02_pipeline_benchmark.py --mock # cached responses

# 3. Relevance scoring — precision/recall vs. 40 synthetic videos
python3 ../experiments/03_relevance_scoring.py

# 4. Source coverage — quality and latency metrics per data source
python3 ../experiments/04_source_coverage.py

# 5. Narrative grounding — RAG vs. no-RAG narrative grounding rate
python3 ../experiments/05_narrative_grounding.py

Results are saved as CSV files and consolidated into experiments/results/chapter4_tables.xlsx.

Testing

Backend

cd backend
pytest                   # all tests
pytest --cov=.           # with coverage

Frontend

cd frontend
npm test                 # watch mode
npm test -- --watchAll=false  # single run

External API quotas

Service	Key required	Default quota
YouTube Data API v3	Yes	10,000 units/day
Anthropic Claude	Yes	Per account plan
Europeana	Yes	1,000 req/day
Wikidata SPARQL	No	Public
Semantic Scholar	Optional	100 req/5 min (unauthenticated)
Internet Archive	No	~1 req/2 s
Nominatim	No	~1 req/s

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
backend		backend
experiments		experiments
frontend		frontend
tests		tests
.gitignore		.gitignore
README.md		README.md
jest.config.js		jest.config.js
pytest.ini		pytest.ini

Folders and files

Latest commit

History

Repository files navigation

Atlas

Overview

Architecture

Tech Stack

Prerequisites

Installation

1. Clone the repository

2. Backend

3. Frontend

Windows Setup

Step 1 — Install Prerequisites

Step 2 — Configure VSCode Terminal to Use Git Bash

Step 3 — Copy / Clone the Project

Step 4 — Backend Setup

Step 5 — Create backend/.env

Step 6 — Frontend Setup

Step 7 — Start Neo4j

Step 8 — Run the Project

Mac → Windows Differences at a Glance

Configuration

Backend — backend/.env

Frontend — frontend/.env.local

Running the project

API server (required for the frontend)

Frontend dev server

CLI — run the research pipeline directly

CLI — interactive narrative REPL

API Reference

Research

Videos

Geography

Knowledge graph & misc

Experiments

Testing

Backend

Frontend

External API quotas

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Step 5 — Create `backend/.env`

Backend — `backend/.env`

Frontend — `frontend/.env.local`

Packages