WebMapping

Introduction

A comprehensive web mapping application that discovers and maps connections between websites. Starting from a given URL, it crawls outward to find linked websites. It can use plugins that can captures screenshots, detects bounding boxes, and it stores all such information in a Neo4j database.

The application includes a web-based dashboard for monitoring crawling progress and viewing captured screenshots.

Prerequisites

The project requires Python 3.12+ and all dependencies are managed through pyproject.toml and can be installed using the Makefile commands.

Setup

Running with Docker Compose

The application runs as two separate containers: a crawler and a dashboard. Start all services using:

make compose

This starts:

Dashboard: Web interface at http://localhost:8000
Crawler: Background service for crawling websites
Neo4j Database: Web interface at http://localhost:7474, database at bolt://localhost:7687
Selenium: Chrome browser for screenshot capture

To stop all services:

make stop

Local Development Setup

For local development without Docker, you need to run two separate processes:

Prerequisites

Start a Neo4j database (can use Docker or devcontainer):

docker run -p 7474:7474 -p 7687:7687 -e NEO4J_AUTH=neo4j/password neo4j:latest

Configure environment:

cp .env.example .env

Edit .env with your Neo4j credentials:

NEO4j_URI="neo4j://localhost:7687"
NEO4j_USERNAME="neo4j"
NEO4j_PASSWORD="password"

Install dependencies:

make install

Running Locally

In separate terminals, run:

Terminal 1 - Dashboard:

uvicorn dashboard.api:app --host 0.0.0.0 --port 8000

Terminal 2 - Crawler:

python src/webmap/main.py

The dashboard will be accessible at http://localhost:8000.

Usage

Basic Usage

from webmap import Crawler

# Initialize crawler with starting URL
crawler = Crawler("https://example.com")

# Start crawling
crawler.run()

Plugin Usage

The crawler supports plugins for additional functionality:

from webmap import Crawler
from webmap.boundingbox import BoundingBoxCapture

# Initialize crawler with starting URL
crawler = Crawler("https://example.com")

# Add bounding box capture plugin
def capture_bounding_boxes(url: str) -> None:
    capture = BoundingBoxCapture()
    capture.capture_and_save(url)

crawler.add(capture_bounding_boxes)

# Start crawling with plugins
crawler.run()

Note, if you want to be able to see these tings on the webpage as well, you will have to write a "plugin" web. You can see how that is done in screenshot and boungingbox, for examples.

Web Dashboard

The dashboard container provides a web interface accessible at http://localhost:8000 when running with Docker Compose. The dashboard provides real-time crawling statistics, crawler control, and screenshot viewing capabilities.

Command Line Tools

The tools/ directory contains utility scripts:

boundingbox.py: Capture bounding box screenshots for a given URL
save_screenshot.py: Retrieve and save screenshots from the database
clean_database.py: Database maintenance utilities

See main.py for a complete example.

DevContainers

This project includes a devcontainer configuration for development in VS Code with Docker. This provides a consistent development environment with all dependencies pre-installed.

Project Structure

src/
├── webmap/                 # Crawler application
│   ├── main.py            
│   ├── screenshot/         # Screenshot capture
│   ├── boundingbox/        # Bounding box detection
│   └── database/           # Neo4j database integration
├── dashboard/              # Dashboard application
│   ├── api.py              # FastAPI web interface
│   ├── screenshot/         # Screenshot web plugin
│   └── boundingbox/        # Bounding box web plugin
containers/
├── crawler/                # Crawler Docker container
└── dashboard/              # Dashboard Docker container
tools/                      # Utility scripts

Contributions

Allowed to add anything you want, please follow resonable standards and the ones found in docs.

License

Author: Uplink036

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
containers		containers
docs		docs
src		src
tests		tests
tools		tools
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
compose.yaml		compose.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebMapping

Introduction

Prerequisites

Setup

Running with Docker Compose

Local Development Setup

Prerequisites

Running Locally

Usage

Basic Usage

Plugin Usage

Web Dashboard

Command Line Tools

DevContainers

Project Structure

Contributions

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WebMapping

Introduction

Prerequisites

Setup

Running with Docker Compose

Local Development Setup

Prerequisites

Running Locally

Usage

Basic Usage

Plugin Usage

Web Dashboard

Command Line Tools

DevContainers

Project Structure

Contributions

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages