Graphclone

Graphclone is a structural website mapping tool that crawls complex site hierarchies and converts them into a normalized graph abstraction. The resulting structure feeds a TypeScript/React wireframe builder that generates visual documentation and exportable artifacts for engineering scoping and architectural analysis. Graphclone is intended for authorized environments such as internal systems, owned properties, or sites where you have explicit permission to perform automated analysis.

✨ Features

🌐 Structural Crawling

Traverses nested directory structures and dynamic routing patterns while preventing infinite loops and redundant traversal.

🔒 Authenticated Session Support

WAIT_FOR_LOGIN allows the user to login with their credentials like they would normally
Keeps passwords private by never requiring your password to be entered into Graphclone
Enables scraping of gated content.

🔗 URL Canonicalization

Applies structural heuristics to normalize dynamic URLs
Handles parameterized routes to prevent duplicate mapping
Configurable thresholds for predictable outputs

📸 Snapshot & Screenshot Capture

Raw HTML snapshot
Page screenshot
Canonicalized route metadata

🧩 Wireframe & Export Pipeline

Structured outputs feed directly into a Next.js TypeScript application /wireframe that reconstructs page layouts and component hierarchies for review, documentation, and scoping workflows.

🏗️ Architecture

Scraper (main.py)
Asynchronous Python
Playwright-driven browser automation

📁 Output stored in /scrape-results

HTML snapshots
screenshot.png
Route metadata
Wireframe Builder /wireframe
Next.js
TypeScript
Consumes structural mapping output
Renders interactive wireframes
Supports PDF export

⚙️ Setup & Installation

Install Python Dependencies pip install -r requirements.txt and playwright install chrome
Setup Next.js Wireframe Builder Bash Copy code cd wireframe npm install Usage
Run the Mapper Bash Copy code python main.py
Authenticated Mode (Optional) In config.py: Python Copy code WAIT_FOR_LOGIN = True When enabled: A visible browser window opens. Log into your website as you normally would. Return to the terminal and press Enter. Crawling continues using the authenticated session. Authenticated runs are be labeled with a suffix in the _logged_in to distinguish them from public crawls.
With the scrape results, feed into an LLM of your choice (.github/agents/graphclone-builder.md is recommended) to output to /wireframe
Generate your Product requirements document with python generate_prd.py

🧪 Automated tests

python -m pytest tests -q

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graphclone

✨ Features

🌐 Structural Crawling

🔒 Authenticated Session Support

🔗 URL Canonicalization

📸 Snapshot & Screenshot Capture

🧩 Wireframe & Export Pipeline

🏗️ Architecture

📁 Output stored in /scrape-results

⚙️ Setup & Installation

🧪 Automated tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/agents		.github/agents
tests		tests
wireframe		wireframe
.gitignore		.gitignore
README.md		README.md
config.py		config.py
generate_prd.py		generate_prd.py
main.py		main.py
product-requirements.pdf		product-requirements.pdf
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Graphclone

✨ Features

🌐 Structural Crawling

🔒 Authenticated Session Support

🔗 URL Canonicalization

📸 Snapshot & Screenshot Capture

🧩 Wireframe & Export Pipeline

🏗️ Architecture

📁 Output stored in /scrape-results

⚙️ Setup & Installation

🧪 Automated tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages