Skip to content

A simple and efficient HTML scraper built in Rust, designed for easy command-line usage and interactive scraping.

License

Notifications You must be signed in to change notification settings

rafainsights/Seascraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Screenshot_20260126_215827

The simplest HTML scraper

Status GitHub Issues GitHub Pull Requests License


A simple and efficient HTML scraper built in Rust, designed for easy command-line usage and interactive scraping.

📝 Table of Contents

🧐 About

Seascraper is a lightweight and efficient HTML scraper written in Rust. It allows users to extract text content from web pages using CSS selectors. The tool supports both command-line arguments for quick scraping and an interactive mode for guided usage. It's designed to be simple, fast, and easy to use, making it ideal for developers, data analysts, and anyone needing to scrape web data programmatically.

The project leverages Rust's performance and safety features to provide a reliable scraping solution. It uses libraries like reqwest for HTTP requests, scraper for HTML parsing, and clap for command-line interface handling.

🏁 Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

  • Rust (version 1.70 or later)
  • Cargo (comes with Rust)

You can install Rust using rustup:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Installing

  1. Clone the repository:
git clone <repository-url>
cd seascraper
  1. Build the project:
cargo build --release
  1. Run the scraper:
cargo run

This will start the interactive mode where you can enter the URL, selector, and amount to scrape.

🎈 Usage

Interactive Mode

Run the scraper without arguments to enter interactive mode:

cargo run

You'll be prompted to enter:

  • URL to scrape
  • CSS selector
  • Number of items to scrape

Command Line Mode

Use command-line arguments for direct scraping:

cargo run -- <url> <selector> [amount]

Example:

cargo run -- https://scrapeme.live/shop/ "span.price" 5

Options

  • --bannerless: Run without displaying the ASCII banner
  • --help: Display help information

Example Output

Using the following arguments:
url: https://scrapeme.live/shop/
selector: span.price
amount: 5

The text is: £109.99
The text is: £109.99
The text is: £109.99
The text is: £109.99
The text is: £109.99

⛏️ Built Using

✍️ Authors

░█▀▀░█▀▀░█▀█░█▀▀░█▀▀░█▀▄░█▀█░█▀█░█▀▀░█▀▄
░▀▀█░█▀▀░█▀█░▀▀█░█░░░█▀▄░█▀█░█▀▀░█▀▀░█▀▄
░▀▀▀░▀▀▀░▀░▀░▀▀▀░▀▀▀░▀░▀░▀░▀░▀░░░▀▀▀░▀░▀
  

About

A simple and efficient HTML scraper built in Rust, designed for easy command-line usage and interactive scraping.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Languages