Python Automation Web Scraper

Overview

This project is a Python web scraper that extracts book data from Books to Scrape and saves it to a CSV file. The scraper handles pagination, error handling, request headers, logging, and delays to behave like a real-world automation tool.

Features

Scrapes all pages of the website automatically
Extracts Title, Price, Rating
Saves data to output/books_data.csv
Logs progress and errors in logs/scraper.log
Includes error handling and request delays to prevent blocking

Project Structure

src/ main.py # Entry point scraper.py # Handles scraping pages parser.py # Extracts data from HTML storage.py # Saves data to CSV logs/ # Log file directory output/ # CSV output directory

Technologies

Python
Requests
BeautifulSoup
Pandas
Logging

Installation

Clone the repository:

git clone https://github.com/HothoLina/python-automation-web-scraper.git

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
logs		logs
output		output
src		src
.gitignore		.gitignore
README.md		README.md
data		data
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Automation Web Scraper

Overview

Features

Project Structure

Technologies

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Python Automation Web Scraper

Overview

Features

Project Structure

Technologies

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages