Skip to content

HothoLina/python-automation-web-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Automation Web Scraper

Overview

This project is a Python web scraper that extracts book data from Books to Scrape and saves it to a CSV file. The scraper handles pagination, error handling, request headers, logging, and delays to behave like a real-world automation tool.

Features

  • Scrapes all pages of the website automatically
  • Extracts Title, Price, Rating
  • Saves data to output/books_data.csv
  • Logs progress and errors in logs/scraper.log
  • Includes error handling and request delays to prevent blocking

Project Structure

src/ main.py # Entry point scraper.py # Handles scraping pages parser.py # Extracts data from HTML storage.py # Saves data to CSV logs/ # Log file directory output/ # CSV output directory

Technologies

  • Python
  • Requests
  • BeautifulSoup
  • Pandas
  • Logging

Installation

  1. Clone the repository:
git clone https://github.com/HothoLina/python-automation-web-scraper.git

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages