Skip to content

v01dedknight/web_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraper for Quotes

This project is a Python-based web scraper that collects quotes from Quotes to Scrape and stores them in a SQLite database. The scraper is asynchronous and uses SQLAlchemy for database interaction. An optional FastAPI interface is provided to view the quotes via a web API.

Features

  • Asynchronous web scraping using httpx
  • HTML parsing with BeautifulSoup
  • Storage in SQLite via SQLAlchemy
  • Configurable settings via .env and Pydantic
  • Optional FastAPI endpoint for browsing quotes
  • Ready for deployment via Docker

Requirements

  • Python 3.11+
  • Pip packages as listed in requirements.txt
  • Optional: Docker Desktop for containerized deployment

Setup

  1. Clone the repository
git clone https://github.com/your-username/web_scraper.git
cd web_scraper
  1. Create and activate a virtual environment
python -m venv venv

Windows PowerShell

venv\Scripts\Activate.ps1

Linux/macOS

source venv/bin/activate
  1. Install dependencies
pip install -r requirements.txt
  1. Create a .env file

Create a file named .env in the project root with the following contents:

BASE_URL=https://quotes.toscrape.com
DATABASE_URL=sqlite:///data.db
REQUEST_TIMEOUT=10

explanation:

  • BASE_URL – URL of the website to scrape
  • DATABASE_URL – SQLAlchemy database URL
  • REQUEST_TIMEOUT – HTTP request timeout in seconds

Running the Scraper

  1. Initialize the database
python -m scripts.init_db
  1. Run the scraper
python -m scripts.run_scraper
  • The scraper will fetch quotes and store them in data.db

Viewing Data

Option 1: Console Output

Use the demo script to print quotes in a formatted table:

python -m scripts.demo_output

Option 2: CSV File

Generate a CSV file for easy viewing:

python -m scripts.demo_csv
  • The file quotes_demo.csv will contain all scraped quotes with authors and tags.

Option 3: FastAPI Endpoint (Optional)

If you want a web interface: 1. Modify the Dockerfile or run Uvicorn directly:

uvicorn app.api.main:app --reload
  1. Open the browser:

Docker Deployment

You can run the scraper in a container without installing Python or dependencies.

  1. Build the Docker image
docker build -t web_scraper_demo .
  1. Run the container
docker run --rm --env-file .env -v ${PWD}/data.db:/app/data.db web_scraper_demo
  • --env-file .env passes environment variables to the container
  • -v ${PWD}/data.db:/app/data.db ensures the SQLite database persists on the host

Notes

  • Do not include secret keys or sensitive information in the Dockerfile.
  • .env should be created locally and is ignored in Git via .gitignore.
  • The project is intended for educational and demonstration purposes using public data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors