Web Scraper for Quotes

This project is a Python-based web scraper that collects quotes from Quotes to Scrape and stores them in a SQLite database. The scraper is asynchronous and uses SQLAlchemy for database interaction. An optional FastAPI interface is provided to view the quotes via a web API.

Features

Asynchronous web scraping using httpx
HTML parsing with BeautifulSoup
Storage in SQLite via SQLAlchemy
Configurable settings via .env and Pydantic
Optional FastAPI endpoint for browsing quotes
Ready for deployment via Docker

Requirements

Python 3.11+
Pip packages as listed in requirements.txt
Optional: Docker Desktop for containerized deployment

Setup

Clone the repository

git clone https://github.com/your-username/web_scraper.git
cd web_scraper

Create and activate a virtual environment

python -m venv venv

Windows PowerShell

venv\Scripts\Activate.ps1

Linux/macOS

source venv/bin/activate

Install dependencies

pip install -r requirements.txt

Create a .env file

Create a file named .env in the project root with the following contents:

BASE_URL=https://quotes.toscrape.com
DATABASE_URL=sqlite:///data.db
REQUEST_TIMEOUT=10

explanation:

BASE_URL – URL of the website to scrape
DATABASE_URL – SQLAlchemy database URL
REQUEST_TIMEOUT – HTTP request timeout in seconds

Running the Scraper

Initialize the database

python -m scripts.init_db

Run the scraper

python -m scripts.run_scraper

The scraper will fetch quotes and store them in data.db

Viewing Data

Option 1: Console Output

Use the demo script to print quotes in a formatted table:

python -m scripts.demo_output

Option 2: CSV File

Generate a CSV file for easy viewing:

python -m scripts.demo_csv

The file quotes_demo.csv will contain all scraped quotes with authors and tags.

Option 3: FastAPI Endpoint (Optional)

If you want a web interface: 1. Modify the Dockerfile or run Uvicorn directly:

uvicorn app.api.main:app --reload

Open the browser:

API endpoint: http://127.0.0.1:8000/quotes
Swagger UI: http://127.0.0.1:8000/docs

Docker Deployment

You can run the scraper in a container without installing Python or dependencies.

Build the Docker image

docker build -t web_scraper_demo .

Run the container

docker run --rm --env-file .env -v ${PWD}/data.db:/app/data.db web_scraper_demo

--env-file .env passes environment variables to the container
-v ${PWD}/data.db:/app/data.db ensures the SQLite database persists on the host

Notes

Do not include secret keys or sensitive information in the Dockerfile.
.env should be created locally and is ignored in Git via .gitignore.
The project is intended for educational and demonstration purposes using public data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraper for Quotes

Features

Requirements

Setup

Windows PowerShell

Linux/macOS

Running the Scraper

Viewing Data

Docker Deployment

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
scripts		scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
quotes_demo.csv		quotes_demo.csv
requirements.txt		requirements.txt
run_all.ps1		run_all.ps1

Folders and files

Latest commit

History

Repository files navigation

Web Scraper for Quotes

Features

Requirements

Setup

Windows PowerShell

Linux/macOS

Running the Scraper

Viewing Data

Docker Deployment

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages