A flexible, production-ready web scraper for extracting product information from e-commerce websites. Built with Python, this tool allows you to scrape product names and prices using custom CSS selectors.
- ✨ Configurable CSS Selectors: Adapt to any website structure
- 🔄 Automatic Retry Logic: Handles network failures gracefully
- 📝 Comprehensive Logging: Track scraping progress and errors
- 🛡️ Anti-Blocking Measures: User-agent rotation and request delays
- 💾 CSV Export: Save data with timestamps and source URLs
- ⚙️ Command-Line Interface: Easy to use and automate
- 🧹 Data Cleaning: Automatically cleans prices (removes currency symbols)
-
Clone or download this project
-
Install dependencies:
pip install -r requirements.txt
python scraper.py \
--url "https://example.com/products" \
--product-selector ".product" \
--name-selector ".name" \
--price-selector ".price"python scraper.py \
--url "https://example.com/products" \
--product-selector ".item" \
--name-selector "h2.title" \
--price-selector "span.cost" \
--output "my_products.csv" \
--max-retries 5 \
--delay 2.0| Argument | Required | Default | Description |
|---|---|---|---|
--url |
Yes | - | Target website URL |
--product-selector |
Yes | - | CSS selector for product containers |
--name-selector |
Yes | - | CSS selector for product names |
--price-selector |
Yes | - | CSS selector for product prices |
--output |
No | data/products.csv |
Output CSV file path |
--max-retries |
No | 3 |
Maximum retry attempts |
--delay |
No | 1.0 |
Delay between requests (seconds) |
To use this scraper, you need to identify the correct CSS selectors for your target website:
- Open the target website in Chrome or Firefox
- Right-click on a product and select "Inspect" or "Inspect Element"
- The browser's Developer Tools will open
Look for patterns in the HTML structure:
<div class="product-card">
<h3 class="product-title">Example Product</h3>
<span class="product-price">$29.99</span>
</div>For this structure, your selectors would be:
--product-selector ".product-card"--name-selector ".product-title"--price-selector ".product-price"
- Classes: Use
.classname(most common) - IDs: Use
#idname(for unique elements) - Tags: Use tag names like
h3,span,div - Nested: Combine selectors like
div.product h3.title - Test: Use browser console:
document.querySelectorAll('.product-card')
The scraper generates a CSV file with the following columns:
| Column | Description |
|---|---|
product_name |
Name of the product |
price |
Price (cleaned, no currency symbols) |
timestamp |
When the data was scraped |
source_url |
Source website URL |
Example output:
product_name,price,timestamp,source_url
Wireless Mouse,24.99,2025-11-11 14:30:22,https://example.com/products
Mechanical Keyboard,89.99,2025-11-11 14:30:22,https://example.com/products
USB-C Cable,12.99,2025-11-11 14:30:22,https://example.com/productspython scraper.py \
--url "https://example-store.com/electronics" \
--product-selector "div[data-component-type='s-search-result']" \
--name-selector "h2 span" \
--price-selector ".a-price-whole"python scraper.py \
--url "https://shop.example.com/category/phones" \
--product-selector ".product-grid-item" \
--name-selector ".product-name" \
--price-selector ".price"python scraper.py \
--url "https://marketplace.example.com/deals" \
--product-selector "article.listing" \
--name-selector "div.info h2 a" \
--price-selector "div.pricing span.amount"Problem: Scraper runs but finds 0 products.
Solutions:
- Verify your CSS selectors using browser Developer Tools
- Check if the website loads content dynamically with JavaScript (this scraper only works with static HTML)
- Test your selector in the browser console:
document.querySelectorAll('your-selector')
Problem: Scraper fails with connection timeout or 403/429 errors.
Solutions:
- Increase the
--delayparameter (e.g.,--delay 3.0) - Increase
--max-retries(e.g.,--max-retries 5) - Check if the website blocks automated access (requires more advanced techniques)
- Verify the URL is correct and accessible in your browser
Problem: Wrong data appears in the CSV.
Solutions:
- Double-check your selectors are specific enough
- Inspect the HTML structure carefully - child selectors are relative to the product container
- Use more specific selectors (e.g.,
span.priceinstead of just.price)
Problem: Website uses JavaScript to load products.
Solutions: This scraper only works with static HTML. For JavaScript-heavy sites, consider:
- Using Selenium or Playwright instead
- Finding an API endpoint the website uses (check Network tab in Developer Tools)
- Checking if the site has an official API
- Check the website's
robots.txtfile (e.g.,https://example.com/robots.txt) - Review the website's Terms of Service for scraping policies
- Use appropriate delays between requests (
--delayparameter) - Don't overload servers with too many requests
- Respect copyright and data usage rights
- Some websites explicitly prohibit scraping
- Start Small: Test with a single page before scraping large amounts of data
- Be Respectful: Use delays of 1-3 seconds between requests
- Monitor Logs: Check the console output for errors and warnings
- Regular Testing: Website structures change; test your selectors periodically
- Error Handling: The scraper logs all errors - review them to improve your selectors
You can also import and use the scraper in your own Python scripts:
from scraper import ProductScraper
# Create scraper instance
scraper = ProductScraper(
url="https://example.com/products",
product_selector=".product",
name_selector=".name",
price_selector=".price",
max_retries=3,
delay=1.0
)
# Run scraper
success = scraper.scrape(output_path="custom_output.csv")
if success:
print("Scraping completed successfully!")
else:
print("Scraping failed!")product_scraper/
├── scraper.py # Main scraper script
├── requirements.txt # Python dependencies
├── README.md # This file
└── data/ # Output directory (auto-created)
└── products.csv # Scraped data
- requests: HTTP library for fetching web pages
- beautifulsoup4: HTML parsing and CSS selector support
- lxml: Fast XML/HTML parser (used by BeautifulSoup)
Feel free to modify and extend this scraper for your needs. Some ideas:
- Add support for pagination
- Export to JSON or database
- Add proxy support
- Implement concurrent scraping for multiple pages
- Add price comparison features
- Create a web UI
This is a starter project for educational purposes. Use responsibly and in accordance with applicable laws and website terms of service.
If you encounter issues:
- Check the troubleshooting section above
- Review the logs for specific error messages
- Verify your CSS selectors in the browser
- Ensure the website is accessible and doesn't require authentication
Happy Scraping! 🚀