Skip to content

A robust Python crawler to extract, normalize, and export the list of U.S. embassies and consulates worldwide from the U.S. Department of State website. The project provides detailed embassy/consulate information in CSV, JSON, and YAML formats, with features for caching, progress tracking, and continent detection.

License

Notifications You must be signed in to change notification settings

BaseMax/us-embassies-consulates

Repository files navigation

US Embassies & Consulates Crawler

MIT License

A robust Python crawler to extract, normalize, and export the list of U.S. embassies and consulates worldwide from the U.S. Department of State website. The project provides detailed embassy/consulate information in CSV, JSON, and YAML formats, with features for caching, progress tracking, and continent detection.

Features

  • Crawls the official U.S. State Department embassy/consulate list
  • Extracts detailed information: country, city, code, continent, full name, address, telephone, fax, email, website, cancel/reschedule info, Google Maps link
  • Robust HTML parsing with caching for efficiency
  • Auto-detects continent (supports English country/city names)
  • Exports data to CSV, JSON, and YAML
  • Progress bar and logging for user feedback
  • Deduplication and navigation link filtering
  • Modular, maintainable codebase using Python best practices

Usage

  1. Clone the repository:

    git clone https://github.com/BaseMax/us-embassies-consulates.git
    cd us-embassies-consulates
  2. Install dependencies:

    pip install .

    (Or, for development: pip install -e .)

    This project uses PEP 621 and pyproject.toml for dependency management. No requirements.txt is needed.

  3. Run the crawler:

    python app.py
  4. Output files:

    • us_embassies_consulates.csv
    • us_embassies_consulates.json
    • us_embassies_consulates.yml

Project Structure

  • app.py — Main crawler and exporter script
  • .cache/ — Cached HTML pages for efficiency
  • us_embassies_consulates.csv — Exported embassy/consulate data (CSV)
  • us_embassies_consulates.json — Exported data (JSON)
  • us_embassies_consulates.yml — Exported data (YAML)

Customization

  • Continent Mapping:
    • The script auto-detects continent from country/city (supports English names)
  • Caching:
    • HTML pages are cached in .cache/ to minimize repeated requests
  • Logging & Progress:
    • Uses Python logging and tqdm for progress bars

License

MIT License

© 2025 Seyyed Ali Mohammadiyeh (MAX BASE)

See LICENSE for details.

About

A robust Python crawler to extract, normalize, and export the list of U.S. embassies and consulates worldwide from the U.S. Department of State website. The project provides detailed embassy/consulate information in CSV, JSON, and YAML formats, with features for caching, progress tracking, and continent detection.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages