PixCollect is a fast, multi-source image scraper designed to efficiently collect and save images for datasets and creative projects.
- Multi-threaded image scraping for fast and efficient downloads.
- Supports sources sources like Google, and Pixabay.
- Flexible configuration via appsettings.json file.
- Saves images in various formats and integrates with your file system.
- Clone the repository:
https://github.com/Isaac987/PixCollect.git
- Navigate to the project directory:
cd PixCollect - Build the project:
dotnet build
Run a scraping session with a specific query and limit the number of images:
dotnet run scrape run <query> <limit>query: The keyword or search term for the images.limit: Maximum number of images to scrape.
Enable or disable specific image sources for the session (currently supports: google, pixabay):
# Enable a source
dotnet run scrape enable-source <source>
# Disable a source
dotnet run scrape disable-source <source>source: The name of the image source to enable or disable.
View and modify default scrape settings:
# List current settings
dotnet run scrape list-settings
# Update a default setting
dotnet run scrape set-output-directory <directory-path> # Change the output directory
dotnet run scrape set-format <image-format> # Set the default image format
dotnet run scrape set-headless <true|false> # Enable or disable headless modedirectory-path: Specifies the directory where scraped images will be saved. For example: /path/to/output..image-format: The desired format for images (e.g., jpg, png). Must be a valid format.true|false: Use true to enable headless mode or false to disable it.
