A high-performance C# console application for scraping Vietnamese lottery (Vietlott) draw results and exporting them to CSV format. This tool efficiently collects historical lottery data from the official Vietlott website with parallel processing capabilities.
- Multi-lottery Support: Scrapes data for three lottery types (535, 645, 655)
- Parallel Processing: Uses up to 32 concurrent threads for fast data collection
- Incremental Updates: Only scrapes new draw codes, avoiding duplicate data
- CSV Export: Exports data to organized CSV files for each lottery type
- Comprehensive Logging: Detailed logging with Serilog to console and file
- Error Handling: Robust error handling with retry mechanisms
- Command Line Options: Configurable scraping limits via command line arguments
- .NET 8.0 or later
- Internet connection to access Vietlott website
The project uses the following NuGet packages:
- CsvHelper (v33.1.0) - CSV file reading and writing
- HtmlAgilityPack (v1.12.4) - HTML parsing and web scraping
- RestSharp (v112.1.0) - HTTP client for API requests
- Serilog (v4.3.0) - Structured logging framework
- Serilog.Sinks.Console (v6.0.0) - Console logging output
- Serilog.Sinks.File (v7.0.0) - File logging output
- Clone the repository:
git clone https://github.com/nsknet/VietlottScraper
cd VietlottScraper- Restore dependencies:
dotnet restore- Build the project:
dotnet buildRun the application without any parameters to scrape all available lottery data:
dotnet runUse the total parameter to limit the number of records to scrape:
dotnet run --total 100This will scrape only the first 100 missing draw codes for each lottery type.
The application generates the following organized file structure:
VietlottScraper/
├── csv/
│ ├── 535.csv # Draw results for lottery type 535
│ ├── 645.csv # Draw results for lottery type 645
│ └── 655.csv # Draw results for lottery type 655
└── logs/
└── log.txt # Application logs (rotated daily)
- csv/ directory contains all lottery data files
- logs/ directory contains application log files
Each CSV file contains the following columns:
| Column | Description |
|---|---|
| DrawCode | Sequential draw number |
| LotteryType | Type of lottery (535, 645, or 655) |
| DrawDate | Date of the draw |
| WinningNumbers | The winning lottery numbers |
| FirstPrizeVnd | First prize amount in VND |
| FirstPrizeWinners | Number of first prize winners |
| SecondPrizeVnd | Second prize amount in VND |
| SecondPrizeWinners | Number of second prize winners |
| ThirdPrizeVnd | Third prize amount in VND |
| ThirdPrizeWinners | Number of third prize winners |
| FourthPrizeVnd | Fourth prize amount in VND |
| FourthPrizeWinners | Number of fourth prize winners |
| FifthPrizeVnd | Fifth prize amount in VND |
| FifthPrizeWinners | Number of fifth prize winners |
| SixthPrizeVnd | Sixth prize amount in VND |
| SixthPrizeWinners | Number of sixth prize winners |
| SeventhPrizeVnd | Seventh prize amount in VND |
| SeventhPrizeWinners | Number of seventh prize winners |
- Initialization: The application sets up logging, creates necessary directories (
csv/andlogs/), and parses command line arguments - CSV Preparation: Creates CSV files with headers in the
csv/directory if they don't exist - Existing Data Check: Reads existing draw codes from CSV files to avoid duplicates
- Latest Draw Discovery: Fetches the latest available draw code from the Vietlott website
- Gap Identification: Determines which draw codes are missing from local data
- Parallel Scraping: Uses up to 32 concurrent threads to scrape missing data
- Data Export: Saves new data to CSV files in draw code order within the
csv/directory
- Concurrency: Up to 32 parallel requests for optimal performance
- Memory Efficient: Uses streaming for large datasets
- Incremental: Only processes new data, not existing records
- Fault Tolerant: Continues processing even if individual requests fail
The application provides comprehensive logging:
- Console Output: Real-time progress and status updates
- File Logging: Detailed logs saved to
logs/log.txtwith daily rotation - Log Levels: Information, warnings, and errors are properly categorized
- Organized Storage: All log files are automatically stored in the
logs/directory
- Individual request failures don't stop the entire process
- Network timeouts and HTTP errors are logged and skipped
- CSV parsing errors are handled gracefully
- Application continues processing remaining lottery types if one fails
VietlottScraper/
├── Program.cs # Main application logic and orchestration
├── VietlottClient.cs # HTTP client for Vietlott website interaction
├── DrawInfo.cs # Data model for lottery draw information
├── VietlottScraper.csproj # Project configuration and dependencies
├── csv/ # Directory containing generated CSV files
│ ├── 535.csv # Lottery type 535 data
│ ├── 645.csv # Lottery type 645 data
│ └── 655.csv # Lottery type 655 data
└── logs/ # Directory containing application logs
└── log.txt # Application log file (rotated daily)
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is for educational and research purposes. Please respect the terms of service of the Vietlott website when using this scraper.
This tool is designed for personal use and data analysis. Users are responsible for complying with the website's terms of service and applicable laws regarding web scraping.