Skip to content

nsknet/VietlottScraper

Repository files navigation

VietlottScraper

A high-performance C# console application for scraping Vietnamese lottery (Vietlott) draw results and exporting them to CSV format. This tool efficiently collects historical lottery data from the official Vietlott website with parallel processing capabilities.

Features

  • Multi-lottery Support: Scrapes data for three lottery types (535, 645, 655)
  • Parallel Processing: Uses up to 32 concurrent threads for fast data collection
  • Incremental Updates: Only scrapes new draw codes, avoiding duplicate data
  • CSV Export: Exports data to organized CSV files for each lottery type
  • Comprehensive Logging: Detailed logging with Serilog to console and file
  • Error Handling: Robust error handling with retry mechanisms
  • Command Line Options: Configurable scraping limits via command line arguments

Prerequisites

  • .NET 8.0 or later
  • Internet connection to access Vietlott website

Dependencies

The project uses the following NuGet packages:

  • CsvHelper (v33.1.0) - CSV file reading and writing
  • HtmlAgilityPack (v1.12.4) - HTML parsing and web scraping
  • RestSharp (v112.1.0) - HTTP client for API requests
  • Serilog (v4.3.0) - Structured logging framework
  • Serilog.Sinks.Console (v6.0.0) - Console logging output
  • Serilog.Sinks.File (v7.0.0) - File logging output

Installation

  1. Clone the repository:
git clone https://github.com/nsknet/VietlottScraper
cd VietlottScraper
  1. Restore dependencies:
dotnet restore
  1. Build the project:
dotnet build

Usage

Basic Usage

Run the application without any parameters to scrape all available lottery data:

dotnet run

Command Line Options

Limit Total Records

Use the total parameter to limit the number of records to scrape:

dotnet run --total 100

This will scrape only the first 100 missing draw codes for each lottery type.

Output Files

The application generates the following organized file structure:

VietlottScraper/
├── csv/
│   ├── 535.csv     # Draw results for lottery type 535
│   ├── 645.csv     # Draw results for lottery type 645
│   └── 655.csv     # Draw results for lottery type 655
└── logs/
    └── log.txt     # Application logs (rotated daily)
  • csv/ directory contains all lottery data files
  • logs/ directory contains application log files

CSV Data Structure

Each CSV file contains the following columns:

Column Description
DrawCode Sequential draw number
LotteryType Type of lottery (535, 645, or 655)
DrawDate Date of the draw
WinningNumbers The winning lottery numbers
FirstPrizeVnd First prize amount in VND
FirstPrizeWinners Number of first prize winners
SecondPrizeVnd Second prize amount in VND
SecondPrizeWinners Number of second prize winners
ThirdPrizeVnd Third prize amount in VND
ThirdPrizeWinners Number of third prize winners
FourthPrizeVnd Fourth prize amount in VND
FourthPrizeWinners Number of fourth prize winners
FifthPrizeVnd Fifth prize amount in VND
FifthPrizeWinners Number of fifth prize winners
SixthPrizeVnd Sixth prize amount in VND
SixthPrizeWinners Number of sixth prize winners
SeventhPrizeVnd Seventh prize amount in VND
SeventhPrizeWinners Number of seventh prize winners

How It Works

  1. Initialization: The application sets up logging, creates necessary directories (csv/ and logs/), and parses command line arguments
  2. CSV Preparation: Creates CSV files with headers in the csv/ directory if they don't exist
  3. Existing Data Check: Reads existing draw codes from CSV files to avoid duplicates
  4. Latest Draw Discovery: Fetches the latest available draw code from the Vietlott website
  5. Gap Identification: Determines which draw codes are missing from local data
  6. Parallel Scraping: Uses up to 32 concurrent threads to scrape missing data
  7. Data Export: Saves new data to CSV files in draw code order within the csv/ directory

Performance

  • Concurrency: Up to 32 parallel requests for optimal performance
  • Memory Efficient: Uses streaming for large datasets
  • Incremental: Only processes new data, not existing records
  • Fault Tolerant: Continues processing even if individual requests fail

Logging

The application provides comprehensive logging:

  • Console Output: Real-time progress and status updates
  • File Logging: Detailed logs saved to logs/log.txt with daily rotation
  • Log Levels: Information, warnings, and errors are properly categorized
  • Organized Storage: All log files are automatically stored in the logs/ directory

Error Handling

  • Individual request failures don't stop the entire process
  • Network timeouts and HTTP errors are logged and skipped
  • CSV parsing errors are handled gracefully
  • Application continues processing remaining lottery types if one fails

Project Structure

VietlottScraper/
├── Program.cs              # Main application logic and orchestration
├── VietlottClient.cs       # HTTP client for Vietlott website interaction
├── DrawInfo.cs             # Data model for lottery draw information
├── VietlottScraper.csproj  # Project configuration and dependencies
├── csv/                    # Directory containing generated CSV files
│   ├── 535.csv            # Lottery type 535 data
│   ├── 645.csv            # Lottery type 645 data
│   └── 655.csv            # Lottery type 655 data
└── logs/                   # Directory containing application logs
    └── log.txt            # Application log file (rotated daily)

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

This project is for educational and research purposes. Please respect the terms of service of the Vietlott website when using this scraper.

Disclaimer

This tool is designed for personal use and data analysis. Users are responsible for complying with the website's terms of service and applicable laws regarding web scraping.

About

A high-performance C# console application for scraping Vietnamese lottery (Vietlott) draw results and exporting them to CSV format

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages