ViPERSQL

Vietnamese/English Text-to-SQL System with ViR2 Example Selection Method

A research system for converting natural language questions to SQL queries, featuring ViR2 - a novel two-stage example selection method combining semantic retrieval with syntactic matching and diversity optimization.

🎯 Overview

ViPERSQL addresses the challenge of selecting optimal few-shot examples for Text-to-SQL tasks through:

ViR2 Method: Two-stage selection (PhoBERT retrieval → POS-based re-ranking with diversity)
Multi-Language Support: Vietnamese (PhoBERT + underthesea) and English (BERT + spaCy)
Enhanced Evaluation: Component-wise F1 metrics beyond Exact Match
Modular Architecture: Extensible framework with multiple strategies and selectors

Key Innovation:

$$\text{Score}(E, q) = \text{POS}_{\text{Score}}(E, q) + \lambda \cdot \text{Diversity}(E)$$

where $\lambda = 0.3$ balances syntactic similarity and example diversity.

📚 Documentation

Core Concepts

Architecture - System design and module organization
ViR2 Method - Two-stage selection algorithm
Strategies - Zero-shot, Few-shot, Chain-of-Thought
Selectors - Random, DICL, ASTRES, Skill-KNN, ViR2
Evaluation - Component-wise metrics and analysis

Usage Guides

Quick Start - Get started in 5 minutes
Configuration - All parameters and settings
Usage Examples - Common scenarios
Multi-Language - Vietnamese and English support

Advanced

Ablation Studies - Testing ViR2 components
Extending System - Add new strategies/selectors
API Reference - Complete API documentation

⚡ Quick Start

Installation

git clone https://github.com/hoadm-net/ViPERSQL.git
cd ViPERSQL
pip install -r requirements.txt

Configuration

cp .env.example .env
# Edit .env with your API keys:
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...

Basic Usage

# Zero-shot (baseline)
python vipersql.py --samples 10

# Few-shot with ViR2 (recommended)
python vipersql.py --strategy few-shot --example-selection-strategy vir2 --samples 10

# Chain-of-thought reasoning
python vipersql.py --strategy cot --samples 10

See Usage Examples for more scenarios.

🏗️ System Architecture

Input Question
      ↓
┌─────────────┐
│  Strategy   │  Zero-shot / Few-shot / CoT
└─────┬───────┘
      ↓
┌─────────────┐
│  Selector   │  Random / DICL / ASTRES / Skill-KNN / ViR2
└─────┬───────┘  (if Few-shot)
      ↓
┌─────────────┐
│ LLM Interface│  OpenAI GPT / Anthropic Claude
└─────┬───────┘
      ↓
┌─────────────┐
│  Evaluator  │  Component F1 + Error Analysis
└─────┬───────┘
      ↓
  SQL Query + Metrics

See Architecture for details.

🎓 Research Contributions

ViR2 Method: Novel two-stage example selection combining semantic + syntactic + diversity
Multi-Language Framework: Unified architecture for Vietnamese and English
Enhanced Metrics: Component-wise evaluation beyond Exact Match
Ablation Framework: Systematic testing of individual components

📊 Supported Methods

Method	Type	Speed	Complexity	Notes
Zero-shot	Baseline	⚡⚡⚡	Low	No training examples
Random	Few-shot	⚡⚡⚡	Low	Random selection baseline
DICL	Few-shot	⚡⚡	Medium	Semantic similarity only
ASTRES	Few-shot	⚡	High	AST-based structural matching
Skill-KNN	Few-shot	⚡⚡	Medium	SQL skill extraction + matching
ViR2	Few-shot	⚡⚡	Medium	Two-stage: Semantic + POS + Diversity
CoT	Reasoning	⚡	High	Step-by-step reasoning

📁 Project Structure

ViPERSQL/
├── vipersql.py              # Main CLI entry point
├── requirements.txt         # Dependencies
├── .env.example            # Environment template
├── docs/                   # Documentation
│   ├── ARCHITECTURE.md
│   ├── VIR2_METHOD.md
│   ├── STRATEGIES.md
│   └── ...
├── mint/                   # Core package
│   ├── core/              # Evaluator, LLM, Templates
│   ├── strategies/        # Zero-shot, Few-shot, CoT
│   ├── selectors/         # Random, DICL, ASTRES, ViR2
│   ├── metrics/           # Enhanced metrics
│   └── utils/             # Utilities
├── dataset/               # ViText2SQL dataset
├── templates/             # Prompt templates
├── scripts/               # Preprocessing scripts
└── results/               # Evaluation outputs

🛠️ Configuration

All settings configurable via .env or command-line:

# Model selection
--model gpt-4o              # or claude-3-5-sonnet-20241022

# Strategy selection  
--strategy few-shot         # or zero-shot, cot

# Selector for few-shot
--example-selection-strategy vir2  # or random, dicl, astres, skill_knn

# ViR2 parameters
--vir2-candidate-pool-size 50      # Stage 1 pool size (M)
--vir2-beam-size 5                 # Beam search width (B)
--vir2-diversity-weight 0.3        # Diversity weight (λ)

# Dataset options
--level std                 # or syllable, word
--split dev                 # or test
--samples 100               # Number of samples

See Configuration Guide for all options.

🔬 Example: Running ViR2

# Basic ViR2 with default parameters (M=50, B=5, λ=0.3)
python vipersql.py \
  --strategy few-shot \
  --example-selection-strategy vir2 \
  --samples 100

# Custom ViR2 parameters
python vipersql.py \
  --strategy few-shot \
  --example-selection-strategy vir2 \
  --vir2-candidate-pool-size 100 \
  --vir2-beam-size 10 \
  --vir2-diversity-weight 0.5 \
  --samples 100

# Ablation: ViR2 without POS matching
python vipersql.py \
  --strategy few-shot \
  --example-selection-strategy vir2-no-pos \
  --samples 100

📄 License

MIT License - See LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ViPERSQL

🎯 Overview

📚 Documentation

Core Concepts

Usage Guides

Advanced

⚡ Quick Start

Installation

Configuration

Basic Usage

🏗️ System Architecture

🎓 Research Contributions

📊 Supported Methods

📁 Project Structure

🛠️ Configuration

🔬 Example: Running ViR2

📄 License

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
dataset		dataset
docs		docs
mint		mint
scripts		scripts
templates		templates
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bird_en_vir2_fewshot.py		bird_en_vir2_fewshot.py
bird_vi_random_fewshot.py		bird_vi_random_fewshot.py
bird_vi_vir2_fewshot.py		bird_vi_vir2_fewshot.py
requirements.txt		requirements.txt
vipersql.py		vipersql.py

Folders and files

Latest commit

History

Repository files navigation

ViPERSQL

🎯 Overview

📚 Documentation

Core Concepts

Usage Guides

Advanced

⚡ Quick Start

Installation

Configuration

Basic Usage

🏗️ System Architecture

🎓 Research Contributions

📊 Supported Methods

📁 Project Structure

🛠️ Configuration

🔬 Example: Running ViR2

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages