hkustDB · psychlone77 · Mar 18, 2026 · Mar 18, 2026 · May 9, 2026 · May 9, 2026
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,3 @@
+config/database.ini
+__pycache__/
+.venv/
diff --git a/README.md b/README.md
@@ -1,119 +1,93 @@
-# Differential Privacy over SQL
-
-## Table of Contents
-* [About the Project](#about-the-project)  
-* [Prerequisites](#prerequisites)
-    * [Tools](#tools)
-    * [Python Dependency](#python-dependency)
-    * [Database Permission](#database-permission)
-* [system structure](#system-structure)
-* [Demo System](#demo-system)
-* [Instruction for Collecting Result](#collect-result)
-* [Future Plan](#future-plan)
+# Differential Privacy over SQL (DPSQL)
+
+DPSQL is a system designed for answering SQL queries while satisfying differential privacy guarantees.
 
 ## About The Project
-Differential Privacy over SQL (DPSQL) is a system for answering queries over differential privacy.
 
-The file structure is as below
-```
+The file and directory structure of the project is organized as follows:
+
+```text
 project
-│   
-└───config
-└───docs
-└───Profile
-└───src
-│   └───algorithm
-└───Test
-│   └───TPCH
-│   └───Graph
-└───Sample
+├── config/        # Configuration files required for the system
+├── docs/          # Reference information and documentation
+├── Profile/       # Profile information/licenses (e.g., mosek.lic)
+├── src/           # Main source code files
+│   └── algorithm/ # Core algorithms integrated into the system (e.g., FastSJA, OptSJA)
+├── Test/          # Queries used in system experiments (TPCH, Graph)
+└── Sample/        # Scripts for database setup and collecting experiment results
 ```
-`./config` stores the configuration files users need for the system.
 
-`./docs` stores the reference information users need to work with DPSQL:
+## Prerequisites
 
-`./Profile` stores the Profile information for using `mosek` in the system.
+### Tools
+* **[PostgreSQL](https://www.postgresql.org/)**: Database engine.
+* **[Python3](https://www.python.org/download/releases/3.0/)**: Ensure version 3.0 or higher.
+* **[Mosek](https://www.mosek.com/downloads/)**: License file must be placed in `./Profile`.
+* **CPLEX (Full Edition)**: Required for large datasets. Note: Do not rely on `pip install cplex` alone, as it has a 1,000-variable limit.
+  * [Detailed CPLEX Installation & Python Linking Guide](docs/cplex_setup.md)
 
-`./src` stores main source files.
-* `./src/algorithm` stores 3 algorithm we integrated into this system.
+### Python Dependencies
 
-`./Test` stores the queries used in the experiments of the system.
+Install the required Python packages using the provided `requirements.txt` file:
 
-`./Sample` stores the script for setting up database and collecting experiment results.
+```bash
+pip install -r requirements.txt
+```
 
+### Database Permissions
+The user running the system must have read permissions for the target database schema.
 
-## Prerequisites
-### Tools
-Before running this project, please install below tools
-* [PostgreSQL](https://www.postgresql.org/)
-* [Python3](https://www.python.org/download/releases/3.0/)
-* [Cplex](https://www.ibm.com/analytics/cplex-optimizer)
-* [Mosek](https://www.mosek.com/downloads/) and the licence is under `./Profile`.
-
-Please do not install `Cplex` dependency, which can only handle a small dataset, but download the `Cplex API` and import that to python with this [instruction](https://www.ibm.com/docs/zh/icos/12.9.0?topic=cplex-setting-up-python-api).
-(We are aware that this link is expired and are working on a substitute solution.)
-
-### Python Dependency
-Here are dependencies used in python programs:
-* `matplotlib`
-* `numpy`
-* `sys`
-* `os`
-* `collections`
-* `configparser`
-* `math`
-* `psycopg2`
-* `pglast`v4.4
-* `argparser`
-
-### Database permission
-The user should have the permission to read the schema of the database to use this system.
-
-## System structure
-TODO
-
-## Demo System
-
-To run the system,  run `main.py`. There are seven parameters
- - `--d`: path to database initialization file;
- - `--q`: path to query file;
- - `--r`: path to private relation file;
- - `--c`: path to the configuration file; 
- - `--o`: path to the output file;
- - `--debug`: debug mode for more information;
- - `--optimal`: choose to use optimal algorithm for SJA queries;
-
-One can use `--h` to get help for parameter instruction.
-
-For more information about input file, users can consult [here](./docs/system-input.md)
-
-For the SQL syntax used in this system, users can consult [here](./docs/query-syntax.md)
-
-Example:
-```
+## Usage / Demo System
+
+The main entry point for the system is `main.py`.
+
+### Command-Line Arguments
+| Parameter | Description |
+| :--- | :--- |
+| `--d` | Path to the database initialization file |
+| `--q` | Path to the query file |
+| `--r` | Path to the private relation file |
+| `--c` | Path to the configuration file |
+| `--o` | Path to the output file |
+| `--debug` | Enable debug mode for more detailed logging |
+| `--optimal` | Use the optimal algorithm for SJA queries |
+
+*Use `python main.py --h` to view complete help instructions.*
+
+**Documentation Links:**
+*   [Input File Configuration](./docs/system-input.md)
+*   [Supported SQL Syntax](./docs/query-syntax.md)
+
+**Example Run:**
+```bash
 python main.py --d ./config/database.ini --q ./test.txt --r ./test_relation.txt --c ./config/parameter.config --o out.txt
 ```
 
-## collect result
+## Collecting Results
 
-1. install the dependency
+Follow these steps to set up the data and collect experiment results:
 
-2. create an empty database in `PosgreSQL`
-3. generate `tbl` data files by using dbgen from [TPCH website](https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp)
-and store them in `/Sample/data/TPCH`
-4. run script we provide in `/Sample/setupDBTPCH.py`
-``` 
-python setupDBTPCH.py --db databasename
-```
-5. run script we provide in `/Sample/collectResult.py`
-```commandline
-python collectResult.py
-```
-6. find the result in `/Sample/result/TPCH`
+1. **Install Dependencies**: Ensure tools and Python requirements are installed as per the [Prerequisites](#prerequisites).
+2. **Database Setup**: Create an empty database in PostgreSQL.
+3. **Data Generation**: Generate `.tbl` data files using `dbgen` from the [TPC-H website](https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp), and store them in `./Sample/data/TPCH`.
+4. **Database Initialization**: Run the setup script provided in `./Sample/setupDBTPCH.py`:
+   ```bash
+   python Sample/setupDBTPCH.py --db <databasename>
+   ```
+5. **Run Collection Script**:
+   ```bash
+   cd Sample
+   python collectResult.py
+   ```
+6. **View Results**: The output will be available in `./Sample/result/TPCH`.
+
+## Query Rewriting & Subquery Unnesting
+
+DPSQL automatically rewrites and unnests subqueries to standard relational joins to ensure differential privacy mechanisms can be seamlessly applied. Through a custom Abstract Syntax Tree (AST) visitor (`UnnestSubqueries` in `src/parser.py`) built using `pglast`, the system traverses the AST and flattens nested `IN`, `ANY`, and `EXISTS` subqueries found in the `WHERE` clause into standard multi-table joins, while automatically preserving and linking the original filtering conditions.
 
-## Future Plan
+## Future Plans
 
-- Distinct count queries type (projection);
-- User Interface
-- Better user experience;
-- Optimization;
+* Support for distinct count queries (projection).
+* Develop a User Interface (UI).
+* Improve overall user experience.
+* General performance optimization.