DPSQL is a system designed for answering SQL queries while satisfying differential privacy guarantees.
The file and directory structure of the project is organized as follows:
project
├── config/ # Configuration files required for the system
├── docs/ # Reference information and documentation
├── Profile/ # Profile information/licenses (e.g., mosek.lic)
├── src/ # Main source code files
│ └── algorithm/ # Core algorithms integrated into the system (e.g., FastSJA, OptSJA)
├── Test/ # Queries used in system experiments (TPCH, Graph)
└── Sample/ # Scripts for database setup and collecting experiment results
- PostgreSQL: Database engine.
- Python3: Ensure version 3.0 or higher.
- Mosek: License file must be placed in
./Profile. - CPLEX (Full Edition): Required for large datasets. Note: Do not rely on
pip install cplexalone, as it has a 1,000-variable limit.
Install the required Python packages using the provided requirements.txt file:
pip install -r requirements.txtThe user running the system must have read permissions for the target database schema.
The main entry point for the system is main.py.
| Parameter | Description |
|---|---|
--d |
Path to the database initialization file |
--q |
Path to the query file |
--r |
Path to the private relation file |
--c |
Path to the configuration file |
--o |
Path to the output file |
--debug |
Enable debug mode for more detailed logging |
--optimal |
Use the optimal algorithm for SJA queries |
Use python main.py --h to view complete help instructions.
Documentation Links:
Example Run:
python main.py --d ./config/database.ini --q ./test.txt --r ./test_relation.txt --c ./config/parameter.config --o out.txtFollow these steps to set up the data and collect experiment results:
- Install Dependencies: Ensure tools and Python requirements are installed as per the Prerequisites.
- Database Setup: Create an empty database in PostgreSQL.
- Data Generation: Generate
.tbldata files usingdbgenfrom the TPC-H website, and store them in./Sample/data/TPCH. - Database Initialization: Run the setup script provided in
./Sample/setupDBTPCH.py:python Sample/setupDBTPCH.py --db <databasename>
- Run Collection Script:
cd Sample python collectResult.py - View Results: The output will be available in
./Sample/result/TPCH.
DPSQL automatically rewrites and unnests subqueries to standard relational joins to ensure differential privacy mechanisms can be seamlessly applied. Through a custom Abstract Syntax Tree (AST) visitor (UnnestSubqueries in src/parser.py) built using pglast, the system traverses the AST and flattens nested IN, ANY, and EXISTS subqueries found in the WHERE clause into standard multi-table joins, while automatically preserving and linking the original filtering conditions.
- Support for distinct count queries (projection).
- Develop a User Interface (UI).
- Improve overall user experience.
- General performance optimization.