GitHub - hkustDB/DuckDBYanPlus

Yannakakis⁺

This repository contains the implementation of Yannakakis⁺, built on top of DuckDB v1.3.0. It provides a customized version of DuckDB. Compared to the original Yannakakis⁺, this version introduces the following key improvements:

Replace semi-join in original Yannakakis algorithm with Bloom Filter.
Apply aggregation push-down in the query plan.
Use GYO algorithm when query is acyclic and fallback to the original DuckDB plan only when the query is cyclic.

Build

You can build this repository in the same way as the original DuckDB. A Makefile wraps the build process. For available build targets and configuration flags, see the DuckDB Build Configuration Guide.

make                   # Build optimized release version
make release           # Same as 'make'
make debug             # Build with debug symbols
GEN=ninja make         # Use Ninja as backend
BUILD_BENCHMARK=1 make # Build with benchmark support

Baselines

DuckDB v1.3.0: https://github.com/duckdb/duckdb/tree/v1.3-ossivalis
RPT (Robust Predicate Transfer): https://github.com/embryo-labs/Robust-Predicate-Transfer
Parachute: https://github.com/utndatasystems/parachute
SYA: https://github.com/UHasselt-DSI-Data-Systems-Lab/code-reproducability-yannakakis-vldb2025
Yannakakis⁺ (rewrite): https://github.com/hkustDB/Quorion

Benchmark

Sub-Graph Pattern Benchmark (SGPB)
LSQB
TPC-H & Decision Support Benchmark (DSB)
Join Order Benchmark (JOB)

Below is the original DuckDB's README.

DuckDB

DuckDB is a high-performance analytical database system. It is designed to be fast, reliable, portable, and easy to use. DuckDB provides a rich SQL dialect, with support far beyond basic SQL. DuckDB supports arbitrary and nested correlated subqueries, window functions, collations, complex types (arrays, structs, maps), and several extensions designed to make SQL easier to use.

DuckDB is available as a standalone CLI application and has clients for Python, R, Java, Wasm, etc., with deep integrations with packages such as pandas and dplyr.

For more information on using DuckDB, please refer to the DuckDB documentation.

Installation

If you want to install DuckDB, please see our installation page for instructions.

Data Import

For CSV files and Parquet files, data import is as simple as referencing the file in the FROM clause:

SELECT * FROM 'myfile.csv';
SELECT * FROM 'myfile.parquet';

Refer to our Data Import section for more information.

SQL Reference

The documentation contains a SQL introduction and reference.

Development

For development, DuckDB requires CMake, Python3 and a C++11 compliant compiler. Run make in the root directory to compile the sources. For development, use make debug to build a non-optimized debug version. You should run make unit and make allunit to verify that your version works properly after making changes. To test performance, you can run BUILD_BENCHMARK=1 BUILD_TPCH=1 make and then perform several standard benchmarks from the root directory by executing ./build/release/benchmark/benchmark_runner. The details of benchmarks are in our Benchmark Guide.

Please also refer to our Build Guide and Contribution Guide.

Support

See the Support Options page.

Name		Name	Last commit message	Last commit date
Latest commit History 57,897 Commits
.github		.github
benchmark		benchmark
bf		bf
data		data
dsb_agg		dsb_agg
dsb_agg_rewrite		dsb_agg_rewrite
dsb_spj		dsb_spj
dsb_spj_rewrite		dsb_spj_rewrite
examples		examples
extension		extension
graph		graph
graph_rewrite		graph_rewrite
job_agg		job_agg
job_original		job_original
log		log
logo		logo
lsqb		lsqb
lsqb_rewrite		lsqb_rewrite
parallel_graph		parallel_graph
parallel_job		parallel_job
parallel_lsqb		parallel_lsqb
scripts		scripts
src		src
test		test
third_party		third_party
tools		tools
tpch		tpch
tpch_full		tpch_full
tpch_rewrite		tpch_rewrite
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.clangd		.clangd
.codecov.yml		.codecov.yml
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.sanitizer-leak-suppressions.txt		.sanitizer-leak-suppressions.txt
.sanitizer-thread-suppressions.txt		.sanitizer-thread-suppressions.txt
CITATION.cff		CITATION.cff
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Doxyfile		Doxyfile
DuckDBConfig.cmake.in		DuckDBConfig.cmake.in
DuckDBConfigVersion.cmake.in		DuckDBConfigVersion.cmake.in
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
auto_run.sh		auto_run.sh
batch_run.sh		batch_run.sh
bitcoin.txt		bitcoin.txt
duckdb_10		duckdb_10
duckdb_12		duckdb_12
duckdb_20		duckdb_20
duckdb_24		duckdb_24
duckdb_4		duckdb_4
duckdb_6		duckdb_6
duckdb_8		duckdb_8
duckdb_RPT		duckdb_RPT
duckdb_YanPlus		duckdb_YanPlus
duckdb_YanPlus_CP		duckdb_YanPlus_CP
duckdb_YanPlus_GYO		duckdb_YanPlus_GYO
duckdb_YanPlus_NoGYO		duckdb_YanPlus_NoGYO
duckdb_YanPlus_PK		duckdb_YanPlus_PK
duckdb_YanPlus_primitive		duckdb_YanPlus_primitive
duckdb_origin		duckdb_origin
duckdb_origin_opt		duckdb_origin_opt
monitor_memory.sh		monitor_memory.sh
rewrite_run.sh		rewrite_run.sh
summarize_time.sh		summarize_time.sh
timing_summary.csv		timing_summary.csv
timing_summary.txt		timing_summary.txt
transform_job_agg.sh		transform_job_agg.sh
transform_job_distinct.sh		transform_job_distinct.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Yannakakis⁺

Build

Baselines

Benchmark

DuckDB

Installation

Data Import

SQL Reference

Development

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Yannakakis+

Build

Baselines

Benchmark

DuckDB

Installation

Data Import

SQL Reference

Development

Support

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Yannakakis⁺

Packages