Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
197 commits
Select commit Hold shift + click to select a range
dc27cd1
first commit
harrygav Mar 9, 2023
2256f85
first commit
harrygav Mar 9, 2023
65cd306
started implementing compression
harrygav Mar 9, 2023
7ed1704
started implementing decompression
harrygav Mar 9, 2023
871f268
Docker mit postgres und xdbc kompiliert
Hyrikan Mar 27, 2023
76e8aa1
Dockerfile, libabsl-dev not found for now
Hyrikan Mar 27, 2023
03f5769
compilation not successful
Hyrikan Mar 27, 2023
58027cf
Ubuntu auf 22.04 angepasst.
Hyrikan Mar 27, 2023
1cbf31a
kompilierfehler
Hyrikan Mar 27, 2023
68c003d
removed postrges, moved docker stuff into separate directory
Hyrikan Apr 11, 2023
da5f96e
added first compression methods for lzo and lz4
Hyrikan Apr 25, 2023
20a3037
temporarily remove compresslibs to allow building
harrygav Apr 26, 2023
aa400c7
add compression libs to dockerfile
harrygav Apr 26, 2023
4003ffb
added 4 compression libs
harrygav Apr 27, 2023
95051f4
added 4 (de)compression libs
harrygav Apr 27, 2023
c035435
minor: made uncompressed data default
harrygav Apr 27, 2023
e4cb575
minor
harrygav Apr 27, 2023
9b29781
minor
harrygav Apr 27, 2023
9299beb
adapted docker to build xdbc-server
harrygav May 3, 2023
3cdbc57
changed to pg conn info hostname
harrygav May 4, 2023
f85f9b0
changed pg conn info & added docker stuff
harrygav May 4, 2023
0dea101
Added CLI parameters
Hyrikan May 10, 2023
e66c2d9
pgserver params in constructor. still not swapped constants with params!
Hyrikan May 10, 2023
2613f1f
add brotli in Dockerfile & compression plot file
harrygav May 11, 2023
3584764
adjusted input params & code formatting
harrygav May 11, 2023
a11a0e8
added script for docker development
harrygav May 11, 2023
3286a51
adjusted server to send compression type in header
harrygav May 12, 2023
c283305
adjusted client to receive compression type in header
harrygav May 12, 2023
de7a5e9
add brotli dependency in dockerfile
harrygav May 12, 2023
a55a568
adjusted experiments script
harrygav May 12, 2023
6287904
adjusted experiment scripts
harrygav May 12, 2023
bc8337f
address hasUnread bug
harrygav May 12, 2023
1ce1f0e
add readme for experiments
harrygav May 12, 2023
e543168
WIP: refactoring client/server
harrygav May 15, 2023
0af3a26
update gitignore
harrygav May 23, 2023
fba1667
refactor
harrygav May 23, 2023
26cf611
refactor
harrygav May 24, 2023
a231b18
Started support for interchangable col/row formats
harrygav May 25, 2023
62fe94c
Support row/col formats
harrygav May 25, 2023
55e5542
minor refactoring
harrygav May 25, 2023
e4f7194
PGReader: avoid temp copy during string deserialization
harrygav May 25, 2023
47851cd
Compression: minor refactoring
harrygav May 25, 2023
8e808fd
Compressor: add support for zlib
harrygav May 26, 2023
7a175c3
add support for zlib
harrygav May 26, 2023
4d5647e
Refactored decompression
harrygav May 26, 2023
a033412
Add dynamic environment variables
harrygav May 26, 2023
144796e
introduce parallelism
harrygav May 28, 2023
657ead1
separate network/read parallelism
harrygav May 28, 2023
5b031ba
Compression: refactored implementation
harrygav May 28, 2023
bdeb810
add clickhouse support
harrygav Jun 9, 2023
aa66363
add table to env
harrygav Jun 9, 2023
818d808
wip dynamic schema
harrygav Jun 9, 2023
fd954d6
refine receive termination
harrygav Jun 10, 2023
492bd5e
Bugfix for remaining tuples
harrygav Jun 12, 2023
54d4694
Added zfp and fastpfor dependencies to docker and cmake
harrygav Jul 11, 2023
a440b83
change local clickouse image in docker compose
harrygav Jul 11, 2023
7125a8f
added support for dynamic schema and more compression libs (zfp,fastp…
harrygav Jul 11, 2023
f872fde
Added zfp and fastpfor dependencies to docker and cmake
harrygav Jul 11, 2023
09360ef
added support for dynamic schema and more compression libs (zfp,fastp…
harrygav Jul 11, 2023
65ffd51
fix in hasunsent()
harrygav Jul 12, 2023
49e27e6
PGReader: decouple read partitions from threads
harrygav Jul 12, 2023
7eea48c
bugfix with transfer status
harrygav Jul 12, 2023
26f20f7
WIP work on partition/parallelism
harrygav Jul 14, 2023
1ff7380
refactored read parallelism & intermediate queue
harrygav Jul 22, 2023
ad34ff1
performance refactoring
harrygav Jul 23, 2023
2074780
Refactoring: replace flagBuffer with queues
harrygav Aug 4, 2023
8c2841d
added CSV Source
harrygav Aug 4, 2023
74855ad
add ports and shared memory size in docker-compose
harrygav Sep 21, 2023
958e802
refactor xclient: Decompressor, utils, added queue for better paralle…
harrygav Sep 21, 2023
3a0f734
added mode in experiment script, and helper scripts for the csv sourc…
harrygav Sep 21, 2023
f6c22a1
refactored xclient, introduced decomp,net,read parallelisms, tester c…
harrygav Sep 25, 2023
b02a206
add experiment scheduler
harrygav Oct 9, 2023
3c41cc7
update gitignore with pycache folder
harrygav Oct 10, 2023
2da159d
modified parallelism in xclient & Tester
harrygav Oct 10, 2023
02ab519
changes in build/run scripts & environment
harrygav Oct 10, 2023
d292d23
latest status
harrygav Oct 27, 2023
3eea44e
inserted uptodate clickhouse image
Hyrikan Nov 10, 2023
9087cdf
introduced compression parallelism #1
harrygav Nov 10, 2023
a1b053d
updated gitignore
harrygav Nov 10, 2023
ee62f04
tidied up scripts & configs
harrygav Nov 10, 2023
2a64d04
added server compression parallelism to experiment scheduler
harrygav Nov 10, 2023
4d5f126
Merge pull request #2 from polydbms/pg_parallelism
harrygav Nov 10, 2023
96f614d
added various plotting files
harrygav Nov 10, 2023
6c48c41
updated requirements file for experiment scheduler
harrygav Nov 10, 2023
a0c7216
added plotting script for xdbcclient log.
Hyrikan Nov 15, 2023
3ff2236
Added runtime monitoring variables to RuntimeEnv & added runtime logg…
harrygav Nov 15, 2023
a8fe830
added transfer id to RuntimeEnv
harrygav Nov 15, 2023
23bef36
added script to measure cpu utilization & adapted run experiments scr…
harrygav Nov 15, 2023
bb7a1c7
Implemented timing for different components in client like in server
Hyrikan Nov 21, 2023
9c97aac
added first plots of eda
Hyrikan Nov 23, 2023
9ed8d28
undid .idea push
Hyrikan Nov 23, 2023
b2fd94c
Moved Joel script into plots folder...
Hyrikan Nov 23, 2023
82432e8
made a new notebook for more visualization.
Hyrikan Nov 28, 2023
1444c70
fixed order of xdbcclient wait times in csv.
Hyrikan Dec 2, 2023
58ab6a3
added little plotting of wait timings of newest run.
Hyrikan Dec 2, 2023
358504c
added directory for experiment findings and experiment case ideas.
Hyrikan Dec 2, 2023
62ae172
added directory for experiment findings and experiment case ideas.
Hyrikan Dec 3, 2023
f6f4fae
changed client read to write, changed client write to different files…
harrygav Dec 4, 2023
a3eabcc
minor fixes
harrygav Jan 28, 2024
e027917
modified/added helper scripts
harrygav Jan 28, 2024
9cc4f08
updated gitignore
harrygav Jan 28, 2024
110ece7
updated docker config
harrygav Jan 28, 2024
f405726
optimizations for csv writing
harrygav Jan 28, 2024
7bd0452
minor/formatting
harrygav Jan 28, 2024
580265e
WIP
harrygav Feb 14, 2024
37f5ebc
WIP
harrygav Feb 14, 2024
152b249
fixed server
harrygav Feb 29, 2024
12373e1
added fix for processing all buffers by introducing feedback loop bet…
harrygav Mar 1, 2024
e478356
extended support for dynamic schema
harrygav Apr 13, 2024
cc46013
extended support for dynamic schema
harrygav Apr 13, 2024
4e00f0c
compute tuple size based on schema & refactored schema
harrygav Apr 18, 2024
086e2f1
compute tuple size based on schema & shutdown sockets correctly
harrygav Apr 18, 2024
5ad7d95
add release flag to docker build
harrygav Apr 26, 2024
5d1d5c8
Fixed socket sync issue & adjusted max attributes
harrygav Apr 26, 2024
7bdd029
added cmake release flag, adjusted container shm size, fixed ssh conn…
harrygav Apr 26, 2024
22d0686
fixed socket sync issues, adjusted max attributes, slightly refactore…
harrygav Apr 26, 2024
8b7a9c7
changed buffer size to kb instead of #tuples, introduced char type
harrygav Apr 26, 2024
68c6a99
changed buffer size to kb instead of #tuples, introduced char type
harrygav Apr 26, 2024
2e3976f
added string support, now receiving schema from client
harrygav Apr 29, 2024
f6f0d97
added string support, now reading schema from json
harrygav Apr 29, 2024
e8b3151
Adjusted bufferpool size to memory instead of number of buffers & adj…
harrygav May 10, 2024
dd05be7
adjusted bufferpool size to memory, send tablename to server, fixed &…
harrygav May 10, 2024
d69fbbd
update cmake
harrygav Jun 24, 2024
85a939f
update dockerfile
harrygav Jun 24, 2024
aaa4b99
update experiments
harrygav Jun 26, 2024
b3decb6
added profiling code & microoptimizations
harrygav Jul 21, 2024
f965429
Profiling & performance improvements
harrygav Jul 27, 2024
9fe1018
refactoring profiling info with timestamps & small perf improvements
harrygav Aug 3, 2024
c3dc036
minor
harrygav Aug 21, 2024
be3fc4f
Added profiling info & optimized CSV Writer
harrygav Jul 19, 2024
dc32479
Fixed issue with libxdbc not found by adding to ld_library_path
harrygav Aug 1, 2024
8893733
refactoring profiling info with timestamps & small perf improvements
harrygav Aug 3, 2024
5929da2
refactored for new docker/compose
harrygav Aug 3, 2024
f9d2f76
minor
harrygav Aug 21, 2024
c4091bc
minor config changes
harrygav Oct 5, 2024
ac70c1d
adjust Dockerfile
harrygav Oct 5, 2024
bcb2030
XCLIENT logger only gets created once and fetched if already exists. …
Hyrikan Oct 2, 2024
b44cf05
added first version of optimizer
harrygav Oct 6, 2024
0c4f183
tidying up
harrygav Oct 18, 2024
060114f
minor
harrygav Oct 18, 2024
d6e1dcf
minor
harrygav Oct 18, 2024
23a1bb9
minor fixes
harrygav Oct 21, 2024
e0c2428
updated readme with build instructions
harrygav Oct 21, 2024
b31f758
minor fix in README
harrygav Oct 31, 2024
d01e3ac
minor fix in README
harrygav Oct 31, 2024
d9a8425
add optimizer to image
harrygav Nov 4, 2024
7bd0b1d
Include only changes of script files for PR to main
midhun-TUB Dec 2, 2024
25b2bdf
unified buffer queue & reroute empty buffers through sender instead o…
harrygav Dec 13, 2024
2aa6913
initial memory management improvement: introduced freeBufferQueue as …
harrygav Dec 13, 2024
eb9fe1c
removed copying in readers
harrygav Dec 23, 2024
9a64d1a
Added Parquet Source
harrygav Dec 28, 2024
56388a7
adjusted profiling timestamps
harrygav Dec 29, 2024
d3ac5a2
decompress buffers without copying
harrygav Dec 21, 2024
1aa71d2
adjusted queue capacities
harrygav Dec 29, 2024
213edc1
adjusted profiling timestamps
harrygav Dec 29, 2024
0633991
added checks for required buffers
harrygav Dec 31, 2024
17b8f4a
adjusted profiling interval
harrygav Dec 31, 2024
f1d62f1
introduced skip deserializer option, added header in all buffer stage…
harrygav Jan 2, 2025
a701266
removed some comments
harrygav Jan 2, 2025
0ef8164
adjusted CSV & PQ sources
harrygav Jan 3, 2025
aea3cbc
restructure deserializers, add Arrow intermediate format
harrygav Jan 5, 2025
6682412
rename free queue load in csv metrics header
harrygav Jan 6, 2025
de8daf5
simplified compressor
harrygav Jan 6, 2025
11542a4
fixed available buffers
harrygav Jan 9, 2025
d6d589c
minor
harrygav Jan 9, 2025
079a8d2
fix PGReader
harrygav Jan 11, 2025
204b844
removed activeReadThreads as not needed anymore
harrygav Jan 11, 2025
70ddaf2
fixed buffersize for columnar format
harrygav Jan 15, 2025
946d061
added uncompressed size
harrygav Jan 28, 2025
c3e55b4
minor here and there
harrygav Jan 28, 2025
89bb0c3
WIP introduce csv sink, and adjusted queue capacity requirements
harrygav Dec 31, 2024
3e205a0
introduced skip serializer option, header in all buffer stages, minor…
harrygav Jan 2, 2025
98005ae
minor fix
harrygav Jan 2, 2025
bf28044
added PQ Sink, Arrow intermediate format, restructured Sink interface
harrygav Jan 5, 2025
9569634
adjust logging
harrygav Jan 5, 2025
27bc7d3
added tid variable to env
harrygav Jan 6, 2025
8ca8c7e
simplify receiver & decompressor
harrygav Jan 6, 2025
fe899fd
cleaned up Dockerfile
harrygav Jan 6, 2025
d5bd529
minor
harrygav Jan 9, 2025
7a36c8d
Changed queue capacity assignment & introduced profiling interval param
harrygav Jan 15, 2025
a7d8a0a
pin arrow/parquet versions
harrygav Jan 28, 2025
9322835
Allow changing host/port
harrygav Jan 28, 2025
91f195b
add parquet/arrow intermediate
harrygav Jan 28, 2025
3d6a8eb
added uncompressed size
harrygav Jan 28, 2025
e1b5cc7
added postgres import statements
harrygav Feb 12, 2025
97a32e0
update experiment runner
harrygav Feb 13, 2025
980ad50
update optimizer
harrygav Feb 25, 2025
d9bd7fc
Client version used in paper with relevant container spawned
midhun-TUB Aug 22, 2025
af8f9a6
Initial commit
midhun-TUB Feb 27, 2026
51c589a
client: merge xdbc-client test/reproduce history under client/
midhun-TUB Feb 27, 2026
bd892ce
server: merge xdbc-server test/reproduce history under server/
midhun-TUB Feb 27, 2026
6609bb3
Prepared docker for unified XDBC. Verified with ss13husllm
midhun-TUB Feb 27, 2026
1b27a38
Prepare framework for unit test
midhun-TUB Feb 27, 2026
ec290a5
Add codes for unit tests
midhun-TUB Feb 27, 2026
c10405f
Minor changes to fix docker after adding unit test framework. Verifie…
midhun-TUB Feb 27, 2026
eb6cf50
Prepare for Pull Request
midhun-TUB Mar 18, 2026
9a85d51
Restore monorepo README as final version
midhun-TUB Mar 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
build/
3 changes: 3 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"makefile.configureOnOpen": false
}
41 changes: 41 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
cmake_minimum_required(VERSION 3.21)

project(xdbc VERSION 0.1 DESCRIPTION "xdbc load")

set(CMAKE_VERBOSE_MAKEFILE ON)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread -g")
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
set(CMAKE_CXX_FLAGS_RELEASE "-O3")

set(THREADS_PREFER_PTHREAD_FLAG ON)
find_package(Threads REQUIRED)
find_package(spdlog REQUIRED)
find_package(ZLIB REQUIRED)
find_package(FastPFOR REQUIRED)
find_package(FPZIP REQUIRED)
find_library(FPZIP_LIBRARY NAMES fpzip)

#TODO: fix hardcoded paths
set(ZSTD_LIBRARY_PATH "/usr/lib/x86_64-linux-gnu/libzstd.so")
set(SNAPPY_LIBRARY_PATH "/usr/lib/x86_64-linux-gnu/libsnappy.so")

include_directories("/zfp/include")
link_directories(/zfp/lib)

include(GNUInstallDirs)

include(FetchContent)
FetchContent_Declare(
googletest
URL https://github.com/google/googletest/archive/03597a01ee50ed33e9dfd640b249b4be3799d395.zip
)
# For Windows: Prevent overriding the parent project's compiler/linker settings
set(gtest_force_shared_crt ON CACHE BOOL "" FORCE)
FetchContent_MakeAvailable(googletest)

enable_testing()

add_subdirectory(client)
add_subdirectory(server)
69 changes: 69 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# postgres server 14 on ubuntu 22.04 image
FROM ubuntu:jammy

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update
RUN apt-get upgrade -qy

#-------------------------------------------- Install XDBC and prerequisites -------------------------------------------
# install arrow/parquet dependencies

RUN apt install -qy ca-certificates lsb-release wget pip

RUN wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb

RUN apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb

RUN apt update && apt install -qy cmake git gdb nlohmann-json3-dev clang libboost-all-dev build-essential libspdlog-dev iproute2 netcat libarrow-dev=18.1.0-1 libparquet-dev=18.1.0-1 libthrift-dev pkg-config

# install compression libs

RUN apt install -qy libzstd-dev liblzo2-dev liblz4-dev libsnappy-dev libbrotli-dev

RUN git clone https://github.com/lemire/FastPFor.git && cd FastPFor && git checkout 039134b && \
mkdir build && \
cd build && \
cmake .. && \
cmake --build . --config Release && \
make install

RUN ln -s /FastPFor /fastpfor

RUN git clone https://github.com/LLNL/fpzip.git && cd fpzip && \
mkdir build && \
cd build && \
cmake .. && \
cmake --build . --config Release && \
make install


# install postgres dependencies
RUN apt install -qy libpq-dev libpqxx-dev

# install clickhouse depencencies
RUN apt-get install -y libabsl-dev
RUN git clone https://github.com/google/cityhash.git
RUN cd /cityhash && ./configure && make all check CXXFLAGS="-g -O3" && make install

# install clickhouse-lib
RUN git clone https://github.com/ClickHouse/clickhouse-cpp.git
RUN cd /clickhouse-cpp && rm -rf build && mkdir build && cd build && cmake .. -DWITH_SYSTEM_ABSEIL=ON && make -j8 && make install

# install webserver for http experiments
RUN pip install rangehttpserver

#------------------------------------------------------------------------

# Copy the entire project context
RUN mkdir /xdbc
COPY . /xdbc/

# build xdbc
RUN rm -rf /xdbc/build && mkdir -p /xdbc/build && cd /xdbc/build && cmake .. -D CMAKE_BUILD_TYPE=Release && make -j8 && make install

RUN rm -rf /xdbc/build/client/Sinks/build && mkdir -p /xdbc/build/client/Sinks/build && cd /xdbc/build/client/Sinks/build && cmake /xdbc/client/Sinks -D CMAKE_BUILD_TYPE=Release && make -j8

RUN ldconfig

ENTRYPOINT ["tail", "-f", "/dev/null"]
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
build_XDBC:
docker build -f Dockerfile -t xdbc-unified:latest .
56 changes: 44 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,49 @@
# XDBC
# XDBC Monorepo : `git filter-repo`

- [XDBC](https://dl.acm.org/doi/10.1145/3725294) is a holistic, high-performance framework for fast and scalable data transfers across heterogeneous data systems (e.g. DBMS to dataframes) aiming to combine the generality of generic solutions with performance of specialized connectors
- It decomposes data transfer into a configurable pipeline (read -> deserialize -> compress -> send/receive -> decompress -> serialize -> write) with pipeline-parallel execution and ring-buffer memory manager for low resource overhead.
- The core of the framework (xdbc-client and xdbc-server) are written in C++ with bindings available for Python and Spark. It includes built-in adapters to connect to PostgreSQL, CSV, Parquet and Pandas.
- The project includes a lightweight heuristic optimizer implemented in Python that automatically tunes the parallelism, buffer sizes, intermediate formats and compression algorithms to the current environment.
This monorepo merges `xdbc-client` and `xdbc-server` (branch: `test/reproduce`) using **`git filter-repo --to-subdirectory-filter`**.

## Structure

## Project Structure
```
XDBC-filter/
client/ ← xdbc-client source
server/ ← xdbc-server source
```
To access the history of client (or server) use git log as given below
```bash
git log -- client/xdbc/xclient.cpp
```

XDBC consists of multiple repositories covering the cross-system functionality. For the reproducibility experiments the following repositories will be cloned and used :
## Running the Project

- [`xdbc-client`](https://github.com/polydbms/xdbc-client) Client-side module, for loading data into the target system.
- [`xdbc-server`](https://github.com/polydbms/xdbc-server) Server-side module, for extracting the data from the source system.
- [`xdbc-python`](https://github.com/polydbms/xdbc-python) Python bindings for loading data into Pandas (through pybind).
- [`xdbc-spark`](https://github.com/polydbms/xdbc-spark) Spark bindings, for loading data into a Spark RDD (through a custom DataSource with JNI).
- [`pg_xdbc_fdw`](https://github.com/polydbms/pg_xdbc_fdw) PostgreSQL Foreign Data Wrapper, for loading data into a table.
To build the combined image and spin up both the **client** and **server** containers :

```bash
# 1. Build the unified image
make

# 2. Start the infrastructure
docker compose up -d
```

This will create two containers (`xdbcserver` and `xdbcclient`) using the same `xdbc-unified:latest` image, mapping their shared `/dev/shm` volumes correctly.

Before running, download and extract the required dataset to `/dev/shm`:

```bash
# Download the ss13husallm dataset (~250 MB compressed, ~1.2 GB extracted)
wget -O ss13husallm.csv.tar.gz "https://tubcloud.tu-berlin.de/s/M3aeptL8R5ekWSD/download?path=%2F&files=ss13husallm.csv.tar.gz"

# Extract to /dev/shm (shared memory, accessible inside containers)
tar --overwrite -xzf ss13husallm.csv.tar.gz -C /dev/shm
```

You can then run commands inside them:

```bash
# Start the server
docker exec -it xdbcserver bash -c "./xdbc-server/build/xdbc-server"

# Run a client command
docker exec -it xdbcclient bash -c "/xdbc-client/Sinks/build/xdbcsinks --server-host=xdbcserver --table ss13husallm -f1 -b 1024 -p 32000 -n1 -w1 -d1 -s1 --skip-serializer=0 --target=csv"
```
125 changes: 125 additions & 0 deletions README_development.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# XDBC Development Guide: Testing Framework

This document outlines the testing strategies used in the XDBC monorepo. It explains the differences between our end-to-end (E2E) Docker-based testing and our component-level Unit Testing framework, and how to use both effectively during development.

---

### Framework Used
We use [Google Test (gtest)](https://google.github.io/googletest/). This is automatically downloaded and configured when you run CMake on the host machine. You do not need to install anything manually aside from a C++ compiler and CMake.

### Test Executables

#### Server Tests (`server/tests/`)

| Executable | Source File | What It Tests |
|---|---|---|
| `xdbc-server-tests` | `test_queue.cpp` | customQueue: FIFO ordering, capacity blocking, concurrency (multi-producer/multi-consumer), sentinel termination pattern |
| `xdbc-compressor-tests` | `test_compressor.cpp` | All 5 compression algorithms (zstd, snappy, lzo, lz4, zlib): compress, round-trip integrity, edge cases, integer column data |
| `xdbc-deserializer-tests` | `test_deserializers.cpp` | CSV deserializer templates: int, double, char, fixed-size string parsing, full CSV row deserialization |
| `xdbc-datasource-tests` | `test_datasource.cpp` | DataSource utilities: JSON schema parsing (`createSchemaFromJsonString`), `getSchemaSize`, `createSchemaAttribute` |

#### Client Tests (`client/Sinks/tests/`)

| Executable | Source File | What It Tests |
|---|---|---|
| `xdbc-client-tests` | `test_sink.cpp` | CSVSink `SerializeAttribute` templates (int, double, char, fixed-string), full tuple serialization, `RuntimeEnv.calculateTupleSize`, `RuntimeEnv.toString` |
| `xdbc-decompressor-tests` | `test_decompressor.cpp` | All 5 decompression algorithms, dispatch method routing, server-to-client round-trip integrity for text/integer/double data |
| `xdbc-pqsink-tests` | `test_pqsink.cpp` | `CreateParquetSchema` for all column types (INT, DOUBLE, STRING, CHAR), mixed schemas, error handling for unsupported types |
| `xdbc-utils-tests` | `test_utils.cpp` | `Utils::compute_checksum`, `boolVectorToString`, `boolVecAsStr`, `slStr` |

### How to Run Unit Tests Locally
on your host machine during active development.

```bash
# 1. Create a build directory
mkdir build && cd build

# 2. Run CMake (this downloads googletest automatically)
cmake ..

# 3. Build all test binaries
make -j$(nproc)

# 4. Run the full test suite via CTest
ctest --output-on-failure
```

To build and run a specific test target:

```bash
# Build only the compressor tests
make xdbc-compressor-tests

# Run it directly
./server/tests/xdbc-compressor-tests

# Or run via ctest with a filter
ctest -R Compressor --output-on-failure
```

### How to Write a New Unit Test

1. Create a `.cpp` file in the appropriate `tests/` directory and use the `TEST()` macro:

```cpp
#include <gtest/gtest.h>
#include "../your_component_header.h"

TEST(YourComponentTest, ExpectedBehaviorName) {
int result = MyComponent::Add(2, 2);
EXPECT_EQ(result, 4); // Assertion
}
```

2. Add your new file to the `CMakeLists.txt` inside that `tests/` directory. Either add it to an existing `add_executable` or create a new test target:

```cmake
add_executable(
your-new-tests
test_your_component.cpp
)

target_link_libraries(
your-new-tests
GTest::gtest_main
pthread
# ... any additional libraries your component needs
)

gtest_discover_tests(your-new-tests)
```

### Test Design Notes

- **Compression/Decompression tests** avoid pulling in the full `xdbcserver.h` dependency chain by calling the compression libraries directly with the same logic as `Compressor.cpp`. This keeps tests fast and dependency-free.
- **Round-trip tests** simulate the server-compress then client-decompress pipeline to verify data integrity across the network boundary.
- **Concurrency tests** in `customQueue` use multiple threads and verify blocking behavior, FIFO ordering, and the sentinel-based termination pattern used throughout the XDBC pipeline.
- **Serializer tests** validate that the templated `SerializeAttribute` functions produce correct CSV output for each data type.

---

## 2. End-to-End Testing (Docker)

**What is it?**
E2E testing spins up the entire XDBC Server, Postgres DB, and XDBC Client simultaneously to perform massive, real-world data transfers (like transferring the `ss13husallm` dataset).

**Why use it?**
- **Accuracy:** Guarantees that all micro-components connect and work reliably under realistic network and database loads.

### How to Run E2E Tests via Docker

Instead of testing isolation, you test the full pipeline.

```bash
# 1. Build the unified image
make

# 2. Start the infrastructure
docker compose up -d

# 3. Start the server daemon inside its container
docker exec -it xdbcserver bash -c "./xdbc-server/build/xdbc-server"

# 4. Run a transfer from the client container
docker exec -it xdbcclient bash -c "/xdbc-client/Sinks/build/xdbcsinks --server-host=xdbcserver --table ss13husallm -f1 -b 1024 -p 32000 -n1 -w1 -d1 -s1 --skip-serializer=0 --target=csv"
```
5 changes: 5 additions & 0 deletions client/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
**/cmake-build-debug
**/cmake-build-release
**/build
**/.git
**/.idea
14 changes: 14 additions & 0 deletions client/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
build/*
cmake-build-*
.idea/*
experiments/local_measurements/*
__pycache__/
**/plots/.ipynb_checkpoints/
**/plots/myvenv
**/experiments/plots/*.pdf
**/experiments/plots/paper_plots/*.pdf
**/experiments/plots/current_analysis/*.csv*
**/plots/*.csv
experiments/experiment_scheduler/measurements/*
experiments/feature_extraction_xdbc/xdbc_experiments_master.csv
experiments/feature_extraction_xdbc/.idea/*
22 changes: 22 additions & 0 deletions client/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Client-only local configurations, dependencies have been moved to root

message(STATUS "Configuring xdbc-client")

add_library(xdbc-client-lib SHARED
xdbc/xclient.cpp
xdbc/Decompression/Decompressor.cpp xdbc/Decompression/Decompressor.h)

set_target_properties(xdbc-client-lib PROPERTIES
#VERSION ${PROJECT_VERSION}
#SOVERSION 1
PUBLIC_HEADER "xdbc/xclient.h;xdbc/SinkInterface.h")

target_include_directories(xdbc-client-lib PUBLIC .)
target_link_libraries(xdbc-client-lib PRIVATE ${ZSTD_LIBRARY_PATH} ${SNAPPY_LIBRARY_PATH} Threads::Threads lzo2 lz4 spdlog::spdlog_header_only ZLIB::ZLIB FastPFOR::FastPFOR ${FPZIP_LIBRARY})
install(TARGETS xdbc-client-lib
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
PUBLIC_HEADER DESTINATION ${CMAKE_INSTALL_INCLUDEDIR})

install(FILES xdbc/customQueue.h xdbc/utils.h xdbc/metrics_calculator.h DESTINATION ${CMAKE_INSTALL_INCLUDEDIR})

add_subdirectory(Sinks)
Loading