Design: Repo Data Structures III

Jump to bottom

Eric Hanson edited this page Nov 16, 2024 · 2 revisions

Here is a general overview of the central decision point of pg_delta, its data structures.

Data Structures

Delta needs to represent the following as data structures:

Commit

rows
- ordered in checkout order (reverse delete order)
- unique
fields
- column/value set for each row

Repository

tracked_rows_added - unique set of row_ids
stage_rows_to_add - unique set of row_ids
stage_rows_to_remove - unique set of row_ids
stage_fields_to_change - unique set of column_name: value pairs associated with a row_id

Logical Operations

A conceptual overview of the logical operations that need to occur (quickly):

commit(repo)
- apply the stage to the previous commit and save the snapshot to the db
- row_ids = (parent_commit.rows + stage_rows_to_add) - stage_rows_to_remove
- fields (row_id, column_name, value) = (parent_commit.fields + stage_rows_to_add.fields) - stage_rows_to_remove.fields
checkout(commit)
- upsert commit rows join commit fields into live db
track_untracked_rows(repo, relation_id)
stage_tracked_rows(repo)
stage_removed_rows(repo)
stage_updated_fields(repo)
stage_changed_fields(repo)

Approaches

hyper-normalized - tables/rows for every row, field in both stage and commit (the old bundle way)
hyper-jsonb - commit has only a jsonb manifest which contains everything in the commit. repository has stage and track jsonb objects.
array
hstore
others?