Skip to content

Design: Repo Data Structures III

Eric Hanson edited this page Nov 16, 2024 · 2 revisions

Here is a general overview of the central decision point of pg_delta, its data structures.

Data Structures

Delta needs to represent the following as data structures:

Commit

  • rows
    • ordered in checkout order (reverse delete order)
    • unique
  • fields
    • column/value set for each row

Repository

  • tracked_rows_added - unique set of row_ids
  • stage_rows_to_add - unique set of row_ids
  • stage_rows_to_remove - unique set of row_ids
  • stage_fields_to_change - unique set of column_name: value pairs associated with a row_id

Logical Operations

A conceptual overview of the logical operations that need to occur (quickly):

  • commit(repo)
    • apply the stage to the previous commit and save the snapshot to the db
    • row_ids = (parent_commit.rows + stage_rows_to_add) - stage_rows_to_remove
    • fields (row_id, column_name, value) = (parent_commit.fields + stage_rows_to_add.fields) - stage_rows_to_remove.fields
  • checkout(commit)
    • upsert commit rows join commit fields into live db
  • track_untracked_rows(repo, relation_id)
  • stage_tracked_rows(repo)
  • stage_removed_rows(repo)
  • stage_updated_fields(repo)
  • stage_changed_fields(repo)

Approaches

  • hyper-normalized - tables/rows for every row, field in both stage and commit (the old bundle way)
  • hyper-jsonb - commit has only a jsonb manifest which contains everything in the commit. repository has stage and track jsonb objects.
  • array
  • hstore
  • others?

Clone this wiki locally