Skip to content

Architecture Overview

iamvirul edited this page Mar 21, 2026 · 1 revision

Architecture Overview

Layer Map

DeepDiff DB is organized into five clean layers with strict dependency direction — outer layers depend on inner layers, never the reverse.

┌─────────────────────────────────────────────────────────┐
│                   CLI Layer                             │
│              cmd/deepdiffdb/main.go                     │
│   Commands: check, schema-diff, diff, gen-pack, apply   │
└───────────────────────┬─────────────────────────────────┘
                        │
┌───────────────────────▼─────────────────────────────────┐
│                 Config Layer                            │
│                 pkg/config/                             │
│   YAML loading, validation, defaults                    │
└──────────┬────────────────────────────┬─────────────────┘
           │                            │
┌──────────▼──────────┐   ┌─────────────▼───────────────┐
│   Schema Layer      │   │     Content Layer           │
│  internal/schema/   │   │    internal/content/        │
│  Introspect, diff,  │   │  Hash, pack, apply, ignore  │
│  migrate, order     │   │  resolve/                   │
└──────────┬──────────┘   └─────────────┬───────────────┘
           │                            │
┌──────────▼────────────────────────────▼───────────────┐
│                  Driver Layer                         │
│               internal/drivers/                       │
│   Open connections, build DSNs, retry logic           │
└───────────────────────────────────────────────────────┘

Supporting packages (used by all layers):
  pkg/logger/      — Structured logging (slog-based, JSON/text)
  pkg/errors/      — Typed errors with codes, context, suggestions, retry
  pkg/progress/    — Progress bars and spinners
  internal/checkpoint/ — State persistence for resume
  internal/report/html/ — HTML report generation

Module Structure

deepdiff-db/
├── cmd/deepdiffdb/
│   └── main.go                      # CLI entry point, command dispatch
│
├── internal/
│   ├── schema/
│   │   ├── model.go                 # Column, Index, ForeignKey, Table, Schema types
│   │   ├── introspect.go            # LoadSchema — driver-specific introspection
│   │   ├── diff.go                  # DiffSchemas, TableDiff, DiffResult
│   │   ├── migrate.go               # GenerateMigration, MigrationOptions
│   │   ├── ordering.go              # Topological sort for FK-safe operation ordering
│   │   ├── report.go                # WriteReports (JSON + text)
│   │   └── primary_keys.go          # CheckPrimaryKeys
│   │
│   ├── content/
│   │   ├── diff.go                  # TableDataDiff, DataDiff, Conflicts types
│   │   ├── hash.go                  # HashTable — keyset pagination + full load
│   │   ├── cursor.go                # BuildCursorQuery — driver-specific pagination SQL
│   │   ├── pack.go                  # GeneratePack — builds migration_pack.sql
│   │   ├── apply.go                 # ApplyPack — executes pack transactionally
│   │   ├── ignore.go                # IgnoreMatcher — glob/exact column ignoring
│   │   ├── report.go                # WriteReports (JSON + text)
│   │   └── resolve/
│   │       ├── resolve.go           # Strategy, Decision, ApplyStrategy, Conflicts
│   │       ├── fetch.go             # FetchConflictRows, CompareRows, FormatValue
│   │       └── persistence.go       # Save/load resolutions to disk
│   │
│   ├── checkpoint/
│   │   ├── checkpoint.go            # Manager — save/load/delete/update
│   │   ├── state.go                 # State, HashTableState, GeneratePackState, ApplyPackState
│   │   └── resume.go                # Resume helpers
│   │
│   ├── drivers/
│   │   ├── drivers.go               # Open — connection + retry + pool config
│   │   └── imports.go               # Driver side-effect imports (mysql, pgx, sqlite, etc.)
│   │
│   ├── cli/
│   │   └── prompt.go                # Interactive prompts for resolve-conflicts
│   │
│   └── report/html/
│       ├── types.go                 # ReportData, ReportSummary, display types
│       ├── generator.go             # GenerateReport
│       └── template.go              # Embedded HTML template
│
├── pkg/
│   ├── config/
│   │   └── config.go                # Config struct, Load, Validate, defaults
│   ├── logger/
│   │   ├── logger.go                # Logger, New, Debug/Info/Warn/Error
│   │   ├── context.go               # ToContext, FromContext
│   │   └── fields.go                # Field name constants
│   ├── progress/
│   │   ├── manager.go               # Manager, Bar, Spinner
│   │   ├── metrics.go               # Throughput + ETA tracking
│   │   └── context.go               # ToContext, FromContext
│   └── errors/
│       ├── errors.go                # Error type, New, Wrap, With, Suggestions
│       ├── codes.go                 # ErrorCode enum
│       ├── suggestions.go           # Actionable suggestion generation
│       └── retry.go                 # Retry with exponential backoff + jitter
│
└── tests/
    ├── config/                      # Config unit tests
    ├── content/                     # Content unit tests
    ├── checkpoint/                  # Checkpoint unit tests
    ├── drivers/                     # Driver unit tests
    ├── schema/                      # Schema unit tests (SQLite-based)
    ├── html/                        # HTML report unit tests
    ├── resolve/                     # Resolve unit tests
    ├── errors/                      # Error/retry unit tests
    └── integration_test.go          # Full workflow integration tests

Key Data Types

Schema Layer

type Column struct {
    Name         string
    DataType     string
    IsNullable   bool
    DefaultValue *string
}

type Index struct {
    Name     string
    Columns  []string
    IsUnique bool
}

type ForeignKey struct {
    Name              string
    ReferencedTable   string
    Columns           []string
    ReferencedColumns []string
    OnDelete, OnUpdate string
}

type Table struct {
    Name        string
    Columns     map[string]Column
    PrimaryKey  []string
    Indexes     map[string]Index
    ForeignKeys map[string]ForeignKey
}

type Schema struct {
    Tables map[string]Table
}

Content Layer

// Row hashes: map[compositePKString]sha256Hash
type TableHashes map[string]string

type TableDataDiff struct {
    Table   string
    Added   []string  // PK keys only in dev
    Removed []string  // PK keys only in prod
    Updated []string  // Same key, different hash
}

type Conflict struct {
    Table, Key, ProdHash, DevHash string
}

Context Propagation

All packages propagate shared state via context.Context — nothing is global:

ctx = logger.ToContext(ctx, log)           // structured logger
ctx = progress.ToContext(ctx, progressMgr) // progress bars
ctx = checkpoint.ToContext(ctx, ckptMgr)   // checkpoint manager

Every function extracts what it needs:

log := logger.FromContext(ctx)
mgr := checkpoint.FromContext(ctx)

Error Handling

All errors are typed with a machine-readable ErrorCode:

type Error struct {
    Code        ErrorCode
    Message     string
    Cause       error
    Context     map[string]any
    Suggestions []string
}

Errors are wrapped at each layer boundary, adding context as they propagate up:

return errors.Wrap(err, errors.ErrHashingFailed, "failed to hash table").
    With("table", tableName).
    With("batch", batchNum).
    WithSuggestion("Check that the table has a primary key")

Design Decisions

Decision Rationale
SHA-256 row hashing Deterministic, content-addressable comparison without full data transfer
Keyset pagination O(batchSize) memory regardless of table size; cursor stability across pages
Single transaction for apply All-or-nothing guarantees; never leave prod in partial state
Checkpoint atomic write (temp + rename) Prevents corrupt state file if process is killed mid-write
Destructive ops commented out by default Production safety; operator must explicitly opt in
Config hash in checkpoint Prevents resuming with a different config than what started the operation
Context-carried logger/progress No global state; testable; per-request context isolation

Clone this wiki locally