Skip to content

Latest commit

 

History

History
1051 lines (830 loc) · 37.3 KB

File metadata and controls

1051 lines (830 loc) · 37.3 KB

Patchwork Architecture

Patchwork is an agent-native version control system designed for concurrent, multi-agent development workflows. Unlike traditional VCS implementations built around human-centric branching models, Patchwork models all mutations as an append-only operation log, materializes file trees on demand through composable views, and gates integration through a speculative submit queue.

This document covers the storage model, core algorithms, data flow, and complexity characteristics in sufficient detail to reproduce the design or contribute to it.


1. System Component Diagram

                              +-----------+
                              |    CLI    |
                              +-----+-----+
                                    |
                              +-----v-----+
                              |  HTTP API  |
                              +-----+-----+
                                    |
                              +-----v-----+
                              |   Server   |
                              +-----+-----+
                                    |
        +-------+-------+------+---+---+-------+--------+
        |       |       |      |       |       |        |
   +----v--+ +-v----+ +-v---+ +v----+ +v----+ +v-----+ +v---------+
   | OpLog | | Blob | |Change| |View | |Event| |Queue | |Workspace |
   |       | |Store | |Store | |Engine| | Bus | |      | |          |
   +---+---+ +--+---+ +--+--+ +--+--+ +--+--+ +--+---+ +----+-----+
       |        |         |       |       |        |          |
   +---v--------v---------v-------v-------+--------+----------+
   |                   SQLite (WAL mode)                      |
   +----------------------------------------------------------+
   |                   Filesystem (blobs)                     |
   +----------------------------------------------------------+

Dependency graph between packages:

types           (no dependencies -- pure data definitions)
  ^
  |--- blob     (types)
  |--- log      (types)
  |--- events   (types)
  |--- diff     (no patchwork dependencies)
  |--- change   (types)
  |--- view     (log, blob, change, diff, types)
  |--- workspace(blob, events, types)
  |--- server   (log, blob, change, view, events, workspace, config, types)

The Server struct is the composition root:

type Server struct {
    Root    string
    OpLog   *log.OpLog
    Blobs   *blob.Store
    Changes *change.Store
    Views   *view.Engine
    Events  *events.Bus
    Config  *config.Store

    db         *sql.DB
    mu         sync.Mutex
    workspaces map[string]*workspace.Workspace

    queue      []*types.QueueEntry
    queueMu    sync.Mutex
    Queue      *SubmitQueue

    SnapshotInterval uint64
    lastSnapshotSeq  uint64
}

Two SQLite databases are used: patchwork.db (changes, views, view cache) and oplog.db (operations, snapshots). Both are opened in WAL mode.


2. Storage Layer

2.1 SQLite Schema

Both databases are opened with WAL journaling and a 5-second busy timeout:

dbPath+"?_journal_mode=WAL&_busy_timeout=5000"

WAL mode rationale. Write-Ahead Logging permits concurrent readers while a single writer holds the write lock. This is critical for Patchwork because view materialization (read-heavy) must not block on operation appends (write path), and multiple agents may be reading views simultaneously while a workspace scan appends operations.

oplog.db -- Operations table

CREATE TABLE IF NOT EXISTS operations (
    seq          INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp    TEXT NOT NULL,
    agent_id     TEXT NOT NULL,
    change_id    TEXT NOT NULL,
    version      INTEGER NOT NULL,
    op_type      TEXT NOT NULL,
    path         TEXT NOT NULL,
    content_hash TEXT,
    diff_hash    TEXT,
    old_path     TEXT,
    parent_seq   INTEGER NOT NULL,
    metadata     TEXT
);
CREATE INDEX IF NOT EXISTS idx_ops_change ON operations(change_id, version);
CREATE INDEX IF NOT EXISTS idx_ops_path ON operations(path);

oplog.db -- Snapshots table

CREATE TABLE IF NOT EXISTS snapshots (
    id         INTEGER PRIMARY KEY AUTOINCREMENT,
    seq        INTEGER NOT NULL UNIQUE,
    tree       TEXT NOT NULL,       -- JSON: { path: content_hash, ... }
    created_at TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_snapshots_seq ON snapshots(seq DESC);

patchwork.db -- Changes tables

CREATE TABLE IF NOT EXISTS changes (
    id          TEXT PRIMARY KEY,
    title       TEXT NOT NULL,
    author      TEXT NOT NULL,
    target      TEXT NOT NULL DEFAULT 'main',
    status      TEXT NOT NULL DEFAULT 'draft',
    current_ver INTEGER NOT NULL DEFAULT 1,
    deps        TEXT,               -- JSON array of change IDs
    metadata    TEXT,               -- JSON object
    created_at  TEXT NOT NULL,
    updated_at  TEXT NOT NULL
);

CREATE TABLE IF NOT EXISTS change_versions (
    change_id   TEXT NOT NULL,
    version     INTEGER NOT NULL,
    base_seq    INTEGER NOT NULL,
    message     TEXT,
    created_at  TEXT NOT NULL,
    test_status TEXT,
    test_output TEXT,
    PRIMARY KEY (change_id, version),
    FOREIGN KEY (change_id) REFERENCES changes(id)
);

CREATE TABLE IF NOT EXISTS version_operations (
    change_id TEXT NOT NULL,
    version   INTEGER NOT NULL,
    op_seq    INTEGER NOT NULL,
    PRIMARY KEY (change_id, version, op_seq),
    FOREIGN KEY (change_id, version) REFERENCES change_versions(change_id, version)
);

patchwork.db -- Views tables

CREATE TABLE IF NOT EXISTS views (
    id              TEXT PRIMARY KEY,
    owner           TEXT NOT NULL,
    base_type       TEXT NOT NULL DEFAULT 'main_at_latest',
    base_seq        INTEGER NOT NULL DEFAULT 0,
    conflict_policy TEXT NOT NULL DEFAULT 'warn',
    created_at      TEXT NOT NULL,
    updated_at      TEXT NOT NULL
);

CREATE TABLE IF NOT EXISTS view_overlays (
    view_id   TEXT NOT NULL,
    change_id TEXT NOT NULL,
    version   INTEGER NOT NULL,
    pin       INTEGER NOT NULL DEFAULT 0,
    sort_order INTEGER NOT NULL,
    PRIMARY KEY (view_id, change_id),
    FOREIGN KEY (view_id) REFERENCES views(id)
);

CREATE TABLE IF NOT EXISTS view_cache (
    view_id      TEXT NOT NULL,
    path         TEXT NOT NULL,
    seq          INTEGER NOT NULL,
    content_hash TEXT NOT NULL,
    PRIMARY KEY (view_id, path)
);

2.2 Blob Store Layout

The blob store is a content-addressed filesystem store using SHA-256. File contents and diffs are stored under a two-level directory hash:

.patchwork/blobs/{hash[0:2]}/{hash[2:4]}/{hash}

For example, a blob with hash a1b2c3d4e5... is stored at:

.patchwork/blobs/a1/b2/a1b2c3d4e5...

Implementation from blob/store.go:

func (s *Store) path(hash string) string {
    if len(hash) < 4 {
        return filepath.Join(s.root, hash)
    }
    return filepath.Join(s.root, hash[0:2], hash[2:4], hash)
}

Write atomicity. Blobs are written to a .tmp suffix first, then atomically renamed. Duplicate writes are a no-op (checked via os.Stat before writing). The store is protected by a sync.RWMutex: writes take an exclusive lock, reads share a read lock.

func (s *Store) Put(data []byte) (string, error) {
    hash := Hash(data)
    s.mu.Lock()
    defer s.mu.Unlock()

    p := s.path(hash)
    if _, err := os.Stat(p); err == nil {
        return hash, nil // already exists
    }
    // ... write tmp, rename ...
}

3. Operation Log

The OpLog is the central data structure: an append-only, totally-ordered log of file-level operations.

Core data type

type Operation struct {
    ID          uint64            `json:"id"`          // seq -- assigned by AUTOINCREMENT
    Timestamp   time.Time         `json:"timestamp"`
    AgentID     string            `json:"agent_id"`
    ChangeID    string            `json:"change_id"`
    Version     uint32            `json:"version"`
    OpType      OpType            `json:"op_type"`     // create | modify | delete | rename | copy
    Path        string            `json:"path"`
    ContentHash string            `json:"content_hash,omitempty"`
    DiffHash    string            `json:"diff_hash,omitempty"`
    OldPath     string            `json:"old_path,omitempty"`
    ParentSeq   uint64            `json:"parent_seq"`
    Metadata    map[string]any    `json:"metadata,omitempty"`
}

Properties

  • Total ordering. seq is the INTEGER PRIMARY KEY AUTOINCREMENT. Every operation receives a unique, monotonically increasing sequence number. This is the global clock of the system.
  • Append-only invariant. The OpLog exposes only Append(). There are no update or delete methods on operations. Once appended, an operation is immutable.
  • Mutex-serialized writes. OpLog.mu sync.Mutex serializes all calls to Append(). Reads (Get, GetRange, GetSince, GetByChange, GetByPath) do not acquire the mutex and rely on SQLite WAL for snapshot isolation.
  • Indexes. Two secondary indexes accelerate the common access patterns:
    • idx_ops_change(change_id, version) -- used by GetByChange() during overlay materialization.
    • idx_ops_path(path) -- used by GetByPath() for per-file history.

Query methods

Method SQL predicate Use case
Get(seq) WHERE seq = ? Point lookup by sequence number
GetRange(from, to) WHERE seq >= ? AND seq <= ? Materialization, rebase conflict detection
GetSince(afterSeq) WHERE seq > ? Incremental updates
GetByChange(changeID, version) WHERE change_id = ? AND version = ? Overlay materialization
GetByPath(path) WHERE path = ? File history
Head() SELECT MAX(seq) Current log tip

4. View Materialization Algorithm

A View is a composable projection of the codebase. It consists of a base (the "main" timeline up to some sequence number) plus zero or more overlays (pending changes layered on top).

type View struct {
    ID             string         `json:"id"`
    Owner          string         `json:"owner"`
    Base           ViewBase       `json:"base"`
    Overlays       []Overlay      `json:"overlays"`
    ConflictPolicy ConflictPolicy `json:"conflict_policy"`
}

type ViewBase struct {
    Type ViewBaseType `json:"type"`  // "main_at_seq" | "main_at_latest"
    Seq  uint64       `json:"seq,omitempty"`
}

type Overlay struct {
    ChangeID string `json:"change_id"`
    Version  uint32 `json:"version"`
    Pin      bool   `json:"pin"`
}

Materialization pseudocode

materialize(view) -> FileTree:
    // Step 1: Resolve the base sequence number.
    if view.base.type == "main_at_latest":
        base_seq = oplog.Head()
    else:
        base_seq = view.base.seq

    // Step 2: Build the base tree from landed operations.
    tree = {}                               // map[path] -> content_hash
    start_seq = 1

    snapshot = oplog.GetLatestSnapshot(before=base_seq)
    if snapshot != nil:
        tree = snapshot.tree                // O(f) files
        start_seq = snapshot.seq + 1

    if start_seq <= base_seq:
        ops = oplog.GetRange(start_seq, base_seq)
        for op in ops:
            change = changes.Get(op.change_id)
            if change.status != "landed":
                continue                    // skip non-landed operations
            apply_op(tree, op)

    // Step 3: Layer overlays in order.
    conflicts = []
    for overlay in view.overlays:
        ops = oplog.GetByChange(overlay.change_id, overlay.version)
        for op in ops:
            if op.op_type == "modify" and op.path in tree:
                conflict = detect_conflict(op, tree[op.path], overlay, blobs)
                if conflict != nil:
                    if conflict.type == "clean_merge":
                        tree[op.path] = conflict.merged_hash
                        continue
                    switch view.conflict_policy:
                        case "warn":
                            conflicts.append(conflict)
                            tree[op.path] = conflict.merged_hash  // with markers
                            continue
                        case "last_write":
                            // fall through to apply_op
                        case "manual":
                            return error("conflict requires manual resolution")
            apply_op(tree, op)

    return FileTree{entries: tree, seq: base_seq, conflicts: conflicts}

apply_op

The function that mutates the in-memory tree:

func applyOp(tree map[string]string, op *types.Operation, blobs *blob.Store) {
    switch op.OpType {
    case OpFileCreate:
        tree[op.Path] = op.ContentHash
    case OpFileModify:
        if op.ContentHash != "" {
            tree[op.Path] = op.ContentHash          // full replacement
        } else if op.DiffHash != "" {
            diffData, _ := blobs.Get(op.DiffHash)
            resultHash, _ := blobs.Put(diffData)
            tree[op.Path] = resultHash              // apply diff
        }
    case OpFileDelete:
        delete(tree, op.Path)
    case OpFileRename:
        if hash, ok := tree[op.OldPath]; ok {
            tree[op.Path] = hash
            delete(tree, op.OldPath)
        }
    case OpFileCopy:
        if hash, ok := tree[op.OldPath]; ok {
            tree[op.Path] = hash
        }
    }
}

Cache-accelerated materialization

For persisted views without overlays, the engine uses a SQLite-backed cache (view_cache table) to avoid full replay:

  1. Look up cached (path, content_hash, seq) entries for this view.
  2. If cachedSeq >= baseSeq, return the cached tree directly.
  3. Otherwise, replay only operations in (cachedSeq+1, baseSeq] against the cached tree.
  4. Store the updated tree back to the cache.

Views with overlays always do a full materialization because overlay operations can change between calls (new versions, rebase).


5. 3-Way Merge Algorithm

The merge algorithm lives in diff/merge.go and is invoked during overlay materialization when an overlay modifies a file that was also modified in the base tree.

Inputs

The three versions for the merge:

  • base: the content the overlay was originally based on (stored in op.Metadata["base_hash"])
  • ours: the current tree content (what has landed on main)
  • theirs: the overlay's new content

Algorithm overview

Merge3(base, ours, theirs, oursLabel, theirsLabel) -> MergeResult:
    baseLines  = splitLines(base)
    oursLines  = splitLines(ours)
    theirsLines = splitLines(theirs)

    // Step 1: Compute line-level changes from base to each side.
    oursChanges  = computeLineChanges(baseLines, oursLines)
    theirsChanges = computeLineChanges(baseLines, theirsLines)

    // Step 2: Walk both change streams, detect overlaps.
    // Each lineChange has { baseStart, baseEnd, newLines }.
    result = []
    while changes remain:
        pick the change with the smallest baseStart
        if no overlap with the other side:
            apply cleanly (copy base lines up to change, insert newLines)
        else:  // overlapping regions
            if both sides produced identical newLines:
                apply once (convergent edit)
            else:
                emit conflict markers:
                    <<<<<<< oursLabel
                    <ours content>
                    =======
                    <theirs content>
                    >>>>>>> theirsLabel

    return MergeResult{ content, clean, conflicts }

LCS-based diff computation

Both Compute() (unified diff) and computeLineChanges() (for 3-way merge) use the same LCS algorithm:

func lcsEditScript(a, b []string) []edit {
    n, m := len(a), len(b)

    // Build LCS table: dp[i][j] = length of LCS of a[0..i-1] and b[0..j-1]
    dp := make([][]int, n+1)
    for i := range dp {
        dp[i] = make([]int, m+1)
    }
    for i := 1; i <= n; i++ {
        for j := 1; j <= m; j++ {
            if a[i-1] == b[j-1] {
                dp[i][j] = dp[i-1][j-1] + 1
            } else if dp[i-1][j] >= dp[i][j-1] {
                dp[i][j] = dp[i-1][j]
            } else {
                dp[i][j] = dp[i][j-1]
            }
        }
    }

    // Backtrack to produce edit script: editEqual, editInsert, editRemove
    // ...
}

The LCS table is O(n * m) in both time and space, where n and m are the line counts of the two inputs.

lineChange extraction

The edit script is converted to a list of lineChange structs, each representing a contiguous modified region:

type lineChange struct {
    baseStart int      // start index in base (inclusive)
    baseEnd   int      // end index in base (exclusive)
    newLines  []string // replacement lines
}

Contiguous runs of editRemove and editInsert are collapsed into a single lineChange. Runs of editEqual terminate the current change.

Overlap detection

Two changes overlap when oursChange.baseEnd > theirsChange.baseStart (or vice versa). When overlap is detected:

  1. Compute overlapStart = min(ours.baseStart, theirs.baseStart) and overlapEnd = max(ours.baseEnd, theirs.baseEnd).
  2. Copy base lines up to overlapStart.
  3. Compare oursContent and theirsContent. If identical, apply once (convergent edit). Otherwise, emit conflict markers.
  4. Advance baseIdx to overlapEnd.

Conflict markers format

<<<<<<< landed
<content from the landed/main side>
=======
<content from the overlay/change side>
>>>>>>> change-id

Fallback without base

If no base_hash metadata is available on the operation, the engine falls back to a simple hash comparison: if the existing tree hash differs from the overlay's content hash, a conflict event is emitted without attempting a merge.


6. Submit Queue

The submit queue gates the integration of changes into main. It is implemented in server/queue.go.

Configuration

type QueueConfig struct {
    TestCmd        string        // e.g., "go test ./..."
    TestDir        string        // working directory for tests
    MaxSpeculative int           // max entries to test in parallel (1 = linear)
    PollInterval   time.Duration // default: 2s
}

Modes

Linear mode (MaxSpeculative == 1): One entry at a time. Dequeue the first pending entry, run tests, land or reject.

Speculative mode (MaxSpeculative > 1, Zuul-style): Test up to MaxSpeculative entries in parallel. Each entry is tested against a stacked view that assumes all preceding entries will pass.

Queue processing loop

func (q *SubmitQueue) loop() {
    ticker := time.NewTicker(q.pollInterval)
    for {
        select {
        case <-q.stopCh:
            return
        case <-ticker.C:
            q.processQueue()
        }
    }
}

processQueue() executes each poll cycle:

processQueue():
    entries = first N pending entries (N = maxSpeculative)
    if len(entries) == 0:
        return

    mark all as "running"

    if len(entries) == 1:
        processLinear(entry)    // test and handle result
    else:
        processSpeculative(entries)

Speculative testing with stacked views

For speculative mode, each entry N is tested against a view containing:

base = main@latest
overlays = [entries[0], entries[1], ..., entries[N]]

This is implemented in runTest():

func (q *SubmitQueue) runTest(entry *QueueEntry, preceding []*QueueEntry) (bool, string) {
    var overlays []types.Overlay
    for _, prev := range preceding {
        overlays = append(overlays, types.Overlay{
            ChangeID: prev.ChangeID,
            Version:  prev.Version,
        })
    }
    overlays = append(overlays, types.Overlay{
        ChangeID: entry.ChangeID,
        Version:  ch.CurrentVersion,
    })

    v := &types.View{
        ID:       "queue-test-" + entry.ChangeID,
        Owner:    "submit-queue",
        Base:     types.ViewBase{Type: types.ViewBaseMainAtLatest},
        Overlays: overlays,
        ConflictPolicy: types.ConflictPolicyLastWrite,
    }

    ft, _ := q.server.Views.MaterializeView(v)
    // Write ft to testDir, execute testCmd via sh -c
    // 5-minute timeout via context.WithTimeout
}

Result processing and cascade invalidation

Results are processed in order:

processSpeculative(entries, indices):
    results = parallel_test(entries)    // WaitGroup

    for i, result in results:
        if result.passed:
            land(entries[i])
            remove from queue
        else:
            reject(entries[i])
            remove from queue
            // Cascade: reset all subsequent entries to "pending"
            for j = i+1 to len(entries):
                entries[j].test_status = "pending"
            break   // stop processing; subsequent entries will be retested

When an entry fails, all entries that were tested speculatively after it are reset to pending. They assumed the failed entry would pass, so their test results are invalid. On the next poll cycle, they will be retested with a corrected stacked view.

On pass

  1. change.status is updated to landed.
  2. change_landed event is published.
  3. test_result(passed) event is published.
  4. Entry is removed from the queue; positions are recomputed.

On failure

  1. change.status is reverted to draft.
  2. test_result(failed) event is published.
  3. Entry is removed from the queue; positions are recomputed.

7. Event Bus Architecture

The event bus (events/bus.go) provides in-process pub/sub with typed subscriptions.

Core types

type Handler func(event types.Event)

type Subscription struct {
    id      uint64
    types   []types.EventType  // empty = all events
    handler Handler
}

type Bus struct {
    mu     sync.RWMutex
    subs   map[uint64]*Subscription
    nextID uint64
}

Publish semantics

Synchronous: Publish(event) copies the subscriber list under a read lock, then invokes each matching handler sequentially in the caller's goroutine. The lock is released before handler invocation to avoid deadlocks.

func (b *Bus) Publish(event types.Event) {
    b.mu.RLock()
    var handlers []Handler
    for _, sub := range b.subs {
        if matchesTypes(sub.types, event.Type) {
            handlers = append(handlers, sub.handler)
        }
    }
    b.mu.RUnlock()

    for _, h := range handlers {
        h(event)
    }
}

Asynchronous: PublishAsync(event) launches each handler in its own goroutine:

for _, h := range handlers {
    go h(event)
}

PublishAsync is used for all events published from the server's write path (AppendOp, UpdateChangeStatus, handleResult) to avoid blocking the critical section.

Channel-based subscription

Bus.Channel() returns a buffered channel and a cancel function. Events are delivered via a non-blocking send; if the channel buffer is full, events are dropped (the subscriber is too slow).

func (b *Bus) Channel(eventTypes []types.EventType, bufSize int) (<-chan types.Event, func()) {
    ch := make(chan types.Event, bufSize)
    id := b.Subscribe(eventTypes, func(e types.Event) {
        select {
        case ch <- e:
        default:  // drop if buffer full
        }
    })
    cancel := func() {
        b.Unsubscribe(id)
        close(ch)
    }
    return ch, cancel
}

This is the primitive used for SSE bridges in the HTTP API: a handler calls Channel(), then reads from the channel in a loop, writing each event as an SSE frame.

Event types

Event Published when Key data fields
op_appended Operation appended to log seq, change_id, path, op_type, agent_id
change_landed Change status set to landed change_id, version
conflict_detected 3-way merge detects conflict path, your_change, your_version, conflicting_op
test_result Queue test completes change_id, version, status, output
queue_position Queue position changes (position data)

8. Workspace Sync

A workspace is a local directory backed by a Patchwork view. It bridges the gap between the operation log (abstract) and the filesystem (concrete).

Core state

type Workspace struct {
    ID, Dir, ChangeID, ViewID, AgentID string

    opAppender OpAppender
    viewReader ViewReader
    blobs      *blob.Store
    eventBus   *events.Bus

    mu         sync.Mutex
    watching   bool
    stopCh     chan struct{}
    knownFiles map[string]string  // path -> content_hash
    version    uint32
    autoSyncID uint64
}

knownFiles is the workspace's understanding of what is on disk. It is the baseline for detecting changes.

ScanAndCapture algorithm

This is the filesystem-to-oplog bridge. It runs on a polling interval or can be called explicitly.

ScanAndCapture():
    head = oplog.Head()
    diskFiles = {}
    ops = []

    walk(workspace.dir):
        skip hidden files/dirs (prefix ".")
        for each file:
            relPath = relative(workspace.dir, file)
            diskFiles[relPath] = true
            hash = SHA-256(file.content)

            if relPath not in knownFiles:
                // New file
                contentHash = blobs.Put(file.content)
                op = Operation{
                    type: "create", path: relPath,
                    content_hash: contentHash, parent_seq: head
                }
                ops.append(opAppender.AppendOp(op))
                knownFiles[relPath] = hash

            else if knownFiles[relPath] != hash:
                // Modified file
                contentHash = blobs.Put(file.content)
                op = Operation{
                    type: "modify", path: relPath,
                    content_hash: contentHash, parent_seq: head
                }
                ops.append(opAppender.AppendOp(op))
                knownFiles[relPath] = hash

    // Detect deletions
    for path in knownFiles:
        if path not in diskFiles:
            op = Operation{
                type: "delete", path: path, parent_seq: head
            }
            ops.append(opAppender.AppendOp(op))
            delete(knownFiles, path)

    return ops

Sync (view to disk)

Sync() materializes the workspace's view and writes files to disk:

Sync():
    ft = viewReader.Materialize(viewID)
    viewFiles = {}

    for entry in ft.entries:
        viewFiles[entry.path] = true
        if knownFiles[entry.path] == entry.content_hash:
            continue    // already up to date
        content = blobs.Get(entry.content_hash)
        writeFile(workspace.dir / entry.path, content)
        knownFiles[entry.path] = entry.content_hash

    // Remove files no longer in the view
    for path in knownFiles:
        if path not in viewFiles:
            removeFile(workspace.dir / path)
            delete(knownFiles, path)

Auto-sync

EnableAutoSync() subscribes to change_landed events on the event bus. When another change lands (not the workspace's own change), the workspace calls Sync() to re-materialize its view, pulling in the latest main state.

ws.autoSyncID = ws.eventBus.Subscribe(
    []types.EventType{types.EventChangeLanded},
    func(event types.Event) {
        if changeID, ok := event.Data["change_id"].(string); ok && changeID == ws.ChangeID {
            return  // don't sync on our own landing
        }
        ws.Sync()
    },
)

9. Scalability

Snapshots

Without snapshots, materializing a view requires replaying every landed operation from seq=1. Snapshots are periodic captures of the full file tree state at a given sequence number.

Creation trigger. After a change lands, Server.maybeSnapshot() checks if head - lastSnapshotSeq >= SnapshotInterval (default: 100 operations). If so, it materializes the current main tree and saves it:

func (s *Server) maybeSnapshot() {
    head, _ := s.OpLog.Head()
    if head - s.lastSnapshotSeq < s.SnapshotInterval {
        return
    }
    // Materialize main, save snapshot
    s.OpLog.CreateSnapshot(head, tree)
    s.lastSnapshotSeq = head
    s.OpLog.PruneSnapshots(5)   // keep only 5 most recent
}

Usage during materialization. The view engine calls GetLatestSnapshot(beforeSeq) to find the most recent snapshot at or before the target sequence. Replay begins from snapshot.seq + 1 instead of seq=1.

snap, err := e.oplog.GetLatestSnapshot(baseSeq)
if snap != nil {
    tree = snap.Tree
    startSeq = snap.Seq + 1
}

Pruning. PruneSnapshots(keep) deletes all but the keep most recent snapshots:

DELETE FROM snapshots WHERE id NOT IN (
    SELECT id FROM snapshots ORDER BY seq DESC LIMIT ?
)

Incremental view cache

The view_cache table stores (view_id, path, content_hash, seq) tuples. For base-only views (no overlays):

  1. If cachedSeq >= baseSeq, return the cached tree (zero replay).
  2. Otherwise, replay only (cachedSeq, baseSeq] against the cached tree (incremental replay).
  3. Invalidate and repopulate the cache with the new state.

Cache invalidation occurs on:

  • AddOverlay or RemoveOverlay on a view (full view invalidation)
  • DeleteView (full view invalidation)
  • Explicit InvalidateCache(viewID, path) call

Snapshot pruning policy

The system keeps N=5 most recent snapshots. With a default interval of 100 operations, this means the worst-case replay for a base-only view is at most 100 operations (from the latest snapshot to head). The 5-snapshot window provides recovery points if the latest snapshot is somehow stale.


10. Key Invariants

Total order

Every operation has a unique, monotonically increasing sequence number assigned by SQLite's AUTOINCREMENT. This is the system's logical clock. All queries that reconstruct state (GetRange, GetSince) rely on this ordering.

Idempotence

Replaying the same sequence of operations against an empty tree (or a snapshot) always produces the same FileTree. Operations are deterministic: create sets, modify replaces, delete removes, rename moves, copy copies. There is no operation that depends on external state beyond the tree itself and the blob store.

Causality

ParentSeq on each operation tracks the causal dependency -- the log head at the time the operation was created. This is used by:

  • Rebase: when latestVersion.BaseSeq < head, the change's operations are replayed with ParentSeq updated to the current head.
  • Conflict detection: operations landed between BaseSeq+1 and head are checked for path overlap with the change's operations.

Safety

The submit queue ensures only tested changes land. The flow is enforced by state machine transitions:

draft -> ready -> testing -> landed
                          \-> draft (on failure, back to draft)
                 -> abandoned

No change can reach landed without passing through the queue (or an explicit LandChange bypass intended for testing).

Atomicity

Multi-row mutations use SQLite transactions:

  • CreateView: inserts into views and view_overlays in a single transaction.
  • AddVersion: inserts into change_versions and version_operations, then updates changes.current_ver, all in a single transaction.
  • DeleteView: deletes from view_cache, view_overlays, and views in a single transaction.

11. Complexity Analysis

Operation Time Complexity Space Complexity
Append operation O(1) amortized (single INSERT) O(1)
Get operation by seq O(1) (primary key lookup) O(1)
Materialize view (from snapshot) O(n) ops since snapshot + O(m) overlay ops O(f) files in tree
Materialize view (no snapshot) O(N) all landed ops + O(m) overlay ops O(f) files in tree
Materialize view (cache hit, no overlays) O(1) if cache is current; O(delta) if stale O(f) files in tree
3-way merge (LCS) O(n * m) where n, m = line counts O(n * m) for DP table
Blob store put O(content size) for hash + write O(content size)
Blob store get O(content size) for read O(content size)
Blob store has O(1) (stat syscall) O(1)
Submit queue process (linear) O(1) entries * O(T) test time O(f) files for test dir
Submit queue process (speculative) O(k) entries * O(T) test time (parallel) O(k * f) for k test dirs
ScanAndCapture O(d) files on disk * O(file size) for hashing O(d) for diskFiles map
GetDependencyChain O(V + E) graph traversal O(V) visited set
CreateSnapshot O(f) files to serialize O(f) JSON blob
PruneSnapshots O(S) total snapshots O(1)

Where:

  • n = operations since the last snapshot (or since the beginning if no snapshot)
  • N = total operations in the log
  • m = operations in the overlay(s)
  • f = number of files in the materialized tree
  • d = number of files in the workspace directory
  • k = number of speculative queue entries
  • T = wall-clock time to run the test command
  • V = number of changes in the dependency graph, E = number of dependency edges
  • S = total number of snapshots

12. Data Flow: Agent Edit to Landing

The complete pipeline from an agent editing a file to that change being integrated into main:

 Agent edits file         Workspace detects        Blob store
 in workspace dir         change (polling)          (SHA-256)
       |                        |                       |
       v                        v                       v
  1. write file -----> 2. ScanAndCapture() ----> 3. blobs.Put(content)
                              |                       returns hash
                              v
                       4. OpLog.Append(op)
                          {type: "modify", path: "...", content_hash: hash}
                              |
                              v
                       5. EventBus.PublishAsync(op_appended)
                              |
                              v
                       6. Agent calls AddVersion(changeID, baseSeq, msg, opSeqs)
                          -- snapshots the operations into a version
                              |
                              v
                       7. Agent calls Submit(changeID)
                          -- change.status -> "testing"
                          -- QueueEntry{changeID, version, pending} added
                              |
                              v
                       8. Queue.processQueue() on next tick:
                          a. Build stacked view: main@latest + [preceding entries] + this entry
                          b. MaterializeView -> FileTree
                          c. Write files to temp dir
                          d. exec("sh", "-c", testCmd) with 5-min timeout
                              |
                              v
                       9. On pass:
                          a. change.status -> "landed"
                          b. Entry removed from queue
                          c. maybeSnapshot() -- create snapshot if interval elapsed
                              |
                              v
                      10. EventBus.PublishAsync(change_landed)
                          -> Other workspaces with auto-sync call Sync()
                          -> Their views re-materialize with the new main state

Step-by-step detail

  1. Agent edits file in workspace directory. The agent (or any process) writes to a file under the workspace's directory.

  2. Workspace scan detects change. If StartWatching() is active, a periodic ticker calls ScanAndCapture(). The function walks the directory, computes SHA-256 hashes, and compares them to knownFiles. New files produce create operations, modified files produce modify operations, missing files produce delete operations.

  3. Content stored in blob store. File contents are written to the content-addressed blob store. The SHA-256 hash serves as both the address and the deduplication key. If the blob already exists, the write is a no-op.

  4. Operation appended to log. The operation (with content_hash, path, parent_seq, change_id, agent_id) is validated by ValidateOp() (checks for relative paths, no directory traversal, required fields per op type), then appended to the log with a mutex-serialized INSERT. The operation receives its globally unique sequence number.

  5. Event published. op_appended event is published asynchronously. Any subscriber (other workspaces, UI, SSE clients) receives notification.

  6. Agent creates a version. AddVersion() groups a set of operation sequence numbers into a named version of the change. This is recorded in change_versions and version_operations tables within a single transaction. The change's current_ver is incremented atomically.

  7. Agent submits change to queue. Submit() validates the change status (must be draft or ready), checks that all dependencies are landed, transitions the status to testing, and appends a QueueEntry to the queue.

  8. Queue creates stacked view and runs tests. On the next poll cycle, the queue builds a View with base=main@latest and overlays for all preceding entries plus this entry. It materializes this view to a FileTree, writes all files to a temporary directory, and executes the test command via sh -c with a 5-minute timeout.

  9. On pass: change lands. The change status transitions to landed. Its operations are now part of the "main" timeline -- future view materializations with main_at_latest will include them. maybeSnapshot() checks if the snapshot interval has elapsed and, if so, takes a snapshot to accelerate future materializations.

  10. Event published; workspaces sync. The change_landed event triggers auto-sync in other workspaces. Each workspace re-materializes its view (which now includes the newly landed operations in its base) and writes the updated files to disk.