Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 83 additions & 1 deletion TYPE_SAFE_EDITS.md
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,88 @@ const edit: EditHistory<["metadata", "cellLabel"]> = {
4. **Create type guards** for runtime type checking
5. **Document new editMap paths** in the EditMapValueTypes mapping

This type system provides a robust foundation for the CRDT edit system, ensuring type safety while maintaining flexibility for future extensions.</contents>
This type system provides a robust foundation for the CRDT edit system, ensuring type safety while maintaining flexibility for future extensions.

## Merge-Time Field Preservation Invariant

The CRDT edit system replays `editMap`-tracked operations during merge to reach the
canonical post-merge state. However, **not every field on a cell's metadata is
necessarily tracked by an edit-history entry**. Examples include (but are not
limited to):

- `metadata.attachments` (audio recordings, LFS pointers, image cards)
- `metadata.selectedAudioId` / `metadata.selectionTimestamp` (current selection)
- `metadata.data.milestoneIndex` (timing milestones)
- `metadata.data.startTime` / `metadata.data.endTime` (subtitle timing — sometimes
written without an edit-history entry depending on the call site)
- future fields added by features that don't write through `EditHistory`

### The Invariant

> When merging two cells, the resulting cell's metadata MUST be a union of both
> sides' metadata as a baseline, with edit-history replay applied on top.
> Fields that exist on only one side and are not tracked by edit history MUST
> still appear on the merged cell.

Concretely, in `resolveMetadataConflictsUsingEditHistory` (see
[src/projectManager/utils/merge/resolvers.ts](src/projectManager/utils/merge/resolvers.ts)),
the resolved cell is initialized as:

```typescript
const resolvedCell: CustomNotebookCellData = {
...theirCell,
...ourCell,
metadata: {
...(theirCell.metadata ?? {}),
...(ourCell.metadata ?? {}),
data: {
...((theirCell.metadata ?? {}).data ?? {}),
...((ourCell.metadata ?? {}).data ?? {}),
},
} as CustomNotebookCellData["metadata"],
};
```

Edit-history-driven overrides (`applyEditToCell`, intelligent attachment merging
via `mergeAttachments`, audio-selection resolution via `resolveAudioSelection`)
then run on top.

### Why This Invariant Exists

In late-2025, a regression replaced this union spread with `{ ...ourCell }` plus
edit-history replay. That implicitly assumed all meaningful metadata changes were
tracked by edit-history entries. In practice that wasn't true — `attachments` for
audio recordings were written without edit-history entries, so any cell whose
`ourCell` lacked attachments would silently drop the `theirCell` attachments on
merge. This is exactly the gan-ji-an `.webm` data-loss incident.

### Enforcement

Three regression tests in
[src/test/suite/resolveCodexCustomMerge.test.ts](src/test/suite/resolveCodexCustomMerge.test.ts)
pin this invariant:

- `preserves theirs-only attachments when ours has none (the .webm case)`
- `preserves theirs-only data.milestoneIndex when ours has none`
- `ours wins on overlapping keys; non-overlapping keys union`

Plus two tests on the migration path which uses the same merge logic to dedupe
cells: `mergeDuplicateCellsUsingResolverLogic - union preservation`. If any of
these break, the invariant has been broken — do not "fix" the tests, fix the
merge logic.

### When Adding New Metadata Fields

If you add a new metadata field to a cell:

1. Prefer routing changes through `EditHistory` so they participate in edit
replay during merge. Use `EditMapUtils.*` helpers.
2. If a field cannot reasonably be edit-tracked (e.g. content-addressed
attachment dictionaries), the union-spread baseline above will preserve it
on the unmodified side. **Add a regression test in `resolveCodexCustomMerge.test.ts`**
asserting your field survives an ours-empty / theirs-full merge.
3. If a field has its own intelligent-merge logic (like `mergeAttachments`),
ensure that logic is invoked AFTER the union spread in
`mergeTwoCellsUsingResolverLogic`, not before.</contents>
</xai:function_call"> wrote to file "/Users/benjaminscholtens/Documents/clear_bible/codex-editor/TYPE_SAFE_EDITS.md

221 changes: 152 additions & 69 deletions src/projectManager/utils/merge/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@ import { getAuthApi } from "../../../extension";
import { ConflictFile } from "./types";
import { getFrontierVersionStatus } from "../../utils/versionChecks";
import { migration_recoverTempFilesAndMergeDuplicates } from "../migrationUtils";
import {
TransientSyncError,
isRetriableSyncError,
isUserSurfacedError,
markUserSurfaced,
} from "./transientSyncError";

const DEBUG_MODE = false;
function debug(...args: any[]): void {
Expand Down Expand Up @@ -129,43 +135,42 @@ export async function stageAndCommitAllAndSync(
return syncResult;
}

// Optional diagnostics from Frontier to help detect “remote changes not applied” scenarios.
// This is intentionally non-destructive: we only warn/log so issues can be triaged.
try {
const conflictsArr = Array.isArray(conflictsResponse?.conflicts)
? (conflictsResponse.conflicts as Array<{ filepath?: string; }>)
: [];
const conflictPaths = new Set(
conflictsArr.map((c) => c?.filepath).filter((p): p is string => typeof p === "string")
// Diagnostic: every remote-changed file MUST appear in the conflict list.
// If it doesn't, something dropped silently between Frontier's existence
// classification and the conflict list it produced — proceeding would
// create a merge commit missing those files. Throw a TransientSyncError
// so the retry layer below can re-run sync (which refetches via
// fetchOrigin) before bothering the user.
//
// Unlike the old diagnostic, this checks ALL paths, not just `.codex`,
// because the gan-ji-an regression involved .webm pointer files too.
const conflictsArr = Array.isArray(conflictsResponse?.conflicts)
? (conflictsResponse.conflicts as Array<{ filepath?: string; }>)
: [];
const conflictPaths = new Set(
conflictsArr.map((c) => c?.filepath).filter((p): p is string => typeof p === "string")
);

const remoteChanged = conflictsResponse?.remoteChangedFilePaths;
const allChanged = conflictsResponse?.allChangedFilePaths;
const changedList: unknown =
Array.isArray(remoteChanged) ? remoteChanged : Array.isArray(allChanged) ? allChanged : [];

if (Array.isArray(changedList) && changedList.length > 0) {
const changedPaths = changedList.filter(
(p): p is string => typeof p === "string" && p.length > 0
);

const remoteChanged = conflictsResponse?.remoteChangedFilePaths;
const allChanged = conflictsResponse?.allChangedFilePaths;
const changedList: unknown =
Array.isArray(remoteChanged) ? remoteChanged : Array.isArray(allChanged) ? allChanged : [];

if (Array.isArray(changedList) && changedList.length > 0) {
const changedPaths = changedList.filter(
(p): p is string => typeof p === "string" && p.length > 0
);
const remoteCodex = changedPaths.filter(
(p) => p.startsWith("files/target/") && p.endsWith(".codex")
const missingFromConflicts = changedPaths.filter((p) => !conflictPaths.has(p));
if (missingFromConflicts.length > 0) {
const sample = missingFromConflicts.slice(0, 5).join(", ");
const extra = missingFromConflicts.length > 5
? ` (+${missingFromConflicts.length - 5} more)`
: "";
throw new TransientSyncError(
`${missingFromConflicts.length} remote-changed file(s) were not in the conflict list. Missing: ${sample}${extra}`,
missingFromConflicts
);

const missingFromConflicts = remoteCodex.filter((p) => !conflictPaths.has(p));
if (missingFromConflicts.length > 0) {
console.warn(
"[Sync] Frontier reported remote `.codex` changes not present in conflict list:",
missingFromConflicts
);
// Avoid modal spam; one-time lightweight warning.
vscode.window.showWarningMessage(
`Some changes from other team members may not have been included. Try syncing again if something looks missing.`
);
}
}
} catch (e) {
// Never fail sync due to diagnostics
}

// Capture uploaded LFS files from the sync operation
Expand Down Expand Up @@ -199,15 +204,32 @@ export async function stageAndCommitAllAndSync(
console.error(
`[Merge] ${failedConflicts.length} conflict(s) could not be resolved:\n${failedList}`
);
// If any failure originated from a BLOB_READ_FAILED: sentinel (e.g.
// empty-content isNew in Fix 4), let the outer retry handle it
// silently before surfacing UI. Otherwise it's a non-retriable
// merge issue and falls through to the existing user-facing path.
const anyTransient = failedConflicts.some((f) =>
typeof f.error === "string" && f.error.startsWith("BLOB_READ_FAILED:")
);
if (anyTransient) {
throw new TransientSyncError(
`Merge aborted: ${failedConflicts.length} conflict(s) could not be resolved due to incomplete fetch. ` +
`Resolved ${resolvedFiles.length} of ${conflicts.length} total.`,
failedConflicts.map((f) => `${f.filepath}: ${f.error}`)
);
}
// Non-transient: show the specific "data is safe" dialog and tag the
// error so the outer catch knows not to surface a generic dialog on
// top of this one.
vscode.window.showErrorMessage(
`${failedConflicts.length} file(s) had changes that couldn't be combined automatically. ` +
`Your data is safe — please try syncing again or contact support.`
);
throw new Error(
throw markUserSurfaced(new Error(
`Merge aborted: ${failedConflicts.length} conflict(s) could not be resolved. ` +
`Resolved ${resolvedFiles.length} of ${conflicts.length} total. ` +
`Failed:\n${failedList}`
);
));
}

if (resolvedFiles.length > 0) {
Expand All @@ -218,20 +240,9 @@ export async function stageAndCommitAllAndSync(
const errorMessage = completeMergeError instanceof Error ? completeMergeError.message : String(completeMergeError);
debug("completeMerge error:", errorMessage);

// Only retry on transient errors (push rejected because remote
// advanced, network hiccups, etc.). Permanent failures like auth,
// validation, or staging errors should surface immediately.
const isTransient =
errorMessage.includes("non-fast-forward") ||
errorMessage.includes("failed to push") ||
errorMessage.includes("Failed to push") ||
errorMessage.includes("timeout") ||
errorMessage.includes("ETIMEDOUT") ||
errorMessage.includes("ECONNRESET") ||
errorMessage.includes("ECONNREFUSED") ||
errorMessage.includes("network");

if (isTransient && retryCount < 3) {
// Use the shared classifier so completeMerge retries stay aligned
// with the outer transient-error policy (single source of truth).
if (isRetriableSyncError(completeMergeError) && retryCount < 3) {
debug(`⚠️ Transient completeMerge failure, retrying... (attempt ${retryCount + 1}/3)`);

const backoffMs = 5 * Math.pow(2, retryCount) * 1000;
Expand Down Expand Up @@ -272,37 +283,109 @@ export async function stageAndCommitAllAndSync(

return syncResult;
} catch (error) {
// Self-healing outer retry: re-run the entire sync (which forces a fresh
// authApi.syncChanges → fetchOrigin) if this looks like a transient
// condition. Caps at 2 retries (1.5s, 3s) so we never block the user
// forever; on final failure we still throw (so SyncManager updates its
// status) but we also surface a non-modal dialog with a Retry button
// for foreground syncs. Background syncs (showCompletionMessage=false)
// stay silent so SyncManager's scheduler can reschedule.
if (isRetriableSyncError(error) && retryCount < 2) {
const backoffMs = 1500 * Math.pow(2, retryCount); // 1.5s then 3s
debug(
`[Sync] Transient outer failure (attempt ${retryCount + 1}/2), retrying after ${backoffMs}ms: ${(error as Error).message}`
);
await new Promise((r) => setTimeout(r, backoffMs));
return stageAndCommitAllAndSync(commitMessage, showCompletionMessage, retryCount + 1);
}

console.error("Failed to commit and sync changes:", error);
// Fire-and-forget: showErrorMessage with an action only resolves when
// the user dismisses/clicks the toast, and we must not block the sync
// promise on that — background sync (syncManager) would stall behind
// an ignored notification.
void showSyncErrorWithCopy(error, { commitMessage });

// Suppress the outer dialog if an inner step already showed a more
// specific dialog to the user (e.g. failedConflicts non-transient path).
// Foreground syncs see the toast; background syncs (showCompletionMessage=false)
// stay silent so SyncManager's scheduler can reschedule.
//
// Fire-and-forget: showErrorMessage with actions only resolves when the
// user interacts, and we must not block the sync promise on that —
// background sync would stall behind an ignored notification.
if (showCompletionMessage && !isUserSurfacedError(error)) {
void showSyncErrorWithCopy(error, {
commitMessage,
wasTransient: isRetriableSyncError(error),
onRetry: async () => {
// Use SyncManager directly so the retry flows through the
// same scheduling/UI status path as a manual user-triggered
// sync.
const { SyncManager } = await import("../../syncManager");
await SyncManager.getInstance().executeSync(
"Retrying sync",
true,
undefined,
true
);
},
});
}

throw error;
}
}

/**
* Show the sync-failure toast with a "Copy Error Details" action. The action
* writes a plaintext diagnostic report to the clipboard so users can paste it
* directly to support — most users have no useful access to the dev console.
* Show the sync-failure toast.
*
* The report extracts structured fields from a `GitLabApiError` thrown by
* frontier-authentication (status, URL, response body, etc.) when present,
* and falls back to the generic Error message + stack otherwise.
* Actions on the toast:
* - "Retry sync now" (only when `onRetry` is provided): re-runs the sync via
* `SyncManager.executeSync`. Surfaced for transient failures so users are
* never permanently blocked after the auto-retry layer gives up.
* - "Copy Error Details": writes a plaintext diagnostic report to the
* clipboard so users can paste it directly to support — most users have no
* useful access to the dev console. The report extracts structured fields
* from a `GitLabApiError` thrown by frontier-authentication (status, URL,
* response body, etc.) when present, and falls back to the generic Error
* message + stack otherwise.
*
* The message text is adjusted based on `wasTransient` so users get a softer,
* more reassuring message ("Your changes are safe...") when the failure looks
* like a brief network hiccup, vs the generic "Sync failed" message for
* unknown failures.
*/
async function showSyncErrorWithCopy(
error: unknown,
context: { commitMessage?: string },
context: {
commitMessage?: string;
wasTransient?: boolean;
onRetry?: () => Promise<void> | void;
},
): Promise<void> {
const RETRY_ACTION = "Retry sync now";
const COPY_ACTION = "Copy Error Details";
const choice = await vscode.window.showErrorMessage(
"Sync failed. Please try again or contact support if the problem persists.",
COPY_ACTION,
);
if (choice !== COPY_ACTION) return;
await vscode.env.clipboard.writeText(formatSyncErrorReport(error, context));
vscode.window.showInformationMessage("Error details copied to clipboard.");

const message = context.wasTransient
? "Some files from other team members couldn't be downloaded just now (likely a brief network issue). " +
"Your changes are safe. Try again — if it keeps happening, please contact support."
: "Sync failed. Please try again or contact support if the problem persists.";

const actions = context.onRetry
? [RETRY_ACTION, COPY_ACTION]
: [COPY_ACTION];

const choice = await vscode.window.showErrorMessage(message, ...actions);

if (choice === RETRY_ACTION && context.onRetry) {
try {
await context.onRetry();
} catch (retryError) {
console.error("Manual retry from sync error dialog failed:", retryError);
}
return;
}

if (choice === COPY_ACTION) {
await vscode.env.clipboard.writeText(formatSyncErrorReport(error, context));
vscode.window.showInformationMessage("Error details copied to clipboard.");
}
}

function formatSyncErrorReport(
Expand Down
Loading
Loading