Skip to content

fix(storage): prevent silent data loss in VikingFS.append_file()#1036

Open
RGB-loop wants to merge 1 commit intovolcengine:mainfrom
RGB-loop:fix/append-file-silent-data-loss
Open

fix(storage): prevent silent data loss in VikingFS.append_file()#1036
RGB-loop wants to merge 1 commit intovolcengine:mainfrom
RGB-loop:fix/append-file-silent-data-loss

Conversation

@RGB-loop
Copy link
Copy Markdown

@RGB-loop RGB-loop commented Mar 27, 2026

Summary

VikingFS.append_file() catches all exceptions when reading existing file content and silently falls back to an empty string. This means any transient AGFS failure (timeout, connection error, decryption failure) causes the subsequent write to overwrite the entire file with only the new content — silently losing all previous data.

The only caller is Session._append_to_jsonl(), which appends every message to messages.jsonl. A single read failure during append wipes the entire message history.

Changes

  • Only suppress AGFSHTTPError with status 404 (file not found), which is the expected case on first write
  • Let other AGFS errors (AGFSClientError: connection refused, timeout, etc.) propagate to the outer handler which logs and raises IOError

Trigger conditions

  • AGFS backend temporary I/O error (network flap, S3 timeout)
  • Encryption key rotation causing _decrypt_content to fail
  • File encoding corruption causing _decode_bytes to raise

Test plan

  • Verify first-time append still works (file does not exist → 404 → creates new file)
  • Verify append to existing file works (read succeeds → content concatenated)
  • Verify AGFS timeout during read raises IOError instead of silently overwriting

When reading existing file content fails (e.g. AGFS timeout, connection
error, decryption failure), the broad `except Exception: pass` silently
discards the error and sets existing content to empty string. The
subsequent write then overwrites the entire file with only the new
content, losing all previous data.

Only suppress FileNotFoundError (HTTP 404) which is expected on first
write. Let other AGFS errors propagate so the caller sees the failure
instead of silently losing data.
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 27, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions
Copy link
Copy Markdown

Failed to generate code suggestions for PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants