Skip to content

[Bug]: SemanticProcessor fails to move L0/L1 files from temp to target directory due to lock timeout #1047

@App-311

Description

@App-311

Bug Description

The SemanticProcessor in OpenViking v0.2.13 fails to move generated .abstract.md (L0) and .overview.md (L1) files from the temporary directory to the target resource directory. The operation fails with "Failed to acquire mv lock" errors, resulting in missing layer files even though the VLM successfully generated them.

Steps to Reproduce

  1. Start OpenViking server with default configuration
  2. Create a directory-based resource using the HTTP API:
    client.mkdir("viking://resources/test/my-topic")
    client.add_resource(
    path="content.md",
    to="viking://resources/test/my-topic",
    wait=False
    )
  3. Wait for SemanticProcessor to process the queue
  4. Check the target directory for .abstract.md and .overview.md

Expected Behavior

The SemanticProcessor should:

  1. Generate L0 (.abstract.md) and L1 (.overview.md) via VLM
  2. Successfully move these files from viking://temp/YYYYMMDD_HHMMSS_*/ to viking://resources/{category}/{title}/
  3. Result in a complete directory structure with all three layers present

Actual Behavior

The SemanticProcessor generates the layer files in the temp directory, but fails to move them to the target directory. Server logs show:
[SyncDiff] Failed to acquire mv lock for ['/local/default/temp/.../.overview.md']
[SyncDiff] Failed to move updated file: viking://temp/.../.overview.md -> viking://resources/.../.overview.md, error=Failed to acquire mv lock
[SyncDiff] Failed to move added file: viking://temp/.../content.md -> viking://resources/.../content.md, error=Failed to acquire mv lock
The target directory only contains the original L2 content file, while L0/L1 remain in temp (and may be cleaned up later).
Root Cause Analysis (Investigated)
Upon investigation, we found that the LockManager is initialized with lock_timeout: float = 0.0 (line 25 in lock_manager.py). This causes immediate lock acquisition failures when there's any contention, rather than waiting for locks to become available.
The issue appears to be a race condition in the SemanticProcessor's _sync_topdown_recursive method, where multiple concurrent operations compete for subtree locks.

Workaround / Local Fix
We were able to resolve the issue locally by applying two changes:

  1. Increased lock timeout in lock_manager.py:25:
    lock_timeout: float = 10.0, # Changed from 0.0
  2. Added retry logic around the viking_fs.mv() calls in semantic_processor.py (4 locations, lines ~614, ~625, ~662, ~678):
    for attempt in range(3):
    try:
    await viking_fs.mv(root_file, target_file, ctx=ctx)
    break
    except Exception as e:
    if "lock" in str(e).lower() and attempt < 2:
    await asyncio.sleep(0.3 * (attempt + 1))
    continue
    raise
    After these changes, L0/L1 files are successfully moved and the layer system works as intended.
    Minimal Reproducible Example
    import openviking as ov

client = ov.SyncHTTPClient(url="http://127.0.0.1:1933")
client.initialize()

Create directory structure

client.mkdir("viking://resources/test/repro-case")

Add content (triggers SemanticProcessor)

with open("test.md", "w") as f:
f.write("# Test\nThis is a test document for layer generation.")
client.add_resource(
path="test.md",
to="viking://resources/test/repro-case",
wait=False # Async processing
)

Wait for processing (30-60 seconds)

import time
time.sleep(30)

Check result - L0/L1 will be missing if bug is present

import subprocess
result = subprocess.run(
["ov", "ls", "viking://resources/test/repro-case", "-a"],
capture_output=True, text=True
)
print(result.stdout)

Expected: .abstract.md, .overview.md, test.md

Actual (with bug): only test.md

Minimal Reproducible Example

Error Logs

OpenViking Version

v0.2.13

Python Version

3.12.3

Operating System

Linux

Model Backend

Other

Additional Context

OS: Ubuntu 24.04.4 LTS
Installation Method: pip install in venv

Additional Context
• This may be configuration-specific or related to our environment
• The issue appears consistently on our system with the default installation
• We have verified the VLM is working correctly
• The files are successfully generated in temp, only the move operation fails

Note: This might be an edge case specific to our configuration. We're reporting it in case others encounter similar issues. The workaround above has resolved it for our use case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions