file_lock attacker-DoS mitigation (sidecar-lock TTL + observability; PR #641 R2-S-3 follow-up)

## Background

During PR #641 Round-2 blind review (R2-S-3 finding), security-engineer-r2 flagged that `shared/claude_md_manager.py::file_lock` has a 5-second timeout with fail-open behavior. An attacker (same-user, since this is local-trust scope) holding the sidecar `.{filename}.lock` file for >5 seconds causes legitimate writers to silently degrade to unlocked writes.

```python
# pact-plugin/hooks/shared/claude_md_manager.py (file_lock)
# Acquire fcntl.flock on .{filename}.lock with 5s timeout
# On timeout or any disk error → fail-open (proceed without lock)
```

Fail-open is correct under current threat models (same-user-trust, where the attacker has equivalent capability via direct os tooling anyway), but the timeout is short and there's no observability for "how often does fail-open fire?"

## Symptoms attacker could induce

- Concurrent CLAUDE.md writes that race past lock ordering → torn writes (PACT_MANAGED block partial overwrite by routing-block update, or vice versa).
- Race window during which strip_orphan_routing_markers runs without holding the lock → potential to lose a writer's append.

The likelihood is low (same-user is the only adversary) but the surface is enumerable.

## Proposal — defense-in-depth

### Option A — TTL on sidecar lock

```python
# Before fcntl.flock acquisition, check the sidecar lock's mtime.
# If mtime > now - LOCK_TTL_SECONDS (e.g., 30s), the previous holder is presumed dead;
# unlink the sidecar and reacquire.
LOCK_TTL_SECONDS = 30

def file_lock(target: Path):
    sidecar = target.parent / f".{target.name}.lock"
    if sidecar.exists():
        age = time.time() - sidecar.stat().st_mtime
        if age > LOCK_TTL_SECONDS:
            sidecar.unlink(missing_ok=True)  # presumed dead holder
    # ... existing acquisition with 5s timeout ...
```

This bounds the worst-case wait at `LOCK_TTL_SECONDS` (30s) for legitimate writers, instead of leaving them indefinitely waiting on a hung process or fail-open silently after 5s.

### Option B — fail-open observability

```python
# On fail-open, emit a journal event so audits can enumerate fail-open frequency
def file_lock(target: Path):
    try:
        # ... acquisition ...
        yield
    except (fcntl.error, TimeoutError) as e:
        _journal_lock_fail_open(str(target), e.errno, e.args[0])
        yield  # proceed without lock
```

`session_journal.py` accepts a new `lock_fail_open` event type. Doesn't change behavior; gives security review observability.

## Recommendation

Implement **both**: Option B first (observability is cheap), Option A as the security hardening once we have data on actual fail-open frequency.

## Acceptance criteria

- [ ] `lock_fail_open` journal event emitted on every file_lock fail-open path.
- [ ] LOCK_TTL_SECONDS sidecar-lock TTL implemented; lock files older than 30s are presumed dead and unlinked.
- [ ] Tests cover: TTL-based unlink + reacquire path, fail-open observability path.
- [ ] Existing fail-open tests still pass (TTL logic does not break the contract).

## Cross-references

- PR #641 R2-S-3 (security-engineer-r2 finding, deferred to follow-up)
- `shared/claude_md_manager.py::file_lock` (5s timeout, fail-open)
- Pinned: `file_lock` is the canonical helper for shared-file read-mutate-write
- session_journal.py — canonical event-persistence layer


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

file_lock attacker-DoS mitigation (sidecar-lock TTL + observability; PR #641 R2-S-3 follow-up) #645

Background

Symptoms attacker could induce

Proposal — defense-in-depth

Option A — TTL on sidecar lock

Option B — fail-open observability

Recommendation

Acceptance criteria

Cross-references

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

file_lock attacker-DoS mitigation (sidecar-lock TTL + observability; PR #641 R2-S-3 follow-up) #645

Description

Background

Symptoms attacker could induce

Proposal — defense-in-depth

Option A — TTL on sidecar lock

Option B — fail-open observability

Recommendation

Acceptance criteria

Cross-references

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions