Skip to content

provider: fix "stale state file" status when daemon is stopped — or actually running#301

Open
devin-ai-integration[bot] wants to merge 4 commits into
masterfrom
devin/1781060480-stale-state-file
Open

provider: fix "stale state file" status when daemon is stopped — or actually running#301
devin-ai-integration[bot] wants to merge 4 commits into
masterfrom
devin/1781060480-stale-state-file

Conversation

@devin-ai-integration

@devin-ai-integration devin-ai-integration Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

A provider reported darkbloom status showing Daemon: not running (stale state file). Root cause: nothing ever deletes ~/.darkbloom/daemon-state.json. The daemon writes it every heartbeat, but darkbloom stop (launchctl bootout → SIGTERM, which skips the daemon's defers) leaves it behind, as does any crash or reboot. From then on status finds a state file with a dead PID and prints the alarming "stale state file" message forever, even though nothing is broken — the daemon just isn't running.

Fix:

  • DaemonStateFile.remove() — new best-effort cleanup helper.
  • darkbloom stop removes the state file after unloading the launchd service (covers the SIGTERM path where the daemon can't clean up itself).
  • The foreground daemon removes it on graceful exit (defer alongside the PID-lock release).
  • status now prints an actionable message when the state file's PID is dead: Daemon: not running (last session ended ~Xm ago — run \darkbloom start`)` instead of "stale state file".

Linked issue

Closes #

Test plan

  • Added daemonStateRemoveDeletesFileAndIsIdempotent unit test in DiagnosticsTests.swift
  • CI swift test (Swift provider builds only on macOS; not buildable on this Linux box)

Components touched

  • coordinator (Go)
  • provider (Rust, legacy)
  • provider-swift (Swift CLI)
  • console-ui (Next.js)
  • enclave (Swift)
  • infra / CI / release
  • docs

Protocol / interface changes

  • No protocol/interface changes

Notes for reviewers

The state file remains intentionally NOT removed on crash/SIGKILL (we can't), so the dead-PID branch in status stays — it now just gives the operator the fix (darkbloom start) plus how long ago the last session ended, instead of pointing at the artifact.

Link to Devin session: https://app.devin.ai/sessions/9becf5d25be04b6d88c2a30466141566


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.

@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment, CI, and merge conflict monitoring

@vercel

vercel Bot commented Jun 10, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
d-inference Ready Ready Preview Jun 10, 2026 3:23am
d-inference-console-ui-dev Ready Ready Preview Jun 10, 2026 3:23am
d-inference-landing Ready Ready Preview Jun 10, 2026 3:23am

Request Review

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

These changes fall entirely outside the current threat model's affected_files mappings and introduce no direct security regressions, but the LaunchAgent / daemon-state surface deserves a threat-model annotation.


Trust boundaries touched

None of the canonical TB-001 through TB-009 boundaries are directly exercised by these files. However, the changes sit adjacent to TB-003 (provider operator vs. process) because they govern how the darkbloom process is installed, started, stopped, and inspected by the local operator.


Threat coverage

No existing T-xxx threat directly covers LaunchAgent lifecycle or daemon state files. The changes are therefore ℹ️ neutral with respect to all current threats — they do not strengthen or weaken any documented mitigation.


New attack surface not covered by the threat model

Two areas warrant a future threat-model entry:

  1. LaunchAgent.swift — plist installation path and permissions
    If the LaunchAgent plist is written to a user-writable location (e.g. ~/Library/LaunchAgents/) with weak file permissions, a local attacker (or a compromised process running as the same user) could modify the ProgramArguments array to substitute an arbitrary binary that will be launched under the same credentials. This is a privilege-persistence / binary-substitution vector that is adjacent to T-013 but not covered by it. The threat model should add an entry covering LaunchAgent.swift and StartCommand.swift in the context of TB-003.

  2. DaemonStateFile.swift — state file integrity
    If the daemon state file (recording whether the provider is running, its PID, or trust state) is world-readable or world-writable, an operator or co-process could forge state, causing StatusCommand and StopCommand to act on stale or attacker-supplied data (e.g. a crafted PID causes a SIGTERM to be sent to an arbitrary process). Integrity of this file is not enforced by any current mitigation.

  3. DoctorCommand.swift / DoctorRunner.swift — diagnostic output disclosure
    Doctor/diagnostic runners commonly collect and print system state (SIP status, binary path, entitlements, keychain query results). If any of this output is written to a world-readable file or sent over an unauthenticated channel, it could aid an adversary in fingerprinting the security posture of a target provider. This is low severity but should be confirmed not to log sensitive fields (SE public key, keychain access-group identifiers, binary hash) to a path readable outside the process.


Recommended threat model update

Add a new threat entry (suggested: T-043, TB-003, STRIDE: Tampering/Elevation-of-Privilege) covering:

  • provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift
  • provider-swift/Sources/ProviderCore/Service/DaemonStateFile.swift
  • provider-swift/Sources/darkbloom/StartCommand.swift
  • provider-swift/Sources/darkbloom/StopCommand.swift
  • provider-swift/Sources/darkbloom/Diagnostics/DoctorRunner.swift

No SEC-* open findings are resolved by this PR.


🔐 Threat model: docs/threat-model.yaml · Updates on each push to this PR

…ith a stale/unwritable state file isn't reported as down

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: REQUEST_CHANGES

Security — ✅ No issues found

Performance — 1 finding(s) (1 blocking)

  • 🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-90 — Synchronous process execution in runningPID() method without timeout or cancellation
    • Suggestion: Add timeout to process execution or make this method async to avoid blocking callers indefinitely if launchctl hangs

Type_diligence — ✅ No issues found

Additive_complexity — 3 finding(s) (1 blocking)

  • 🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-90 — Custom launchctl output parsing reinvents process management
    • Suggestion: Consider using Foundation's Process.processIdentifier or NSRunningApplication APIs instead of parsing launchctl output
  • 🔵 [INFO] provider-swift/Sources/darkbloom/StatusCommand.swift:56-87 — Complex daemon status logic with multiple cross-checks
    • Suggestion: Extract daemon status determination into a single function that returns an enum (Running/NotRunning/Stale) to simplify the branching logic
  • 🔵 [INFO] provider-swift/Sources/darkbloom/Diagnostics/DoctorRunner.swift:16-23 — Daemon status checking duplicated across DoctorRunner and StatusCommand
    • Suggestion: Create a shared DaemonStatus utility that both DoctorRunner and StatusCommand can use

4 finding(s) total, 2 blocking. Verdict: REQUEST_CHANGES.

🤖 Automated review by Centaur · DAR-186

Comment on lines +64 to +90
public static func runningPID() -> Int32? {
let target = "gui/\(getuid())/\(label)"
let process = Process()
process.executableURL = URL(fileURLWithPath: "/bin/launchctl")
process.arguments = ["print", target]
let outPipe = Pipe()
process.standardOutput = outPipe
process.standardError = FileHandle.nullDevice

do {
try process.run()
} catch {
return nil
}
let data = outPipe.fileHandleForReading.readDataToEndOfFile()
process.waitUntilExit()
guard process.terminationStatus == 0,
let output = String(data: data, encoding: .utf8) else { return nil }

// launchctl print emits a line shaped `pid = 12345` for a running service.
for line in output.split(separator: "\n") {
let trimmed = line.trimmingCharacters(in: .whitespaces)
guard trimmed.hasPrefix("pid = ") else { continue }
return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces))
}
return nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [MEDIUM] ⚡ Synchronous process execution in runningPID() method without timeout or cancellation

💡 Suggestion: Add timeout to process execution or make this method async to avoid blocking callers indefinitely if launchctl hangs

📊 Score: 3×4 = 12 · Category: Blocking I/O on hot path

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 07880adrunningPID(timeout: 2.0) now waits on a termination-handler semaphore and terminates launchctl + returns nil on timeout. Kept it synchronous since both call sites (status, doctor) are one-shot CLI commands, not a hot path.

Comment on lines +64 to +90
public static func runningPID() -> Int32? {
let target = "gui/\(getuid())/\(label)"
let process = Process()
process.executableURL = URL(fileURLWithPath: "/bin/launchctl")
process.arguments = ["print", target]
let outPipe = Pipe()
process.standardOutput = outPipe
process.standardError = FileHandle.nullDevice

do {
try process.run()
} catch {
return nil
}
let data = outPipe.fileHandleForReading.readDataToEndOfFile()
process.waitUntilExit()
guard process.terminationStatus == 0,
let output = String(data: data, encoding: .utf8) else { return nil }

// launchctl print emits a line shaped `pid = 12345` for a running service.
for line in output.split(separator: "\n") {
let trimmed = line.trimmingCharacters(in: .whitespaces)
guard trimmed.hasPrefix("pid = ") else { continue }
return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces))
}
return nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [MEDIUM] 🧩 Custom launchctl output parsing reinvents process management

💡 Suggestion: Consider using Foundation's Process.processIdentifier or NSRunningApplication APIs instead of parsing launchctl output

📊 Score: 3×4 = 12 · Category: reinventing stdlib

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no Foundation API for this: Process.processIdentifier only works for processes we spawned, and NSRunningApplication covers GUI apps, not launchd daemons. launchctl print is the supported way to query a launchd service's PID (no public C API for launchctl print's info), and the parsing is a single stable pid = N line — same pattern as the existing isLoaded() which also shells out to launchctl.

Comment on lines 56 to 87
/// trust reason — the answer to "am I earning, and if not, why?".
private func printDaemonStatus() {
let now = Date().timeIntervalSince1970
guard let state = DaemonStateFile.read() else {
print("Daemon: not running (run `darkbloom start`)")
let state = DaemonStateFile.read()
let stateAlive = state.map { daemonProcessAlive(pid: $0.pid) } ?? false

// The state file alone can't answer "is the daemon running?": it can be
// left over from a previous session, or the daemon may be unable to
// write it at all (e.g. ~/.darkbloom permissions). launchd is the
// source of truth for the process it manages, so cross-check it before
// declaring the daemon down.
let livePID: Int32? = stateAlive ? state?.pid : LaunchAgent.runningPID()

guard let pid = livePID else {
if let state {
// Leftover state file from a previous daemon session (stop,
// crash, or reboot). Point at the fix, not the artifact.
print("Daemon: not running (last session ended ~\(formatUptime(state.ageSeconds(now: now))) ago — run `darkbloom start`)")
} else {
print("Daemon: not running (run `darkbloom start`)")
}
return
}
let alive = daemonProcessAlive(pid: state.pid)
if !alive {
print("Daemon: not running (stale state file)")

guard let state, state.pid == pid else {
// launchd says the daemon is up, but the state file is missing or
// belongs to an older process — live diagnostics are unavailable.
print("Daemon: running (pid \(pid), managed by launchd)")
print(" → no live diagnostics: the daemon's state file is missing or from an older session.")
print(" → if this persists for >1 min, check that ~/.darkbloom is writable, or run `darkbloom stop && darkbloom start`.")
return
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 [INFO] 🧩 Complex daemon status logic with multiple cross-checks

💡 Suggestion: Extract daemon status determination into a single function that returns an enum (Running/NotRunning/Stale) to simplify the branching logic

📊 Score: 2×3 = 6 · Category: over-abstraction

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — leaving as-is for this fix to keep the diff small: the three branches map 1:1 to the three user-visible outcomes (not running / running without diagnostics / running with diagnostics). Happy to extract a DaemonStatus enum in a follow-up if more call sites appear.

Comment on lines +16 to +23
// Cross-check launchd: the state file can be stale (or unwritable by
// the daemon) while the launchd-managed daemon is actually up.
let stateAlive = state.map { daemonProcessAlive(pid: $0.pid) } ?? false
let daemonUp = stateAlive || LaunchAgent.runningPID() != nil
// "Fresh" = the state file's own writer process is alive AND the
// snapshot isn't stale, so its live fields (trust level, current model,
// capacity) are trustworthy. A daemon that's up per launchd but isn't
// the state file's writer doesn't make the file's data fresh.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 [INFO] 🧩 Daemon status checking duplicated across DoctorRunner and StatusCommand

💡 Suggestion: Create a shared DaemonStatus utility that both DoctorRunner and StatusCommand can use

📊 Score: 2×3 = 6 · Category: duplicate logic

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: REQUEST_CHANGES

Security — ✅ No issues found

Performance — 1 finding(s) (1 blocking)

  • 🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-95 — Synchronous process execution with 2s timeout in runningPID() can block callers
    • Suggestion: Make this async or use a background queue, especially since it's called from status commands that should be responsive

Type_diligence — ✅ No issues found

Additive_complexity — 3 finding(s) (1 blocking)

  • 🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-95 — Custom launchctl process execution duplicates Foundation's Process pattern
    • Suggestion: Extract common process execution pattern to shared helper or use existing utility if available
  • 🔵 [INFO] provider-swift/Sources/darkbloom/StatusCommand.swift:59-87 — Daemon status checking logic duplicated across StatusCommand and DoctorCommand
    • Suggestion: Extract daemon status determination to shared helper function in DaemonStateFile or create DaemonStatus utility
  • 🔵 [INFO] provider-swift/Sources/darkbloom/DoctorCommand.swift:41-44 — Same daemon running check pattern as StatusCommand with slight variations
    • Suggestion: Use shared daemon status helper to avoid inconsistencies between commands

4 finding(s) total, 2 blocking. Verdict: REQUEST_CHANGES.

🤖 Automated review by Centaur · DAR-186

Comment on lines +64 to +95
public static func runningPID(timeout: TimeInterval = 2.0) -> Int32? {
let target = "gui/\(getuid())/\(label)"
let process = Process()
process.executableURL = URL(fileURLWithPath: "/bin/launchctl")
process.arguments = ["print", target]
let outPipe = Pipe()
process.standardOutput = outPipe
process.standardError = FileHandle.nullDevice

let exited = DispatchSemaphore(value: 0)
process.terminationHandler = { _ in exited.signal() }
do {
try process.run()
} catch {
return nil
}
if exited.wait(timeout: .now() + timeout) == .timedOut {
process.terminate()
return nil
}
let data = outPipe.fileHandleForReading.readDataToEndOfFile()
guard process.terminationStatus == 0,
let output = String(data: data, encoding: .utf8) else { return nil }

// launchctl print emits a line shaped `pid = 12345` for a running service.
for line in output.split(separator: "\n") {
let trimmed = line.trimmingCharacters(in: .whitespaces)
guard trimmed.hasPrefix("pid = ") else { continue }
return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces))
}
return nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [MEDIUM] ⚡ Synchronous process execution with 2s timeout in runningPID() can block callers

💡 Suggestion: Make this async or use a background queue, especially since it's called from status commands that should be responsive

📊 Score: 3×4 = 12 · Category: Blocking I/O on hot path

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping it synchronous intentionally: the only callers are one-shot CLI commands (status, doctor) where the user is waiting on the result anyway — there's no event loop or concurrent work to block. The 2s bound caps worst-case latency; making it async would add complexity without changing perceived responsiveness.

Comment on lines +64 to +95
public static func runningPID(timeout: TimeInterval = 2.0) -> Int32? {
let target = "gui/\(getuid())/\(label)"
let process = Process()
process.executableURL = URL(fileURLWithPath: "/bin/launchctl")
process.arguments = ["print", target]
let outPipe = Pipe()
process.standardOutput = outPipe
process.standardError = FileHandle.nullDevice

let exited = DispatchSemaphore(value: 0)
process.terminationHandler = { _ in exited.signal() }
do {
try process.run()
} catch {
return nil
}
if exited.wait(timeout: .now() + timeout) == .timedOut {
process.terminate()
return nil
}
let data = outPipe.fileHandleForReading.readDataToEndOfFile()
guard process.terminationStatus == 0,
let output = String(data: data, encoding: .utf8) else { return nil }

// launchctl print emits a line shaped `pid = 12345` for a running service.
for line in output.split(separator: "\n") {
let trimmed = line.trimmingCharacters(in: .whitespaces)
guard trimmed.hasPrefix("pid = ") else { continue }
return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces))
}
return nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [MEDIUM] 🧩 Custom launchctl process execution duplicates Foundation's Process pattern

💡 Suggestion: Extract common process execution pattern to shared helper or use existing utility if available

📊 Score: 3×4 = 12 · Category: reinventing stdlib

Comment on lines +59 to 87
let state = DaemonStateFile.read()
let stateAlive = state.map { daemonProcessAlive(pid: $0.pid) } ?? false

// The state file alone can't answer "is the daemon running?": it can be
// left over from a previous session, or the daemon may be unable to
// write it at all (e.g. ~/.darkbloom permissions). launchd is the
// source of truth for the process it manages, so cross-check it before
// declaring the daemon down.
let livePID: Int32? = stateAlive ? state?.pid : LaunchAgent.runningPID()

guard let pid = livePID else {
if let state {
// Leftover state file from a previous daemon session (stop,
// crash, or reboot). Point at the fix, not the artifact.
print("Daemon: not running (last session ended ~\(formatUptime(state.ageSeconds(now: now))) ago — run `darkbloom start`)")
} else {
print("Daemon: not running (run `darkbloom start`)")
}
return
}
let alive = daemonProcessAlive(pid: state.pid)
if !alive {
print("Daemon: not running (stale state file)")

guard let state, state.pid == pid else {
// launchd says the daemon is up, but the state file is missing or
// belongs to an older process — live diagnostics are unavailable.
print("Daemon: running (pid \(pid), managed by launchd)")
print(" → no live diagnostics: the daemon's state file is missing or from an older session.")
print(" → if this persists for >1 min, check that ~/.darkbloom is writable, or run `darkbloom stop && darkbloom start`.")
return
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 [INFO] 🧩 Daemon status checking logic duplicated across StatusCommand and DoctorCommand

💡 Suggestion: Extract daemon status determination to shared helper function in DaemonStateFile or create DaemonStatus utility

📊 Score: 2×3 = 6 · Category: duplicate logic

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted a shared daemonIsRunning(state:) helper in DaemonStateFile.swift (83ae1a6) — doctor and DoctorRunner now use it. status keeps its own flow since it needs the resolved PID and per-branch messaging, not just a boolean.

Comment on lines 41 to 44
let daemonState = DaemonStateFile.read()
let daemonRunning = daemonState.map { daemonProcessAlive(pid: $0.pid) } ?? false
|| LaunchAgent.runningPID() != nil
print("Daemon: \(daemonRunning ? "running" : "NOT running — run `darkbloom start`")")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 [INFO] 🧩 Same daemon running check pattern as StatusCommand with slight variations

💡 Suggestion: Use shared daemon status helper to avoid inconsistencies between commands

📊 Score: 2×3 = 6 · Category: duplicate logic

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 83ae1a6doctor and DoctorRunner now share daemonIsRunning(state:).

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: REQUEST_CHANGES

Security — ✅ No issues found

Performance — 1 finding(s) (1 blocking)

  • 🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-95 — Synchronous process execution with 2s timeout in runningPID() could block status command
    • Suggestion: Consider making this async or reducing timeout for better responsiveness in status checks

Type_diligence — ✅ No issues found

Additive_complexity — 3 finding(s) (2 blocking)

  • 🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-95 — Custom process execution and parsing instead of using existing utilities
    • Suggestion: Consider using existing Swift process utilities or a simpler approach like checking if the service is loaded rather than parsing launchctl output
  • 🔵 [INFO] provider-swift/Sources/ProviderCore/Service/DaemonStateFile.swift:162-167 — New function daemonIsRunning duplicates logic already present in status checking
    • Suggestion: Consider consolidating daemon status logic into a single location rather than adding another layer of abstraction
  • 🟡 [MEDIUM] provider-swift/Sources/darkbloom/StatusCommand.swift:57-87 — Complex nested conditionals and multiple state checks make daemon status logic hard to follow
    • Suggestion: Extract daemon status determination into a dedicated type with clear states (Running, Stopped, Stale, etc.) to simplify the branching logic

4 finding(s) total, 3 blocking. Verdict: REQUEST_CHANGES.

🤖 Automated review by Centaur · DAR-186

Comment on lines +64 to +95
public static func runningPID(timeout: TimeInterval = 2.0) -> Int32? {
let target = "gui/\(getuid())/\(label)"
let process = Process()
process.executableURL = URL(fileURLWithPath: "/bin/launchctl")
process.arguments = ["print", target]
let outPipe = Pipe()
process.standardOutput = outPipe
process.standardError = FileHandle.nullDevice

let exited = DispatchSemaphore(value: 0)
process.terminationHandler = { _ in exited.signal() }
do {
try process.run()
} catch {
return nil
}
if exited.wait(timeout: .now() + timeout) == .timedOut {
process.terminate()
return nil
}
let data = outPipe.fileHandleForReading.readDataToEndOfFile()
guard process.terminationStatus == 0,
let output = String(data: data, encoding: .utf8) else { return nil }

// launchctl print emits a line shaped `pid = 12345` for a running service.
for line in output.split(separator: "\n") {
let trimmed = line.trimmingCharacters(in: .whitespaces)
guard trimmed.hasPrefix("pid = ") else { continue }
return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces))
}
return nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [MEDIUM] ⚡ Synchronous process execution with 2s timeout in runningPID() could block status command

💡 Suggestion: Consider making this async or reducing timeout for better responsiveness in status checks

📊 Score: 3×4 = 12 · Category: Blocking I/O on hot path

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the prior thread: callers are one-shot CLI commands with nothing else to do while waiting, so sync with a 2s bound is the right tradeoff. In practice launchctl print returns in milliseconds; the timeout is only a safety net.

Comment on lines +64 to +95
public static func runningPID(timeout: TimeInterval = 2.0) -> Int32? {
let target = "gui/\(getuid())/\(label)"
let process = Process()
process.executableURL = URL(fileURLWithPath: "/bin/launchctl")
process.arguments = ["print", target]
let outPipe = Pipe()
process.standardOutput = outPipe
process.standardError = FileHandle.nullDevice

let exited = DispatchSemaphore(value: 0)
process.terminationHandler = { _ in exited.signal() }
do {
try process.run()
} catch {
return nil
}
if exited.wait(timeout: .now() + timeout) == .timedOut {
process.terminate()
return nil
}
let data = outPipe.fileHandleForReading.readDataToEndOfFile()
guard process.terminationStatus == 0,
let output = String(data: data, encoding: .utf8) else { return nil }

// launchctl print emits a line shaped `pid = 12345` for a running service.
for line in output.split(separator: "\n") {
let trimmed = line.trimmingCharacters(in: .whitespaces)
guard trimmed.hasPrefix("pid = ") else { continue }
return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces))
}
return nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [MEDIUM] 🧩 Custom process execution and parsing instead of using existing utilities

💡 Suggestion: Consider using existing Swift process utilities or a simpler approach like checking if the service is loaded rather than parsing launchctl output

📊 Score: 3×4 = 12 · Category: reinventing stdlib

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isLoaded() isn't sufficient — the bug being fixed is exactly the case where the service is loaded but we need the live PID to distinguish "running" from "stale state file". There's no public API for a launchd service's PID, so parsing launchctl print's stable pid = N line is the standard approach, and this mirrors the existing launchctl shelling in this same file.

Comment on lines +162 to 167
/// alive, or launchd reports a running provider service. The launchd
/// cross-check covers a daemon whose state file is stale or unwritable.
public func daemonIsRunning(state: DaemonState?) -> Bool {
if state.map({ daemonProcessAlive(pid: $0.pid) }) ?? false { return true }
return LaunchAgent.runningPID() != nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 [INFO] 🧩 New function daemonIsRunning duplicates logic already present in status checking

💡 Suggestion: Consider consolidating daemon status logic into a single location rather than adding another layer of abstraction

📊 Score: 2×3 = 6 · Category: over-abstraction

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper is the consolidation — it was extracted at review request (prior thread) so doctor/DoctorRunner share one definition instead of repeating daemonProcessAlive || runningPID() inline. status is the only other caller and needs the PID itself, not a boolean.

Comment on lines 57 to 87
private func printDaemonStatus() {
let now = Date().timeIntervalSince1970
guard let state = DaemonStateFile.read() else {
print("Daemon: not running (run `darkbloom start`)")
let state = DaemonStateFile.read()
let stateAlive = state.map { daemonProcessAlive(pid: $0.pid) } ?? false

// The state file alone can't answer "is the daemon running?": it can be
// left over from a previous session, or the daemon may be unable to
// write it at all (e.g. ~/.darkbloom permissions). launchd is the
// source of truth for the process it manages, so cross-check it before
// declaring the daemon down.
let livePID: Int32? = stateAlive ? state?.pid : LaunchAgent.runningPID()

guard let pid = livePID else {
if let state {
// Leftover state file from a previous daemon session (stop,
// crash, or reboot). Point at the fix, not the artifact.
print("Daemon: not running (last session ended ~\(formatUptime(state.ageSeconds(now: now))) ago — run `darkbloom start`)")
} else {
print("Daemon: not running (run `darkbloom start`)")
}
return
}
let alive = daemonProcessAlive(pid: state.pid)
if !alive {
print("Daemon: not running (stale state file)")

guard let state, state.pid == pid else {
// launchd says the daemon is up, but the state file is missing or
// belongs to an older process — live diagnostics are unavailable.
print("Daemon: running (pid \(pid), managed by launchd)")
print(" → no live diagnostics: the daemon's state file is missing or from an older session.")
print(" → if this persists for >1 min, check that ~/.darkbloom is writable, or run `darkbloom stop && darkbloom start`.")
return
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [MEDIUM] 🧩 Complex nested conditionals and multiple state checks make daemon status logic hard to follow

💡 Suggestion: Extract daemon status determination into a dedicated type with clear states (Running, Stopped, Stale, etc.) to simplify the branching logic

📊 Score: 3×4 = 12 · Category: additive complexity

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flow is two early-return guards mapping 1:1 to the three user-visible outcomes (not running / running without live diagnostics / running with diagnostics) — a DaemonStatus enum would just move the same branches behind a type while this remains the only consumer. Will extract one if a second consumer of the full status appears.

@devin-ai-integration devin-ai-integration Bot changed the title provider: fix confusing "stale state file" status after daemon stop/crash/reboot provider: fix "stale state file" status when daemon is stopped — or actually running Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant