provider: fix "stale state file" status when daemon is stopped — or actually running#301
provider: fix "stale state file" status when daemon is stopped — or actually running#301devin-ai-integration[bot] wants to merge 4 commits into
Conversation
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
ethenotethan
left a comment
There was a problem hiding this comment.
Automated Code Review — Layr-Labs/d-inference#
Verdict: COMMENT
Security — ✅ No issues found
Performance — ✅ No issues found
Type_diligence — ✅ No issues found
Additive_complexity — ✅ No issues found
✅ All four passes clean. No issues found.
🤖 Automated review by Centaur · DAR-186
|
These changes fall entirely outside the current threat model's Trust boundaries touchedNone of the canonical TB-001 through TB-009 boundaries are directly exercised by these files. However, the changes sit adjacent to TB-003 (provider operator vs. process) because they govern how the darkbloom process is installed, started, stopped, and inspected by the local operator. Threat coverageNo existing T-xxx threat directly covers LaunchAgent lifecycle or daemon state files. The changes are therefore ℹ️ neutral with respect to all current threats — they do not strengthen or weaken any documented mitigation. New attack surface not covered by the threat modelTwo areas warrant a future threat-model entry:
Recommended threat model updateAdd a new threat entry (suggested: T-043, TB-003, STRIDE: Tampering/Elevation-of-Privilege) covering:
No SEC-* open findings are resolved by this PR. 🔐 Threat model: |
…ith a stale/unwritable state file isn't reported as down
ethenotethan
left a comment
There was a problem hiding this comment.
Automated Code Review — Layr-Labs/d-inference#
Verdict: REQUEST_CHANGES
Security — ✅ No issues found
Performance — 1 finding(s) (1 blocking)
- 🟡 [MEDIUM]
provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-90— Synchronous process execution in runningPID() method without timeout or cancellation- Suggestion: Add timeout to process execution or make this method async to avoid blocking callers indefinitely if launchctl hangs
Type_diligence — ✅ No issues found
Additive_complexity — 3 finding(s) (1 blocking)
- 🟡 [MEDIUM]
provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-90— Custom launchctl output parsing reinvents process management- Suggestion: Consider using Foundation's Process.processIdentifier or NSRunningApplication APIs instead of parsing launchctl output
- 🔵 [INFO]
provider-swift/Sources/darkbloom/StatusCommand.swift:56-87— Complex daemon status logic with multiple cross-checks- Suggestion: Extract daemon status determination into a single function that returns an enum (Running/NotRunning/Stale) to simplify the branching logic
- 🔵 [INFO]
provider-swift/Sources/darkbloom/Diagnostics/DoctorRunner.swift:16-23— Daemon status checking duplicated across DoctorRunner and StatusCommand- Suggestion: Create a shared DaemonStatus utility that both DoctorRunner and StatusCommand can use
4 finding(s) total, 2 blocking. Verdict: REQUEST_CHANGES.
🤖 Automated review by Centaur · DAR-186
| public static func runningPID() -> Int32? { | ||
| let target = "gui/\(getuid())/\(label)" | ||
| let process = Process() | ||
| process.executableURL = URL(fileURLWithPath: "/bin/launchctl") | ||
| process.arguments = ["print", target] | ||
| let outPipe = Pipe() | ||
| process.standardOutput = outPipe | ||
| process.standardError = FileHandle.nullDevice | ||
|
|
||
| do { | ||
| try process.run() | ||
| } catch { | ||
| return nil | ||
| } | ||
| let data = outPipe.fileHandleForReading.readDataToEndOfFile() | ||
| process.waitUntilExit() | ||
| guard process.terminationStatus == 0, | ||
| let output = String(data: data, encoding: .utf8) else { return nil } | ||
|
|
||
| // launchctl print emits a line shaped `pid = 12345` for a running service. | ||
| for line in output.split(separator: "\n") { | ||
| let trimmed = line.trimmingCharacters(in: .whitespaces) | ||
| guard trimmed.hasPrefix("pid = ") else { continue } | ||
| return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces)) | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
🟡 [MEDIUM] ⚡ Synchronous process execution in runningPID() method without timeout or cancellation
💡 Suggestion: Add timeout to process execution or make this method async to avoid blocking callers indefinitely if launchctl hangs
📊 Score: 3×4 = 12 · Category: Blocking I/O on hot path
There was a problem hiding this comment.
Done in 07880ad — runningPID(timeout: 2.0) now waits on a termination-handler semaphore and terminates launchctl + returns nil on timeout. Kept it synchronous since both call sites (status, doctor) are one-shot CLI commands, not a hot path.
| public static func runningPID() -> Int32? { | ||
| let target = "gui/\(getuid())/\(label)" | ||
| let process = Process() | ||
| process.executableURL = URL(fileURLWithPath: "/bin/launchctl") | ||
| process.arguments = ["print", target] | ||
| let outPipe = Pipe() | ||
| process.standardOutput = outPipe | ||
| process.standardError = FileHandle.nullDevice | ||
|
|
||
| do { | ||
| try process.run() | ||
| } catch { | ||
| return nil | ||
| } | ||
| let data = outPipe.fileHandleForReading.readDataToEndOfFile() | ||
| process.waitUntilExit() | ||
| guard process.terminationStatus == 0, | ||
| let output = String(data: data, encoding: .utf8) else { return nil } | ||
|
|
||
| // launchctl print emits a line shaped `pid = 12345` for a running service. | ||
| for line in output.split(separator: "\n") { | ||
| let trimmed = line.trimmingCharacters(in: .whitespaces) | ||
| guard trimmed.hasPrefix("pid = ") else { continue } | ||
| return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces)) | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
🟡 [MEDIUM] 🧩 Custom launchctl output parsing reinvents process management
💡 Suggestion: Consider using Foundation's Process.processIdentifier or NSRunningApplication APIs instead of parsing launchctl output
📊 Score: 3×4 = 12 · Category: reinventing stdlib
There was a problem hiding this comment.
There's no Foundation API for this: Process.processIdentifier only works for processes we spawned, and NSRunningApplication covers GUI apps, not launchd daemons. launchctl print is the supported way to query a launchd service's PID (no public C API for launchctl print's info), and the parsing is a single stable pid = N line — same pattern as the existing isLoaded() which also shells out to launchctl.
| /// trust reason — the answer to "am I earning, and if not, why?". | ||
| private func printDaemonStatus() { | ||
| let now = Date().timeIntervalSince1970 | ||
| guard let state = DaemonStateFile.read() else { | ||
| print("Daemon: not running (run `darkbloom start`)") | ||
| let state = DaemonStateFile.read() | ||
| let stateAlive = state.map { daemonProcessAlive(pid: $0.pid) } ?? false | ||
|
|
||
| // The state file alone can't answer "is the daemon running?": it can be | ||
| // left over from a previous session, or the daemon may be unable to | ||
| // write it at all (e.g. ~/.darkbloom permissions). launchd is the | ||
| // source of truth for the process it manages, so cross-check it before | ||
| // declaring the daemon down. | ||
| let livePID: Int32? = stateAlive ? state?.pid : LaunchAgent.runningPID() | ||
|
|
||
| guard let pid = livePID else { | ||
| if let state { | ||
| // Leftover state file from a previous daemon session (stop, | ||
| // crash, or reboot). Point at the fix, not the artifact. | ||
| print("Daemon: not running (last session ended ~\(formatUptime(state.ageSeconds(now: now))) ago — run `darkbloom start`)") | ||
| } else { | ||
| print("Daemon: not running (run `darkbloom start`)") | ||
| } | ||
| return | ||
| } | ||
| let alive = daemonProcessAlive(pid: state.pid) | ||
| if !alive { | ||
| print("Daemon: not running (stale state file)") | ||
|
|
||
| guard let state, state.pid == pid else { | ||
| // launchd says the daemon is up, but the state file is missing or | ||
| // belongs to an older process — live diagnostics are unavailable. | ||
| print("Daemon: running (pid \(pid), managed by launchd)") | ||
| print(" → no live diagnostics: the daemon's state file is missing or from an older session.") | ||
| print(" → if this persists for >1 min, check that ~/.darkbloom is writable, or run `darkbloom stop && darkbloom start`.") | ||
| return | ||
| } |
There was a problem hiding this comment.
🔵 [INFO] 🧩 Complex daemon status logic with multiple cross-checks
💡 Suggestion: Extract daemon status determination into a single function that returns an enum (Running/NotRunning/Stale) to simplify the branching logic
📊 Score: 2×3 = 6 · Category: over-abstraction
There was a problem hiding this comment.
Acknowledged — leaving as-is for this fix to keep the diff small: the three branches map 1:1 to the three user-visible outcomes (not running / running without diagnostics / running with diagnostics). Happy to extract a DaemonStatus enum in a follow-up if more call sites appear.
| // Cross-check launchd: the state file can be stale (or unwritable by | ||
| // the daemon) while the launchd-managed daemon is actually up. | ||
| let stateAlive = state.map { daemonProcessAlive(pid: $0.pid) } ?? false | ||
| let daemonUp = stateAlive || LaunchAgent.runningPID() != nil | ||
| // "Fresh" = the state file's own writer process is alive AND the | ||
| // snapshot isn't stale, so its live fields (trust level, current model, | ||
| // capacity) are trustworthy. A daemon that's up per launchd but isn't | ||
| // the state file's writer doesn't make the file's data fresh. |
There was a problem hiding this comment.
🔵 [INFO] 🧩 Daemon status checking duplicated across DoctorRunner and StatusCommand
💡 Suggestion: Create a shared DaemonStatus utility that both DoctorRunner and StatusCommand can use
📊 Score: 2×3 = 6 · Category: duplicate logic
ethenotethan
left a comment
There was a problem hiding this comment.
Automated Code Review — Layr-Labs/d-inference#
Verdict: REQUEST_CHANGES
Security — ✅ No issues found
Performance — 1 finding(s) (1 blocking)
- 🟡 [MEDIUM]
provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-95— Synchronous process execution with 2s timeout in runningPID() can block callers- Suggestion: Make this async or use a background queue, especially since it's called from status commands that should be responsive
Type_diligence — ✅ No issues found
Additive_complexity — 3 finding(s) (1 blocking)
- 🟡 [MEDIUM]
provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-95— Custom launchctl process execution duplicates Foundation's Process pattern- Suggestion: Extract common process execution pattern to shared helper or use existing utility if available
- 🔵 [INFO]
provider-swift/Sources/darkbloom/StatusCommand.swift:59-87— Daemon status checking logic duplicated across StatusCommand and DoctorCommand- Suggestion: Extract daemon status determination to shared helper function in DaemonStateFile or create DaemonStatus utility
- 🔵 [INFO]
provider-swift/Sources/darkbloom/DoctorCommand.swift:41-44— Same daemon running check pattern as StatusCommand with slight variations- Suggestion: Use shared daemon status helper to avoid inconsistencies between commands
4 finding(s) total, 2 blocking. Verdict: REQUEST_CHANGES.
🤖 Automated review by Centaur · DAR-186
| public static func runningPID(timeout: TimeInterval = 2.0) -> Int32? { | ||
| let target = "gui/\(getuid())/\(label)" | ||
| let process = Process() | ||
| process.executableURL = URL(fileURLWithPath: "/bin/launchctl") | ||
| process.arguments = ["print", target] | ||
| let outPipe = Pipe() | ||
| process.standardOutput = outPipe | ||
| process.standardError = FileHandle.nullDevice | ||
|
|
||
| let exited = DispatchSemaphore(value: 0) | ||
| process.terminationHandler = { _ in exited.signal() } | ||
| do { | ||
| try process.run() | ||
| } catch { | ||
| return nil | ||
| } | ||
| if exited.wait(timeout: .now() + timeout) == .timedOut { | ||
| process.terminate() | ||
| return nil | ||
| } | ||
| let data = outPipe.fileHandleForReading.readDataToEndOfFile() | ||
| guard process.terminationStatus == 0, | ||
| let output = String(data: data, encoding: .utf8) else { return nil } | ||
|
|
||
| // launchctl print emits a line shaped `pid = 12345` for a running service. | ||
| for line in output.split(separator: "\n") { | ||
| let trimmed = line.trimmingCharacters(in: .whitespaces) | ||
| guard trimmed.hasPrefix("pid = ") else { continue } | ||
| return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces)) | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
🟡 [MEDIUM] ⚡ Synchronous process execution with 2s timeout in runningPID() can block callers
💡 Suggestion: Make this async or use a background queue, especially since it's called from status commands that should be responsive
📊 Score: 3×4 = 12 · Category: Blocking I/O on hot path
There was a problem hiding this comment.
Keeping it synchronous intentionally: the only callers are one-shot CLI commands (status, doctor) where the user is waiting on the result anyway — there's no event loop or concurrent work to block. The 2s bound caps worst-case latency; making it async would add complexity without changing perceived responsiveness.
| public static func runningPID(timeout: TimeInterval = 2.0) -> Int32? { | ||
| let target = "gui/\(getuid())/\(label)" | ||
| let process = Process() | ||
| process.executableURL = URL(fileURLWithPath: "/bin/launchctl") | ||
| process.arguments = ["print", target] | ||
| let outPipe = Pipe() | ||
| process.standardOutput = outPipe | ||
| process.standardError = FileHandle.nullDevice | ||
|
|
||
| let exited = DispatchSemaphore(value: 0) | ||
| process.terminationHandler = { _ in exited.signal() } | ||
| do { | ||
| try process.run() | ||
| } catch { | ||
| return nil | ||
| } | ||
| if exited.wait(timeout: .now() + timeout) == .timedOut { | ||
| process.terminate() | ||
| return nil | ||
| } | ||
| let data = outPipe.fileHandleForReading.readDataToEndOfFile() | ||
| guard process.terminationStatus == 0, | ||
| let output = String(data: data, encoding: .utf8) else { return nil } | ||
|
|
||
| // launchctl print emits a line shaped `pid = 12345` for a running service. | ||
| for line in output.split(separator: "\n") { | ||
| let trimmed = line.trimmingCharacters(in: .whitespaces) | ||
| guard trimmed.hasPrefix("pid = ") else { continue } | ||
| return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces)) | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
🟡 [MEDIUM] 🧩 Custom launchctl process execution duplicates Foundation's Process pattern
💡 Suggestion: Extract common process execution pattern to shared helper or use existing utility if available
📊 Score: 3×4 = 12 · Category: reinventing stdlib
| let state = DaemonStateFile.read() | ||
| let stateAlive = state.map { daemonProcessAlive(pid: $0.pid) } ?? false | ||
|
|
||
| // The state file alone can't answer "is the daemon running?": it can be | ||
| // left over from a previous session, or the daemon may be unable to | ||
| // write it at all (e.g. ~/.darkbloom permissions). launchd is the | ||
| // source of truth for the process it manages, so cross-check it before | ||
| // declaring the daemon down. | ||
| let livePID: Int32? = stateAlive ? state?.pid : LaunchAgent.runningPID() | ||
|
|
||
| guard let pid = livePID else { | ||
| if let state { | ||
| // Leftover state file from a previous daemon session (stop, | ||
| // crash, or reboot). Point at the fix, not the artifact. | ||
| print("Daemon: not running (last session ended ~\(formatUptime(state.ageSeconds(now: now))) ago — run `darkbloom start`)") | ||
| } else { | ||
| print("Daemon: not running (run `darkbloom start`)") | ||
| } | ||
| return | ||
| } | ||
| let alive = daemonProcessAlive(pid: state.pid) | ||
| if !alive { | ||
| print("Daemon: not running (stale state file)") | ||
|
|
||
| guard let state, state.pid == pid else { | ||
| // launchd says the daemon is up, but the state file is missing or | ||
| // belongs to an older process — live diagnostics are unavailable. | ||
| print("Daemon: running (pid \(pid), managed by launchd)") | ||
| print(" → no live diagnostics: the daemon's state file is missing or from an older session.") | ||
| print(" → if this persists for >1 min, check that ~/.darkbloom is writable, or run `darkbloom stop && darkbloom start`.") | ||
| return | ||
| } |
There was a problem hiding this comment.
🔵 [INFO] 🧩 Daemon status checking logic duplicated across StatusCommand and DoctorCommand
💡 Suggestion: Extract daemon status determination to shared helper function in DaemonStateFile or create DaemonStatus utility
📊 Score: 2×3 = 6 · Category: duplicate logic
There was a problem hiding this comment.
Extracted a shared daemonIsRunning(state:) helper in DaemonStateFile.swift (83ae1a6) — doctor and DoctorRunner now use it. status keeps its own flow since it needs the resolved PID and per-branch messaging, not just a boolean.
| let daemonState = DaemonStateFile.read() | ||
| let daemonRunning = daemonState.map { daemonProcessAlive(pid: $0.pid) } ?? false | ||
| || LaunchAgent.runningPID() != nil | ||
| print("Daemon: \(daemonRunning ? "running" : "NOT running — run `darkbloom start`")") |
There was a problem hiding this comment.
🔵 [INFO] 🧩 Same daemon running check pattern as StatusCommand with slight variations
💡 Suggestion: Use shared daemon status helper to avoid inconsistencies between commands
📊 Score: 2×3 = 6 · Category: duplicate logic
There was a problem hiding this comment.
Done in 83ae1a6 — doctor and DoctorRunner now share daemonIsRunning(state:).
ethenotethan
left a comment
There was a problem hiding this comment.
Automated Code Review — Layr-Labs/d-inference#
Verdict: REQUEST_CHANGES
Security — ✅ No issues found
Performance — 1 finding(s) (1 blocking)
- 🟡 [MEDIUM]
provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-95— Synchronous process execution with 2s timeout in runningPID() could block status command- Suggestion: Consider making this async or reducing timeout for better responsiveness in status checks
Type_diligence — ✅ No issues found
Additive_complexity — 3 finding(s) (2 blocking)
- 🟡 [MEDIUM]
provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-95— Custom process execution and parsing instead of using existing utilities- Suggestion: Consider using existing Swift process utilities or a simpler approach like checking if the service is loaded rather than parsing launchctl output
- 🔵 [INFO]
provider-swift/Sources/ProviderCore/Service/DaemonStateFile.swift:162-167— New function daemonIsRunning duplicates logic already present in status checking- Suggestion: Consider consolidating daemon status logic into a single location rather than adding another layer of abstraction
- 🟡 [MEDIUM]
provider-swift/Sources/darkbloom/StatusCommand.swift:57-87— Complex nested conditionals and multiple state checks make daemon status logic hard to follow- Suggestion: Extract daemon status determination into a dedicated type with clear states (Running, Stopped, Stale, etc.) to simplify the branching logic
4 finding(s) total, 3 blocking. Verdict: REQUEST_CHANGES.
🤖 Automated review by Centaur · DAR-186
| public static func runningPID(timeout: TimeInterval = 2.0) -> Int32? { | ||
| let target = "gui/\(getuid())/\(label)" | ||
| let process = Process() | ||
| process.executableURL = URL(fileURLWithPath: "/bin/launchctl") | ||
| process.arguments = ["print", target] | ||
| let outPipe = Pipe() | ||
| process.standardOutput = outPipe | ||
| process.standardError = FileHandle.nullDevice | ||
|
|
||
| let exited = DispatchSemaphore(value: 0) | ||
| process.terminationHandler = { _ in exited.signal() } | ||
| do { | ||
| try process.run() | ||
| } catch { | ||
| return nil | ||
| } | ||
| if exited.wait(timeout: .now() + timeout) == .timedOut { | ||
| process.terminate() | ||
| return nil | ||
| } | ||
| let data = outPipe.fileHandleForReading.readDataToEndOfFile() | ||
| guard process.terminationStatus == 0, | ||
| let output = String(data: data, encoding: .utf8) else { return nil } | ||
|
|
||
| // launchctl print emits a line shaped `pid = 12345` for a running service. | ||
| for line in output.split(separator: "\n") { | ||
| let trimmed = line.trimmingCharacters(in: .whitespaces) | ||
| guard trimmed.hasPrefix("pid = ") else { continue } | ||
| return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces)) | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
🟡 [MEDIUM] ⚡ Synchronous process execution with 2s timeout in runningPID() could block status command
💡 Suggestion: Consider making this async or reducing timeout for better responsiveness in status checks
📊 Score: 3×4 = 12 · Category: Blocking I/O on hot path
There was a problem hiding this comment.
Same as the prior thread: callers are one-shot CLI commands with nothing else to do while waiting, so sync with a 2s bound is the right tradeoff. In practice launchctl print returns in milliseconds; the timeout is only a safety net.
| public static func runningPID(timeout: TimeInterval = 2.0) -> Int32? { | ||
| let target = "gui/\(getuid())/\(label)" | ||
| let process = Process() | ||
| process.executableURL = URL(fileURLWithPath: "/bin/launchctl") | ||
| process.arguments = ["print", target] | ||
| let outPipe = Pipe() | ||
| process.standardOutput = outPipe | ||
| process.standardError = FileHandle.nullDevice | ||
|
|
||
| let exited = DispatchSemaphore(value: 0) | ||
| process.terminationHandler = { _ in exited.signal() } | ||
| do { | ||
| try process.run() | ||
| } catch { | ||
| return nil | ||
| } | ||
| if exited.wait(timeout: .now() + timeout) == .timedOut { | ||
| process.terminate() | ||
| return nil | ||
| } | ||
| let data = outPipe.fileHandleForReading.readDataToEndOfFile() | ||
| guard process.terminationStatus == 0, | ||
| let output = String(data: data, encoding: .utf8) else { return nil } | ||
|
|
||
| // launchctl print emits a line shaped `pid = 12345` for a running service. | ||
| for line in output.split(separator: "\n") { | ||
| let trimmed = line.trimmingCharacters(in: .whitespaces) | ||
| guard trimmed.hasPrefix("pid = ") else { continue } | ||
| return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces)) | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
🟡 [MEDIUM] 🧩 Custom process execution and parsing instead of using existing utilities
💡 Suggestion: Consider using existing Swift process utilities or a simpler approach like checking if the service is loaded rather than parsing launchctl output
📊 Score: 3×4 = 12 · Category: reinventing stdlib
There was a problem hiding this comment.
isLoaded() isn't sufficient — the bug being fixed is exactly the case where the service is loaded but we need the live PID to distinguish "running" from "stale state file". There's no public API for a launchd service's PID, so parsing launchctl print's stable pid = N line is the standard approach, and this mirrors the existing launchctl shelling in this same file.
| /// alive, or launchd reports a running provider service. The launchd | ||
| /// cross-check covers a daemon whose state file is stale or unwritable. | ||
| public func daemonIsRunning(state: DaemonState?) -> Bool { | ||
| if state.map({ daemonProcessAlive(pid: $0.pid) }) ?? false { return true } | ||
| return LaunchAgent.runningPID() != nil | ||
| } |
There was a problem hiding this comment.
🔵 [INFO] 🧩 New function daemonIsRunning duplicates logic already present in status checking
💡 Suggestion: Consider consolidating daemon status logic into a single location rather than adding another layer of abstraction
📊 Score: 2×3 = 6 · Category: over-abstraction
There was a problem hiding this comment.
This helper is the consolidation — it was extracted at review request (prior thread) so doctor/DoctorRunner share one definition instead of repeating daemonProcessAlive || runningPID() inline. status is the only other caller and needs the PID itself, not a boolean.
| private func printDaemonStatus() { | ||
| let now = Date().timeIntervalSince1970 | ||
| guard let state = DaemonStateFile.read() else { | ||
| print("Daemon: not running (run `darkbloom start`)") | ||
| let state = DaemonStateFile.read() | ||
| let stateAlive = state.map { daemonProcessAlive(pid: $0.pid) } ?? false | ||
|
|
||
| // The state file alone can't answer "is the daemon running?": it can be | ||
| // left over from a previous session, or the daemon may be unable to | ||
| // write it at all (e.g. ~/.darkbloom permissions). launchd is the | ||
| // source of truth for the process it manages, so cross-check it before | ||
| // declaring the daemon down. | ||
| let livePID: Int32? = stateAlive ? state?.pid : LaunchAgent.runningPID() | ||
|
|
||
| guard let pid = livePID else { | ||
| if let state { | ||
| // Leftover state file from a previous daemon session (stop, | ||
| // crash, or reboot). Point at the fix, not the artifact. | ||
| print("Daemon: not running (last session ended ~\(formatUptime(state.ageSeconds(now: now))) ago — run `darkbloom start`)") | ||
| } else { | ||
| print("Daemon: not running (run `darkbloom start`)") | ||
| } | ||
| return | ||
| } | ||
| let alive = daemonProcessAlive(pid: state.pid) | ||
| if !alive { | ||
| print("Daemon: not running (stale state file)") | ||
|
|
||
| guard let state, state.pid == pid else { | ||
| // launchd says the daemon is up, but the state file is missing or | ||
| // belongs to an older process — live diagnostics are unavailable. | ||
| print("Daemon: running (pid \(pid), managed by launchd)") | ||
| print(" → no live diagnostics: the daemon's state file is missing or from an older session.") | ||
| print(" → if this persists for >1 min, check that ~/.darkbloom is writable, or run `darkbloom stop && darkbloom start`.") | ||
| return | ||
| } |
There was a problem hiding this comment.
🟡 [MEDIUM] 🧩 Complex nested conditionals and multiple state checks make daemon status logic hard to follow
💡 Suggestion: Extract daemon status determination into a dedicated type with clear states (Running, Stopped, Stale, etc.) to simplify the branching logic
📊 Score: 3×4 = 12 · Category: additive complexity
There was a problem hiding this comment.
The flow is two early-return guards mapping 1:1 to the three user-visible outcomes (not running / running without live diagnostics / running with diagnostics) — a DaemonStatus enum would just move the same branches behind a type while this remains the only consumer. Will extract one if a second consumer of the full status appears.
Summary
A provider reported
darkbloom statusshowingDaemon: not running (stale state file). Root cause: nothing ever deletes~/.darkbloom/daemon-state.json. The daemon writes it every heartbeat, butdarkbloom stop(launchctl bootout → SIGTERM, which skips the daemon's defers) leaves it behind, as does any crash or reboot. From then onstatusfinds a state file with a dead PID and prints the alarming "stale state file" message forever, even though nothing is broken — the daemon just isn't running.Fix:
DaemonStateFile.remove()— new best-effort cleanup helper.darkbloom stopremoves the state file after unloading the launchd service (covers the SIGTERM path where the daemon can't clean up itself).statusnow prints an actionable message when the state file's PID is dead:Daemon: not running (last session ended ~Xm ago — run \darkbloom start`)` instead of "stale state file".Linked issue
Closes #
Test plan
daemonStateRemoveDeletesFileAndIsIdempotentunit test inDiagnosticsTests.swiftswift test(Swift provider builds only on macOS; not buildable on this Linux box)Components touched
Protocol / interface changes
Notes for reviewers
The state file remains intentionally NOT removed on crash/SIGKILL (we can't), so the dead-PID branch in
statusstays — it now just gives the operator the fix (darkbloom start) plus how long ago the last session ended, instead of pointing at the artifact.Link to Devin session: https://app.devin.ai/sessions/9becf5d25be04b6d88c2a30466141566
Need help on this PR? Tag
/codesmithwith what you need. Autofix is disabled.