provider: fix "stale state file" status when daemon is stopped — or actually running by devin-ai-integration[bot] · Pull Request #301 · Layr-Labs/d-inference

devin-ai-integration · 2026-06-10T03:02:50Z

Summary

A provider reported darkbloom status showing Daemon: not running (stale state file). Root cause: nothing ever deletes ~/.darkbloom/daemon-state.json. The daemon writes it every heartbeat, but darkbloom stop (launchctl bootout → SIGTERM, which skips the daemon's defers) leaves it behind, as does any crash or reboot. From then on status finds a state file with a dead PID and prints the alarming "stale state file" message forever, even though nothing is broken — the daemon just isn't running.

Fix:

DaemonStateFile.remove() — new best-effort cleanup helper.
darkbloom stop removes the state file after unloading the launchd service (covers the SIGTERM path where the daemon can't clean up itself).
The foreground daemon removes it on graceful exit (defer alongside the PID-lock release).
status now prints an actionable message when the state file's PID is dead: Daemon: not running (last session ended ~Xm ago — run \darkbloom start`)` instead of "stale state file".

Linked issue

Closes #

Test plan

Added daemonStateRemoveDeletesFileAndIsIdempotent unit test in DiagnosticsTests.swift
CI swift test (Swift provider builds only on macOS; not buildable on this Linux box)

Components touched

Protocol / interface changes

No protocol/interface changes

Notes for reviewers

The state file remains intentionally NOT removed on crash/SIGKILL (we can't), so the dead-PID branch in status stays — it now just gives the operator the fix (darkbloom start) plus how long ago the last session ended, instead of pointing at the artifact.

Link to Devin session: https://app.devin.ai/sessions/9becf5d25be04b6d88c2a30466141566

^{Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.}

…message

devin-ai-integration · 2026-06-10T03:02:52Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment, CI, and merge conflict monitoring

vercel · 2026-06-10T03:02:55Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
d-inference	Ready	Preview	Jun 10, 2026 3:23am
d-inference-console-ui-dev	Ready	Preview	Jun 10, 2026 3:23am
d-inference-landing	Ready	Preview	Jun 10, 2026 3:23am

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

github-actions · 2026-06-10T03:03:31Z

These changes fall entirely outside the current threat model's affected_files mappings and introduce no direct security regressions, but the LaunchAgent / daemon-state surface deserves a threat-model annotation.

Trust boundaries touched

None of the canonical TB-001 through TB-009 boundaries are directly exercised by these files. However, the changes sit adjacent to TB-003 (provider operator vs. process) because they govern how the darkbloom process is installed, started, stopped, and inspected by the local operator.

Threat coverage

No existing T-xxx threat directly covers LaunchAgent lifecycle or daemon state files. The changes are therefore ℹ️ neutral with respect to all current threats — they do not strengthen or weaken any documented mitigation.

New attack surface not covered by the threat model

Two areas warrant a future threat-model entry:

LaunchAgent.swift — plist installation path and permissions
If the LaunchAgent plist is written to a user-writable location (e.g. ~/Library/LaunchAgents/) with weak file permissions, a local attacker (or a compromised process running as the same user) could modify the ProgramArguments array to substitute an arbitrary binary that will be launched under the same credentials. This is a privilege-persistence / binary-substitution vector that is adjacent to T-013 but not covered by it. The threat model should add an entry covering LaunchAgent.swift and StartCommand.swift in the context of TB-003.
DaemonStateFile.swift — state file integrity
If the daemon state file (recording whether the provider is running, its PID, or trust state) is world-readable or world-writable, an operator or co-process could forge state, causing StatusCommand and StopCommand to act on stale or attacker-supplied data (e.g. a crafted PID causes a SIGTERM to be sent to an arbitrary process). Integrity of this file is not enforced by any current mitigation.
DoctorCommand.swift / DoctorRunner.swift — diagnostic output disclosure
Doctor/diagnostic runners commonly collect and print system state (SIP status, binary path, entitlements, keychain query results). If any of this output is written to a world-readable file or sent over an unauthenticated channel, it could aid an adversary in fingerprinting the security posture of a target provider. This is low severity but should be confirmed not to log sensitive fields (SE public key, keychain access-group identifiers, binary hash) to a path readable outside the process.

Recommended threat model update

Add a new threat entry (suggested: T-043, TB-003, STRIDE: Tampering/Elevation-of-Privilege) covering:

provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift
provider-swift/Sources/ProviderCore/Service/DaemonStateFile.swift
provider-swift/Sources/darkbloom/StartCommand.swift
provider-swift/Sources/darkbloom/StopCommand.swift
provider-swift/Sources/darkbloom/Diagnostics/DoctorRunner.swift

No SEC-* open findings are resolved by this PR.

🔐 Threat model: docs/threat-model.yaml · Updates on each push to this PR

…ith a stale/unwritable state file isn't reported as down

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: REQUEST_CHANGES

Security — ✅ No issues found

Performance — 1 finding(s) (1 blocking)

🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-90 — Synchronous process execution in runningPID() method without timeout or cancellation
- Suggestion: Add timeout to process execution or make this method async to avoid blocking callers indefinitely if launchctl hangs

Type_diligence — ✅ No issues found

Additive_complexity — 3 finding(s) (1 blocking)

🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-90 — Custom launchctl output parsing reinvents process management
- Suggestion: Consider using Foundation's Process.processIdentifier or NSRunningApplication APIs instead of parsing launchctl output
🔵 [INFO] provider-swift/Sources/darkbloom/StatusCommand.swift:56-87 — Complex daemon status logic with multiple cross-checks
- Suggestion: Extract daemon status determination into a single function that returns an enum (Running/NotRunning/Stale) to simplify the branching logic
🔵 [INFO] provider-swift/Sources/darkbloom/Diagnostics/DoctorRunner.swift:16-23 — Daemon status checking duplicated across DoctorRunner and StatusCommand
- Suggestion: Create a shared DaemonStatus utility that both DoctorRunner and StatusCommand can use

4 finding(s) total, 2 blocking. Verdict: REQUEST_CHANGES.

🤖 Automated review by Centaur · DAR-186

ethenotethan · 2026-06-10T03:20:34Z

+    public static func runningPID() -> Int32? {
+        let target = "gui/\(getuid())/\(label)"
+        let process = Process()
+        process.executableURL = URL(fileURLWithPath: "/bin/launchctl")
+        process.arguments = ["print", target]
+        let outPipe = Pipe()
+        process.standardOutput = outPipe
+        process.standardError = FileHandle.nullDevice
+
+        do {
+            try process.run()
+        } catch {
+            return nil
+        }
+        let data = outPipe.fileHandleForReading.readDataToEndOfFile()
+        process.waitUntilExit()
+        guard process.terminationStatus == 0,
+              let output = String(data: data, encoding: .utf8) else { return nil }
+
+        // launchctl print emits a line shaped `pid = 12345` for a running service.
+        for line in output.split(separator: "\n") {
+            let trimmed = line.trimmingCharacters(in: .whitespaces)
+            guard trimmed.hasPrefix("pid = ") else { continue }
+            return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces))
+        }
+        return nil
+    }


🟡 [MEDIUM] ⚡ Synchronous process execution in runningPID() method without timeout or cancellation

💡 Suggestion: Add timeout to process execution or make this method async to avoid blocking callers indefinitely if launchctl hangs

📊 Score: 3×4 = 12 · Category: Blocking I/O on hot path

Done in 07880ad — runningPID(timeout: 2.0) now waits on a termination-handler semaphore and terminates launchctl + returns nil on timeout. Kept it synchronous since both call sites (status, doctor) are one-shot CLI commands, not a hot path.

ethenotethan · 2026-06-10T03:20:34Z

+    public static func runningPID() -> Int32? {
+        let target = "gui/\(getuid())/\(label)"
+        let process = Process()
+        process.executableURL = URL(fileURLWithPath: "/bin/launchctl")
+        process.arguments = ["print", target]
+        let outPipe = Pipe()
+        process.standardOutput = outPipe
+        process.standardError = FileHandle.nullDevice
+
+        do {
+            try process.run()
+        } catch {
+            return nil
+        }
+        let data = outPipe.fileHandleForReading.readDataToEndOfFile()
+        process.waitUntilExit()
+        guard process.terminationStatus == 0,
+              let output = String(data: data, encoding: .utf8) else { return nil }
+
+        // launchctl print emits a line shaped `pid = 12345` for a running service.
+        for line in output.split(separator: "\n") {
+            let trimmed = line.trimmingCharacters(in: .whitespaces)
+            guard trimmed.hasPrefix("pid = ") else { continue }
+            return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces))
+        }
+        return nil
+    }


🟡 [MEDIUM] 🧩 Custom launchctl output parsing reinvents process management

💡 Suggestion: Consider using Foundation's Process.processIdentifier or NSRunningApplication APIs instead of parsing launchctl output

📊 Score: 3×4 = 12 · Category: reinventing stdlib

There's no Foundation API for this: Process.processIdentifier only works for processes we spawned, and NSRunningApplication covers GUI apps, not launchd daemons. launchctl print is the supported way to query a launchd service's PID (no public C API for launchctl print's info), and the parsing is a single stable pid = N line — same pattern as the existing isLoaded() which also shells out to launchctl.

ethenotethan · 2026-06-10T03:20:34Z

    /// trust reason — the answer to "am I earning, and if not, why?".
    private func printDaemonStatus() {
        let now = Date().timeIntervalSince1970
-        guard let state = DaemonStateFile.read() else {
-            print("Daemon: not running (run `darkbloom start`)")
+        let state = DaemonStateFile.read()
+        let stateAlive = state.map { daemonProcessAlive(pid: $0.pid) } ?? false
+
+        // The state file alone can't answer "is the daemon running?": it can be
+        // left over from a previous session, or the daemon may be unable to
+        // write it at all (e.g. ~/.darkbloom permissions). launchd is the
+        // source of truth for the process it manages, so cross-check it before
+        // declaring the daemon down.
+        let livePID: Int32? = stateAlive ? state?.pid : LaunchAgent.runningPID()
+
+        guard let pid = livePID else {
+            if let state {
+                // Leftover state file from a previous daemon session (stop,
+                // crash, or reboot). Point at the fix, not the artifact.
+                print("Daemon: not running (last session ended ~\(formatUptime(state.ageSeconds(now: now))) ago — run `darkbloom start`)")
+            } else {
+                print("Daemon: not running (run `darkbloom start`)")
+            }
            return
        }
-        let alive = daemonProcessAlive(pid: state.pid)
-        if !alive {
-            print("Daemon: not running (stale state file)")
+
+        guard let state, state.pid == pid else {
+            // launchd says the daemon is up, but the state file is missing or
+            // belongs to an older process — live diagnostics are unavailable.
+            print("Daemon: running (pid \(pid), managed by launchd)")
+            print("  → no live diagnostics: the daemon's state file is missing or from an older session.")
+            print("  → if this persists for >1 min, check that ~/.darkbloom is writable, or run `darkbloom stop && darkbloom start`.")
            return
        }


🔵 [INFO] 🧩 Complex daemon status logic with multiple cross-checks

💡 Suggestion: Extract daemon status determination into a single function that returns an enum (Running/NotRunning/Stale) to simplify the branching logic

📊 Score: 2×3 = 6 · Category: over-abstraction

Acknowledged — leaving as-is for this fix to keep the diff small: the three branches map 1:1 to the three user-visible outcomes (not running / running without diagnostics / running with diagnostics). Happy to extract a DaemonStatus enum in a follow-up if more call sites appear.

ethenotethan · 2026-06-10T03:20:34Z

+        // Cross-check launchd: the state file can be stale (or unwritable by
+        // the daemon) while the launchd-managed daemon is actually up.
+        let stateAlive = state.map { daemonProcessAlive(pid: $0.pid) } ?? false
+        let daemonUp = stateAlive || LaunchAgent.runningPID() != nil
+        // "Fresh" = the state file's own writer process is alive AND the
+        // snapshot isn't stale, so its live fields (trust level, current model,
+        // capacity) are trustworthy. A daemon that's up per launchd but isn't
+        // the state file's writer doesn't make the file's data fresh.


🔵 [INFO] 🧩 Daemon status checking duplicated across DoctorRunner and StatusCommand

💡 Suggestion: Create a shared DaemonStatus utility that both DoctorRunner and StatusCommand can use

📊 Score: 2×3 = 6 · Category: duplicate logic

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: REQUEST_CHANGES

Security — ✅ No issues found

Performance — 1 finding(s) (1 blocking)

🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-95 — Synchronous process execution with 2s timeout in runningPID() can block callers
- Suggestion: Make this async or use a background queue, especially since it's called from status commands that should be responsive

Type_diligence — ✅ No issues found

Additive_complexity — 3 finding(s) (1 blocking)

🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-95 — Custom launchctl process execution duplicates Foundation's Process pattern
- Suggestion: Extract common process execution pattern to shared helper or use existing utility if available
🔵 [INFO] provider-swift/Sources/darkbloom/StatusCommand.swift:59-87 — Daemon status checking logic duplicated across StatusCommand and DoctorCommand
- Suggestion: Extract daemon status determination to shared helper function in DaemonStateFile or create DaemonStatus utility
🔵 [INFO] provider-swift/Sources/darkbloom/DoctorCommand.swift:41-44 — Same daemon running check pattern as StatusCommand with slight variations
- Suggestion: Use shared daemon status helper to avoid inconsistencies between commands

4 finding(s) total, 2 blocking. Verdict: REQUEST_CHANGES.

🤖 Automated review by Centaur · DAR-186

ethenotethan · 2026-06-10T03:22:15Z

+    public static func runningPID(timeout: TimeInterval = 2.0) -> Int32? {
+        let target = "gui/\(getuid())/\(label)"
+        let process = Process()
+        process.executableURL = URL(fileURLWithPath: "/bin/launchctl")
+        process.arguments = ["print", target]
+        let outPipe = Pipe()
+        process.standardOutput = outPipe
+        process.standardError = FileHandle.nullDevice
+
+        let exited = DispatchSemaphore(value: 0)
+        process.terminationHandler = { _ in exited.signal() }
+        do {
+            try process.run()
+        } catch {
+            return nil
+        }
+        if exited.wait(timeout: .now() + timeout) == .timedOut {
+            process.terminate()
+            return nil
+        }
+        let data = outPipe.fileHandleForReading.readDataToEndOfFile()
+        guard process.terminationStatus == 0,
+              let output = String(data: data, encoding: .utf8) else { return nil }
+
+        // launchctl print emits a line shaped `pid = 12345` for a running service.
+        for line in output.split(separator: "\n") {
+            let trimmed = line.trimmingCharacters(in: .whitespaces)
+            guard trimmed.hasPrefix("pid = ") else { continue }
+            return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces))
+        }
+        return nil
+    }


🟡 [MEDIUM] ⚡ Synchronous process execution with 2s timeout in runningPID() can block callers

💡 Suggestion: Make this async or use a background queue, especially since it's called from status commands that should be responsive

📊 Score: 3×4 = 12 · Category: Blocking I/O on hot path

Keeping it synchronous intentionally: the only callers are one-shot CLI commands (status, doctor) where the user is waiting on the result anyway — there's no event loop or concurrent work to block. The 2s bound caps worst-case latency; making it async would add complexity without changing perceived responsiveness.

ethenotethan · 2026-06-10T03:22:15Z

+    public static func runningPID(timeout: TimeInterval = 2.0) -> Int32? {
+        let target = "gui/\(getuid())/\(label)"
+        let process = Process()
+        process.executableURL = URL(fileURLWithPath: "/bin/launchctl")
+        process.arguments = ["print", target]
+        let outPipe = Pipe()
+        process.standardOutput = outPipe
+        process.standardError = FileHandle.nullDevice
+
+        let exited = DispatchSemaphore(value: 0)
+        process.terminationHandler = { _ in exited.signal() }
+        do {
+            try process.run()
+        } catch {
+            return nil
+        }
+        if exited.wait(timeout: .now() + timeout) == .timedOut {
+            process.terminate()
+            return nil
+        }
+        let data = outPipe.fileHandleForReading.readDataToEndOfFile()
+        guard process.terminationStatus == 0,
+              let output = String(data: data, encoding: .utf8) else { return nil }
+
+        // launchctl print emits a line shaped `pid = 12345` for a running service.
+        for line in output.split(separator: "\n") {
+            let trimmed = line.trimmingCharacters(in: .whitespaces)
+            guard trimmed.hasPrefix("pid = ") else { continue }
+            return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces))
+        }
+        return nil
+    }


🟡 [MEDIUM] 🧩 Custom launchctl process execution duplicates Foundation's Process pattern

💡 Suggestion: Extract common process execution pattern to shared helper or use existing utility if available

📊 Score: 3×4 = 12 · Category: reinventing stdlib

ethenotethan · 2026-06-10T03:22:15Z

+        let state = DaemonStateFile.read()
+        let stateAlive = state.map { daemonProcessAlive(pid: $0.pid) } ?? false
+
+        // The state file alone can't answer "is the daemon running?": it can be
+        // left over from a previous session, or the daemon may be unable to
+        // write it at all (e.g. ~/.darkbloom permissions). launchd is the
+        // source of truth for the process it manages, so cross-check it before
+        // declaring the daemon down.
+        let livePID: Int32? = stateAlive ? state?.pid : LaunchAgent.runningPID()
+
+        guard let pid = livePID else {
+            if let state {
+                // Leftover state file from a previous daemon session (stop,
+                // crash, or reboot). Point at the fix, not the artifact.
+                print("Daemon: not running (last session ended ~\(formatUptime(state.ageSeconds(now: now))) ago — run `darkbloom start`)")
+            } else {
+                print("Daemon: not running (run `darkbloom start`)")
+            }
            return
        }
-        let alive = daemonProcessAlive(pid: state.pid)
-        if !alive {
-            print("Daemon: not running (stale state file)")
+
+        guard let state, state.pid == pid else {
+            // launchd says the daemon is up, but the state file is missing or
+            // belongs to an older process — live diagnostics are unavailable.
+            print("Daemon: running (pid \(pid), managed by launchd)")
+            print("  → no live diagnostics: the daemon's state file is missing or from an older session.")
+            print("  → if this persists for >1 min, check that ~/.darkbloom is writable, or run `darkbloom stop && darkbloom start`.")
            return
        }


🔵 [INFO] 🧩 Daemon status checking logic duplicated across StatusCommand and DoctorCommand

💡 Suggestion: Extract daemon status determination to shared helper function in DaemonStateFile or create DaemonStatus utility

📊 Score: 2×3 = 6 · Category: duplicate logic

Extracted a shared daemonIsRunning(state:) helper in DaemonStateFile.swift (83ae1a6) — doctor and DoctorRunner now use it. status keeps its own flow since it needs the resolved PID and per-branch messaging, not just a boolean.

ethenotethan · 2026-06-10T03:22:15Z

        let daemonState = DaemonStateFile.read()
        let daemonRunning = daemonState.map { daemonProcessAlive(pid: $0.pid) } ?? false
+            || LaunchAgent.runningPID() != nil
        print("Daemon: \(daemonRunning ? "running" : "NOT running — run `darkbloom start`")")


🔵 [INFO] 🧩 Same daemon running check pattern as StatusCommand with slight variations

💡 Suggestion: Use shared daemon status helper to avoid inconsistencies between commands

📊 Score: 2×3 = 6 · Category: duplicate logic

Done in 83ae1a6 — doctor and DoctorRunner now share daemonIsRunning(state:).

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: REQUEST_CHANGES

Security — ✅ No issues found

Performance — 1 finding(s) (1 blocking)

🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-95 — Synchronous process execution with 2s timeout in runningPID() could block status command
- Suggestion: Consider making this async or reducing timeout for better responsiveness in status checks

Type_diligence — ✅ No issues found

Additive_complexity — 3 finding(s) (2 blocking)

🟡 [MEDIUM] provider-swift/Sources/ProviderCore/Service/LaunchAgent.swift:64-95 — Custom process execution and parsing instead of using existing utilities
- Suggestion: Consider using existing Swift process utilities or a simpler approach like checking if the service is loaded rather than parsing launchctl output
🔵 [INFO] provider-swift/Sources/ProviderCore/Service/DaemonStateFile.swift:162-167 — New function daemonIsRunning duplicates logic already present in status checking
- Suggestion: Consider consolidating daemon status logic into a single location rather than adding another layer of abstraction
🟡 [MEDIUM] provider-swift/Sources/darkbloom/StatusCommand.swift:57-87 — Complex nested conditionals and multiple state checks make daemon status logic hard to follow
- Suggestion: Extract daemon status determination into a dedicated type with clear states (Running, Stopped, Stale, etc.) to simplify the branching logic

4 finding(s) total, 3 blocking. Verdict: REQUEST_CHANGES.

🤖 Automated review by Centaur · DAR-186

ethenotethan · 2026-06-10T03:23:17Z

+    public static func runningPID(timeout: TimeInterval = 2.0) -> Int32? {
+        let target = "gui/\(getuid())/\(label)"
+        let process = Process()
+        process.executableURL = URL(fileURLWithPath: "/bin/launchctl")
+        process.arguments = ["print", target]
+        let outPipe = Pipe()
+        process.standardOutput = outPipe
+        process.standardError = FileHandle.nullDevice
+
+        let exited = DispatchSemaphore(value: 0)
+        process.terminationHandler = { _ in exited.signal() }
+        do {
+            try process.run()
+        } catch {
+            return nil
+        }
+        if exited.wait(timeout: .now() + timeout) == .timedOut {
+            process.terminate()
+            return nil
+        }
+        let data = outPipe.fileHandleForReading.readDataToEndOfFile()
+        guard process.terminationStatus == 0,
+              let output = String(data: data, encoding: .utf8) else { return nil }
+
+        // launchctl print emits a line shaped `pid = 12345` for a running service.
+        for line in output.split(separator: "\n") {
+            let trimmed = line.trimmingCharacters(in: .whitespaces)
+            guard trimmed.hasPrefix("pid = ") else { continue }
+            return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces))
+        }
+        return nil
+    }


🟡 [MEDIUM] ⚡ Synchronous process execution with 2s timeout in runningPID() could block status command

💡 Suggestion: Consider making this async or reducing timeout for better responsiveness in status checks

📊 Score: 3×4 = 12 · Category: Blocking I/O on hot path

Same as the prior thread: callers are one-shot CLI commands with nothing else to do while waiting, so sync with a 2s bound is the right tradeoff. In practice launchctl print returns in milliseconds; the timeout is only a safety net.

ethenotethan · 2026-06-10T03:23:17Z

+    public static func runningPID(timeout: TimeInterval = 2.0) -> Int32? {
+        let target = "gui/\(getuid())/\(label)"
+        let process = Process()
+        process.executableURL = URL(fileURLWithPath: "/bin/launchctl")
+        process.arguments = ["print", target]
+        let outPipe = Pipe()
+        process.standardOutput = outPipe
+        process.standardError = FileHandle.nullDevice
+
+        let exited = DispatchSemaphore(value: 0)
+        process.terminationHandler = { _ in exited.signal() }
+        do {
+            try process.run()
+        } catch {
+            return nil
+        }
+        if exited.wait(timeout: .now() + timeout) == .timedOut {
+            process.terminate()
+            return nil
+        }
+        let data = outPipe.fileHandleForReading.readDataToEndOfFile()
+        guard process.terminationStatus == 0,
+              let output = String(data: data, encoding: .utf8) else { return nil }
+
+        // launchctl print emits a line shaped `pid = 12345` for a running service.
+        for line in output.split(separator: "\n") {
+            let trimmed = line.trimmingCharacters(in: .whitespaces)
+            guard trimmed.hasPrefix("pid = ") else { continue }
+            return Int32(trimmed.dropFirst("pid = ".count).trimmingCharacters(in: .whitespaces))
+        }
+        return nil
+    }


🟡 [MEDIUM] 🧩 Custom process execution and parsing instead of using existing utilities

💡 Suggestion: Consider using existing Swift process utilities or a simpler approach like checking if the service is loaded rather than parsing launchctl output

📊 Score: 3×4 = 12 · Category: reinventing stdlib

isLoaded() isn't sufficient — the bug being fixed is exactly the case where the service is loaded but we need the live PID to distinguish "running" from "stale state file". There's no public API for a launchd service's PID, so parsing launchctl print's stable pid = N line is the standard approach, and this mirrors the existing launchctl shelling in this same file.

ethenotethan · 2026-06-10T03:23:17Z

+/// alive, or launchd reports a running provider service. The launchd
+/// cross-check covers a daemon whose state file is stale or unwritable.
+public func daemonIsRunning(state: DaemonState?) -> Bool {
+    if state.map({ daemonProcessAlive(pid: $0.pid) }) ?? false { return true }
+    return LaunchAgent.runningPID() != nil
 }


🔵 [INFO] 🧩 New function daemonIsRunning duplicates logic already present in status checking

💡 Suggestion: Consider consolidating daemon status logic into a single location rather than adding another layer of abstraction

📊 Score: 2×3 = 6 · Category: over-abstraction

This helper is the consolidation — it was extracted at review request (prior thread) so doctor/DoctorRunner share one definition instead of repeating daemonProcessAlive || runningPID() inline. status is the only other caller and needs the PID itself, not a boolean.

ethenotethan · 2026-06-10T03:23:17Z

    private func printDaemonStatus() {
        let now = Date().timeIntervalSince1970
-        guard let state = DaemonStateFile.read() else {
-            print("Daemon: not running (run `darkbloom start`)")
+        let state = DaemonStateFile.read()
+        let stateAlive = state.map { daemonProcessAlive(pid: $0.pid) } ?? false
+
+        // The state file alone can't answer "is the daemon running?": it can be
+        // left over from a previous session, or the daemon may be unable to
+        // write it at all (e.g. ~/.darkbloom permissions). launchd is the
+        // source of truth for the process it manages, so cross-check it before
+        // declaring the daemon down.
+        let livePID: Int32? = stateAlive ? state?.pid : LaunchAgent.runningPID()
+
+        guard let pid = livePID else {
+            if let state {
+                // Leftover state file from a previous daemon session (stop,
+                // crash, or reboot). Point at the fix, not the artifact.
+                print("Daemon: not running (last session ended ~\(formatUptime(state.ageSeconds(now: now))) ago — run `darkbloom start`)")
+            } else {
+                print("Daemon: not running (run `darkbloom start`)")
+            }
            return
        }
-        let alive = daemonProcessAlive(pid: state.pid)
-        if !alive {
-            print("Daemon: not running (stale state file)")
+
+        guard let state, state.pid == pid else {
+            // launchd says the daemon is up, but the state file is missing or
+            // belongs to an older process — live diagnostics are unavailable.
+            print("Daemon: running (pid \(pid), managed by launchd)")
+            print("  → no live diagnostics: the daemon's state file is missing or from an older session.")
+            print("  → if this persists for >1 min, check that ~/.darkbloom is writable, or run `darkbloom stop && darkbloom start`.")
            return
        }


🟡 [MEDIUM] 🧩 Complex nested conditionals and multiple state checks make daemon status logic hard to follow

💡 Suggestion: Extract daemon status determination into a dedicated type with clear states (Running, Stopped, Stale, etc.) to simplify the branching logic

📊 Score: 3×4 = 12 · Category: additive complexity

The flow is two early-return guards mapping 1:1 to the three user-visible outcomes (not running / running without live diagnostics / running with diagnostics) — a DaemonStatus enum would just move the same branches behind a type while this remains the only consumer. Will extract one if a second consumer of the full status appears.

provider: clean up daemon state file on stop/exit; actionable status …

1883142

…message

devin-ai-integration Bot requested a deployment to benchmarks June 10, 2026 03:02 Waiting

vercel Bot deployed to Preview – d-inference-console-ui-dev June 10, 2026 03:02 View deployment

ethenotethan reviewed Jun 10, 2026

View reviewed changes

vercel Bot deployed to Preview – d-inference June 10, 2026 03:03 View deployment

provider: status/doctor cross-check launchd PID so a running daemon w…

d189acc

…ith a stale/unwritable state file isn't reported as down

devin-ai-integration Bot requested a deployment to benchmarks June 10, 2026 03:20 Waiting

ethenotethan requested changes Jun 10, 2026

View reviewed changes

vercel Bot deployed to Preview – d-inference-landing June 10, 2026 03:20 View deployment

vercel Bot deployed to Preview – d-inference June 10, 2026 03:21 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev June 10, 2026 03:21 View deployment

provider: bound launchctl query in runningPID with a timeout

07880ad

devin-ai-integration Bot requested a deployment to benchmarks June 10, 2026 03:22 Waiting

ethenotethan requested changes Jun 10, 2026

View reviewed changes

vercel Bot deployed to Preview – d-inference-landing June 10, 2026 03:22 View deployment

vercel Bot deployed to Preview – d-inference June 10, 2026 03:22 View deployment

provider: share daemonIsRunning helper between status/doctor

83ae1a6

vercel Bot deployed to Preview – d-inference-console-ui-dev June 10, 2026 03:23 View deployment

devin-ai-integration Bot requested a deployment to benchmarks June 10, 2026 03:23 Waiting

ethenotethan requested changes Jun 10, 2026

View reviewed changes

vercel Bot deployed to Preview – d-inference-landing June 10, 2026 03:23 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev June 10, 2026 03:23 View deployment

vercel Bot deployed to Preview – d-inference June 10, 2026 03:23 View deployment

devin-ai-integration Bot changed the title ~~provider: fix confusing "stale state file" status after daemon stop/crash/reboot~~ provider: fix "stale state file" status when daemon is stopped — or actually running Jun 10, 2026

Conversation

devin-ai-integration Bot commented Jun 10, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Linked issue

Test plan

Components touched

Protocol / interface changes

Notes for reviewers

Uh oh!

devin-ai-integration Bot commented Jun 10, 2026

🤖 Devin AI Engineer

Uh oh!

vercel Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ethenotethan left a comment

Choose a reason for hiding this comment

Automated Code Review — Layr-Labs/d-inference#

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Trust boundaries touched

Threat coverage

New attack surface not covered by the threat model

Recommended threat model update

Uh oh!

ethenotethan left a comment

Choose a reason for hiding this comment

Automated Code Review — Layr-Labs/d-inference#

Security — ✅ No issues found

Performance — 1 finding(s) (1 blocking)

Type_diligence — ✅ No issues found

Additive_complexity — 3 finding(s) (1 blocking)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ethenotethan left a comment

Choose a reason for hiding this comment

Automated Code Review — Layr-Labs/d-inference#

Security — ✅ No issues found

Performance — 1 finding(s) (1 blocking)

Type_diligence — ✅ No issues found

Additive_complexity — 3 finding(s) (1 blocking)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 10, 2026

Choose a reason for hiding this comment

devin-ai-integration Bot commented Jun 10, 2026 •

edited by blacksmith-sh Bot

Loading

vercel Bot commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 10, 2026 •

edited

Loading