Summary
Diagnosing a stuck install today required manual lsof / ps / launchctl / log spelunking across several subsystems. A single hyp doctor command should aggregate these checks and print concrete remedies.
Suggested checks
- Port ownership for the gateway listen address: is it free, held by this daemon, or held by a foreign/second process? Report the owning PID and command.
- Daemon liveness vs crash-looping: detect a high respawn rate (many short-lived PIDs) and call it out.
- Config slot / seed consistency: active etag vs
bad_etag, a seed token shadowed by an active slot (see related join issue), missing/extra slots.
- Central identity state: bootstrapped vs token-present-but-unused vs unreachable server.
- Sink materialization: which sinks failed and why, with the actionable hint.
Proposed
- New
hyp doctor command that reuses run/status.json, config-control state, and a live port probe, then prints a checklist with pass/fail and a remedy per failure.
Context
This came out of a debugging session where a foreign daemon held the gateway port, the daemon crash-looped invisibly, and a stale config slot shadowed a fresh join token. Most of the manual investigation should have been one command.
Summary
Diagnosing a stuck install today required manual
lsof/ps/launchctl/ log spelunking across several subsystems. A singlehyp doctorcommand should aggregate these checks and print concrete remedies.Suggested checks
bad_etag, a seed token shadowed by an active slot (see related join issue), missing/extra slots.Proposed
hyp doctorcommand that reusesrun/status.json,config-controlstate, and a live port probe, then prints a checklist with pass/fail and a remedy per failure.Context
This came out of a debugging session where a foreign daemon held the gateway port, the daemon crash-looped invisibly, and a stale config slot shadowed a fresh join token. Most of the manual investigation should have been one command.