status: distinguish API down from unreachable by jarugupj · Pull Request #170 · kernel/cli

jarugupj · 2026-05-28T17:18:16Z

Summary

Splits the failure handling for kernel status so the message accurately reflects what happened, instead of collapsing both transport errors and non-2xx responses into one ambiguous "Could not reach Kernel API" line.

Three failure cases now:

Transport error on /status (network down, DNS, timeout) → Could not reach Kernel API. Check https://status.kernel.sh for updates.
Non-2xx from /status, /health also unhealthy → Kernel API is down. Check https://status.kernel.sh for updates.
Non-2xx from /status, /health returns 200 → Kernel API is responding but /status is unavailable. Check https://status.kernel.sh for updates.

The /health probe matters because /status has an upstream dependency on incident.io, so a 5xx from /status doesn't necessarily mean the API itself is down. /health is a dependency-free liveness check on the same server, so a 200 there confirms the API is up and only the status endpoint is broken.

No new helpers, no synthetic UI rendering — reserves the dotted status UI for real data from the API (same convention the dashboard's status indicator follows).

Test plan

Healthy run against prod still renders the status UI as before
Transport error (unreachable host) prints "Could not reach Kernel API…"
/status 5xx + /health 5xx prints "Kernel API is down…"
/status 5xx + /health 200 prints "Kernel API is responding but /status is unavailable…"

🤖 Generated with Claude Code

Note

Low Risk
CLI-only error messaging and an extra HTTP probe on failure paths; no auth, data, or API contract changes.

Overview
kernel status now prints different errors depending on whether the CLI cannot talk to the API at all, the API is down, or only /status is failing.

When /status returns a non-2xx, the command probes /health on the same base URL. A healthy /health yields Kernel API is responding but /status is unavailable; otherwise Kernel API is down. Transport failures on the initial /status request still use Could not reach Kernel API.

This separates incident.io-related /status outages from true API unavailability without changing the successful status UI or JSON output path.

^{Reviewed by Cursor Bugbot for commit ad44904. Bugbot is set up for automated code reviews on this repo. Configure here.}

When /status returns a non-2xx response the API is reachable but unhealthy, so report it as down rather than reusing the generic 'could not reach' message used for transport errors.

…broken Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

firetiger-agent · 2026-05-29T16:03:30Z

Monitoring Plan: Improve `kernel status` error message differentiation

What this PR does: Gives the kernel status command a more informative error message when the API's status page is unreachable — distinguishing a /status endpoint outage from a full API outage.

Intended effect:

CLI output correctness: No telemetry baseline exists (CLI has no OTel instrumentation). Confirmed by running kernel status post-deploy under normal conditions — output should be identical to pre-deploy (status table displayed). The new messages only appear during failure scenarios.

Risks:

Extended timeout during full outage — when the API is completely unreachable, the CLI now makes two sequential HTTP calls (each up to 10s timeout), potentially doubling the wait time before showing an error. No signal to alert on; first user invocation during an outage surfaces this.
/health endpoint stability — if https://api.onkernel.com/health is not a stable always-200 endpoint, the fallback message could mislead users. Verify /health reliability before release; alert if any non-2xx is observed from this endpoint during normal operation.
API 5xx regression (unrelated path) — overall API error rate baseline is 0.08–0.79% 5xx; alert if sustained above 2% for 2+ hours post-deploy (pre-existing infra noise pattern, not expected to be PR-related).

Status updates will be posted automatically on this PR as monitoring progresses.

View monitor

distinguish API down from unreachable in status command

42bdba0

When /status returns a non-2xx response the API is reachable but unhealthy, so report it as down rather than reusing the generic 'could not reach' message used for transport errors.

jarugupj force-pushed the hypeship/status-distinguish-api-down branch from 5c1365d to cfc2591 Compare May 29, 2026 14:10

fall back to /health on non-2xx to distinguish API down from /status …

ad44904

…broken Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

jarugupj force-pushed the hypeship/status-distinguish-api-down branch from cfc2591 to ad44904 Compare May 29, 2026 15:54

jarugupj marked this pull request as ready for review May 29, 2026 15:57

jarugupj requested a review from masnwilliams May 29, 2026 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

status: distinguish API down from unreachable#170

status: distinguish API down from unreachable#170
jarugupj wants to merge 2 commits into
mainfrom
hypeship/status-distinguish-api-down

jarugupj commented May 28, 2026 •

edited by cursor Bot

Loading

Uh oh!

firetiger-agent Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jarugupj commented May 28, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

firetiger-agent Bot commented May 29, 2026

Monitoring Plan: Improve kernel status error message differentiation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jarugupj commented May 28, 2026 •

edited by cursor Bot

Loading

Monitoring Plan: Improve `kernel status` error message differentiation