Skip to content

Add automated health monitoring via Claude Code cron job #21

@dennisonbertram

Description

@dennisonbertram

Overview

Set up a periodic health check that uses Claude Code to SSH into the server, evaluate system state, and fire alerts if anything is wrong.

What to check

  • GET /v1/system/health → must return {"status":"ok"}
  • docker ps → Traefik, registry, and all deployed services must be running
  • Disk usage → alert if > 80%
  • ah.service systemd status → must be active
  • HTTPS endpoint → curl -sI https://agentic.hosting/ must return HTTP 200

Implementation sketch

# /usr/local/bin/ah-health-check.sh (runs via cron on local machine or server)
claude -p "You are a health monitor for agentic-hosting server. SSH and check: (1) /v1/system/health returns ok, (2) docker ps shows traefik+registry+website running, (3) disk < 80%, (4) https://agentic.hosting/ returns 200. Output ALERT: <details> for any failure, OK if all pass." --output-format text | grep "^ALERT" | mail -s "agentic-hosting alert" ops@agentic.hosting
*/15 * * * * /usr/local/bin/ah-health-check.sh

Alert channels to support

  • Email
  • Slack webhook
  • SMS (Twilio)
  • Custom webhook

Labels

enhancement, operations, monitoring

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions