Kubernetes-native error tracking for Go services.
A single binary that catches panics, pod crashes, OOMKills, and CrashLoopBackOff — with zero external dependencies.
Quick Start • Go SDK • K8s Crashes • Web UI • Deploy • Config • Compare
Every error tracker catches panics inside your code. None of them watch what happens outside it.
When a pod gets OOMKilled, hits CrashLoopBackOff, or fails init — your Sentry, your GlitchTip, your Bugsink see nothing. These crashes happen at the Kubernetes level, below any SDK's reach. You find out from kubectl get pods ten minutes later, or from a PagerDuty alert that just says "pod restarted."
crashctl fixes this.
Two tracks, one binary:
| Track | What It Catches | How |
|---|---|---|
| Application Errors | Panics, errors, stack traces from your Go services | Lightweight Go SDK (sdk.CaptureError) |
| Kubernetes Crashes | OOMKill, CrashLoopBackOff, failed init containers, evictions | Watches the K8s API directly via SharedInformers |
When a pod crashes due to a panic that the SDK already captured, crashctl automatically links them — so you see the Go stack trace next to the OOMKill memory stats in one view.
| Metric | Value |
|---|---|
| Binary size | ~20 MB |
| Memory at idle | ~50 MB |
| External dependencies | Zero (embedded BadgerDB) |
| Time to deploy on K3s | 30 seconds |
| Events per second | 1,000+ per project |
| Docker image size | < 30 MB |
# Install
go install github.com/syst3mctl/crashctl/cmd/crashctl@latest
# Create a project and get a DSN
crashctl project create --name my-app
# → DSN: http://localhost:9090/api/v1/events?key=a1b2c3d4...
# Start the server
crashctl servedocker run -d \
--name crashctl \
-p 9090:9090 \
-v crashctl-data:/data/crashctl \
ghcr.io/syst3mctl/crashctl:latesthelm repo add syst3mctl https://charts.syst3mctl.dev
helm install crashctl syst3mctl/crashctl \
--namespace monitoring \
--create-namespaceThen open http://localhost:9090 to see the dashboard.
Install the SDK in your Go service:
go get github.com/syst3mctl/crashctl/sdkpackage main
import "github.com/syst3mctl/crashctl/sdk"
func main() {
sdk.Init(sdk.Config{
DSN: "http://crashctl:9090/api/v1/events?key=YOUR_DSN_KEY",
Service: "api-gateway",
Version: "1.2.3",
})
defer sdk.Flush(5 * time.Second)
// Your application code...
}if err := processOrder(ctx, orderID); err != nil {
sdk.CaptureError(err,
sdk.WithTag("order_id", orderID),
sdk.WithUser(userID),
sdk.WithLevel(sdk.LevelError),
)
}import "github.com/syst3mctl/crashctl/sdk/middleware"
// net/http
http.ListenAndServe(":8080", middleware.HTTPMiddleware(mux))
// chi
r := chi.NewRouter()
r.Use(middleware.ChiMiddleware)
// gin
r := gin.Default()
r.Use(middleware.GinMiddleware())Any panic in your HTTP handlers is automatically captured with the full stack trace, error chain, and HTTP request context (method, path, status code, headers).
Every error event includes:
- Full stack trace with file, function, and line number
- Go error chain — unwraps
errors.Is/errors.Aschains to show the root cause - Goroutine-aware traces — captures the goroutine that panicked
- Custom tags — key-value pairs you attach for filtering
- Service metadata — name, version, environment, hostname
When crashctl runs on a Kubernetes cluster (K3s, K8s, EKS, GKE, etc.), it automatically watches the cluster for crashes that happen outside your application code:
| Crash Type | How It's Detected | What You See |
|---|---|---|
| OOMKill | LastTerminationState.Terminated.Reason == "OOMKilled" |
Memory limit, actual usage, container logs |
| CrashLoopBackOff | State.Waiting.Reason == "CrashLoopBackOff" |
Exit code, restart count, last 50 log lines |
| Failed Init Container | Init container Terminated.ExitCode != 0 |
Exit code, container logs |
| Pod Eviction | Phase == Failed, Reason == "Evicted" |
Eviction reason, node pressure |
| Restart Threshold | RestartCount > configured threshold |
Restart history, container status |
When a pod crashes due to a panic that the SDK already captured, crashctl links them automatically:
┌─────────────────────────────────────────────────┐
│ ErrorGroup: "runtime error: index out of range" │
│ Stack: api-gateway/handlers/order.go:142 │
│ Count: 23 occurrences │
│ │
│ 🔗 Linked Kubernetes Crash: │
│ OOMKill — api-gateway-7f8b4d (256Mi limit) │
│ Node: worker-02 — 4 minutes ago │
└─────────────────────────────────────────────────┘
The Helm chart sets this up automatically. If deploying manually, the ServiceAccount needs:
rules:
- apiGroups: [""]
resources: ["pods", "pods/log", "events", "namespaces"]
verbs: ["get", "list", "watch"]crashctl includes a built-in web dashboard at http://localhost:9090. No separate frontend deployment — the UI is embedded in the binary via go:embed.
| Page | URL | What It Shows |
|---|---|---|
| Dashboard | / |
Total errors, active groups, recent pod crashes |
| Error List | /errors |
All error groups, sortable by count/recency, filterable by level/status |
| Error Detail | /errors/:id |
Full stack trace, error chain, occurrence timeline, linked K8s crashes |
| Crash List | /crashes |
All pod crashes, filterable by namespace and crash type |
| Crash Detail | /crashes/:id |
Container logs, memory stats, exit code, linked error group |
The UI is built with Go html/template + htmx — no React, no npm, no JavaScript build step. Sorting, filtering, and pagination are handled via htmx partial page updates. The entire UI adds ~14KB (htmx.min.js) to the binary.
crashctl exposes a /metrics endpoint in Prometheus exposition format:
| Metric | Type | Description |
|---|---|---|
crashctl_events_total |
Counter | Total error events (labels: project, level, service) |
crashctl_groups_active |
Gauge | Active error groups (labels: project, status) |
crashctl_pod_crashes_total |
Counter | Kubernetes crashes (labels: namespace, crash_type) |
crashctl_ingestion_duration_seconds |
Histogram | Event processing latency |
crashctl_storage_size_bytes |
Gauge | BadgerDB storage usage |
Import the included dashboard JSON from deploy/grafana/ to visualize crashctl metrics alongside your existing K3s monitoring.
Send notifications when new errors appear or pods crash:
alerting:
webhooks:
- name: team-slack
url: https://hooks.slack.com/services/XXX
type: slack
events: [new_group, pod_crash, regression]
- name: ops-discord
url: https://discord.com/api/webhooks/XXX
type: discord
events: [pod_crash]
- name: pagerduty
url: https://events.pagerduty.com/v2/enqueue
type: generic
events: [pod_crash]Supported: Slack (Block Kit), Discord (embeds), Generic HTTP (JSON POST).
crashctl reads configuration from a YAML file, environment variables (CRASHCTL_ prefix), and CLI flags. Priority: flags > env > file > defaults.
# crashctl.yaml
server:
listen: ":9090" # HTTP listen address
base_url: "https://crashctl.example.com" # Used in webhook links
storage:
driver: badger # badger or postgres
badger:
path: /data/crashctl # BadgerDB data directory
# postgres:
# dsn: postgres://user:pass@localhost:5432/crashctl?sslmode=disable
retention:
max_age: 720h # 30 days
cleanup_interval: 1h # How often to run cleanup
kubernetes:
enabled: true # Enable K8s crash detection
namespaces: [] # Empty = all namespaces
restart_threshold: 5 # Alert after N restarts
alerting:
cooldown: 5m # Min time between alerts for same group
webhooks:
- name: team-slack
url: https://hooks.slack.com/services/XXX
type: slack
events: [new_group, pod_crash, regression]Every config key maps to an env var with CRASHCTL_ prefix and _ separators:
CRASHCTL_SERVER_LISTEN=":9090"
CRASHCTL_STORAGE_DRIVER="badger"
CRASHCTL_STORAGE_BADGER_PATH="/data/crashctl"
CRASHCTL_KUBERNETES_ENABLED="true"
CRASHCTL_RETENTION_MAX_AGE="720h" ┌─────────────────────────────────────┐
│ crashctl binary │
│ │
Go Service ──SDK──► │ Ingestion API ──► Grouping │
│ │ │ │
│ ▼ ▼ │
K8s API ──Watch───► │ K8s Watcher BadgerDB │
│ │ │ │
│ ▼ ▼ │
│ Correlation Web UI (:9090) │
│ │ │ │
│ ▼ ▼ │
│ Alerting /metrics (Prometheus) │
└─────────────────────────────────────┘
Key values you'll want to customize:
# values.yaml
replicaCount: 1
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
persistence:
enabled: true
size: 10Gi
storageClass: "" # Uses default storage class
serviceMonitor:
enabled: false # Set true if using Prometheus Operator
ingress:
enabled: false
className: traefik
hosts:
- host: crashctl.example.com
paths:
- path: /crashctl serve # Start the server
crashctl project create # Create a project, prints DSN
crashctl project list # List all projects
crashctl cleanup # Manual retention cleanup
crashctl version # Print version info| Feature | crashctl | Sentry | Bugsink | GlitchTip |
|---|---|---|---|---|
| Written in | Go | Python | Python | Python |
| Deployment | Single binary | 58+ services | Docker container | 4 services |
| Min. RAM | ~50 MB | 16 GB+ | 1 GB | 2 GB |
| External DB required | No | PG + ClickHouse + Kafka + Redis | SQLite / PG | PG + Redis |
| K8s crash detection | ✓ Built-in | ✗ | ✗ | ✗ |
| OOMKill alerts | ✓ Automatic | ✗ | ✗ | ✗ |
| CrashLoopBackOff | ✓ Automatic | ✗ | ✗ | ✗ |
| Crash-to-error linking | ✓ | ✗ | ✗ | ✗ |
| Go error chain support | ✓ Native | Basic | Basic | Basic |
| Prometheus metrics | ✓ Built-in | Via plugin | ✗ | ✗ |
| Sentry SDK compatible | Planned (v0.3) | ✓ | ✓ | ✓ |
| ARM64 support | ✓ | ✗ | ✓ | ✓ |
| License | MIT | FSL | BSL | MIT |
- You run Go services on K3s or Kubernetes
- You want one tool for both application errors and pod crashes
- You want to self-host on minimal infrastructure (even a Raspberry Pi)
- You value zero external dependencies over a rich plugin ecosystem
- You want Prometheus metrics and Grafana dashboards out of the box
- You need error tracking for non-Go languages (use Sentry or GlitchTip)
- You need full-stack observability (APM, tracing, session replay — use Sentry or Datadog)
- You need Sentry SDK compatibility today (planned for v0.3, use Bugsink or GlitchTip now)
- You run on bare metal / VMs without Kubernetes (crashctl works but you lose the K8s moat)
| Version | Status | Highlights |
|---|---|---|
| v0.1.0-alpha | 🟡 In progress | Core: SDK, grouping, web UI, BadgerDB |
| v0.2.0-alpha | ⬜ Planned | K8s crash detection, correlation, Prometheus, alerting |
| v0.1.0 | ⬜ Planned | Helm chart, CLI polish, first public release |
| v0.2.0 | ⬜ Future | PostgreSQL backend, Grafana dashboards, OTel ingestion |
| v0.3.0 | ⬜ Future | Sentry SDK compatibility, multi-project auth, K8s operator |
Contributions are welcome. Please read the implementation guide and follow the project conventions:
- Fork the repository
- Create a feature branch (
feat/your-feature) - Write tests for your changes
- Ensure
make lint && make testpasses - Submit a pull request
git clone https://github.com/syst3mctl/crashctl.git
cd crashctl
make dev # Starts local server with hot reload- Error handling: always wrap with
fmt.Errorf("context: %w", err) - Logging:
slogonly — nofmt.Printlnorlog.Printf - Testing: table-driven tests with
t.Run()subtests - Dependencies: minimal — think twice before adding a new dependency
See CLAUDE.md for the full code standards and architecture guide.
MIT License — see LICENSE for details.
Built by syst3mctl in Tbilisi, Georgia.
Catches what your SDK can't see.