Open
Conversation
…orld ReadMemStats - refreshVeriflierClients now diffs addr|token fingerprints and skips rebuilding when the verifier list is unchanged, preserving TCP connection pools between rounds - Remove runtime.ReadMemStats stop-the-world call — it was logging but taking no action; memory metrics are already covered by EmitMemStats - Remove unused statusDown constant; the DB transition path goes directly from statusRunning to statusConfirmedDown - Add comment to per-round ClaimBuckets call explaining the rebalancing intent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d orchestrator logic
…ck, and orchestrator paths
Fixes cleanup ordering deadlock in pool tests (LIFO cleanup, close channel before Drain). Adds tests for wpcom circuit breaker, veriflier transport, checker.Check paths, config hot-reload, dashboard SSE, audit helpers, orchestrator memory pressure, retry queue, and pure utility functions.
EVENTS.md: event-sourced architecture — lifecycle, idempotency, resolution reasons, causal links, and site-row projection. TAXONOMY.md: five-layer test taxonomy (Reachability → Transport → Infrastructure → Application → Content + Reverse checks), site/endpoint/ check data model, multi-state vocabulary, event schema, scope matrix, signal processing, and versioned implementation roadmap. ROADMAP.md: deferred public REST API — query and manage endpoints, auth, pagination, and uptime-bench integration context. AGENTS.md: architectural decision log covering event sourcing, severity vs. state separation, Seems Down lifecycle, in-place event updates, idempotent event identity, resolution reasons, causal vs. rollup links, and Unknown-is-not-downtime invariant. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Jetmon 2 is no longer scoped as a drop-in replacement. It is a comprehensive site health monitoring platform — event-sourced, multi-layer, multi-endpoint — with a competitor-parity public REST API as a first-class product surface. ROADMAP.md: reframe public API from internal tooling to user-facing capability on parity with Pingdom, UptimeRobot, and Better Uptime. Expand from two capabilities (query, manage) to five: status and state, events and history, SLA statistics (uptime %, response time p95/p99, MTTR), monitor management (CRUD, pause, resume, trigger-now), and alert contacts with outbound webhooks. Add Unknown/Downtime distinction to uptime calculations, trigger-now async semantics, per-scope rate limiting with a dedicated trigger bucket, and key lifecycle CLI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Output binary as veriflier2-bin in the builder stage to avoid colliding with the veriflier2/ source directory, then copy it into the final image as veriflier2 so the entrypoint script works unchanged.
- Remove ADD COLUMN IF NOT EXISTS from migrations 2 and 7 — MySQL 8.0 does not support that MariaDB-only syntax; the migration tracker already prevents re-applying - Set JETMON_PID_FILE=/jetmon/jetmon2.pid in the entrypoint so the non-root jetmon user can write the PID file - Drop logs/stats host-volume mounts from docker-compose to avoid ownership conflicts with the container's jetmon user
Fresh dev databases don't have the production table, so migration 3 (ALTER TABLE) was failing. Migration 2 now creates the base table with IF NOT EXISTS so production deployments skip it safely. Renumbered subsequent migrations 3–7 to 4–8.
Docker looks for .dockerignore in the build context root (repo root), not in the Dockerfile's subdirectory. The misplaced docker/.dockerignore was never being applied, so the pre-built jetmon2 binary was leaking into the build context and could mask a fresh compile.
The database column last_status_change is DATETIME NULL, but the Site struct had it as time.Time (non-pointer). Go's database/sql returns "converting NULL to time.Time is unsupported" when scanning a NULL value into a non-pointer struct field, causing GetSitesForBucket to return an error on every round and silently skip all checks.
The entry 'veriflier2' matched the source directory after the binary was renamed to veriflier2-bin. Update to veriflier2-bin so the source tree is included in the build context while the local binary is not.
When the mounted config/ directory is owned by a different UID than the container's jetmon user, writing config.json there fails. Check if config/ is writable first; if not, generate config.json into /jetmon/ (container-owned) and set JETMON_CONFIG accordingly.
Matches the typical first user on Linux hosts, ensuring the container can read and write to host-mounted volumes without permission errors.
- Add JETMON_UID/JETMON_GID to .env-sample (default 1000) and wire them into docker-compose user: so the container runs as the host user - Revert Dockerfile to system user; chmod 777 internal dirs (logs, stats, certs) so they are writable under any UID - Fall back to /tmp/config.json when config/ is not writable, since /tmp is always world-writable
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Work in progress. This branch (
v2) is the ambitious successor to the Go rewrite started inrefactor/jetmon2(PR #60). It includes everything from that branch and extends it with a new architecture and direction.What changed from PR #60
PR #60 scoped Jetmon 2 as a drop-in replacement: same interfaces, same schema, same behaviour — just Go instead of Node.js + C++. That work is complete and forms the base of this branch.
This branch pivots to a larger goal: Jetmon 2 as a full site health monitoring platform, not just an uptime tracker. The key additions:
EVENTS.md.TAXONOMY.md.ROADMAP.md.site_statuscolumn keeps receiving derived writes so current consumers are not broken. New capabilities are additive; consumers adopt progressively.Architectural decisions are locked in
AGENTS.mdso they are enforced consistently across all changes.Why Go
The current architecture uses forked Node.js processes (8–16MB RSS each at startup, 53MB limit before recycling) as workers, plus a compiled C++ addon to escape Node's event loop for blocking network I/O. Go eliminates both constraints:
net/httpandcrypto/tlsare first-class stdlib packages — no native addon, no node-gyp, no compilation step during deploymentnet/http/httptraceprovides DNS, TCP, TLS, and TTFB timing hooks as separate measurements within each check, for freenode_modules, and no addon rebuild on Node.js version upgradespprof, race detector viago test -race, and a mature testing ecosystemThe Veriflier is rewritten in Go as well, replacing the Qt C++ dependency with a lightweight Go HTTP service. The protocol between Monitor and Verifliers moves from custom HTTPS to gRPC, providing type-safe contracts, built-in retries, and bidirectional streaming for future use.
Benefits of the Rewrite
Memory
The current architecture forks Node.js worker processes that start at 8–16MB RSS and are recycled once they reach 53MB. With a typical deployment of 8–16 workers, the process tree consumes 240–850MB of resident memory just for worker overhead, before any check data is counted.
Jetmon 2 runs as a single process. Go goroutines start at 4KB of stack and grow on demand. A pool of 1,000 concurrent goroutines costs roughly 4MB of stack. Total process RSS for an equivalent workload is estimated at 50–150MB — a 75–90% reduction in memory consumption per host.
Concurrent Checks
Current concurrency is bounded by the number of worker processes. Each worker is a single-threaded Node.js process; practical concurrency per host is in the low hundreds.
Go's goroutine scheduler makes 10,000+ concurrent in-flight checks on a single host practical with no additional configuration. At a conservative network timeout of 10 seconds and average site response time of 200ms, a pool of 1,000 goroutines sustains approximately 5,000 check completions per second — an estimated 10–50× increase in concurrent checks per host.
Throughput
The current architecture crosses a process boundary on every unit of work: master dispatches via IPC, worker receives and processes, replies via IPC, master aggregates. Each crossing involves serialisation, a context switch, and V8 event loop scheduling on both ends.
Jetmon 2 replaces all IPC with Go channel sends, which are in-process and order-of-magnitude cheaper. Estimated throughput improvement: 3–10× more sites checked per second per host under equivalent conditions.
Check Scheduling Accuracy
The current system uses
setTimeoutandsetIntervalfor round scheduling, subject to V8 event loop delay — a busy loop can delay a callback by tens to hundreds of milliseconds, introducing jitter into RTT measurements.Go's
time.Tickerfires with OS-level timer precision. RTT measurements fromnet/http/httptraceare taken inside the HTTP stack with no event loop between the measurement point and the timer.Deployment Speed
Current deployment requires
npm install, anode-gyprebuild of the native C++ addon, and a coordinated process restart. A failed addon compilation blocks deployment entirely.Jetmon 2 deploys as a single static binary. Deployment is: copy binary,
systemctl restart jetmon2. Total deployment time drops from several minutes to under 30 seconds.Mean Time to Recovery
A worker process crash requires the master to detect the exit, spawn a replacement, and wait for initialisation — several seconds, with in-flight checks unresolved.
In Jetmon 2, a panicking goroutine is recovered by a deferred handler, the result counted as an error, and a replacement goroutine immediately spawned — recovery in the low milliseconds. For a full process crash, systemd restarts the binary; with Go's fast startup, the process is accepting work again in under 2 seconds.
Operational Complexity
The current system requires managing Node.js version compatibility, native addon compilation, npm dependency trees, and the fragile worker spawn/recycle lifecycle.
Jetmon 2 eliminates all of this. One artifact to manage: the Go binary. No
node-gyp, nonpm, no Node.js version management.Build order
jetmon_endpoints,jetmon_events, updated site row projection, back-compatsite_statusderived write🤖 Generated with Claude Code