Conversation
Add always-on internal atomic counters and optional `metrics` crate integration for production monitoring (Prometheus, StatsD, Datadog). Phase 0 — SchedulerCounters + MetricsSnapshot: - New `src/scheduler/counters.rs` with `SchedulerCounters` (AtomicU64) and public `MetricsSnapshot` struct - `Scheduler::metrics_snapshot()` returns cumulative counters + gauges Phase 1 — `metrics` crate integration: - New `metrics` Cargo feature with `metrics = "0.24"` optional dep - New `src/scheduler/metrics_bridge.rs` with `MetricsEmitter` that emits counters, gauges, and histograms via the `metrics` facade - Metric descriptions registered once at build time Phase 2 — Builder API: - `metrics_prefix()`, `metrics_label()`, `disable_metric()` on SchedulerBuilder for customizing metric names and labels Phase 3 — Instrumentation points: - submit.rs: submitted, superseded, batch counters - spawn.rs: dispatched counter, queue wait histogram, inline retry counters - completion.rs: completed counter, duration histogram - failure.rs: failed, retried, dead_lettered, dependency_failures counters + duration histogram - gate.rs: gate_denials, rate_limit_throttles counters at each denial point - control.rs: group_pauses, group_resumes counters - run_loop.rs: expired counter, gauge updates (pending, running, blocked, paused, waiting, pressure, module running, rate limit tokens) Phase 4 — Duration plumbing: - Added `duration: Duration` to CompletionMsg and FailureMsg - Captured via `started_at.elapsed()` in spawned task Docs: - New `docs/metrics.md` with full metric reference, dashboard layout, alert rules, and builder API examples - Cross-linked from quick-start, configuration, progress-and-events, query-apis, and io-and-backpressure docs - Feature flag and builder methods documented in configuration.md - Metrics section added to lib.rs crate docs Tests: - 8 new integration tests covering submit/dispatch/complete, failure/ retry, dead-letter, batch, group pause/resume, gauges, and supersede - All 377 existing tests continue to pass - Zero clippy warnings on both feature flag states
Contributor
Benchmark ComparisonClick to expand |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
AtomicU64counters and a publicMetricsSnapshotstruct for consumers who don't use themetricscrate (CLIs, TUI dashboards)metricscrate integration (behind themetricsCargo feature) that emits ~30 counters, gauges, and histograms via the standard facade — consumers choose their exporter (Prometheus, StatsD, Datadog)SchedulerBuildermethods for customizing metric names (metrics_prefix), global labels (metrics_label), and suppressing specific metrics (disable_metric)CompletionMsg/FailureMsgfor execution time histograms and queue wait histograms at dispatch timeCloses #20
New files
src/scheduler/counters.rsSchedulerCounters(always-on atomics) +MetricsSnapshotpublic structsrc/scheduler/metrics_bridge.rsMetricsEmitter— feature-gatedmetricscrate facade wrappertests/integration/metrics.rsdocs/metrics.mdModified files (13)
Cargo.tomlmetrics = { version = "0.24", optional = true },metricsfeature flagsrc/lib.rsMetricsSnapshot, feature flag + metrics crate docssrc/scheduler/mod.rscounters/metrics_bridgemodules,MetricsConfig,durationon coalescing messages, new fields onSchedulerInnersrc/scheduler/builder.rsmetrics_prefix(),metrics_label(),disable_metric()builder methods;describe_metrics()at build timesrc/scheduler/queries.rsScheduler::metrics_snapshot()methodsrc/scheduler/gate.rscountersinGateContext;gate_denials+rate_limit_throttlesat each denial pathsrc/scheduler/run_loop.rspoll_and_dispatch(); expired counter; rate limit token gaugessrc/scheduler/spawn.rssrc/scheduler/spawn/context.rscounters+emitterinSpawnContextsrc/scheduler/spawn/completion.rssrc/scheduler/spawn/failure.rssrc/scheduler/submit.rssrc/scheduler/control.rsDesign decisions
AtomicU64(always) AND calls theMetricsEmitter(only with#[cfg(feature = "metrics")]). The atomics serve non-metricsconsumers; themetricscrate adds labels, histograms, and gauge semantics.metrics::*calls are behind#[cfg(feature = "metrics")]. Internal counters cost a few cache lines of atomics withRelaxedordering.type,module,group, andreasonappear as labels. Nevertask_id,key, or user-providedtags.spawn.rsnow incrementsfailed,failed_retryable, andretriedcounters (previously this fast path bypassed failure accounting).