Skip to content

Add NullMonitor.timer + SystemMonitor.count_once/timer#3200

Open
talsperre wants to merge 1 commit into
masterfrom
ssrikanth/monitor-one-shot-timer
Open

Add NullMonitor.timer + SystemMonitor.count_once/timer#3200
talsperre wants to merge 1 commit into
masterfrom
ssrikanth/monitor-one-shot-timer

Conversation

@talsperre
Copy link
Copy Markdown
Collaborator

Summary

Adds two small additions to the monitor API so callers can:

  1. Emit a one-shot timer with a precomputed duration (instead of measure()'s contextmanager that wraps wall-clock around a yielded block).
  2. Increment a counter exactly once for discrete events, without the awkward with system_monitor.count(name): pass idiom.

Motivation

The contextmanager-based measure(name) is great when the start and end of the measured interval happen in the same Python process and the same call site. It doesn't fit when the duration is computed externally — for example, wall-clock from a workflow's run-creation timestamp to the terminal step's success, where the two boundaries are in different containers.

The concrete use case: emitting a run-level metaflow.run.duration_timer metric from a terminal-step task_finished hook, computing duration as now() - Run.created_at. See companion internal PR for the consumer: corp/mli-metaflow-custom#1861 (the AtlasMonitor work where #3093's per-step tags are already being used in production).

Changes

  • metaflow/monitor.pyNullMonitor.timer(name, value). Routes through the same sidecar protocol as measure(), so DebugMonitor and any other NullMonitor subclass inherits the behavior automatically.
  • metaflow/system/system_monitor.pySystemMonitor.timer(name, value) passthrough + SystemMonitor.count_once(name) helper.

No existing call sites change. Existing count() / measure() / gauge() APIs are untouched.

Test plan

  • NullMonitor.timer() is a no-op when the sidecar isn't active.
  • NullMonitor.timer() sends a single BEST_EFFORT message with the precomputed duration when the sidecar is active.
  • DebugMonitor (via inheritance) prints the timer through DebugMonitorSidecar.process_message.
  • SystemMonitor.count_once(name) increments a counter exactly once (verified against an instrumented sub-monitor).

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 14, 2026

Greptile Summary

Adds two small additions to the monitoring API: a one-shot timer(name, value) method (on both NullMonitor and SystemMonitor) for emitting precomputed durations, and a count_once(name) helper on SystemMonitor that avoids the with … pass idiom for discrete events. Both previous review thread issues (synthetic epoch timestamps, undocumented _timer suffix) have been resolved.

  • NullMonitor.timer() correctly anchors _start/_end to real epoch timestamps so the serialized payload is semantically valid for any downstream consumer.
  • SystemMonitor.timer() delegates to the underlying monitor, and its docstring explicitly calls out that "_timer" is appended to the metric name.
  • SystemMonitor.count_once() is a thin wrapper around with self.monitor.count(name): pass; it does not add a corresponding count_once on NullMonitor, so subclasses that want to override this behavior must do so via the count() contextmanager.

Confidence Score: 5/5

The changes are additive and isolated — no existing call sites are touched, and the new methods follow the same sidecar protocol as the surrounding code.

Both new methods are small and well-contained. The epoch-timestamp fix in NullMonitor.timer is correct, and the _timer suffix convention is now clearly documented. The only gap is that a negative value (possible under cross-container clock skew, the exact stated use case) silently emits a negative-duration timer rather than failing fast.

The negative-value edge case in metaflow/monitor.py NullMonitor.timer is worth a second look before merging.

Important Files Changed

Filename Overview
metaflow/monitor.py Adds NullMonitor.timer(name, value) using real epoch timestamps for _start/_end; correctly implements the sidecar protocol. No validation for negative value.
metaflow/system/system_monitor.py Adds count_once(name) and timer(name, value) passthroughs; docstrings correctly document the _timer suffix convention. Both implementations are clean.

Reviews (2): Last reviewed commit: "Add NullMonitor.timer + SystemMonitor.ti..." | Re-trigger Greptile

Comment thread metaflow/monitor.py
Comment thread metaflow/system/system_monitor.py
`measure()` is a contextmanager that wraps wall-clock around a yielded
block; it doesn't fit cases where the duration is computed externally
(e.g. wall-clock across separate processes/containers, where the start
boundary is in one process and the end is in another). Add a one-shot
`timer(name, value)` to NullMonitor that takes a precomputed duration
and routes it through the existing sidecar protocol — DebugMonitor and
other NullMonitor subclasses pick it up automatically via inheritance.

Also add SystemMonitor.count_once(name) so callers no longer need the
awkward `with system_monitor.count(name): pass` idiom for discrete
events.
@talsperre talsperre force-pushed the ssrikanth/monitor-one-shot-timer branch from f4ad318 to 8d490d0 Compare May 15, 2026 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant