Add NullMonitor.timer + SystemMonitor.count_once/timer by talsperre · Pull Request #3200 · Netflix/metaflow

talsperre · 2026-05-14T21:25:42Z

Summary

Adds two small additions to the monitor API so callers can:

Emit a one-shot timer with a precomputed duration (instead of measure()'s contextmanager that wraps wall-clock around a yielded block).
Increment a counter exactly once for discrete events, without the awkward with system_monitor.count(name): pass idiom.

Motivation

The contextmanager-based measure(name) is great when the start and end of the measured interval happen in the same Python process and the same call site. It doesn't fit when the duration is computed externally — for example, wall-clock from a workflow's run-creation timestamp to the terminal step's success, where the two boundaries are in different containers.

The concrete use case: emitting a run-level metaflow.run.duration_timer metric from a terminal-step task_finished hook, computing duration as now() - Run.created_at. See companion internal PR for the consumer: corp/mli-metaflow-custom#1861 (the AtlasMonitor work where #3093's per-step tags are already being used in production).

Changes

metaflow/monitor.py — NullMonitor.timer(name, value). Routes through the same sidecar protocol as measure(), so DebugMonitor and any other NullMonitor subclass inherits the behavior automatically.
metaflow/system/system_monitor.py — SystemMonitor.timer(name, value) passthrough + SystemMonitor.count_once(name) helper.

No existing call sites change. Existing count() / measure() / gauge() APIs are untouched.

Test plan

NullMonitor.timer() is a no-op when the sidecar isn't active.
NullMonitor.timer() sends a single BEST_EFFORT message with the precomputed duration when the sidecar is active.
DebugMonitor (via inheritance) prints the timer through DebugMonitorSidecar.process_message.
SystemMonitor.count_once(name) increments a counter exactly once (verified against an instrumented sub-monitor).

greptile-apps · 2026-05-14T21:28:13Z

Greptile Summary

Adds two small additions to the monitoring API: a one-shot timer(name, value) method (on both NullMonitor and SystemMonitor) for emitting precomputed durations, and a count_once(name) helper on SystemMonitor that avoids the with … pass idiom for discrete events. Both previous review thread issues (synthetic epoch timestamps, undocumented _timer suffix) have been resolved.

NullMonitor.timer() correctly anchors _start/_end to real epoch timestamps so the serialized payload is semantically valid for any downstream consumer.
SystemMonitor.timer() delegates to the underlying monitor, and its docstring explicitly calls out that "_timer" is appended to the metric name.
SystemMonitor.count_once() is a thin wrapper around with self.monitor.count(name): pass; it does not add a corresponding count_once on NullMonitor, so subclasses that want to override this behavior must do so via the count() contextmanager.

Confidence Score: 5/5

The changes are additive and isolated — no existing call sites are touched, and the new methods follow the same sidecar protocol as the surrounding code.

Both new methods are small and well-contained. The epoch-timestamp fix in NullMonitor.timer is correct, and the _timer suffix convention is now clearly documented. The only gap is that a negative value (possible under cross-container clock skew, the exact stated use case) silently emits a negative-duration timer rather than failing fast.

The negative-value edge case in metaflow/monitor.py NullMonitor.timer is worth a second look before merging.

Important Files Changed

Filename	Overview
metaflow/monitor.py	Adds `NullMonitor.timer(name, value)` using real epoch timestamps for `_start`/`_end`; correctly implements the sidecar protocol. No validation for negative `value`.
metaflow/system/system_monitor.py	Adds `count_once(name)` and `timer(name, value)` passthroughs; docstrings correctly document the `_timer` suffix convention. Both implementations are clean.

_{Reviews (2): Last reviewed commit: "Add NullMonitor.timer + SystemMonitor.ti..." | Re-trigger Greptile}

`measure()` is a contextmanager that wraps wall-clock around a yielded block; it doesn't fit cases where the duration is computed externally (e.g. wall-clock across separate processes/containers, where the start boundary is in one process and the end is in another). Add a one-shot `timer(name, value)` to NullMonitor that takes a precomputed duration and routes it through the existing sidecar protocol — DebugMonitor and other NullMonitor subclasses pick it up automatically via inheritance. Also add SystemMonitor.count_once(name) so callers no longer need the awkward `with system_monitor.count(name): pass` idiom for discrete events.

greptile-apps Bot reviewed May 14, 2026

View reviewed changes

Comment thread metaflow/monitor.py

Comment thread metaflow/system/system_monitor.py

talsperre force-pushed the ssrikanth/monitor-one-shot-timer branch from f4ad318 to 8d490d0 Compare May 15, 2026 18:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NullMonitor.timer + SystemMonitor.count_once/timer#3200

Add NullMonitor.timer + SystemMonitor.count_once/timer#3200
talsperre wants to merge 1 commit into
masterfrom
ssrikanth/monitor-one-shot-timer

talsperre commented May 14, 2026

Uh oh!

greptile-apps Bot commented May 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

talsperre commented May 14, 2026

Summary

Motivation

Changes

Test plan

Uh oh!

greptile-apps Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented May 14, 2026 •

edited

Loading