Add NullMonitor.timer + SystemMonitor.count_once/timer#3200
Conversation
Greptile SummaryAdds two small additions to the monitoring API: a one-shot
Confidence Score: 5/5The changes are additive and isolated — no existing call sites are touched, and the new methods follow the same sidecar protocol as the surrounding code. Both new methods are small and well-contained. The epoch-timestamp fix in NullMonitor.timer is correct, and the _timer suffix convention is now clearly documented. The only gap is that a negative value (possible under cross-container clock skew, the exact stated use case) silently emits a negative-duration timer rather than failing fast. The negative-value edge case in metaflow/monitor.py NullMonitor.timer is worth a second look before merging. Important Files Changed
Reviews (2): Last reviewed commit: "Add NullMonitor.timer + SystemMonitor.ti..." | Re-trigger Greptile |
`measure()` is a contextmanager that wraps wall-clock around a yielded block; it doesn't fit cases where the duration is computed externally (e.g. wall-clock across separate processes/containers, where the start boundary is in one process and the end is in another). Add a one-shot `timer(name, value)` to NullMonitor that takes a precomputed duration and routes it through the existing sidecar protocol — DebugMonitor and other NullMonitor subclasses pick it up automatically via inheritance. Also add SystemMonitor.count_once(name) so callers no longer need the awkward `with system_monitor.count(name): pass` idiom for discrete events.
f4ad318 to
8d490d0
Compare
Summary
Adds two small additions to the monitor API so callers can:
measure()'s contextmanager that wraps wall-clock around a yielded block).with system_monitor.count(name): passidiom.Motivation
The contextmanager-based
measure(name)is great when the start and end of the measured interval happen in the same Python process and the same call site. It doesn't fit when the duration is computed externally — for example, wall-clock from a workflow's run-creation timestamp to the terminal step's success, where the two boundaries are in different containers.The concrete use case: emitting a run-level
metaflow.run.duration_timermetric from a terminal-steptask_finishedhook, computing duration asnow() - Run.created_at. See companion internal PR for the consumer: corp/mli-metaflow-custom#1861 (the AtlasMonitor work where #3093's per-step tags are already being used in production).Changes
metaflow/monitor.py—NullMonitor.timer(name, value). Routes through the same sidecar protocol asmeasure(), soDebugMonitorand any otherNullMonitorsubclass inherits the behavior automatically.metaflow/system/system_monitor.py—SystemMonitor.timer(name, value)passthrough +SystemMonitor.count_once(name)helper.No existing call sites change. Existing
count()/measure()/gauge()APIs are untouched.Test plan
NullMonitor.timer()is a no-op when the sidecar isn't active.NullMonitor.timer()sends a singleBEST_EFFORTmessage with the precomputed duration when the sidecar is active.DebugMonitor(via inheritance) prints the timer throughDebugMonitorSidecar.process_message.SystemMonitor.count_once(name)increments a counter exactly once (verified against an instrumented sub-monitor).