perf(cv): decouple frame capture from inference in YOLOPoseSource#64
Merged
Conversation
…ve sources) For live sources (RTSP, HTTP, anything not is_file), _stream_one_capture now runs cap.read() and model.predict() as independent asyncio tasks communicating through a size-1 slot. Previously they were serialised: capture waited for predict, predict waited for capture, so a 15 fps camera + 100 ms/frame model ran end-to-end at ~6 fps. Drop-newest slot semantics: if the reader outpaces the predictor, the older queued frame is discarded and replaced with the newest one. The rep detector only ever needs the most recent frame, so backing up a FIFO would just add latency between what's happening in the gym and what the pipeline sees. Reader keeps publishing _latest_frame and _frame_id (and by extension the JPEG cache from PR #61) on every decode so the wall preview stays fresh even between predictions. File sources still run the original sequential path (renamed to _stream_sequential) — file replay is bounded by the predictor rather than a real-time camera, so double-buffering would race through the file and drop most frames. Shutdown carefully: the outer finally-block signals the reader to stop, drains the slot to unblock any in-flight put(), cancels the reader task, awaits it, and only THEN releases the VideoCapture — so we never release cap while a cap.read() is still on the thread pool. Benchmark (mocked cap + model, wall-clock accurate): - 15 fps camera + 100 ms predict: 5.98 fps → 12.80 fps (2.14x) - Fast reader + 200 ms predict: 4.60 fps → 23.93 fps (5.20x) Also reverts a speculative vectorisation of pose_sequence_to_features that was in my working tree: benchmarking showed the NumPy version was 0.6-0.7x the loop version at realistic T=150-500 clip lengths (per-call overhead dominates), and that function is called once per completed set anyway, not per frame. 5 new async tests cover happy-path yield, drop-newest under a slow predictor, clean shutdown on aiter close (VideoCapture released, no zombie reader), EOF sentinel handling, and fps counting still working. 42 pass / 1 skip (was 37/1). Ruff clean. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
For live sources,
_stream_one_capturenow runscap.read()andmodel.predict()as independent asyncio tasks communicating through a size-1 slot with drop-newest semantics. Previously they were serialised._latest_frame/_frame_id(drives PR perf(cv): cache last-encoded JPEG per frame in healthd's camera snapshot #61's JPEG cache) so the wall preview stays fresh even between predictions._stream_sequential) — file replay is predictor-bounded, so double-buffering would race through and drop most frames.Numbers (mocked cap + model, wall-clock accurate)
Also
Reverted a speculative vectorisation of
pose_sequence_to_features— benchmarking showed it was 0.6-0.7× the loop version at realistic clip lengths (T=150-500), and the function is called once per completed set anyway. Not a hot path; the loop wins.Test plan
Tag:
pump-cv-v0.6.0.🤖 Generated with Claude Code