Feat/coroutine pool#37
Conversation
…sity bench Phase 1 Step 9 of .agents/TODO.md asks the density benchmark to measure event-loop responsiveness (median + p99 task scheduling latency) and to record a hardware fingerprint (cpu, ram, kernel, python). The previous version captured RSS, success/failure counts, and elapsed time — these two additions close the gap so the Step 10 results doc has every input it needs. Scheduler latency: a background task requests a 10ms sleep on a loop and records the difference between the requested wakeup time and the actual return time. Under task pressure this rises and signals event-loop starvation. Result includes sample count, median, p99, and max in milliseconds. Hardware fingerprint: platform.processor / os.cpu_count / psutil.virtual_memory.total / platform.uname / platform.python_version captured per run so benchmark numbers are reproducible across boxes. Smoke run with --sessions 5 confirms the new fields render in the human-readable output and the JSON shape stays valid. Full test suite 374 passed, 2 skipped; ruff and mypy strict clean.
Three back-to-back N=50 runs against the §7 success gate, captured through the extended benchmark harness from 32bde3a (scheduler latency + hardware fingerprint). All pass: 50/50 sessions, 0 failures, peak RSS 354-367 MB (well under the 4 GB budget), elapsed 1.02-1.08 s. Median scheduler latency ~1.07 ms, p99 in the 3-6 ms band. Tail max spikes on two of three runs (50, 64 ms) were environmental single-sample transients; the clean Run C lands at p99 3.19 ms, max 3.36 ms. No regression from the harness instrumentation additions. Gate verdict unchanged: Phase 1 §7 gate met.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/benchmarks/density.py (2)
69-71: ⚡ Quick winReplace loose metrics dictionaries with typed dataclasses.
scheduler_latency_ms: dict[str, float]andhardware: dict[str, Any]lose schema guarantees and force awkward typing (for example,"samples"is a count but stored as float). Use typed dataclasses (or typed models) for these payloads.Suggested direction
+@dataclass(frozen=True) +class SchedulerLatencyStats: + samples: int + median_ms: float + p99_ms: float + max_ms: float + +@dataclass(frozen=True) +class HardwareFingerprint: + cpu_model: str + cpu_count: int | None + total_ram_gb: float | None + kernel: str + python_version: str @@ - scheduler_latency_ms: dict[str, float] = field(default_factory=dict) - hardware: dict[str, Any] = field(default_factory=dict) + scheduler_latency_ms: SchedulerLatencyStats | None = None + hardware: HardwareFingerprint | None = NoneAs per coding guidelines, "Prefer
dataclassor clear typed classes over loose dictionaries for structured data" and "Prefer concrete types overAnyin Python type hints."Also applies to: 236-245
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/benchmarks/density.py` around lines 69 - 71, Replace the loose dicts by defining explicit dataclasses for the scheduler latency payload and the hardware payload, then change the fields scheduler_latency_ms and hardware on the enclosing dataclass to use those concrete types (with default_factory for each dataclass) instead of dict[str, float] and dict[str, Any]; ensure the scheduler dataclass uses proper numeric types for counts (e.g., int for "samples") and floats for times, and the hardware dataclass types each field explicitly (no Any), update any code constructing or accessing scheduler_latency_ms and hardware to use the new attribute names/types on the dataclass (e.g., MySchedulerLatency and MyHardware classes referenced where scheduler_latency_ms and hardware were used), and apply the same replacement for the other occurrence noted around the second block.
162-163: ⚡ Quick winReplace
suppress(CancelledError)with explicit stop-event timeout waits.The latency sampler suppresses
CancelledError, which weakens shutdown semantics and can mask unexpected cancellations. Useasyncio.wait_for(stop.wait(), timeout=...)withsuppress(TimeoutError)instead, matching the pattern already used in_sample_loop_rssin the same file. This respects the stop signal immediately without swallowing exceptions.Suggested patch
async def _sample_loop_latency(stop: asyncio.Event, samples: list[float]) -> None: """Background task: measure scheduler wakeup latency in milliseconds. A small ``asyncio.sleep`` is requested every interval; the delta between the requested wakeup time and the actual return time is the loop's scheduling latency. Under heavy task pressure this rises and signals starvation of the event loop. """ while not stop.is_set(): target = time.monotonic() + _LATENCY_SAMPLE_INTERVAL_SECONDS - with contextlib.suppress(asyncio.CancelledError): - await asyncio.sleep(_LATENCY_SAMPLE_INTERVAL_SECONDS) + try: + await asyncio.wait_for(stop.wait(), timeout=_LATENCY_SAMPLE_INTERVAL_SECONDS) + except TimeoutError: + pass + if stop.is_set(): + break actual = time.monotonic() samples.append(max(0.0, (actual - target) * 1000.0))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/benchmarks/density.py` around lines 162 - 163, Replace the current "with contextlib.suppress(asyncio.CancelledError): await asyncio.sleep(_LATENCY_SAMPLE_INTERVAL_SECONDS)" in the latency sampling loop with an explicit wait on the stop event: use "await asyncio.wait_for(stop.wait(), timeout=_LATENCY_SAMPLE_INTERVAL_SECONDS)" and wrap that in "contextlib.suppress(asyncio.TimeoutError)" (mirroring _sample_loop_rss) so the loop respects immediate stop signals instead of swallowing CancelledError; keep the existing _LATENCY_SAMPLE_INTERVAL_SECONDS and stop symbols to locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/benchmarks/density.py`:
- Around line 69-71: Replace the loose dicts by defining explicit dataclasses
for the scheduler latency payload and the hardware payload, then change the
fields scheduler_latency_ms and hardware on the enclosing dataclass to use those
concrete types (with default_factory for each dataclass) instead of dict[str,
float] and dict[str, Any]; ensure the scheduler dataclass uses proper numeric
types for counts (e.g., int for "samples") and floats for times, and the
hardware dataclass types each field explicitly (no Any), update any code
constructing or accessing scheduler_latency_ms and hardware to use the new
attribute names/types on the dataclass (e.g., MySchedulerLatency and MyHardware
classes referenced where scheduler_latency_ms and hardware were used), and apply
the same replacement for the other occurrence noted around the second block.
- Around line 162-163: Replace the current "with
contextlib.suppress(asyncio.CancelledError): await
asyncio.sleep(_LATENCY_SAMPLE_INTERVAL_SECONDS)" in the latency sampling loop
with an explicit wait on the stop event: use "await
asyncio.wait_for(stop.wait(), timeout=_LATENCY_SAMPLE_INTERVAL_SECONDS)" and
wrap that in "contextlib.suppress(asyncio.TimeoutError)" (mirroring
_sample_loop_rss) so the loop respects immediate stop signals instead of
swallowing CancelledError; keep the existing _LATENCY_SAMPLE_INTERVAL_SECONDS
and stop symbols to locate the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 110e5ee4-fc2d-44f8-9b36-19aa5a23a102
📒 Files selected for processing (2)
docs/benchmarks/density-v0.1.mdtests/benchmarks/density.py
Codex review nitpicks on tests/benchmarks/density.py: 1. Loose dicts replaced with explicit dataclasses. SchedulerLatency carries samples (int), median/p99/max (float ms). Hardware carries cpu_model (str), cpu_count (int | None), total_ram_gb (float), kernel (str), python_version (str). DensityResult.scheduler_latency_ms and .hardware are now typed Optional[SchedulerLatency] and Optional[Hardware] instead of dict[str, float] / dict[str, Any]. _hardware_fingerprint() returns Hardware. _format_human accesses attrs by name (no key lookup). The JSON output shape stays stable (asdict serialization), with a samples-as-int fix; downstream consumers see the same keys. 2. _sample_loop_latency now waits on stop with asyncio.wait_for(...) and contextlib.suppress(TimeoutError), mirroring _sample_rss. Replaces the old contextlib.suppress(asyncio.CancelledError) + asyncio.sleep pattern that swallowed cancellation and ignored the stop event between samples. The loop now exits within one sample interval of stop.set() instead of always waiting the full interval. Verification: - uv run python tests/benchmarks/density.py --sessions 5 renders the same human-readable output with the new dataclass fields - uv run python tests/benchmarks/density.py --sessions 3 --json produces the same shape (samples is int now, not float) - uv run pytest: 374 passed, 2 skipped - uv run ruff check / ruff format: clean - uv run mypy src/: clean (the benchmark is in tests/, not part of strict typing surface, but field changes don't ripple to src/ either way)
The density benchmark imports psutil at module load time (tests/benchmarks/density.py:41 — added in 32bde3a) to capture total RAM in the hardware fingerprint. psutil currently resolves transitively through livekit-agents, but that's a fragile guarantee and the benchmark is documented as directly runnable ('uv run python tests/benchmarks/density.py'). Declare it explicitly under [dependency-groups].dev so contributors get a clean source install with the benchmark working out of the box. End-user installs are unaffected (psutil stays out of the runtime extras and the wheel). Codex review (P2 finding from the feat/coroutine-pool branch review) — undeclared import. Verification: - uv sync --group dev resolves cleanly; psutil 7.2.2 installed - uv run pytest: 374 passed, 2 skipped - uv run ruff check: clean - uv run mypy src/: clean - uv run python tests/benchmarks/density.py --sessions 3 exits 0 with the hardware fingerprint rendered
Summary by CodeRabbit
Release Notes
Documentation
Tests