Skip to content

Feat/coroutine pool#37

Open
mahimairaja wants to merge 4 commits intomainfrom
feat/coroutine-pool
Open

Feat/coroutine pool#37
mahimairaja wants to merge 4 commits intomainfrom
feat/coroutine-pool

Conversation

@mahimairaja
Copy link
Copy Markdown
Collaborator

@mahimairaja mahimairaja commented May 5, 2026

Summary by CodeRabbit

Release Notes

  • Documentation

    • Added new benchmark results dated 2026-05-05 confirming Phase 1 gate continues to pass.
  • Tests

    • Enhanced benchmark observability with scheduler latency metrics (median, p99, max).
    • Added hardware fingerprinting (CPU, RAM, kernel, Python version) to results.
    • Implemented diagnostic notes tracking in benchmark output.

…sity bench

Phase 1 Step 9 of .agents/TODO.md asks the density benchmark to
measure event-loop responsiveness (median + p99 task scheduling
latency) and to record a hardware fingerprint (cpu, ram, kernel,
python). The previous version captured RSS, success/failure
counts, and elapsed time — these two additions close the gap so
the Step 10 results doc has every input it needs.

Scheduler latency: a background task requests a 10ms sleep on a
loop and records the difference between the requested wakeup time
and the actual return time. Under task pressure this rises and
signals event-loop starvation. Result includes sample count,
median, p99, and max in milliseconds.

Hardware fingerprint: platform.processor / os.cpu_count /
psutil.virtual_memory.total / platform.uname / platform.python_version
captured per run so benchmark numbers are reproducible across
boxes.

Smoke run with --sessions 5 confirms the new fields render in the
human-readable output and the JSON shape stays valid. Full test
suite 374 passed, 2 skipped; ruff and mypy strict clean.
Three back-to-back N=50 runs against the §7 success gate, captured
through the extended benchmark harness from 32bde3a (scheduler
latency + hardware fingerprint). All pass: 50/50 sessions, 0
failures, peak RSS 354-367 MB (well under the 4 GB budget),
elapsed 1.02-1.08 s.

Median scheduler latency ~1.07 ms, p99 in the 3-6 ms band. Tail
max spikes on two of three runs (50, 64 ms) were environmental
single-sample transients; the clean Run C lands at p99 3.19 ms,
max 3.36 ms.

No regression from the harness instrumentation additions. Gate
verdict unchanged: Phase 1 §7 gate met.
@mahimairaja mahimairaja self-assigned this May 5, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 220939c8-2324-4ca0-bd06-96c584e571cb

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/coroutine-pool

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/benchmarks/density.py (2)

69-71: ⚡ Quick win

Replace loose metrics dictionaries with typed dataclasses.

scheduler_latency_ms: dict[str, float] and hardware: dict[str, Any] lose schema guarantees and force awkward typing (for example, "samples" is a count but stored as float). Use typed dataclasses (or typed models) for these payloads.

Suggested direction
+@dataclass(frozen=True)
+class SchedulerLatencyStats:
+    samples: int
+    median_ms: float
+    p99_ms: float
+    max_ms: float
+
+@dataclass(frozen=True)
+class HardwareFingerprint:
+    cpu_model: str
+    cpu_count: int | None
+    total_ram_gb: float | None
+    kernel: str
+    python_version: str
@@
-    scheduler_latency_ms: dict[str, float] = field(default_factory=dict)
-    hardware: dict[str, Any] = field(default_factory=dict)
+    scheduler_latency_ms: SchedulerLatencyStats | None = None
+    hardware: HardwareFingerprint | None = None

As per coding guidelines, "Prefer dataclass or clear typed classes over loose dictionaries for structured data" and "Prefer concrete types over Any in Python type hints."

Also applies to: 236-245

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/benchmarks/density.py` around lines 69 - 71, Replace the loose dicts by
defining explicit dataclasses for the scheduler latency payload and the hardware
payload, then change the fields scheduler_latency_ms and hardware on the
enclosing dataclass to use those concrete types (with default_factory for each
dataclass) instead of dict[str, float] and dict[str, Any]; ensure the scheduler
dataclass uses proper numeric types for counts (e.g., int for "samples") and
floats for times, and the hardware dataclass types each field explicitly (no
Any), update any code constructing or accessing scheduler_latency_ms and
hardware to use the new attribute names/types on the dataclass (e.g.,
MySchedulerLatency and MyHardware classes referenced where scheduler_latency_ms
and hardware were used), and apply the same replacement for the other occurrence
noted around the second block.

162-163: ⚡ Quick win

Replace suppress(CancelledError) with explicit stop-event timeout waits.

The latency sampler suppresses CancelledError, which weakens shutdown semantics and can mask unexpected cancellations. Use asyncio.wait_for(stop.wait(), timeout=...) with suppress(TimeoutError) instead, matching the pattern already used in _sample_loop_rss in the same file. This respects the stop signal immediately without swallowing exceptions.

Suggested patch
 async def _sample_loop_latency(stop: asyncio.Event, samples: list[float]) -> None:
     """Background task: measure scheduler wakeup latency in milliseconds.
 
     A small ``asyncio.sleep`` is requested every interval; the delta between
     the requested wakeup time and the actual return time is the loop's
     scheduling latency. Under heavy task pressure this rises and signals
     starvation of the event loop.
     """
     while not stop.is_set():
         target = time.monotonic() + _LATENCY_SAMPLE_INTERVAL_SECONDS
-        with contextlib.suppress(asyncio.CancelledError):
-            await asyncio.sleep(_LATENCY_SAMPLE_INTERVAL_SECONDS)
+        try:
+            await asyncio.wait_for(stop.wait(), timeout=_LATENCY_SAMPLE_INTERVAL_SECONDS)
+        except TimeoutError:
+            pass
+        if stop.is_set():
+            break
         actual = time.monotonic()
         samples.append(max(0.0, (actual - target) * 1000.0))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/benchmarks/density.py` around lines 162 - 163, Replace the current
"with contextlib.suppress(asyncio.CancelledError): await
asyncio.sleep(_LATENCY_SAMPLE_INTERVAL_SECONDS)" in the latency sampling loop
with an explicit wait on the stop event: use "await
asyncio.wait_for(stop.wait(), timeout=_LATENCY_SAMPLE_INTERVAL_SECONDS)" and
wrap that in "contextlib.suppress(asyncio.TimeoutError)" (mirroring
_sample_loop_rss) so the loop respects immediate stop signals instead of
swallowing CancelledError; keep the existing _LATENCY_SAMPLE_INTERVAL_SECONDS
and stop symbols to locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/benchmarks/density.py`:
- Around line 69-71: Replace the loose dicts by defining explicit dataclasses
for the scheduler latency payload and the hardware payload, then change the
fields scheduler_latency_ms and hardware on the enclosing dataclass to use those
concrete types (with default_factory for each dataclass) instead of dict[str,
float] and dict[str, Any]; ensure the scheduler dataclass uses proper numeric
types for counts (e.g., int for "samples") and floats for times, and the
hardware dataclass types each field explicitly (no Any), update any code
constructing or accessing scheduler_latency_ms and hardware to use the new
attribute names/types on the dataclass (e.g., MySchedulerLatency and MyHardware
classes referenced where scheduler_latency_ms and hardware were used), and apply
the same replacement for the other occurrence noted around the second block.
- Around line 162-163: Replace the current "with
contextlib.suppress(asyncio.CancelledError): await
asyncio.sleep(_LATENCY_SAMPLE_INTERVAL_SECONDS)" in the latency sampling loop
with an explicit wait on the stop event: use "await
asyncio.wait_for(stop.wait(), timeout=_LATENCY_SAMPLE_INTERVAL_SECONDS)" and
wrap that in "contextlib.suppress(asyncio.TimeoutError)" (mirroring
_sample_loop_rss) so the loop respects immediate stop signals instead of
swallowing CancelledError; keep the existing _LATENCY_SAMPLE_INTERVAL_SECONDS
and stop symbols to locate the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 110e5ee4-fc2d-44f8-9b36-19aa5a23a102

📥 Commits

Reviewing files that changed from the base of the PR and between b576dbb and 1d87413.

📒 Files selected for processing (2)
  • docs/benchmarks/density-v0.1.md
  • tests/benchmarks/density.py

Codex review nitpicks on tests/benchmarks/density.py:

1. Loose dicts replaced with explicit dataclasses. SchedulerLatency
   carries samples (int), median/p99/max (float ms). Hardware carries
   cpu_model (str), cpu_count (int | None), total_ram_gb (float),
   kernel (str), python_version (str). DensityResult.scheduler_latency_ms
   and .hardware are now typed Optional[SchedulerLatency] and
   Optional[Hardware] instead of dict[str, float] / dict[str, Any].
   _hardware_fingerprint() returns Hardware. _format_human accesses
   attrs by name (no key lookup). The JSON output shape stays stable
   (asdict serialization), with a samples-as-int fix; downstream
   consumers see the same keys.

2. _sample_loop_latency now waits on stop with asyncio.wait_for(...)
   and contextlib.suppress(TimeoutError), mirroring _sample_rss.
   Replaces the old contextlib.suppress(asyncio.CancelledError) +
   asyncio.sleep pattern that swallowed cancellation and ignored
   the stop event between samples. The loop now exits within one
   sample interval of stop.set() instead of always waiting the
   full interval.

Verification:
- uv run python tests/benchmarks/density.py --sessions 5 renders
  the same human-readable output with the new dataclass fields
- uv run python tests/benchmarks/density.py --sessions 3 --json
  produces the same shape (samples is int now, not float)
- uv run pytest: 374 passed, 2 skipped
- uv run ruff check / ruff format: clean
- uv run mypy src/: clean (the benchmark is in tests/, not part
  of strict typing surface, but field changes don't ripple to
  src/ either way)
The density benchmark imports psutil at module load time
(tests/benchmarks/density.py:41 — added in 32bde3a) to capture
total RAM in the hardware fingerprint. psutil currently resolves
transitively through livekit-agents, but that's a fragile guarantee
and the benchmark is documented as directly runnable
('uv run python tests/benchmarks/density.py'). Declare it
explicitly under [dependency-groups].dev so contributors get a
clean source install with the benchmark working out of the box.
End-user installs are unaffected (psutil stays out of the runtime
extras and the wheel).

Codex review (P2 finding from the feat/coroutine-pool branch
review) — undeclared import.

Verification:
- uv sync --group dev resolves cleanly; psutil 7.2.2 installed
- uv run pytest: 374 passed, 2 skipped
- uv run ruff check: clean
- uv run mypy src/: clean
- uv run python tests/benchmarks/density.py --sessions 3
  exits 0 with the hardware fingerprint rendered
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant