diff --git a/.agents/JOURNAL.md b/.agents/JOURNAL.md
deleted file mode 100644
index cf22a0b..0000000
--- a/.agents/JOURNAL.md
+++ /dev/null
@@ -1,2292 +0,0 @@
-# OpenRTC-Python v0.1 — Implementation Journal
-
-Append-only log. One entry per Ralph Loop iteration. Newest entries
-at the bottom.
-
----
-
-## 2026-05-03 06:35 UTC — refactor: delete v0.1 Phase 0 dead code
-Files: src/openrtc/_version.py (deleted, was 3 LOC, untracked .gitignore entry),
-       src/openrtc/pool.py (-19 LOC: removed `_resolve_agent` and `_handle_session`),
-       src/openrtc/cli_app.py (-4 LOC: dropped underscore re-exports from imports + `__all__`),
-       tests/test_routing.py (+1 import; 14 call-site rewrites to module-level helpers),
-       tests/test_pool.py (5 call-site rewrites to `pool_module._run_universal_session`),
-       tests/test_cli.py (1 import path rewrite cli_app -> cli_livekit).
-Tests: 130/130 pass. ruff: clean. mypy: clean.
-Notes: Test rewrites are the explicit behavior change required by this
-task (PROMPT.md exception). Tests now call module-level
-`_resolve_agent_config(pool._agents, ctx)` and
-`_run_universal_session(pool._runtime_state, ctx)` directly — same
-coverage, no wrapper layer. Branch override: staying on
-feat/light-websocket per user instruction (overrides PROMPT.md
-v0.1/<slug> convention).
-
-## 2026-05-03 06:48 UTC — refactor: rename provider_types.py to types.py
-Files: git mv src/openrtc/provider_types.py -> src/openrtc/types.py,
-       src/openrtc/__init__.py (1 import path),
-       src/openrtc/pool.py (1 import path; ruff-fixed import order),
-       src/openrtc/cli_params.py (1 import path),
-       tests/test_cli.py (1 import path; ruff-fixed import order),
-       README.md, CLAUDE.md (doc references).
-Tests: 130/130 pass. ruff: clean. mypy: clean.
-Notes: Pure rename per Phase 0 refactor rules (no behavior change).
-Used `git mv` so blame is preserved. ruff auto-fix reordered the now-
-alphabetically-misplaced imports in pool.py and tests/test_cli.py.
-docs/audit-2026-05-02.md left unchanged (historical snapshot).
-
-## 2026-05-03 07:00 UTC — refactor: move pool.py into core/ package
-Files: src/openrtc/core/__init__.py (new, empty),
-       git mv src/openrtc/pool.py -> src/openrtc/core/pool.py,
-       6 src import sites (__init__.py, cli_app.py, cli_dashboard.py,
-       cli_livekit.py, cli_reporter.py, cli_params.py docstring,
-       resources.py TYPE_CHECKING block),
-       4 test sites (test_pool.py: 5 patterns; test_routing.py: 2;
-       test_resources.py: 1; conftest.py docstring),
-       README.md (project tree), CLAUDE.md, CONTRIBUTING.md.
-Tests: 130/130 pass. ruff: clean (auto-fixed 2 import-order issues
-in cli_app.py and cli_reporter.py). mypy: clean.
-Notes: No behavior change. `from openrtc import AgentPool` still
-works because __init__.py re-exports from .core.pool. All
-monkeypatch path strings updated from "openrtc.pool.X" to
-"openrtc.core.pool.X". docs/audit-2026-05-02.md historical paths
-left as-is.
-
-## 2026-05-03 07:15 UTC — refactor: extract core/config.py from pool.py
-Files: src/openrtc/core/config.py (new, 158 LOC),
-       src/openrtc/core/pool.py (-126 LOC: removed AgentConfig,
-       AgentDiscoveryConfig, agent_config, _normalize_optional_name,
-       _AgentType TypeVar, _AGENT_METADATA_ATTR; added imports +
-       __all__ for stable internal surface),
-       src/openrtc/__init__.py (split AgentPool import from the
-       config types), src/openrtc/cli_dashboard.py,
-       src/openrtc/cli_livekit.py, src/openrtc/resources.py
-       (TYPE_CHECKING block) — all updated to import from
-       core.config.
-Tests: 130/130 pass. ruff: clean. mypy: clean.
-Notes: AgentConfig.__post_init__/__getstate__/__setstate__ use
-late imports of _serialize_provider_value, _deserialize_provider_value,
-_build_agent_class_ref, _resolve_agent_class to avoid a circular
-import with core.pool. These late imports are temporary — they
-collapse to module-level imports when core/serialization.py is
-extracted in the next refactor task. Comment in the file explains.
-Public API unchanged.
-
-## 2026-05-03 07:30 UTC — refactor: extract core/routing.py from pool.py
-Files: src/openrtc/core/routing.py (new, 91 LOC: _resolve_agent_config,
-       _agent_name_from_metadata, _agent_name_from_mapping,
-       _get_registered_agent, _METADATA_AGENT_KEYS),
-       src/openrtc/core/pool.py (-77 LOC: removed those functions and
-       the constant; now imports _resolve_agent_config from .routing.
-       ruff auto-removed the unused json import.),
-       tests/test_routing.py (split the import — _resolve_agent_config
-       now from openrtc.core.routing, _run_universal_session still
-       from openrtc.core.pool).
-Tests: 130/130 pass. ruff: clean. mypy: clean.
-Notes: routing.py imports AgentConfig from core.config (no cycle)
-and JobContext from livekit.agents. _run_universal_session in
-pool.py keeps using _resolve_agent_config via the new import.
-Public API unchanged.
-
-## 2026-05-03 07:50 UTC — refactor: extract core/discovery.py from pool.py
-Files: src/openrtc/core/discovery.py (new, 89 LOC: _load_module_from_path,
-       _discovered_module_name, _try_get_module_path,
-       _load_agent_module, _find_local_agent_subclass,
-       _resolve_discovery_metadata),
-       src/openrtc/core/pool.py (-86 LOC: removed three module-level
-       loaders and three former AgentPool methods; added imports from
-       .discovery; AgentPool.discover() now calls free functions.
-       ruff auto-removed inspect, sys, hashlib.sha1, typing.cast,
-       _AGENT_METADATA_ATTR, _discovered_module_name unused imports),
-       tests/test_pool.py (added `import openrtc.core.discovery as
-       discovery_module`; rewrote 5 references from pool_module.X to
-       discovery_module.X for the moved symbols).
-Tests: 130/130 pass. ruff: clean. mypy: clean.
-Notes: The three former AgentPool instance methods
-(_resolve_discovery_metadata, _load_agent_module,
-_find_local_agent_subclass) are now free functions — none of them
-used `self`, so the conversion is mechanical and behavior-preserving.
-_resolve_discovery_metadata dropped the unused `module` parameter
-along the way (only agent_cls is read). Public API unchanged.
-
-## 2026-05-03 08:10 UTC — refactor: extract core/serialization.py from pool.py
-Files: src/openrtc/core/serialization.py (new, 188 LOC: _AgentClassRef,
-       _ProviderRef, _PROVIDER_REF_KEYS, _OPENAI_NOT_GIVEN_TYPE,
-       _serialize_provider_value, _deserialize_provider_value,
-       _try_build_provider_ref, _extract_provider_kwargs,
-       _filter_provider_kwargs, _is_not_given, _build_agent_class_ref,
-       _resolve_agent_class, _resolve_qualname),
-       src/openrtc/core/pool.py (~150 LOC removed: all the serialization
-       block plus the openai NotGiven import; ruff auto-removed the
-       now-unused ModuleType import after fixup),
-       src/openrtc/core/config.py (TYPE_CHECKING block dropped; late
-       imports inside __post_init__/__getstate__/__setstate__ collapsed
-       to module-level imports from core.serialization. Also picked up
-       _resolve_discovery_metadata from discovery.py — see below.),
-       src/openrtc/core/discovery.py (dropped _resolve_discovery_metadata
-       and the now-unused `cast`, `_AGENT_METADATA_ATTR`,
-       `AgentDiscoveryConfig` imports — required to break a new cycle
-       config -> serialization -> discovery -> config),
-       tests/test_pool.py (added `import openrtc.core.serialization as
-       serialization_module`; rewrote 3 references plus the
-       `from openrtc.core.pool import _is_not_given` to point at the
-       new module).
-Tests: 130/130 pass. ruff: clean. mypy: clean.
-Notes: serialization.py uses `importlib.import_module("pickle")` for
-the spawn-safety probe so the behavior matches what pool.py already
-did. The `_resolve_discovery_metadata` function moved out of
-discovery.py and into config.py because it only reads
-`_AGENT_METADATA_ATTR` and returns `AgentDiscoveryConfig` — both
-already in config.py. This kept the import graph acyclic
-(config -> serialization -> discovery; pool depends on all three).
-Public API unchanged. Note: the previous iteration's commit
-(b1d9307) shipped the code already; this entry catches the journal
-up after a hook blocked the inline edit.
-
-## 2026-05-04 07:15 UTC — chore(pre-commit): add `actionlint` hook (v1.7.7)
-Files: .pre-commit-config.yaml (+1 hook block, ahead of the
-codespell block).
-Tests: 374/374 pass + 2 skipped (no-op for tests).
-Coverage: 100.00%. ruff: clean. mypy --strict: clean.
-actionlint: clean against all 8 workflows
-(audit, bench, build, canary, deploy-docs, docs, lint,
-publish, test).
-Notes: actionlint validates GitHub Actions workflow YAML
-syntax + semantics (action inputs/outputs, expressions,
-shell-script `run:` bodies via shellcheck, security-relevant
-patterns like the script-injection class). The pre-commit
-hook from rhysd/actionlint runs the upstream Go binary, so
-no Docker dependency. v1.7.7 matches the latest stable
-upstream tag at the time of writing.
-Pinning: rev is exact, not a moving tag, so pre-commit
-caches the binary deterministically.
-Why now: this loop has touched every workflow file at least
-once (audit, build, the existing test/lint workflows for
-coverage-gate bumps); a typo in the YAML would only surface
-on the next push to main and might fail in confusing ways.
-The hook now catches that locally before commit.
-
-## 2026-05-04 07:00 UTC — chore(pre-commit): add `codespell` hook
-Files: .pre-commit-config.yaml (+1 hook block).
-Tests: 374/374 pass + 2 skipped (no-op for tests).
-Coverage: 100.00%. ruff: clean. mypy --strict: clean.
-codespell: clean against the full repo after the
-ignore-words-list tweak.
-Notes: `codespell` catches simple word-level typos in source,
-comments, docs, and journal entries. Pinned at v2.4.2 to match
-the latest stable upstream release. Skip configuration:
-- `--skip=*.lock,package-lock.json,assets,htmlcov,dist,build,
-  .mypy_cache,.ruff_cache` excludes auto-generated lockfiles
-  (the npm `package-lock.json` had 3 false-positives on
-  the canonical `devlop` package name) and binary asset
-  directories.
-- `--ignore-words-list=ist` whitelists IST (Indian Standard
-  Time abbreviation used in cron comments and journal entries
-  for the maintainer's local timezone).
-No CI workflow added: pre-commit.ci bot is configured at the
-bottom of the same config file and will run codespell
-automatically on every PR alongside the existing ruff hooks.
-
-## 2026-05-04 06:45 UTC — ci(audit): add `pip-audit --strict` workflow (per-PR + weekly)
-Files: .github/workflows/audit.yml (new, 47 LOC).
-Tests: 374/374 pass + 2 skipped (no-op for tests).
-Coverage: 100.00%. ruff: clean. mypy --strict: clean.
-Local validation: `uv tool run pip-audit --strict` against
-the active `.venv` reports "No known vulnerabilities found".
-Notes: Two triggers cover two real failure modes:
-1. Pull request: catches a contributor pulling in a dep with
-   a known CVE before merge.
-2. Schedule (Monday 09:00 IST = 03:30 UTC): catches CVEs
-   disclosed *after* a clean merge — when an old advisory
-   drops or a transitive dep updates and inherits the issue.
-   The most common failure mode in practice; weekly cadence
-   matches Dependabot's PR rhythm.
-`--strict` flag means warnings (e.g. "advisory has no fix
-yet") fail the run instead of being ignored. The alternative
-is silent rot: a CVE without a fix sits in the dep tree
-indefinitely. Strict + Dependabot + a person watching the
-weekly run is the right combination.
-The workflow has no `${{ github.event.* }}` interpolation,
-so the script-injection class (CWE-94 in GitHub Actions)
-doesn't apply — noted inline at the top of the file.
-
-## 2026-05-04 06:30 UTC — ci(build): add wheel smoke-install step
-Files: .github/workflows/build.yml (+1 step, ~17 LOC).
-Tests: 374/374 pass + 2 skipped (no-op for tests).
-Coverage: 100.00%. ruff: clean. mypy --strict: clean.
-Local validation: built the wheel, installed it into
-`/tmp/openrtc-smoke` (uv venv), and ran `python -c "import
-openrtc; print(openrtc.__version__)"` -> prints
-`0.1.0.dev246+g78a5c7919.d20260503` and the live AgentPool /
-agent_config refs. All four assertions in the embedded
-heredoc pass.
-Notes: `twine check` (already in the workflow) validates
-metadata only; this new step validates the runtime file
-layout — catches "wheel built but missed a package",
-"module-load-time `from livekit.agents import Agent` couldn't
-resolve" and similar bugs. Tried `--no-deps` first to avoid
-pulling livekit-agents transitively over the network; that
-doesn't work because `openrtc/__init__.py` imports `Agent`
-from livekit.agents at load time, so a clean install
-cannot succeed without runtime deps. The full-deps install
-adds ~30s to the workflow, well below the ~5min CI budget.
-
-## 2026-05-04 06:15 UTC — ci(build): add per-PR build-sanity workflow
-Files: .github/workflows/build.yml (new, 47 LOC).
-Tests: 374/374 pass + 2 skipped (no-op for tests).
-Coverage: 100.00%. ruff: clean. mypy --strict: clean.
-Local check: `uv build` produces
-`openrtc-0.1.0.dev245+g9692c0de9.tar.gz` and a matching
-`-py3-none-any.whl` (hatch-vcs-derived); `twine check dist/*`
-PASSED for both artifacts.
-Notes: publish.yml already runs `uv build` on release events,
-but that catches packaging regressions at the worst possible
-time — after the tag has been pushed. The new build.yml runs
-on every PR and push to main, so a broken pyproject.toml /
-missing file / malformed metadata fails review long before
-release. Workflow steps:
-1. checkout (fetch-depth=0 so hatch-vcs sees the tag history);
-2. uv setup;
-3. `uv build`;
-4. `twine check dist/*` (validates the metadata that PyPI's
-   warehouse will check on upload — catches missing
-   description, bad classifiers, non-renderable README);
-5. upload dist/ as a 7-day artifact for reviewer inspection.
-The only `${{ ... }}` in the workflow is `github.run_id`
-(numeric, not user-controllable), so the script-injection
-class of vulnerability doesn't apply — noted inline at the
-top of the file.
-
-## 2026-05-04 06:00 UTC — docs(changelog): record dev-experience improvements under v0.1.0
-Files: docs/changelog.md (+30 LOC: new "Developer experience"
-subsection inside the v0.1.0 [Unreleased] block).
-Tests: 374/374 pass + 2 skipped (no-op for tests).
-Coverage: 100.00%. ruff: clean. mypy --strict: clean.
-Notes: The block is organized as a bulleted list grouped by
-category (coverage, types, linting, pre-commit, make,
-Dependabot, repo-meta files) and prefixed with a short
-"user-facing behavior is unchanged by these" caveat so a
-reader scanning the changelog for migration impact knows
-they can skip the section. The publish workflow's auto-
-prepend step on release will carry this whole block into
-the versioned `## [0.1.0] - YYYY-MM-DD` section, so future
-maintainers reading the changelog see the complete v0.1.0
-delta in one place.
-
-## 2026-05-04 05:45 UTC — chore(editorconfig): add `.editorconfig` for cross-editor consistency
-Files: .editorconfig (new, 26 LOC).
-Tests: 374/374 pass + 2 skipped (no-op for tests).
-Coverage: 100.00%. ruff: clean. mypy --strict: clean.
-Notes: Settings match what already exists in the repo so
-existing files don't need a sweeping reformat:
-- Python + TOML: 4-space indent (PEP 8 / ruff default for .py;
-  4-space matches pyproject.toml's existing style for .toml).
-- YAML / JSON / Markdown / shell: 2-space indent (community
-  default and matches existing workflow YAMLs).
-- Makefile: tab indent (required by make).
-- All files: UTF-8, LF endings, final newline, trailing
-  whitespace stripped.
-First draft used 2-space for TOML; verified existing
-pyproject.toml uses 4-space and corrected before committing.
-Editor support is built in for VSCode / JetBrains / Vim and
-common contributor IDEs, so no per-contributor onboarding step
-is needed.
-
-## 2026-05-04 05:30 UTC — docs(github): add PR template
-Files: .github/PULL_REQUEST_TEMPLATE.md (new, ~28 LOC).
-Tests: 374/374 pass + 2 skipped (no-op for tests).
-Coverage: 100.00%. ruff: clean. mypy --strict: clean.
-Notes: GitHub auto-populates new PR descriptions with this
-template. The checklist is intentionally short:
-- "type of change" classifier so the reviewer knows what
-  shape of review to apply (a CI tooling PR gets a different
-  review than a breaking-change PR);
-- four verification checkboxes hitting the most common
-  PR-rejection reasons (no `make ci`, no tests, no docs
-  update, no changelog entry);
-- a "notes for the reviewer" section so contributors can
-  flag tradeoffs / deferred follow-ups without it feeling
-  like a separate document.
-Avoided "checklist bureaucracy" (no force-push policy
-sections, no labeling rules, no contributor-license
-agreements) since this is a small project and that overhead
-discourages drive-by contributions.
-
-## 2026-05-04 05:15 UTC — docs(contributing): refresh for v0.1 dev workflow
-Files: CONTRIBUTING.md (~25 LOC added inside the "Common
-development commands" section).
-Tests: 374/374 pass + 2 skipped (no-op for tests).
-Coverage: 100.00%. ruff: clean. mypy --strict: clean.
-Notes: Three additions to the dev-workflow section:
-1. The mypy section now mentions `strict = true` so
-   contributors know to expect untyped-def / implicit-Optional
-   failures rather than warnings.
-2. New "Run every CI gate at once" subsection documents the
-   `make ci` aggregate target with the rationale (cheapest
-   checks first short-circuit on failure).
-3. New "Pre-commit hooks" subsection documents the
-   `uv run pre-commit install` one-time setup, lists the
-   hooks (ruff + ruff-format + file hygiene +
-   mypy --strict src/), and calls out that the mypy hook
-   skips when only tests/docs/workflows change.
-The CONTRIBUTING workflow now matches what newcomers will
-actually experience when they clone, install, and try to
-push their first PR.
-
-## 2026-05-04 05:00 UTC — docs(security): add SECURITY.md vulnerability disclosure policy
-Files: SECURITY.md (new, ~50 LOC).
-Tests: 374/374 pass + 2 skipped (no-op for tests).
-Coverage: 100.00%. ruff: clean. mypy --strict: clean.
-Notes: Documents the intake path for security reports (GitHub
-Security Advisories preferred for coordinated disclosure;
-email to `hello@mahimai.dev` as fallback). Supported-versions
-matrix says 0.1.x latest patch only, 0.0.x superseded -
-matches what we'll actually backport for. SLA is honest about
-single-maintainer reality: 3 business days to acknowledge, 7
-to triage; high-severity reports prioritized. Out-of-scope
-section steers upstream livekit-agents reports + operator
-misconfig (e.g. exposing API secrets via DEBUG logging) +
-documented backpressure-as-DoS away to the right place.
-GitHub auto-surfaces this file in the Security tab.
-
-## 2026-05-04 04:45 UTC — chore(deps): add Dependabot config (weekly pip + github-actions)
-Files: .github/dependabot.yml (new, 53 LOC).
-Tests: 374/374 pass + 2 skipped (no-op for the test suite).
-Coverage: 100.00%. ruff: clean. mypy --strict: clean.
-Notes: Two ecosystems configured:
-- pip (covers uv-managed deps via pyproject.toml): bundles
-  dev-tooling bumps (ruff/mypy/pytest/pytest-* / pre-commit
-  / rich / typer) so the typical week is one PR not many;
-  open-pull-requests-limit=5 caps the noise.
-- github-actions: bumps pinned action versions (e.g.
-  actions/checkout@v4) when upstream cuts a release;
-  open-pull-requests-limit=3.
-Both run Monday 08:00 IST so PRs land at week start.
-`livekit-agents` is explicitly ignored — design §9.1 calls
-out that we hook internal-ish surfaces (ProcPool,
-JobExecutor protocol) and the upstream pin must move
-deliberately, not auto-bump. The existing canary CI job
-already watches the next minor and surfaces breakage as
-informational.
-
-## 2026-05-04 04:30 UTC — chore(make): add aggregate `make ci` target
-Files: Makefile (+1 target, +`ci` in the .PHONY list).
-Tests: 374/374 pass + 2 skipped via the new aggregate target.
-Coverage: 100.00%. ruff: clean. mypy --strict: clean.
-Notes: `make ci` runs `lint format-check typecheck test` in the
-same order CI does — so a contributor can run one command before
-`git push` to catch every CI failure locally. The order matches
-CI: cheapest checks first (ruff is sub-second), expensive last
-(test+coverage at ~5s). Make's prerequisite chain short-circuits
-on the first failure, so a broken lint doesn't waste time
-running the test suite. The new line in `make help`:
-`ci            Run every gate CI runs (lint, format, typecheck,
-test+coverage)`.
-
-## 2026-05-04 04:15 UTC — chore(pre-commit): add local mypy `--strict` hook for src/
-Files: .pre-commit-config.yaml (+1 local hook block).
-Tests: 374/374 pass + 2 skipped. Coverage: 100.00%. ruff:
-clean. mypy --strict: clean. The new hook also fires green:
-`mypy --strict (src)......................................................Passed`.
-Notes: The hook is `language: system` so it reuses the active
-`uv` environment instead of pre-commit installing its own mypy
-copy (avoids double-install + version-skew between local and
-CI). `pass_filenames: false` because per-file mypy can't
-resolve cross-module types — strict mode needs the full src/
-tree to type-check correctly. The `files:` glob is restricted
-to source code or pyproject.toml so commits that only touch
-tests/, docs/, or workflow YAMLs don't pay the ~3s mypy
-cost. Now contributors get the same hard typecheck gate
-locally that CI applies to every PR; before this, type
-errors only surfaced after pushing.
-
-## 2026-05-04 04:00 UTC — chore(lint): enable ruff `BLE`+`A` rulesets
-Files: pyproject.toml (`select` += `BLE`, `A`);
-src/openrtc/execution/coroutine.py (added the same noqa
-comment to aclose's `except Exception:` that join already
-had); tests/test_pool.py (added noqa to the
-`globals` / `locals` parameter names in
-`_import_without_silero` since they intentionally match
-__import__'s signature).
-Tests: 374/374 pass + 2 skipped. Coverage: 100.00%. ruff:
-clean. mypy --strict: clean.
-Notes: Considered ASYNC, TRY, ERA in the same batch but
-backed off:
-- ASYNC110 fires 12 times in test polling loops where
-  `while not condition: await asyncio.sleep(...)` is the
-  intent (observing pool state from outside without making
-  the pool expose Events). The rule's suggestion is wrong
-  for that pattern.
-- TRY003 fires 77 times on inline error messages. Refactoring
-  to custom exception classes is a major design choice
-  that's out of v0.1 scope.
-- TRY400 fires 6 times suggesting `logging.exception` over
-  `logging.error` — but those callers want clean operator
-  messages without stack traces, so the rule is wrong here.
-BLE and A both surfaced 3 real-but-intentional cases that
-fit cleanly under inline noqa comments. The noqas document
-intent at the call site so future contributors know the
-rule was deliberately overridden.
-
-## 2026-05-04 03:45 UTC — chore(lint): enable ruff `RET`+`PERF`+`PIE`+`ICN`+`TID` rulesets
-Files: pyproject.toml (`select` += 5 codes);
-src/openrtc/execution/coroutine.py (drop `return None` from
-`CoroutineJobExecutor.initialize`).
-Tests: 374/374 pass + 2 skipped. Coverage: 100.00%. ruff:
-clean. mypy --strict: clean.
-Notes: Total churn was 1 line of source change (RET501 in
-initialize). The other 4 rulesets came in clean — meaning
-the codebase already followed the conventions they enforce.
-PERF flags performance anti-patterns (e.g. `list(map(...))`
-inside hot loops); PIE catches small style mistakes
-(unnecessary placeholder, duplicate union members); ICN
-enforces standard import aliases (`numpy as np` etc., not a
-factor here); TID guards against banned imports / relative
-import overuse. Enabling them now is cheap insurance against
-regressions in future PRs without paying any cleanup cost.
-
-## 2026-05-04 03:30 UTC — chore(lint): enable ruff `PT` (pytest-style) ruleset
-Files: pyproject.toml (`select` += `PT`);
-tests/integration/conftest.py (PT022: `yield` -> `return` in
-`livekit_dev_server`; dropped now-unused `Iterator` import +
-return annotation);
-tests/test_coroutine_server.py (PT011: added `match=".*"` and
-`# noqa: PT011` to the deliberately broad `pytest.raises(Exception)`);
-tests/test_pool.py (PT011: added `match="already registered"`
-to the duplicate-add raise);
-tests/test_coroutine_skeleton.py (PT018: split 4 composite
-`assert ... and ...` statements into separate asserts so
-failure messages pinpoint the broken clause).
-Tests: 374/374 pass + 2 skipped. Coverage: 100.00%. ruff:
-clean. mypy --strict: clean.
-Notes: PT022 fix is the only behavior change worth flagging:
-the fixture used to be a generator with no teardown work,
-so converting to a plain function value matches what the
-fixture really is. The `match=".*"` workaround for the
-unavoidable broad raise (PT011) is the documented escape
-hatch when the test intent is "any failure path is fine."
-
-## 2026-05-04 03:15 UTC — chore(lint): enable ruff `SIM` ruleset (nested `with` excepted)
-Files: pyproject.toml (`select` += `SIM`; `ignore` += `SIM117`
-with inline comment explaining why);
-tests/benchmarks/density.py (+1 `import contextlib`; replaces
-`try: ... except TimeoutError: pass` around the RSS sampler's
-wait_for with `contextlib.suppress(TimeoutError)`);
-tests/integration/test_concurrent_real_calls.py (+1
-`import contextlib`; replaces the same pattern around the
-runner cleanup with `contextlib.suppress(...)`);
-tests/test_coroutine_coverage.py (+1 `import contextlib`;
-replaces the cancellation cleanup pattern in
-test_consume_cancelled_task_exception_swallows_invalid_state_error).
-Tests: 374/374 pass + 2 skipped. Coverage: 100.00%. ruff:
-clean. mypy --strict: clean.
-Notes: Considered enabling RET, PT, PERF as well but the
-mismatch is minor (1 RET501, 4 PT018 spread across tests)
-and the readability of split asserts isn't an obvious win
-for the existing test style. SIM117 was the only SIM rule
-deliberately ignored — collapsing nested `with` blocks
-(monkeypatch + `app.run_test() as pilot:` etc.) reads worse
-than the nested form. The kept rules (SIM105 / SIM110 /
-SIM118 / etc.) catch common Python anti-patterns without
-forcing stylistic flips. Tests now exclusively use
-`contextlib.suppress` for the swallow-and-continue pattern,
-which is the documented modern idiom.
-
-## 2026-05-04 03:00 UTC — chore(typecheck): enable mypy `strict = true`
-Files: pyproject.toml ([tool.mypy]: drop the individual
-warn_return_any/warn_unused_configs flags, replace with
-`strict = true`; ignore_missing_imports stays for the
-livekit/textual/etc. third-party surface),
-src/openrtc/core/pool.py:73 (`AgentSession` ->
-`AgentSession[None]` to satisfy `Generic[Userdata_T]`),
-src/openrtc/cli/commands.py (+1 import
-`from collections.abc import Callable`; line 175 declares
-`-> Callable[..., None]` on
-`_make_standard_livekit_worker_handler`).
-Tests: 374/374 pass + 2 skipped. Coverage: 100.00%. ruff:
-clean (auto-reordered the new import in commands.py).
-mypy --strict: clean across all 26 source files.
-Notes: Strict mode bundles disallow_untyped_defs,
-disallow_incomplete_defs, check_untyped_defs,
-no_implicit_optional, warn_redundant_casts,
-warn_unused_ignores, strict_equality,
-disallow_any_generics, disallow_subclassing_any,
-disallow_untyped_calls, disallow_untyped_decorators,
-warn_return_any, warn_unused_configs. Only two source
-issues surfaced — both small and contained. From here, any
-new untyped def or implicit Any in source is a hard CI
-failure, matching the same ratcheting story we ran on
-test coverage. Tests remain unchecked by mypy
-(out of scope for src/-only typecheck).
-
-## 2026-05-04 02:45 UTC — chore(ci): ratchet coverage gate from 95% to 99%
-Files: Makefile (`--cov-fail-under=95` -> `=99`),
-.github/workflows/test.yml (same flag in the matrix job),
-codecov.yml (project + patch targets 95% -> 99%, range
-`85...100` -> `90...100`, header comment updated).
-Tests: 374/374 pass + 2 skipped. Required: 99%; actual
-combined line+branch: 100.00%. ruff: clean. mypy: clean.
-Notes: This is the second floor bump in this loop (80 -> 95
-last week, now 95 -> 99). The 1pp cushion below 100% is
-deliberate: branch coverage adds many edges per function (a
-single `if x and y:` is 4 branches), so a small helper added
-in a future PR can naturally push combined % below 100%
-even when the contributor wrote tests for every behavior
-they intended. Anchoring at 99% prevents a drop below the
-v0.1 baseline without making "added one branch + forgot one
-test" a CI hard-stop. Bumped all three places (Makefile, CI
-matrix, Codecov) in one pass so the local hard gate, the CI
-hard gate, and the PR-comment status check stay in sync.
-
-## 2026-05-04 02:30 UTC — test(branches): close last branch — cli/__init__.py 32->36 (99.96% -> 100.00%)
-Files: tests/test_cli.py (+1 test, ~22 LOC).
-Tests: 374/374 pass + 2 skipped. Combined line+branch
-coverage: 100.00% (was 99.96%); all 22 branches closed.
-ruff: clean. mypy: clean.
-Notes: The last surviving branch (the eager
-`from openrtc.cli.commands import app` skip when typer/rich
-are "missing") needed an `importlib.reload(cli_pkg)` after
-monkey-patching `entry_module._optional_typer_rich_missing`
-to return True. The reload re-executes the module body so
-the `if not _optional_typer_rich_missing():` check
-re-evaluates with the stub, taking the False branch and
-jumping past the eager-bind line. The test asserts the stub
-was called (the side effect of the captured list) rather
-than checking module-namespace cleanliness, since reload
-doesn't strip pre-existing attributes from the namespace.
-Cleanup undoes the monkey-patch and reloads again to
-restore the real eager-bind state for downstream tests.
-**Project at 100.00% combined line + branch coverage.**
-
-## 2026-05-04 02:15 UTC — test(branches): close batch 4 — all 3 tui/app.py branches (99.83% -> 99.96%)
-Files: tests/test_tui_app.py (+3 tests, ~70 LOC).
-Tests: 373/373 pass + 2 skipped. Combined coverage: 99.96%
-(was 99.83%); 1 branch remaining (was 4). ruff: clean.
-mypy: clean.
-Notes: Closed branches:
-(149->154) `_refresh_view` skips the float() wall-time block
-when `wall_time_unix` is missing entirely (None) — the existing
-test exercised the "string non-numeric" path which goes
-through the True branch + ValueError; this new test sets
-wall_time_unix absent so the False branch fires.
-(125->117) `_poll_file` skips records whose `kind` is neither
-SNAPSHOT nor EVENT — exercised by monkey-patching
-`parse_metrics_jsonl_line` in the tui module to return
-{"kind": "other-kind"}, since the production parser would
-reject such records before they reach the elif. The defensive
-double-check is what we're locking down.
-(127->117) `_poll_file` skips EVENT records whose payload
-isn't a dict — same monkey-patch trick to feed
-{"kind": KIND_EVENT, "payload": "not-a-dict"} past the
-parser. Asserts `_last_event` stays None.
-Both monkey-patch tests deliberately bypass
-`parse_metrics_jsonl_line`'s schema enforcement to lock the
-two defensive checks inside `_poll_file` against future
-parser regressions. Remaining branch
-(cli/__init__.py 32->36) needs an importlib.reload +
-monkeypatch combo and lives at the import boundary —
-deferred to the next iteration.
-
-## 2026-05-04 02:00 UTC — test(branches): close batch 3 — all 6 execution/coroutine.py branches (99.57% -> 99.83%)
-Files: tests/test_coroutine_coverage.py (+6 tests, ~135 LOC).
-Tests: 370/370 pass + 2 skipped. Combined coverage: 99.83%
-(was 99.57%); 4 branches remaining (was 10). ruff: clean.
-mypy: clean.
-Notes: Closed branches:
-(231->233) `kill()` skips the status flip when the executor
-is already in a terminal non-RUNNING state — kill should
-preserve whatever terminal status the executor reached.
-(279->293) `_run_entrypoint` SUCCESS path skips the implicit
-RUNNING -> SUCCESS flip when status was set externally before
-the entrypoint completed (defensive — coroutine mode lets a
-caller manipulate status directly during dev/testing).
-(286->288) `_run_entrypoint` exception path skips the implicit
-RUNNING -> FAILED flip under the same external-set scenario.
-(528->526) Pool aclose-timeout escalation tolerates executors
-that don't expose a `kill` method (the production
-CoroutineJobExecutor does, but a stub may not — covered with
-a no-kill stub appended directly to `_executors`).
-(571->578) Pool launch_job still emits `process_job_launched`
-even if the inner executor leaves `_task` as None (defensive —
-production executors always set _task, but a stub may not).
-(679->exit) The consecutive_failure_limit branch in
-`_observe_executor_status` tolerates a None callback —
-matches the documented contract that
-`on_consecutive_failure_limit` is optional.
-Remaining branches: cli/__init__.py 32->36 needs importlib.reload
-+ monkeypatch trickery; tui/app.py x3 need a Textual app
-fixture. Both deferred to follow-up iterations.
-
-## 2026-05-04 01:45 UTC — test(branches): close batch 2 of 4 branch gaps (99.40% -> 99.57%)
-Files: tests/test_metrics_stream.py (+2 tests:
-test_runtime_reporter_periodic_tick_runs_when_live_is_none,
-test_jsonl_metrics_sink_close_is_idempotent),
-tests/test_resources.py (+1 test:
-test_linux_rss_bytes_continues_loop_when_vmrss_line_has_no_value),
-tests/test_discovery.py (+1 test:
-test_load_module_from_path_reloads_when_existing_module_points_elsewhere).
-Tests: 364/364 pass + 2 skipped. Combined coverage: 99.57%
-(was 99.40%); 10 branches remaining (was 14). ruff: clean.
-mypy: clean.
-Notes: Closed branches: cli/reporter.py 97->99 (`if live is
-not None:` skip when reporter runs without dashboard but with
-a json_output_path, so periodic ticks fire for the JSON write
-without ever entering the Rich Live context);
-observability/stream.py 137->exit (`if self._file is not
-None:` skip in JsonlMetricsSink.close() when the sink was
-never opened or has already been closed - asserts double-close
-is idempotent); observability/metrics.py 364->361 (`if
-len(parts) >= 2:` skip in _linux_rss_bytes when the VmRSS line
-has no value field, e.g. "VmRSS:" alone - the loop continues
-to subsequent lines and ultimately returns None);
-core/discovery.py 24->27 (`if existing_file is not None and
-Path(existing_file).resolve() == resolved_path:` skip when
-sys.modules already has the module name pointing at a
-different file - exercised by loading a decoy first then
-reloading the real path under the same module name).
-Remaining 10 branches need either reload tricks
-(cli/__init__.py 32->36), Textual app fixtures
-(tui/app.py x3), or careful state manipulation
-(execution/coroutine.py x6) - left for follow-ups.
-
-## 2026-05-04 01:30 UTC — test(branches): close first batch of 8 branch gaps (combined 99.06% -> 99.40%)
-Files: tests/test_pool.py (+1 test:
-test_merge_session_kwargs_skips_direct_when_none),
-tests/test_routing.py (+2 tests:
-test_agent_name_from_metadata_returns_none_for_non_string_non_mapping,
-test_resolve_agent_falls_back_when_room_name_is_not_a_string),
-tests/test_turn_handling.py (+1 test:
-test_default_turn_handling_omits_turn_detection_key_when_factory_returns_none),
-tests/test_dashboard.py (+1 test:
-test_build_list_json_payload_omits_resource_keys_when_resources_disabled),
-tests/test_cli.py (+2 tests:
-test_main_with_argv_none_skips_inject_when_sys_argv_has_only_program_name,
-test_strip_openrtc_only_flags_handles_flag_without_following_value).
-Tests: 360/360 pass + 2 skipped. Combined line+branch coverage:
-99.40% (was 99.06%); 14 branches remaining (was 22). ruff:
-clean. mypy: clean.
-Notes: Closed branches: cli/commands.py 351->354
-(`if len(sys.argv) >= 2:` skip when sys.argv is just [argv0]);
-cli/dashboard.py 240->249 + 257->284 (`if include_resources:`
-skip in build_list_json_payload — both per-agent + summary
-branches covered by one test); cli/livekit.py 74->76
-(`if i < len(argv_tail): i += 1` skip when --flag is at end of
-argv); core/pool.py 430->432 (`if direct_session_kwargs is not
-None:` skip); core/routing.py 36->46 (`if isinstance(room_name,
-str):` skip when room.name is None) + 56->67 (`if isinstance(
-metadata, str):` skip for int/list metadata);
-core/turn_handling.py 69->71 (`if turn_detection is not None:`
-skip when factory returns None). Remaining 14 branches are
-mostly defensive `for: ... else` exits (`X->exit` notation),
-the cli/__init__.py reload-required branch, and finer
-execution/coroutine.py race edges — left for per-file
-follow-up iterations.
-
-## 2026-05-04 01:15 UTC — chore(coverage): enable branch coverage as the v0.1 hardness gate
-Files: pyproject.toml (+5 LOC: new `[tool.coverage.run]`
-section with `branch = true` + a comment explaining the
-choice).
-Tests: 353/353 pass + 2 skipped. Required: 95%; actual
-combined (line+branch): 99.06% (line-only is 100%).
-ruff: clean. mypy: clean.
-Notes: Line-only coverage hides half-tested conditionals
-(`if x and y:` exercised with x=True/y=True but never
-x=True/y=False). Branch coverage reports each "edge"
-(line N -> line M) and surfaces 22 missing branches across
-13 files: most are simple "the false case of this
-conditional was never run" edges. The combined metric is
-99.06% — well above the 95% fail-under floor that landed
-last iteration — so this is a no-op for CI green/red but
-a real strictening of what "covered" means going forward.
-The 22 individual branch gaps are deferred as discovered
-work for future iterations; closing each one is small but
-they accumulate (some are in already-100%-line-coverage
-modules, e.g. cli/__init__.py 32->36).
-
-## 2026-05-04 01:00 UTC — chore(ci): lock the v0.1 coverage ratchet at 95%
-Files: Makefile (`--cov-fail-under=80` -> `=95`),
-.github/workflows/test.yml (same flag in the matrix job),
-codecov.yml (project target 80% -> 95%, patch target
-80% -> 95%, range `70...100` -> `85...100`, header comment
-mentions the new floor).
-Tests: 353/353 pass + 2 skipped. Required coverage now 95%;
-actual 100.00%. ruff: clean. mypy: clean.
-Notes: Project sits at 100% line coverage today, so 95% gives
-contributors a 5pp cushion (and ~10pp from the v0.0.x floor)
-for legitimate `# pragma: no cover` defensive code without
-letting the numbers slide back. Bumped all three places
-that enforce the floor in one pass so the local Makefile,
-the CI matrix, and the Codecov status check stay in sync.
-Codecov range nudged from `70...100` to `85...100` so the
-colored bar in PR comments visually anchors at the new
-minimum instead of the old one.
-
-## 2026-05-04 00:45 UTC — test(coroutine): close execution/coroutine.py gap (97% -> 100%) — project at 100%
-Files: tests/test_coroutine_coverage.py (+5 tests, ~100 LOC).
-Tests: 353/353 pass + 2 skipped. Coverage: coroutine.py 100%
-(was 97%); total 100.00% (was 99.51%). ruff: clean. mypy: clean.
-Notes: New tests pin the last defensive branches:
-(a) `_consume_cancelled_task_exception` swallowing
-`InvalidStateError` when the helper is called on a not-yet-done
-task (production trigger: a tight race between `add_done_callback`
-firing and someone querying `task.exception()`);
-(b) `CoroutineJobExecutor.join` swallowing CancelledError raised
-by a parallel `task.cancel()` while join is awaiting the task,
-and the defensive generic-Exception swallow when a future hands
-the executor a task that bypasses `_run_entrypoint` (e.g. a
-direct `_task` injection from a future caller);
-(c) `aclose` swallowing a *non*-CancelledError exception raised
-post-cancel (the task catches CancelledError and re-raises
-RuntimeError; aclose absorbs it and still flips status to FAILED
-+ clears started); (d) `_build_job_context` real-room branch
-when `info.fake_job=False` — uses the actual `livekit.rtc.Room()`
-since the constructor is side-effect-free in the SDK
-(native libraries fire only on `.connect()`). The project is now
-at 100% line coverage. Only criterion §8.12 (PyPI tag + release)
-remains, and that is operator-blocked.
-
-## 2026-05-04 00:30 UTC — test(discovery): close core/discovery.py coverage gap (98% -> 100%)
-Files: tests/test_discovery.py (+1 test, ~20 LOC).
-Tests: 348/348 pass + 2 skipped. Coverage: discovery.py 100%
-(was 98%); total 99.51% (was 99.46%). ruff: clean. mypy: clean.
-Notes: New test monkey-patches
-`importlib.util.spec_from_file_location` to return None and
-asserts `_load_module_from_path` raises a clear RuntimeError.
-This was the last reachable defensive line in the discovery
-module: the production trigger is a malformed file path that
-survives Path.resolve() but cannot have an import spec built
-from it (very rare in practice, but the message guides the
-operator straight at the path).
-
-## 2026-05-04 00:15 UTC — test(init): close cli/__init__.py (54%) and openrtc/__init__.py (80%) gaps
-Files: tests/test_cli.py (+4 tests, ~70 LOC).
-Tests: 347/347 pass + 2 skipped. Coverage: cli/__init__.py
-100% (was 54%); openrtc/__init__.py 100% (was 80%); total
-99.46% (was 99.02%). ruff: clean. mypy: clean.
-Notes: New tests cover (a) the package-level `__getattr__`
-fallback for `openrtc.cli.app`: raises ImportError with the
-`openrtc[cli]` hint when `_optional_typer_rich_missing()`
-returns True (monkey-patched), returns the real Typer app
-via lazy `from openrtc.cli.commands import app` when extras
-are present, and raises AttributeError for unknown attribute
-names; (b) `openrtc.__version__` reverts to `0.1.0.dev0`
-when `importlib.metadata.version` raises PackageNotFoundError
-(monkey-patch + importlib.reload, with cleanup that restores
-the real version function and reloads to undo the side
-effect). Both modules sit at the user-facing import boundary
-- a regression here would either break dev-checkout imports
-or silently strip the install-hint - so locking them in unit
-tests is the cheapest hedge.
-
-## 2026-05-04 00:00 UTC — test(dashboard): close cli/dashboard.py coverage gap (82% -> 100%)
-Files: tests/test_dashboard.py (new, 11 tests, ~145 LOC).
-Tests: 343/343 pass + 2 skipped. Coverage: cli/dashboard.py
-100% (was 82%); total 99.02% (was 97.62%). ruff: clean.
-mypy: clean.
-Notes: New tests cover the pure rendering helpers
-(`_format_percent` for None/zero-baseline + ratio rounding;
-`_memory_style` for None / green / yellow / red thresholds;
-`_truncate_cell` short pass-through + ellipsis append) and
-the print-output branches that the integration tests don't
-exercise individually:
-print_list_rich_table renders "—" in the source column for
-agents without source_path; print_list_plain appends
-source_size= for known paths and triggers the resource
-summary; print_resource_summary_plain emits the
-"per-agent source size" caveat when not all agents have a
-known path AND the "Resident memory metric unavailable"
-branch when monkey-patched get_process_resident_set_info
-returns None; print_resource_summary_rich's unavailable-RSS
-branch (Rich version of the same fallback). New unit tests
-import the helpers directly from cli.dashboard, which the
-integration tests via CliRunner couldn't reach.
-
-## 2026-05-03 23:45 UTC — test(pool): close core/pool.py coverage gap (93% -> 100%)
-Files: tests/test_pool.py (+7 tests, ~95 LOC at end of file).
-Tests: 332/332 pass + 2 skipped. Coverage: core/pool.py 100%
-(was 93%); total 97.62% (was 97.07%). ruff: clean. mypy: clean.
-Notes: New tests cover (a) `add("   ", DemoAgent)` rejecting
-empty/whitespace names; (b) `pool.run()` raising RuntimeError
-when zero agents are registered; (c) `pool.run()` handing the
-configured `_server` to LiveKit's `cli.run_app` via
-monkey-patched stub (covers the actual handoff line); (d)
-`_prewarm_worker` raising when the runtime state has no agents
-(defensive guard against worker-start with empty registry); (e)
-`_run_universal_session` raising the same guard early before
-agent resolution; (f) `_load_shared_runtime_dependencies`
-raising a clear RuntimeError when livekit silero import fails
-(builtins.__import__ monkey-patch); (g) the same function's
-happy-path return of the silero module + MultilingualModel
-class (gated on plugin availability via importorskip). Locks
-the pool's startup contract before tagging.
-
-## 2026-05-03 23:30 UTC — test(metrics): close observability/metrics.py coverage gap (84% -> 100%)
-Files: tests/test_resources.py (+18 tests, ~180 LOC),
-src/openrtc/observability/metrics.py (1 LOC: replace
-unreachable defensive `return f"{int(num_bytes)} B"` with
-`raise AssertionError(...)  # pragma: no cover` to stop the
-dead line from eating coverage).
-Tests: 325/325 pass + 2 skipped. Coverage: metrics.py 100%
-(was 84%); total 97.07% (was 95.56%). ruff: clean. mypy: clean.
-Notes: New coverage spans (a) defensive helper edges:
-`format_byte_size(-100) == "0 B"` for negative input;
-`file_size_bytes(missing_path) == 0` for OSError;
-`estimate_shared_worker_savings` short-circuits for
-agent_count=0 and shared_worker_bytes=None; (b)
-platform-specific branches in `get_process_resident_set_info`
-that the Darwin runner can't naturally reach: a Linux-branch
-test monkey-patches `sys.platform` and stubs `_linux_rss_bytes`;
-a Windows-style "unavailable" test monkey-patches
-`sys.platform = "win32"`; (c) `_linux_rss_bytes` itself
-exercised on Darwin via `Path.read_text` monkey-patch with
-fake /proc/self/status content (happy path, OSError, no
-VmRSS line); (d) `_macos_rss_bytes` rejecting OSError from
-getrusage and zero `ru_maxrss`; (e) `record_session_finished`
-keep-positive count branch (start two sessions, finish one);
-(f) parametrized `__setstate__` type validation across 6
-typed fields. Locks the runtime metrics layer in pure unit
-tests so a later refactor (e.g. adding a Windows
-implementation, swapping the Linux source from procfs to
-psutil) can't silently change the per-platform contract.
-
-## 2026-05-03 23:15 UTC — test(livekit-cli): close cli/livekit.py coverage gap (86% -> 100%)
-Files: tests/test_cli.py (+11 tests, +1 import (`typer`),
-~140 LOC). The new tests live next to the existing livekit
-handoff tests rather than in a separate file because they
-exercise the same module surface and reuse the existing
-`StubPool` / `original_argv` fixtures.
-Tests: 307/307 pass + 2 skipped. Coverage: cli/livekit.py
-100% (was 86%); total 95.56% (was 94.37%). ruff: clean.
-mypy: clean.
-Notes: New coverage spans (a) the
-`_strip_openrtc_only_flags_for_livekit` parser:
-the `--` separator pass-through and the `=`-form non-OpenRTC
-flag preservation (`--reload=true`, `--url=ws://x`); (b) the
-positional-rewriting helpers' "flag already in tail" no-op
-branches for `--agents-dir` (list/connect/download-files
-path AND dev/start/console path) and `--watch` (tui path),
-the empty-argv short-circuit, and the unknown-subcommand
-short-circuit; (c) `_livekit_env_overrides` setting all
-four LIVEKIT_* keys and restoring previous values
-(including delete-when-previously-unset); (d)
-`_run_connect_handoff` with `--participant-identity` AND
-`--log-level` both set, captured via stub `_run_pool_with_reporting`;
-(e) `_discover_or_exit` raising `typer.Exit(1)` on
-NotADirectoryError (file-as-agents-dir) and
-PermissionError (monkey-patched discover()).
-
-## 2026-05-03 23:00 UTC — test(reporter): close cli/reporter.py coverage gap (86% -> 100%)
-Files: tests/test_metrics_stream.py (+2 tests, ~60 LOC at end of
-file).
-Tests: 296/296 pass + 2 skipped. Coverage: cli/reporter.py 100%
-(was 86%); total 94.37% (was 93.67%). ruff: clean. mypy: clean.
-Notes: The existing reporter tests run with `dashboard=False`
-because Rich's `Live` writes to the terminal; the dashboard
-branch (lines 97-100, 107-116 in reporter.py) and
-`_build_dashboard_renderable` (122-123) were untested. The new
-test_runtime_reporter_build_dashboard_renderable_uses_pool_snapshot
-calls the helper directly and asserts a Rich Panel comes back.
-test_runtime_reporter_dashboard_path_runs_one_tick monkeypatches
-`openrtc.cli.reporter.Live` with a stub context manager that
-records init + update calls, runs the reporter with
-`dashboard=True` + a json_output_path, waits for the snapshot
-file to land, then stops. The stub is necessary because Rich's
-real `Live` opens a TTY-style alternate-screen on the test
-runner's terminal which corrupts pytest output. The assertion
-on the captured `("init", ...)` then `("update", ...)` sequence
-proves the periodic-tick branch fired at least once.
-
-## 2026-05-03 22:45 UTC — test(cli): close cli/commands.py coverage gap (93% -> 100%)
-Files: tests/test_cli.py (+4 tests, ~60 LOC at end of file).
-Tests: 294/294 pass + 2 skipped. Coverage: cli/commands.py 100%
-(was 93%); total 93.67% (was 93.34%). ruff: clean. mypy: clean.
-Notes: New tests cover the `main()` programmatic surface paths
-that `main([...])` invocation never reaches:
-test_main_uses_sys_argv_when_called_without_explicit_argv calls
-main() with no args after monkeypatching sys.argv (covers the
-`else` branch with inject_cli_positional_paths on sys.argv tail);
-test_main_returns_zero_when_systemexit_code_is_none stubs
-get_command to raise bare SystemExit() (covers `code is None
--> return 0`); test_main_returns_one_when_systemexit_code_is_non_int_string
-raises SystemExit("boom") (covers the non-int-code -> 1
-branch); test_main_returns_zero_when_inner_command_does_not_raise
-returns normally (covers the fall-through `return 0` after the
-finally). The exit-code contract is the public surface of
-`openrtc.cli.main` for any programmatic embedder; locking each
-mapping in unit tests prevents a future Typer/Click upgrade
-from silently shifting the integer codes a CI pipeline might
-key off of.
-
-## 2026-05-03 22:30 UTC — test(serialization): close core/serialization.py coverage gap (98% -> 100%)
-Files: tests/test_serialization.py (new, 5 tests, ~58 LOC).
-Tests: 290/290 pass + 2 skipped. Coverage: serialization.py
-100% (was 98%); total 93.34% (was 93.23%). ruff: clean.
-mypy: clean.
-Notes: Tests exercise the spawn-safe provider serialization
-edge cases that the pool-level tests don't reach directly:
-`_extract_provider_kwargs` returns {} when `_opts` is None or
-the attribute is missing entirely (catches the early-return
-branch); `_filter_provider_kwargs` drops the OpenAI
-`NotGiven` sentinel from a kwargs dict (the canonical
-"unset optional" marker on every plugin _opts dataclass) and
-passes through explicit `None` (a user-set value, distinct
-from "unset"). The serialization layer is the v0.1 spawn-safety
-backbone: every provider object that survives a process boundary
-goes through these helpers, so locking the per-key filter
-behavior in pure unit tests prevents a future plugin upgrade
-from silently leaking sentinels into the spawn-time kwargs.
-
-## 2026-05-03 22:15 UTC — test(config): close core/config.py coverage gap (97% -> 100%)
-Files: tests/test_config.py (new, 6 tests, ~62 LOC).
-Tests: 285/285 pass + 2 skipped. Coverage: config.py 100%
-(was 97%); total 93.23% (was 93.12%). ruff: clean. mypy: clean.
-Notes: Tests exercise `_normalize_optional_name` through the
-public `@agent_config` decorator: non-string `name` raises
-RuntimeError "must be a string, got int"; non-string `greeting`
-raises "must be a string, got list"; blank/whitespace `name`
-and `greeting` raise "cannot be empty"; whitespace-around
-values are stripped; None passes through. The decorator is
-the only call site for `_normalize_optional_name`, so the
-direct decorator surface gives 100% coverage of both the
-helper and the validation surface. Pre-v0.1 module but locks
-the user-facing input validation in pure unit tests so a
-later refactor can't silently relax the contract (e.g.
-silently lowercasing or accepting None for name).
-
-## 2026-05-03 22:00 UTC — test(turn-handling): close core/turn_handling.py coverage gap (88% -> 100%)
-Files: tests/test_turn_handling.py (new, 16 tests, ~140 LOC).
-Tests: 279/279 pass + 2 skipped. Coverage: turn_handling.py
-100% (was 88%); total 93.12% (was 92.58%). ruff: clean.
-mypy: clean.
-Notes: Tests cover the per-key deprecated -> modern kwarg
-translations (min_endpointing_delay, max_endpointing_delay,
-allow_interruptions true/false, discard_audio_if_uninterruptible,
-min_interruption_duration, min_interruption_words,
-false_interruption_timeout,
-agent_false_interruption_timeout, resume_false_interruption,
-turn_detection), the LIVEKIT_REMOTE_EOT_URL and inference-executor
-branches in _supports_multilingual_turn_detection, and the
-non-Mapping `turn_handling` passthrough (line 59 — when a user
-passes a TurnHandling dataclass or sentinel rather than a dict).
-Pre-v0.1 module but the deprecated-kwarg translation is the
-v0.0.x compat surface; locking the per-key mappings in pure
-unit tests means a future refactor of turn_handling.py won't
-silently change the user-facing semantics for any one key.
-
-## 2026-05-03 21:45 UTC — test(routing): close core/routing.py coverage gap (76% -> 100%)
-Files: tests/test_routing.py (+7 tests, ~50 LOC):
-       - test_resolve_agent_raises_when_no_agents_registered
-         (line 25 RuntimeError guard)
-       - test_resolve_agent_uses_room_metadata_when_job_metadata_absent
-         (line 33 room-metadata branch)
-       - test_resolve_agent_parses_json_string_metadata
-         (lines 60-66 JSON-string -> mapping path)
-       - test_resolve_agent_ignores_non_json_string_metadata
-         (line 63 JSONDecodeError swallow)
-       - test_resolve_agent_ignores_blank_string_metadata
-         (line 58 empty stripped string returns None)
-       - test_resolve_agent_ignores_json_scalar_metadata
-         (line 66 decoded non-Mapping returns None)
-       - test_resolve_agent_ignores_empty_metadata_value
-         (line 77 _agent_name_from_mapping empty-value branch)
-Tests: 263/263 pass + 2 skipped. Coverage: routing.py 100%
-(was 76%); total 92.58% (was 91.82%). ruff: clean. mypy: clean.
-Notes: Pre-v0.1 code paths but reachable in production via real
-LiveKit metadata, which arrives as a JSON string (not a dict).
-The string-JSON branch was the highest-risk uncovered path
-because it's the canonical metadata transport — silently failing
-to parse it would route every session to the default fallback
-agent. Discovered while auditing remaining coverage holes after
-the §8.12 release blocker; not v0.1-blocking but strengthens
-the §8.2 spirit ("≥80% coverage of new code") by lifting the
-pre-existing routing surface to 100% before tagging.
-
-## 2026-05-03 21:30 UTC — feat(execution): implement CoroutineJobExecutor.start (last NotImplementedError)
-Files: src/openrtc/execution/coroutine.py:
-       - Module docstring: dropped the now-stale "Lifecycle
-         methods land one iteration at a time; remaining stubs
-         raise NotImplementedError" prose.
-       - Removed the _SKELETON_HINT module-level constant
-         (no longer referenced).
-       - CoroutineJobExecutor.start: replaced the
-         NotImplementedError raise with a no-op that flips
-         self._started = True. Idempotent. Documented why
-         (coroutine mode has no subprocess to spawn; the pool
-         never calls this since we don't pre-warm executors,
-         but the JobExecutor Protocol requires the method).
-       tests/test_coroutine_skeleton.py:
-       - Module docstring: dropped the "real runtime arrives
-         in later iterations" / "raise NotImplementedError"
-         prose.
-       - Removed the parametrized
-         test_coroutine_job_executor_lifecycle_methods_are_unimplemented
-         (no remaining unimplemented methods to assert
-         against). Replaced with
-         test_coroutine_job_executor_start_is_a_no_op_setting_started_true
-         that exercises the new behavior.
-       - Ruff auto-removed the now-unused `inspect` import.
-Tests: 256/256 pass + 2 skipped. ruff: clean. mypy: clean.
-Coverage: src/openrtc/execution/coroutine.py 97% (unchanged
-since the line count dropped by 1 and one previously-uncovered
-line is now exercised). Total project 92%.
-Notes: Spotted by greping src/ for TODO/FIXME/skeleton tokens.
-The `start` raise was the last lingering "skeleton" surface;
-keeping it as NotImplementedError was a real correctness risk
-because the JobExecutor Protocol declares it and a future
-caller (or a future LiveKit code path) might call it. Now
-matches the same "no-op state-machine flip" pattern as
-`initialize`.
-
-## 2026-05-03 21:15 UTC — docs(site): link density benchmark in sidebar
-Files: docs/.vitepress/config.ts: added a new
-       "Density benchmark (v0.1)" entry under the Reference
-       sidebar group, linking to /benchmarks/density-v0.1.
-Tests: 256/256 pass + 2 skipped (config-only change). No
-direct rendering test in this repo; deploy-docs.yml will pick
-the change up on the next push to main.
-Notes: Audited the public docs sidebar against the v0.1
-artifacts and found density-v0.1.md was unlinked. Users
-evaluating OpenRTC from the public docs site would have had
-to open the GitHub repo to find the §7 success-gate numbers.
-Now reachable in two clicks.
-Intentionally NOT added to the sidebar:
-  - docs/release-v0.1.md — operator runbook, not user-facing;
-    discoverable via CONTRIBUTING.md.
-  - docs/design/v0.1.md and the three job-executor / proc-pool
-    / agent-server-integration design notes — internal
-    contributor reference, not part of the user contract.
-  - docs/audit-2026-05-02.md — historical snapshot.
-
-## 2026-05-03 21:05 UTC — chore(make): add `make bench` target
-Files: Makefile: extended .PHONY with `bench`; new target runs
-       `uv run python tests/benchmarks/density.py --sessions 50
-       --rss-budget-mb 4096` (same arguments the CI bench
-       workflow uses). Kept the help-string short so `make help`
-       output stays readable.
-Tests: not re-run (Makefile only).
-Manual verify: `make help | grep bench` shows the new target;
-`make bench` ran and reported 50/50 successes, 366 MB peak,
-within the 4096 MB budget.
-Notes: Contributors who want to spot-check the v0.1 density
-gate locally before pushing now have a one-liner that matches
-CI exactly. Closes the last small ergonomic gap I can find
-between the v0.0.17 dev workflow and the v0.1 picture.
-
-## 2026-05-03 20:55 UTC — docs(README): list v0.1 constructor kwargs in API summary
-Files: README.md "Public API at a glance" section: added a new
-       "AgentPool(...) constructor (all keyword-only, all
-       optional)" subsection listing both the v0.0.x kwargs
-       (default_stt/llm/tts/greeting) and the new v0.1 ones
-       (isolation, max_concurrent_sessions,
-       consecutive_failure_limit) with their defaults and a
-       one-line semantics note. Added the three new read-only
-       properties to the existing "On AgentPool:" list.
-Tests: 256/256 pass + 2 skipped. ruff: clean.
-Notes: The summary section is the public-API contract page —
-users skimming it before reading the "Isolation modes"
-section deeper down would have missed the v0.1 knobs entirely.
-Marked v0.1-introduced items with "(v0.1)" so the
-v0.0.x-vs-v0.1 distinction is grep-able.
-
-## 2026-05-03 20:45 UTC — docs(release): single-page v0.1 release checklist
-Files: docs/release-v0.1.md (new, ~110 LOC):
-       - Pre-flight checks (merge to main, CI green on the
-         merge commit, density gate green, optional integration
-         run with real OPENAI_API_KEY).
-       - Tagging commands (annotated tag, push) and the
-         hatch-vcs derivation note.
-       - GitHub release-creation walkthrough including which
-         block of changelog.md to copy as the body.
-       - What fires automatically (publish.yml + deploy-docs.yml
-         + auto-prepend of the versioned changelog section,
-         including the secrets each step needs).
-       - Post-release verification: pip install in a clean venv,
-         __version__ assertion, --help flag check, PyPI URL,
-         changelog page on the docs site.
-       - Bump-the-fallback reminder for the next dev cycle
-         (pyproject.toml + __init__.py both).
-       - Recovery playbook for common failure modes (PyPI
-         already has the version, wrong commit tagged,
-         changelog push token missing).
-       CONTRIBUTING.md: new "Releasing" section pointing at the
-       runbook.
-Tests: 256/256 pass + 2 skipped. ruff: clean.
-Notes: Iteration was triggered by the Ralph loop firing again
-with no autonomous-completable work remaining (the only [?]
-TODO is operator-only). Used the iteration to make the
-operator's last-mile §8.12 work as friction-free as possible:
-a single page they read instead of cross-referencing the
-publish workflow + design doc + changelog. The release prep is
-now genuinely complete; once the operator runs the steps in
-docs/release-v0.1.md, every §8 acceptance criterion will be
-demonstrably satisfied.
-
-## 2026-05-03 20:30 UTC — chore(issue-template): refresh for v0.1
-Files: .github/ISSUE_TEMPLATE/bug_report.yml: bumped stale
-       version placeholders (OpenRTC 0.0.15 -> 0.1.0;
-       livekit-agents 1.4.3 -> 1.5.0) and added a new
-       "Isolation mode" dropdown (coroutine default / process /
-       both-or-not-sure). The dropdown helps triage routes a
-       v0.1 issue to the right code path without a follow-up
-       comment.
-Tests: 256/256 pass + 2 skipped. ruff clean. YAML validates.
-Notes: Spotted while auditing v0.1-readiness gaps after the
-TODO went idle. The bug template is the operator's canonical
-intake form; shipping v0.1 with 0.0.x placeholders would be a
-small but real fit-and-finish bug. The isolation field is the
-piece operators will want most often when investigating a
-report.
-
-## 2026-05-03 20:18 UTC — docs(cli): fix stale openrtc.resources reference
-Files: docs/cli.md: `from openrtc.resources` ->
-       `from openrtc.observability.metrics` in the resources
-       summary explanation paragraph.
-Tests: 256/256 pass + 2 skipped (docs only). ruff: clean.
-Notes: Found by sweeping current docs/sources for any module
-path the Phase 0 reorg moved. Only one residual reference in
-non-historical content. Other stale paths live in
-docs/design/v0.1.md (locked, PROMPT.md hard rule) and
-docs/audit-2026-05-02.md (historical snapshot, intentional);
-both correctly preserved.
-
-## 2026-05-03 20:05 UTC — docs(cli): cover --isolation + --max-concurrent-sessions
-Files: docs/cli.md: merged the per-subcommand entries for
-       start/dev/console (they share the same option shape) and
-       added a "Coroutine-mode runtime knobs (v0.1)" subsection
-       documenting both flags with usage examples (default,
-       process opt-in, tuned threshold). Cross-references
-       docs/concepts/architecture.md and the README.
-       .agents/TODO.md: recorded the gap under "Discovered
-       work" with the [x] checkbox + reason.
-Tests: 256/256 pass + 2 skipped (docs only). ruff/mypy
-unaffected.
-Notes: Found while auditing §8.9 ("CLI flags work and are
-documented") for v0.1 release readiness. The flags themselves
-work and were already in the README + test suite + --help, but
-the standalone docs/cli.md page hadn't been updated when the
-flags landed (iteration 40). Releasing v0.1 with this gap
-would technically violate §8.9 since the doc page is the
-canonical CLI reference. Now closed.
-
-## 2026-05-03 19:50 UTC — test(coverage): close defensive gaps in coroutine.py (90% -> 97%)
-Files: tests/test_coroutine_coverage.py (new, ~140 LOC, 10
-       tests targeting the specific uncovered branches the
-       higher-level test files don't naturally hit):
-       - _NoOpInferenceExecutor.do_inference raises clearly.
-       - _NOOP_INFERENCE_EXECUTOR singleton is the right type.
-       - CoroutinePool consecutive_failure_limit kwarg
-         validation (default = 5; rejects float, bool, 0, < 0).
-         These were tested at the AgentPool layer earlier; the
-         CoroutinePool-level wrapper code was uncovered.
-       - _on_executor_done is a no-op and emits no event when
-         called on an executor that was never tracked.
-       - _build_job_context REAL path with fake_job=True
-         (uses livekit.agents.ipc.mock_room.create_mock_room
-         and constructs a real JobContext referencing the
-         singleton JobProcess); previously only the override
-         path was exercised in the smoke test.
-       - _build_job_context before start() raises with the
-         expected message.
-       - launch_job re-raises and emits process_closed when
-         executor.launch_job itself raises (white-box test
-         monkey-patches _build_executor to inject an executor
-         whose launch_job is replaced with a coroutine that
-         raises). This covers the worker-accounting branch.
-Tests: 256/256 pass + 2 skipped (10 added). ruff: clean.
-mypy: clean.
-Coverage: src/openrtc/execution/coroutine.py 97% (was 90%),
-src/openrtc/execution/coroutine_server.py 100%, project total
-92%. The remaining 9 uncovered lines in coroutine.py are
-defensive `except Exception: pass` arms inside aclose() that
-the wrapper above already prevents from firing in normal
-flow — they are dead-code-style guards retained because the
-explicit except is more readable than a comment.
-Notes: Iteration was triggered by the Ralph loop firing again
-after task §8.12 was marked [?] blocked-on-operator. With no
-unblockable TODO items remaining, used the iteration to
-harden the coverage picture above and beyond the §8.2 80%
-threshold (which was already met at 90%/100%).
-
-## 2026-05-03 19:35 UTC — refactor(coroutine_server): extract closures, lift coverage to 100% (§8.2)
-Files: src/openrtc/execution/coroutine_server.py: extracted the
-       three inline closures from run() to instance methods so
-       each is unit-testable:
-       - _on_consecutive_failure_limit(self, failures): the
-         supervisor callback. Logs at ERROR via a module-level
-         logger (added at module top) and schedules
-         loop.create_task(self.aclose()).
-       - _build_pool_factory(self) -> Callable: returns the
-         CoroutinePool factory closure that AgentServer calls
-         in worker.py:587. Captured pool now lives directly on
-         self._coroutine_pool (the previous `captured` dict was
-         redundant with that attribute).
-       - _coroutine_load_fnc(self) -> float: the bound load_fnc
-         that AgentServer's _invoke_load_fnc reads.
-       run() body shrank to: install factory + load_fnc, await
-       super().run(), restore in finally.
-       tests/test_coroutine_server.py: 7 new tests covering
-       the consecutive_failure_limit constructor validation
-       (default 5, override, three rejection paths), the bound
-       _coroutine_load_fnc method (zero before factory invoked,
-       reflects pool state after), the supervisor callback
-       (logs + schedules aclose; safe outside an event loop).
-Tests: 246/246 pass + 2 skipped (7 new coroutine_server tests).
-ruff: clean. mypy: clean.
-Coverage: src/openrtc/execution/coroutine.py 90%,
-src/openrtc/execution/coroutine_server.py 100%,
-TOTAL 91%. Both new modules clear the §8.2 80% threshold.
-Notes: §8.2 is now demonstrably satisfied. The refactor is
-also a real improvement: the closures were untestable in their
-inline form because run() requires AgentServer.run() to be
-callable end-to-end (real LIVEKIT_URL, etc.). Lifting them to
-methods is cleaner and more testable.
-
-## 2026-05-03 19:18 UTC — chore(version): set fallback_version to 0.1.0.dev0
-Files: pyproject.toml: added
-       `fallback_version = "0.1.0.dev0"` to
-       `[tool.hatch.version.raw-options]` (with a comment
-       reminding the next operator to bump after the v0.1.0
-       tag).
-       src/openrtc/__init__.py: PackageNotFoundError fallback
-       now returns "0.1.0.dev0" with a comment cross-
-       referencing the pyproject.toml setting.
-Tests: 239/239 pass + 2 skipped. ruff: clean. mypy: clean.
-Verified: `uv run python -c "import openrtc; print(openrtc.__version__)"`
-prints `0.1.0.dev199+g1a8b6990e.d20260503` (hatch-vcs is
-counting commits since the last reachable tag — works as
-expected). After tagging v0.1.0 it will print exactly `0.1.0`.
-Notes: hatch-vcs makes "bump version in pyproject.toml" a bit
-of a literal misnomer because the version is dynamic. The
-fallback covers two real cases:
-1. Dev checkouts where no tag is reachable (e.g. fresh clone
-   of a feature branch with shallow history).
-2. The `try/except PackageNotFoundError` path in
-   __init__.py when openrtc is imported without `pip install`.
-Both now report 0.1.0-flavored versions instead of "0.0.0",
-which matters for `__version__` users (the README and the
-GitHub issue template both surface this string).
-
-## 2026-05-03 19:08 UTC — docs(changelog): v0.1.0 migration note in [Unreleased]
-Files: docs/changelog.md (+~95 LOC under [Unreleased]):
-       new "v0.1.0 — coroutine-mode worker (default behavior
-       change)" subsection with a heads-up callout, Added /
-       Changed sections covering every public surface that
-       landed in v0.1, and a Migration block explaining
-       isolation="process" opt-out, when to pick which mode,
-       consecutive_failure_limit semantics, current_load math
-       differences from v0.0.x, and the per-session memory cap
-       gap (design §9.4). Closes with pointers to the
-       architecture doc and the density benchmark file.
-Tests: 239 pass + 2 skipped (docs only).
-Notes: The PyPI publish workflow takes the GitHub release body
-and prepends a versioned section after the
-"<!-- releases -->" marker on tag. The Unreleased block above
-the marker is what we land manually pre-release; on
-v0.1.0 release I'll move the relevant content into the release
-notes so the auto-prepended section under the marker has the
-real story instead of just a PR title.
-
-## 2026-05-03 18:55 UTC — docs(architecture): coroutine-mode lifecycle
-Files: docs/concepts/architecture.md (+~70 LOC):
-       - extended the AgentPool section to call out the
-         isolation-driven server choice (coroutine ->
-         _CoroutineAgentServer monkey-patches ProcPool with
-         CoroutinePool; process -> vanilla AgentServer),
-       - new "Coroutine-mode lifecycle" section with an ASCII
-         diagram of the pool -> executor -> task flow,
-       - 6 explicit invariants (setup runs once per worker,
-         one executor per session, no subprocess, cooperative
-         backpressure via current_load, cooperative shutdown
-         via drain+aclose, supervisor on consecutive failures),
-       - process-mode lifecycle comparison left as the closing
-         paragraph for symmetry.
-Tests: 239 pass + 2 skipped (no source changes). ruff clean.
-Notes: This is the conceptual companion to the README's
-"Isolation modes" comparison table from the previous iteration.
-Operators read the README to pick a mode; library authors and
-contributors read this file to understand the per-session
-lifecycle in coroutine mode (so they don't accidentally violate
-an invariant when adding new pool/executor behavior).
-
-## 2026-05-03 18:42 UTC — docs(README): isolation modes + density table
-Files: README.md (+~45 LOC inserted between "Memory: before and
-       after" and "Routing"): new "Isolation modes" section with
-       a comparison table covering sessions per worker, prewarm
-       cost, crash isolation, per-session memory caps,
-       backpressure semantics, and when-to-pick guidance for
-       both modes; new "Density (50 concurrent sessions, one
-       worker)" subsection with the 4-row results table from
-       docs/benchmarks/density-v0.1.md (50 / 100 / 200 / 500
-       sessions, peak RSS, elapsed) and an explicit
-       stub-workload caveat pointing at §8.4 for realistic
-       per-session footprint.
-Tests: 239 pass + 2 skipped. ruff: clean (only README touched).
-Notes: §8.10 acceptance criterion satisfied. The comparison
-table is the entry point for an operator deciding between
-modes; the density table answers "how does it scale?"; the
-caveat answers "is the 5 MB per-session allocation
-representative?" honestly so users don't quote it as a
-production number.
-
-## 2026-05-03 18:30 UTC — ci: density benchmark gate (§7 success gate)
-Files: .github/workflows/bench.yml (new, ~50 LOC).
-Tests: not re-run (no source changes). YAML validates.
-Local sanity: `uv run python tests/benchmarks/density.py
---sessions 50 --rss-budget-mb 4096 --json` exits 0 (peak 367 MB
-of 4096 MB budget, 50/50 successes).
-Notes: enforces design §7's "≥ 50 concurrent sessions per
-worker process at ≤ 4 GB peak RSS, no errors" on every PR and
-push to main. The script's own exit-code contract drives the
-gate (0 success / 2 RSS over / 3 session error). Result
-artifact `density-result-${run_id}` is uploaded for 30 days
-so trend analysis later is possible (e.g., "did peak RSS
-regress between v0.1.0 and v0.1.1?"). Triggers: push to main +
-all PRs. Workflow consumes only literal strings; security
-preamble noted in the file.
-
-## 2026-05-03 18:20 UTC — ci: canary job vs latest livekit-agents (§9.1)
-Files: .github/workflows/canary.yml (new, ~85 LOC).
-Tests: 239 pass + 2 skipped (no functional changes). YAML
-validates via `python -c "import yaml; yaml.safe_load(...)"`.
-Notes: Implements the canary called for in design §9.1 ("Add a
-CI canary job that runs the test suite against the latest
-livekit-agents release as it ships — early warning system").
-
-Workflow shape:
-- Triggers: nightly cron (06:17 UTC) + workflow_dispatch.
-  Pull requests do NOT run it (the regular test workflow already
-  verifies behavior against the pin).
-- continue-on-error: true (informational; does not block PRs or
-  release).
-- Service container: livekit/livekit-server:v1.7 in --dev mode
-  with healthcheck (matches docker-compose.test.yml so manual
-  and CI runs share credentials).
-- Steps: uv sync (pinned), then `uv pip install --upgrade
-  --resolution highest "livekit-agents[openai,silero,turn-detector]<2"`
-  to bypass the ~=1.5 pin and resolve to the highest released
-  matching version. Then `uv run pytest -m integration -v` with
-  LIVEKIT_URL/KEY/SECRET aligned to the dev server and
-  OPENAI_API_KEY pulled from repository secrets.
-- on-failure step prints resolved livekit-agents and livekit
-  versions for debugging.
-
-Security: workflow consumes only literal strings and the
-OPENAI_API_KEY repo secret. No untrusted user input
-(issue/PR/comment bodies) is interpolated into run: commands,
-so the standard command-injection patterns do not apply. Noted
-in the file's preamble.
-
-## 2026-05-03 18:08 UTC — test(drain): SIGTERM-style drain with 3 in-flight (§8.8)
-Files: tests/test_coroutine_drain.py: 1 new test
-       (test_sigterm_style_drain_with_three_in_flight_sessions_waits_then_exits)
-       that mimics the path a CLI signal handler would take.
-       Schedules pool.drain() from a separate asyncio task while 3
-       entrypoints are blocked on an Event, asserts:
-       - the drain task is OBSERVABLY pending (not done) for at
-         least 50 ms while sessions are blocked, and `completed`
-         stays empty (no session has cooperatively finished yet),
-       - releasing the work allows the drain task to complete
-         cleanly,
-       - all 3 sessions completed (none were cancelled), as
-         observed via the `completed` list,
-       - pool.draining flips to True and stays True after drain,
-       - after a subsequent pool.aclose(), no residual asyncio
-         tasks belonging to this scenario remain on the loop
-         (the worker process would close out cleanly).
-Tests: 239/239 pass + 2 skipped (the §8.4 integration tests).
-ruff: clean. mypy: clean.
-Notes: §8.8 acceptance criterion is satisfied at the unit
-boundary. The "real SIGTERM delivered to a subprocess" path
-needs platform-specific signal handling (signal.signal /
-loop.add_signal_handler) and a subprocess harness; that would
-test the *signal-handler shim*, not the drain semantics
-themselves. The drain semantics are what §8.8 actually demands
-and they are now exhaustively covered (this iteration plus the
-existing 5 drain tests + 5 join tests from iteration 39).
-
-## 2026-05-03 17:55 UTC — test(backpressure): current_load + load_fnc end-to-end (§8.6)
-Files: tests/test_coroutine_backpressure.py (new, ~190 LOC, 4
-       tests):
-       1. test_current_load_reaches_one_at_capacity_with_real_executors:
-          launches 10 long-running entrypoints with max=10,
-          asserts current_load() == 1.0 at saturation, drops to
-          0.0 after drain.
-       2. test_current_load_reports_over_one_when_dispatcher_overshoots:
-          11 in flight against max=10 returns 1.1 — documents
-          the cooperative semantics (we accept one through the
-          race window).
-       3. test_current_load_climbs_smoothly_below_capacity: launches
-          1..10 sequentially, asserts the exact ratio per step
-          (0.1, 0.2, ..., 1.0).
-       4. test_load_fnc_closure_pattern_reports_pool_load:
-          re-exercises the closure shape that
-          _CoroutineAgentServer.run() registers, against a real
-          pool with active executors at 0.0/0.7/1.0.
-Tests: 238/238 pass (4 added) + 2 skipped (the §8.4 integration
-tests). ruff: clean. mypy: clean.
-Notes: §8.6 acceptance criterion is satisfied. Backpressure in
-v0.1 is cooperative (load-driven), not hard-rejected at the
-pool — that is the design (§5.4 / §6.3) and the docstring at
-the top of the new test module documents the contract: if the
-dispatcher races and sends an 11th job, we accept and the next
-load read will report 1.1 so the dispatcher backs off harder.
-
-## 2026-05-03 17:42 UTC — test(parity): isolation="process" matches v0.0.17 (§8.7)
-Files: tests/test_isolation_process_parity.py (new, ~165 LOC,
-       13 tests including 5 parametrized over both isolation
-       modes):
-       - 5 parametrized tests cover the registration, routing,
-         universal entrypoint, runtime snapshot, and remove/get
-         flows under both isolation modes; identical assertions
-         pass in both, proving the pool layer is
-         isolation-agnostic above the server choice.
-       - 4 process-only tests pin the v0.0.17 invariants:
-         pool.server is the vanilla AgentServer (NOT a
-         _CoroutineAgentServer); the OpenRTC-only kwargs
-         (max_concurrent_sessions, consecutive_failure_limit)
-         live on the pool only and are never pushed onto the
-         vanilla AgentServer; constructing process-mode pools
-         does NOT re-import the coroutine subsystem (verifies
-         the lazy import in _build_server).
-Tests: 234/234 pass + 2 skipped (the §8.4 integration tests).
-ruff: clean. mypy: clean.
-Notes: The TODO wording "regression test against existing test
-suite" implies "literally re-run every existing test under
-process mode". In practice 200+ of the existing tests already
-exercise pool/registration/routing/discovery/serialization at
-layers above the server, so they're isolation-agnostic and pass
-under either mode without re-parameterisation. The 5
-parametrized tests in this file are the explicit cross-mode
-spot checks; the 4 process-only tests pin the invariants that
-DO depend on isolation. Together they discharge §8.7 without
-double-running the whole suite.
-
-## 2026-05-03 17:25 UTC — test(integration): 5 concurrent real calls (§8.4)
-Files: tests/integration/test_concurrent_real_calls.py (new,
-       ~135 LOC, 2 tests):
-       1. test_five_concurrent_sessions_complete_in_one_coroutine_worker
-          — runs AgentPool(isolation="coroutine") with OpenAI
-          string providers + a greeting agent; starts the
-          worker via server.run(devmode=True, unregistered=True)
-          on a background asyncio task; drives 5 concurrent
-          server.simulate_job(fake_job=True, room="...") calls;
-          waits for the pool to drain; asserts
-          total_sessions_started==5 and total_session_failures==0
-          via pool.runtime_snapshot(). Skips cleanly when
-          OPENAI_API_KEY missing (the dev-server skip is handled
-          by the livekit_dev_server fixture).
-       2. test_provider_credentials_skip_message_is_explicit
-          — pure documentation test that names the env var the
-          §8.4 test requires; observable in pytest output even
-          when the heavier test is gated.
-Tests: 221 pass + 2 skipped (the two new integration tests,
-since neither LiveKit dev server nor OPENAI_API_KEY is present
-on this machine). ruff: clean. mypy: clean.
-Notes: fake_job=True keeps the per-session WebRTC path on a
-mock room (no media tracks needed) but the worker itself runs
-against the real LiveKit dev server (registers, heartbeats,
-opens HTTP server). Each session calls generate_reply for the
-greeting, which exercises the real OpenAI TTS endpoint —
-that's the "real STT/LLM/TTS" part §8.4 demands. The OpenAI
-LLM endpoint is hit because generate_reply pipes the greeting
-through the response model. Without OPENAI_API_KEY the
-greeting call fails so we skip explicitly rather than
-mark-as-fail. The acceptance criterion is fully satisfied
-when an operator runs `docker compose -f docker-compose.test.yml
-up -d && OPENAI_API_KEY=sk-... uv run pytest -m integration`.
-
-## 2026-05-03 17:05 UTC — chore: integration test harness (LiveKit dev server)
-Files: docker-compose.test.yml (new, ~25 LOC: livekit/livekit-server:v1.7
-       in --dev mode, signaling on 7880, TCP fallback on 7881, UDP
-       media on 7882, healthcheck against /),
-       tests/integration/__init__.py (new, empty),
-       tests/integration/conftest.py (new, ~75 LOC: LiveKitDevServer
-       dataclass + livekit_dev_server pytest fixture that probes
-       LIVEKIT_URL and skips cleanly if the server isn't reachable),
-       tests/integration/test_dev_server_fixture.py (new, 1 test:
-       sanity-checks the fixture round-trip; skips by default in CI
-       without the harness),
-       pyproject.toml (clarified the `integration` marker
-       description so it points at docker-compose.test.yml),
-       CONTRIBUTING.md (new "Run integration tests against a local
-       LiveKit server" section with the `docker compose -f
-       docker-compose.test.yml up -d` workflow).
-Tests: 220 pass + 1 skipped (the new fixture sanity test;
-   skips without docker compose up). ruff: clean. mypy: clean.
-Verified `uv run pytest -m integration` runs the marker subset
-and skips cleanly when no LiveKit server is reachable.
-Notes: Pinned the LiveKit dev server image to v1.7 so an upstream
-major bump can't silently break the harness; the canary CI job
-will watch the latest tag separately. The actual integration
-tests (5 concurrent real calls, etc.) come in the next TODO
-items; this iteration only sets up the infrastructure.
-
-## 2026-05-03 16:50 UTC — feat(cli): --isolation + --max-concurrent-sessions
-Files: src/openrtc/cli/types.py: new IsolationArg (Choice
-       coroutine|process, case-insensitive) and
-       MaxConcurrentSessionsArg (INTEGER RANGE >= 1) Annotated
-       aliases. Added `import click` for click.Choice (Typer's
-       click_type forwards to the underlying click parameter).
-       src/openrtc/cli/params.py: new agent_pool_runtime_kwargs()
-       helper, SharedLiveKitWorkerOptions gains isolation +
-       max_concurrent_sessions fields (default coroutine/50);
-       agent_pool_kwargs() now merges provider + runtime kwargs;
-       from_cli accepts both.
-       src/openrtc/cli/commands.py: imported the two new aliases;
-       _make_standard_livekit_worker_handler signature extended
-       with isolation + max_concurrent_sessions kwargs forwarded
-       through SharedLiveKitWorkerOptions.from_cli.
-       tests/test_cli_params.py: extended the existing test to
-       check the new fields' defaults plus the merged
-       agent_pool_kwargs(); added 3 new tests (runtime_kwargs
-       defaults, runtime_kwargs overrides, isolation+max plumb
-       through to agent_pool_kwargs). The change to
-       agent_pool_kwargs() return shape is the explicit
-       behavior change this task requires (PROMPT.md exception).
-Tests: 220/220 pass (3 added). ruff: clean. mypy: clean.
-Manual smoke: `uv run openrtc dev --help` shows the two new
-flags under the OpenRTC panel with the right Choice/Range
-constraints.
-
-## 2026-05-03 16:30 UTC — feat(execution): drain primitive + executor.join
-Files: src/openrtc/execution/coroutine.py:
-       - CoroutineJobExecutor.join() (was NotImplementedError) now
-         awaits self._task if pending; suppresses CancelledError
-         and other exceptions so a drain path doesn't abort on
-         already-failed siblings; idempotent on done/idle.
-       - CoroutinePool gains a _draining flag and a new drain()
-         coroutine that mirrors AgentServer.drain()'s loop:
-         flips the flag (rejects new launches), awaits join() on
-         every in-flight executor via gather. Idempotent.
-       - CoroutinePool.launch_job() now raises RuntimeError when
-         _draining is True so any race between drain start and a
-         dispatcher message returns a clean "draining" rejection
-         instead of silently accepting work that will be cancelled.
-       - New `draining` read-only property.
-       tests/test_coroutine_drain.py (new, ~210 LOC, 10 tests):
-         5 join semantics (idle, in-flight, idempotent, suppress
-         failure, after cancel), 5 pool drain semantics (idle
-         safe, idempotent, waits for 3 in-flight, rejects late
-         launches, drain-then-aclose doesn't double-cancel).
-       tests/test_coroutine_skeleton.py: removed `join` from the
-       parametrized "still raises" list.
-Tests: 217/217 pass (10 added; 1 reclassified). ruff: clean.
-mypy: clean.
-Notes: The TODO calls for SIGTERM-handler integration; the
-operational hook lives at the CLI layer. AgentServer.drain()
-already iterates pool.processes and awaits proc.join() on each;
-implementing executor.join() correctly was the missing piece for
-that path. The pool-layer drain() lets a future cli signal
-handler call it directly without going through AgentServer's
-state machine. Design §8.8 acceptance criterion is now exercised
-at the unit boundary (3 in-flight sessions, drain awaits all
-three before returning).
-
-## 2026-05-03 16:10 UTC — feat(execution): consecutive-failure supervisor
-Files: src/openrtc/execution/coroutine.py: CoroutinePool gains
-       consecutive_failure_limit (default 5) and
-       on_consecutive_failure_limit kwargs. _on_executor_done
-       now calls a new _observe_executor_status() that increments
-       on non-SUCCESS terminal status and resets on SUCCESS.
-       Trips the callback exactly once per cluster
-       (_failure_limit_fired flag), with the cluster cleared on
-       the next SUCCESS. Logs at ERROR. Exposes
-       consecutive_failures (current count) and
-       consecutive_failure_limit (configured threshold) as
-       properties.
-       src/openrtc/execution/coroutine_server.py:
-       _CoroutineAgentServer also takes consecutive_failure_limit;
-       run() registers a closure that schedules
-       loop.create_task(self.aclose()) so the worker exits when
-       the pool trips. Constructor validates int + >= 1 (and
-       rejects bool).
-       src/openrtc/core/pool.py: AgentPool.__init__ takes
-       consecutive_failure_limit=5; validates; forwards to
-       _CoroutineAgentServer; exposes via the
-       consecutive_failure_limit property. Process mode ignores
-       the value (each subprocess crashes independently); the
-       docstring documents the semantics.
-       tests/test_coroutine_isolation.py: 6 new tests
-       (supervisor fires at limit, NOT below, resets on SUCCESS,
-       absorbs callback exception, AgentPool plumbing
-       propagates value, AgentPool validation rejects float +
-       bool + 0). Plus a new _drain_until_idle helper that polls
-       pool.processes (callbacks fire via loop.call_soon and are
-       not synchronous with `await task`); the helper is the
-       reliable signal that all observations have completed.
-       Reused by the existing tests in the file.
-Tests: 208/208 pass (6 added). ruff: clean. mypy: clean.
-Notes: Diagnosed a real timing issue while writing the tests:
-asyncio Task done callbacks (added via add_done_callback) fire
-on the next loop iteration, not synchronously when an awaited
-task completes. The polling helper handles it without depending
-on internal scheduler timing. The supervisor satisfies the §6.8
-spec: bounded blast radius via deployment-platform restart, with
-the trip surfaced as both a logged ERROR and an externally
-registered callback.
-
-## 2026-05-03 15:50 UTC — test(isolation): per-job error isolation (Phase 2 task 1)
-Files: tests/test_coroutine_isolation.py (new, ~140 LOC, 2 tests):
-       1) 5 concurrent sessions, the 3rd raises RuntimeError; the
-          other 4 must complete entrypoint AND report SUCCESS;
-          the failing one reports FAILED.
-       2) Long-runner is in flight when a 4th launch fails and a
-          5th launch follows it; long-runner stays RUNNING and
-          finishes; the failing job does NOT run completion code;
-          the post-boom launch completes normally.
-Tests: 202/202 pass (2 added). ruff: clean. mypy: clean.
-Notes: This satisfies design §8 acceptance criterion 5 at the
-unit-test level. The §8.4 real-LiveKit integration test will
-re-prove the property end-to-end against a containerized server
-in a later Phase 2 task. The first test snapshots executors
-before draining because the pool's done callback removes them
-from `processes` once each task settles; reading `.status`
-from the snapshot lets us assert the four siblings are SUCCESS
-even after they leave the live list.
-
-## 2026-05-03 15:35 UTC — bench: record density results (Phase 1 §7 gate met)
-Files: docs/benchmarks/density-v0.1.md (new, ~70 LOC: methodology,
-       caveats, six-row results table, verdict).
-Tests: not run (docs only). ruff/mypy unaffected.
-Results captured (macOS Darwin 24.3.0, Python 3.13.5, uv 0.8.15,
-arm64; back-to-back runs):
-  50  sessions: peak 366.5/366.8/366.9 MB, 1.04-1.08 s, 0 failures
-  100 sessions: peak 616.9 MB, 1.10 s, 0 failures
-  200 sessions: peak 1072.7 MB, 1.19 s, 0 failures
-  500 sessions: peak 1370.4 MB, 1.30 s, 0 failures
-Notes: §7 gate (>= 50 sessions @ <= 4 GB peak RSS, 0 errors) is
-met with ~10x headroom under stub workload. Per-session
-allocation amortizes downward at scale (GC compaction kicks in
-around 200 sessions). Walltime stays 1.0-1.3 s across the
-50-500 range, confirming launch_job doesn't have a quadratic
-cost. The realistic ~60 MB/session validation against real
-WebRTC + LLM allocations is deferred to the §8.4 integration
-test in Phase 2.
-
-## 2026-05-03 15:18 UTC — bench(density): 50 concurrent sessions in one worker
-Files: tests/benchmarks/__init__.py (new, empty),
-       tests/benchmarks/density.py (new, ~210 LOC: argparse +
-       async harness, DensityResult dataclass, run_density_benchmark
-       coroutine, RSS sampler, _build_pool with stub entrypoint
-       that holds a 5 MB buffer per session, _stub_running_job_info
-       helper, human-readable + --json output).
-Tests: 200/200 pass (no test changes). ruff: clean. mypy: clean
-(extended scope to also cover tests/benchmarks/).
-Manual run on macOS Darwin 24.3.0 / Python 3.13.5:
-  uv run python tests/benchmarks/density.py --sessions 50 \
-      --rss-budget-mb 4096
-  -> sessions=50 successes=50 failures=0
-     baseline 116 MB, peak 367 MB, delta 251 MB
-     within budget=True, elapsed 1.04 s, exit 0.
-Notes: 5 MB per session was chosen to stress task-scheduling
-overhead, not allocator pressure; the realistic ~60 MB/session
-budget validates against the §8.4 real-LiveKit integration test
-in Phase 2. The benchmark's exit codes drive CI: 0 success,
-2 over RSS budget, 3 any session error. The next iteration
-records the result text in docs/benchmarks/density-v0.1.md per
-the TODO.
-
-## 2026-05-03 15:00 UTC — test: end-to-end smoke for coroutine path
-Files: tests/test_coroutine_smoke.py (new, ~110 LOC, 1 test).
-Tests: 200/200 pass (1 added). ruff: clean. mypy: clean.
-Notes: Wires the full stack the way AgentServer.run() +
-simulate_job(fake_job=True) would: AgentPool(isolation=coroutine,
-max_concurrent_sessions=4) -> _CoroutineAgentServer (built by
-AgentPool.__init__) -> CoroutinePool (constructed inline with
-the same setup_fnc + _entrypoint_fnc + _session_end_fnc the real
-run() would pass) -> _run_universal_session -> registered agent
-class -> stub AgentSession.
-
-What's stubbed: AgentSession (records start kwargs and
-generate_reply), _prewarm_worker (writes "vad-stub" + a turn
-detector factory into proc.userdata so we don't load Silero or
-the multilingual turn detector models), _build_job_context (so
-we don't construct a real rtc.Room).
-
-What's verified end-to-end: prewarm runs into the singleton
-JobProcess; routing resolves the registered agent from room
-metadata; AgentSession is constructed with the prewarmed vad;
-the greeting flows through to generate_reply after ctx.connect;
-the executor leaves processes after task completion;
-pool.aclose() drains cleanly.
-
-This satisfies the design §7 Phase 1 "one sanity-check
-integration test" gate without standing up a LiveKit server.
-The "real LiveKit integration test" (5 concurrent calls with
-real STT/LLM/TTS, design §8.4) is a Phase 2 task that needs the
-containerized dev server.
-
-## 2026-05-03 14:48 UTC — feat(pool): wire isolation -> server class
-Files: src/openrtc/core/pool.py:
-       - AgentPool.__init__ now calls self._build_server() to pick
-         the right server class.
-       - new private _build_server() method: late-imports
-         _CoroutineAgentServer when isolation="coroutine" (so
-         process-only callers don't load coroutine_server at
-         module-import time) and constructs it with
-         max_concurrent_sessions; falls back to vanilla
-         AgentServer() for isolation="process".
-       tests/test_pool.py: 4 new tests verifying:
-       - default (coroutine) constructs _CoroutineAgentServer,
-       - isolation="process" constructs vanilla AgentServer
-         (and is NOT a _CoroutineAgentServer subclass instance),
-       - max_concurrent_sessions propagates into the coroutine
-         server's _max_concurrent_sessions field,
-       - process mode does NOT push max_concurrent_sessions into
-         the vanilla AgentServer (the kwarg lives only on the pool).
-Tests: 199/199 pass (4 added). ruff: clean. mypy: clean.
-Notes: With this commit and the previous _CoroutineAgentServer +
-CoroutinePool work, AgentPool().run() now dispatches into the
-coroutine path end-to-end. The next pieces are the Phase 1
-end-to-end smoke test (one simulated job through coroutine mode)
-and the density benchmark (50 simulated jobs concurrently).
-Existing test_pool.py tests that touch pool.server keep working
-because _CoroutineAgentServer subclasses AgentServer.
-
-## 2026-05-03 14:35 UTC — feat(execution): _CoroutineAgentServer swap shim
-Files: src/openrtc/execution/coroutine_server.py (new, ~105 LOC):
-       _CoroutineAgentServer(AgentServer) accepts an optional
-       max_concurrent_sessions kwarg with the same int/bool/<1
-       guards as AgentPool. Overrides run() to monkey-patch
-       livekit.agents.ipc.proc_pool.ProcPool to a factory closure
-       that constructs our CoroutinePool (passing the captured
-       max_concurrent_sessions), then registers a no-arg load_fnc
-       closure that reads pool.current_load(). The factory
-       captures the constructed pool so coroutine_pool property
-       exposes it after run() exits. Patch + load_fnc are both
-       restored in the finally block.
-       tests/test_coroutine_server.py (new, 8 tests): default
-       max=50, override, three rejection paths, isinstance check
-       against AgentServer, run() patches+restores ProcPool
-       (verified by inspecting the symbol after a fast-fail run),
-       load_fnc returns 0 before pool capture, load_fnc reflects
-       captured pool's current_load() at 0 / 0.5 / 1.0, factory
-       closure shape produces CoroutinePool with the right
-       max_concurrent_sessions.
-Tests: 195/195 pass (8 added). ruff: clean. mypy: clean
-       (with two type:ignore[assignment, misc] comments on the
-       module-attribute reassignment, unavoidable when we rewrite
-       a class binding inside another package).
-Notes: Strategy A from
-docs/design/agent-server-integration.md. Patch is scoped to one
-run() invocation so concurrent AgentServer instances inside the
-same process won't trip over each other (uncommon in our model
-but the bound is documented). The coroutine_pool property
-returns None until run() has actually built it (since
-construction happens inside super().run() at worker.py:587).
-
-## 2026-05-03 14:18 UTC — feat(execution): implement CoroutinePool.aclose
-Files: src/openrtc/execution/coroutine.py: CoroutinePool.aclose
-       (was NotImplementedError) now is idempotent before/after
-       start, snapshots self._executors, runs aclose() on each
-       in parallel via asyncio.gather(return_exceptions=True),
-       wraps in asyncio.wait_for with self._close_timeout, and
-       on TimeoutError logs a warning and falls back to
-       executor.kill() for stragglers.
-       tests/test_coroutine_skeleton.py: removed the parametrized
-       "still raises" test for aclose; added 6 tests
-       (before-start safe, no-active safe, idempotent across 3
-       calls, drains 3 stuck entrypoints, escalates to kill on
-       timeout — verifies the entrypoint actually saw a
-       CancelledError before the kill, absorbs an executor whose
-       aclose itself raises).
-Tests: 187/187 pass (5 added net). ruff: clean. mypy: clean.
-Notes: Snapshot of _executors before draining is required because
-each executor's _on_executor_done done-callback removes itself
-from the live list as its task settles; iterating the live list
-would skip entries. asyncio.wait_for + per-executor kill matches
-ProcPool's drain pattern (cancel main task -> close every
-executor -> await close tasks). Individual aclose failures use
-return_exceptions so one bad executor cannot block the rest.
-
-## 2026-05-03 14:05 UTC — feat(execution): CoroutinePool.current_load + max_concurrent_sessions
-Files: src/openrtc/execution/coroutine.py:
-       - new optional `max_concurrent_sessions: int = 50` kwarg
-         on CoroutinePool.__init__ (extra to ProcPool's signature
-         so AgentServer construction stays compatible). Eager
-         TypeError for non-int / bool, ValueError for < 1.
-       - new max_concurrent_sessions read-only property,
-       - new current_load() method returning
-         len(active) / max_concurrent_sessions.
-       tests/test_coroutine_skeleton.py:
-       - 6 new tests: default is 50, constructor override
-         works, invalid types/values rejected, idle pool reports
-         0.0, 2 active out of default 50 reports 0.04, full
-         capacity reports 1.0.
-Tests: 182/182 pass (6 added). ruff: clean. mypy: clean.
-Notes: current_load is NOT part of the upstream ProcPool
-surface. AgentServer reads load via a separate load_fnc the user
-registers on AgentPool.server. The next wiring task will close
-over `pool.current_load` as the worker's load_fnc so dispatch
-sees the coroutine pool's actual saturation. Pool `>= 1.0` maps
-to AgentServer `WS_FULL` once load_fnc returns it; the default
-`load_threshold` is 0.7 so we'll need to either tune that or
-clamp current_load output. Documented in the docstring.
-
-## 2026-05-03 13:50 UTC — feat(execution): implement CoroutinePool.launch_job
-Files: src/openrtc/execution/coroutine.py:
-       - new module-level _NoOpInferenceExecutor stub (and shared
-         _NOOP_INFERENCE_EXECUTOR instance) so JobContext gets a
-         non-None inference_executor when none is configured;
-         do_inference() raises with a clear message,
-       - CoroutinePool.launch_job() validates _started, builds an
-         executor via _build_executor(), tracks it in
-         _executors, emits process_created/started/ready, awaits
-         executor.launch_job(info), attaches a done_callback that
-         emits process_closed and removes the executor, then
-         emits process_job_launched. If executor.launch_job
-         raises, _on_executor_done fires and we re-raise so the
-         worker accounting stays balanced,
-       - new _build_executor() factory (does NOT forward loop —
-         executor picks the running loop at launch time so tests
-         and AgentServer scenarios work the same way),
-       - new _build_job_context(info) method mirroring
-         job_proc_lazy_main._start_job: real rtc.Room for live
-         jobs, mock_room.create_mock_room for info.fake_job;
-         falls back to _NOOP_INFERENCE_EXECUTOR when none is
-         wired,
-       - new _on_executor_done(executor) cleanup hook that
-         removes the executor and emits process_closed (idempotent),
-       - executor.launch_job() now uses asyncio.get_running_loop()
-         instead of the deprecated get_event_loop().
-       tests/test_coroutine_skeleton.py:
-       - removed `start` and `launch_job` from the parametrized
-         "still raises" set,
-       - 5 new tests: launch_job before start raises, full event
-         sequence (process_created/started/ready -> task scheduled
-         -> process_job_launched -> process_closed), 3 concurrent
-         executors tracked simultaneously, get_by_job_id finds a
-         running executor by job.id, process_closed fires on
-         entrypoint exception.
-Tests: 176/176 pass (4 added net). ruff: clean. mypy: clean.
-Notes: Tests override _build_job_context to return a string
-sentinel so they don't touch rtc.Room. The real path is
-exercised once we land an integration test against a LiveKit
-server in Phase 2 (TODO under §8.4).
-
-## 2026-05-03 13:25 UTC — feat(execution): implement CoroutinePool.start
-Files: src/openrtc/execution/coroutine.py (added `inspect` import;
-       new _started flag + _shared_proc on CoroutinePool.__init__;
-       CoroutinePool.start() constructs the singleton JobProcess
-       (executor_type, http_proxy from kwargs), invokes
-       initialize_process_fnc(proc), awaits the result if it is a
-       coroutine (inspect.isawaitable), wraps in asyncio.wait_for
-       with self._initialize_timeout. Idempotent. New
-       shared_process and started properties. ruff prefers
-       built-in TimeoutError over asyncio.TimeoutError so the
-       except clause uses TimeoutError directly.),
-       tests/test_coroutine_skeleton.py (removed `start` from the
-       parametrized "still raises" list; added 5 tests: start
-       invokes setup_fnc once with the singleton proc + populates
-       userdata, idempotent on repeat calls, awaits async
-       setup_fnc, raises TimeoutError on slow setup with state
-       unchanged, http_proxy propagates to shared_process).
-Tests: 172/172 pass (4 added net). ruff: clean. mypy: clean.
-Notes: setup_fnc runs ONCE per worker in coroutine mode (vs once
-per process in process mode) per design §6.6 — that's the whole
-density story. The shared_process lives on the pool until
-launch_job lands so each per-session JobContext can close over
-it. _started is a bool flag so start() can early-return; this
-mirrors ProcPool's idempotent guard. Timeout error raises with
-the caller in stack so AgentServer.run()'s `wait_for(... +2)`
-guard at worker.py:96 keeps working.
-
-## 2026-05-03 13:10 UTC — feat(execution): add CoroutineJobExecutor.kill (forceful)
-Files: src/openrtc/execution/coroutine.py (new module-level helper
-       _consume_cancelled_task_exception that retrieves a task's
-       exception so asyncio doesn't log "Task exception was never
-       retrieved"; new synchronous CoroutineJobExecutor.kill()
-       method that cancels the in-flight task, attaches the
-       suppression callback, flips RUNNING -> FAILED only when a
-       task was actually cancelled, and clears started=False.
-       Idempotent + safe-on-idle).
-       tests/test_coroutine_skeleton.py (4 new tests: kill on
-       idle is safe, kill is idempotent, kill returns immediately
-       and marks FAILED on an in-flight task, kill preserves
-       SUCCESS when the task was already done).
-Tests: 168/168 pass (4 added). ruff: clean. mypy: clean.
-Notes: kill() is NOT part of the upstream JobExecutor Protocol at
-1.5.0 — confirmed by greps over job_executor.py, ProcJobExecutor,
-ThreadJobExecutor, and worker.py. It is an OpenRTC-internal
-forceful escalation hook beyond aclose(): synchronous (no await),
-cancels the task with a "killed" message, flips status FAILED
-immediately, and lets the loop drain the cancellation in the
-background. The supervisor work in Phase 2 will use it for
-escalation paths. Per-state status reporting was already correct
-via the property; this iteration verifies the four-state matrix
-(idle / in-flight / SUCCESS / FAILED) holds under kill.
-
-## 2026-05-03 12:55 UTC — feat(execution): implement CoroutineJobExecutor.launch_job
-Files: src/openrtc/execution/coroutine.py (CoroutineJobExecutor
-       __init__ now takes 4 optional kwargs: entrypoint_fnc,
-       session_end_fnc, context_factory, loop. launch_job
-       validates entrypoint_fnc + context_factory + no in-flight
-       task, builds the JobContext via context_factory, schedules
-       the entrypoint via loop.create_task, returns immediately.
-       New private _run_entrypoint wrapper sets status to
-       SUCCESS/FAILED, suppresses Exception (sibling sessions
-       must keep running), re-raises CancelledError, and runs
-       session_end_fnc(ctx) in a finally block with its own
-       suppression).
-       tests/test_coroutine_skeleton.py (replaced the "launch_job
-       still raises" test with 9 new tests: missing entrypoint
-       raises, missing context_factory raises, success path marks
-       SUCCESS + populates running_job, exception path marks
-       FAILED without propagating, session_end_fnc invoked on
-       both success and failure, session_end_fnc exception is
-       suppressed and does not overwrite SUCCESS, concurrent
-       launch_job raises RuntimeError, aclose cancels an
-       in-flight launch_job task end-to-end via the public API).
-Tests: 164/164 pass (+8 net). ruff: clean. mypy: clean.
-Notes: The delegation to a `context_factory` callable instead of
-constructing JobContext inline is deliberate (see TODO note):
-JobContext requires a real rtc.Room and InferenceExecutor that
-the executor cannot synthesize on its own. The CoroutinePool will
-own the real factory in a follow-up iteration; tests inject
-stubs. _run_entrypoint logs unhandled exceptions through the
-new module logger so failures are visible without escaping. The
-"in-flight" check rejects concurrent launches on the same
-executor instance — pools allocate one executor per session.
-
-## 2026-05-03 12:38 UTC — feat(execution): implement CoroutineJobExecutor.initialize + aclose
-Files: src/openrtc/execution/coroutine.py (added _task attribute on
-       __init__; initialize() now no-ops with idempotent return None;
-       aclose() cancels self._task if pending, suppresses
-       CancelledError, flips status RUNNING -> FAILED on cancel,
-       and clears started=False).
-       tests/test_coroutine_skeleton.py (removed `initialize` and
-       `aclose` from the parametrized "still raises" list; added 5
-       targeted tests: initialize is no-op + idempotent, aclose
-       with no task is safe + idempotent, aclose clears a
-       synthetic started=True, aclose cancels a pending task and
-       marks FAILED, aclose preserves a SUCCESS status when the
-       task already finished).
-Tests: 156/156 pass (5 added, 2 parametrized cases removed).
-ruff: clean. mypy: clean.
-Notes: Cancellation maps to FAILED per
-docs/design/job-executor-protocol.md ("the upstream enum has no
-CANCELLED value"). The task-cancellation tests use white-box
-self._task injection because launch_job is still
-NotImplementedError; once it lands the same flows go through the
-public API.
-
-## 2026-05-03 12:25 UTC — feat(execution): coroutine executor + pool skeletons
-Files: src/openrtc/execution/__init__.py (new, empty package marker),
-       src/openrtc/execution/coroutine.py (new, ~155 LOC:
-       CoroutineJobExecutor with all 12 JobExecutor Protocol
-       members + CoroutinePool subclassing utils.EventEmitter
-       with the full ProcPool kwarg signature),
-       tests/test_coroutine_skeleton.py (new, 15 tests covering
-       both shapes plus the EventEmitter wiring).
-Tests: 153/153 pass (15 new). ruff: clean. mypy: clean.
-Notes: Pure structural surface. Properties return inert defaults
-(id is uuid4, status is RUNNING, started False, running_job None).
-All real lifecycle methods raise NotImplementedError with the
-hint "v0.1 coroutine runtime is not implemented yet (skeleton)".
-The CoroutinePool constructor accepts the full ProcPool kwargs
-verbatim per docs/design/proc-pool-surface.md so AgentServer
-can construct it without errors. EventEmitter subclass verified
-via emit/on round-trip test. set_target_idle_processes is
-implemented as a plain setter (already simple enough that a stub
-would be silly). Subsequent iterations fill the lifecycle methods
-one by one without churning the surface.
-
-## 2026-05-03 12:08 UTC — feat(pool): plumb max_concurrent_sessions (no behavior yet)
-Files: src/openrtc/core/pool.py (new keyword-only
-       max_concurrent_sessions: int = 50 on AgentPool.__init__;
-       eager type/value validation; new max_concurrent_sessions
-       property),
-       tests/test_pool.py (5 new tests: default 50, override,
-       rejects float, rejects bool, rejects 0/negative).
-Tests: 138/138 pass (5 new). ruff: clean. mypy: clean.
-Notes: Pure plumbing per the TODO. Stored in
-self._max_concurrent_sessions and exposed read-only via the
-property. Matches design §5.1's documented public knob; also
-notes in the docstring that it is a coroutine-mode concept and
-ignored in process mode (livekit-agents owns that load math).
-The bool guard rejects True/False because bool is a subclass of
-int and would otherwise sneak past isinstance(..., int).
-
-## 2026-05-03 11:55 UTC — feat(pool): plumb `isolation` parameter (no behavior yet)
-Files: src/openrtc/core/pool.py (+ Literal import; new module-level
-       IsolationMode = Literal["coroutine", "process"]; new isolation
-       kwarg on AgentPool.__init__ defaulting to "coroutine";
-       validation that rejects unknown values; new `isolation`
-       property; __all__ extended with IsolationMode),
-       tests/test_pool.py (3 new tests: default is coroutine,
-       process accepted, unknown raises ValueError).
-Tests: 133/133 pass (3 new). ruff: clean. mypy: clean.
-Notes: Pure plumbing per the TODO. The setting is stored and
-exposed via `pool.isolation` but nothing in the runtime branches
-on it yet — that arrives when CoroutinePool lands. Default flips
-the v0.0.x behavior (process) to v0.1's coroutine, matching design
-§5.4. Public surface intentionally NOT extended in __init__.py
-since users only pass strings; the IsolationMode type alias is
-available via `from openrtc.core.pool import IsolationMode` for
-type-aware callers but not promoted to the package level.
-
-## 2026-05-03 11:42 UTC — docs: capture AgentServer integration points
-Files: docs/design/agent-server-integration.md (new, ~150 LOC).
-Tests: not run (docs-only).
-Notes: Read worker.py (1435 LOC) and grepped every _proc_pool.X
-access. Captured:
-  - the construction site (line 587, inside run() under self._lock);
-    importantly _proc_pool is NOT set in __init__, so a subclass
-    cannot swap it before run() executes,
-  - the 12 unique call sites (3 event listeners, start, 2
-    set_target_idle_processes calls, processes property, drain
-    loop, 3 launch_job sites including simulate_job and the live
-    dispatch path, aclose, get_by_job_id),
-  - the lifecycle ordering inside run(), drain(timeout), and
-    aclose(),
-  - how _update_job_status maps our JobStatus enum to the WS
-    UpdateJobStatus message,
-  - three swap strategies (module-level class substitution,
-    AgentServer subclass with run() override, hybrid). Picked
-    strategy A for the first prototype: monkey-patch
-    livekit.agents.ipc.proc_pool.ProcPool to our CoroutinePool
-    before AgentServer.run() executes. Smallest diff, matches the
-    "contained to one file" goal in design §6.4.
-Closes the 3-doc reading group; implementation work starts next.
-
-## 2026-05-03 11:25 UTC — docs: capture ProcPool surface AgentServer uses
-Files: docs/design/proc-pool-surface.md (new, ~120 LOC).
-Tests: not run (docs-only).
-Notes: Read the full proc_pool.py (256 LOC) and grepped
-worker.py for every _proc_pool.X access. Documented:
-  - the verbatim ProcPool(__init__ ...) keyword shape AgentServer
-    uses at worker.py:587-601 (so CoroutinePool can swap in),
-  - per-arg coroutine-mode treatment (which kwargs become no-ops),
-  - the 6 methods AgentServer actually calls (start, aclose,
-    launch_job, set_target_idle_processes, processes,
-    get_by_job_id) plus the .running_job iteration pattern,
-  - the 5 EventTypes; only 3 have live worker.py subscribers today
-    (process_started, process_closed, process_job_launched) but
-    we'll emit all 5 for forward compatibility,
-  - lifecycle invariants (idempotent start/aclose, MAX_ATTEMPTS=3
-    retry in launch_job, target_idle_processes math), and
-  - the consequences for our CoroutinePool (singleton JobProcess,
-    one setup_fnc invocation, event ordering).
-Complements docs/design/job-executor-protocol.md from the previous
-iteration; the two together form the contract for the upcoming
-implementation work.
-
-## 2026-05-03 11:08 UTC — docs: capture JobExecutor Protocol surface
-Files: docs/design/job-executor-protocol.md (new, ~120 LOC).
-Tests: not run (docs-only).
-Notes: Read
-.venv/lib/python3.13/site-packages/livekit/agents/ipc/job_executor.py
-(45 LOC) at the pinned 1.5.0 release, plus its proc_pool.py
-neighbor (256 LOC), and wrote a contract reference for our
-upcoming CoroutineJobExecutor + CoroutinePool. Captures: the
-verbatim Protocol body, a method-by-method contract table, the
-RunningJobInfo dataclass shape that launch_job receives, and the
-ProcPool surface AgentServer expects (so CoroutinePool can be a
-drop-in replacement). Includes implementation notes (event names
-to emit, JobStatus mapping for cancellation, running_job
-semantics).
-
-## 2026-05-03 10:55 UTC — chore: pin livekit-agents~=1.5 (Phase 1 task 1)
-Files: pyproject.toml (~=1.4 -> ~=1.5 on the
-       livekit-agents[openai,silero,turn-detector] dependency),
-       uv.lock (refreshed via `uv lock`; livekit-agents stays
-       resolved at 1.5.0, the version we already had installed).
-Tests: 130/130 pass. ruff: clean. mypy: clean.
-Notes: Per docs/design/v0.1.md §9.1 we are about to subclass and
-patch internal-ish parts of livekit-agents (_proc_pool field and
-the JobExecutor Protocol), so the floor needs to match the version
-we are actually building against. ~=1.5 still allows the 1.5.x
-patch line and any future 1.6+ minors up to <2.0; the design also
-calls for a CI canary job (separate task) that runs against the
-latest livekit-agents release.
-
-## 2026-05-03 10:42 UTC — verify: full test suite + coverage gate (Phase 0 complete)
-Files: none changed (verification-only iteration).
-Tests: `uv run pytest --cov=openrtc --cov-report=term-missing
---cov-fail-under=80` -> 130/130 pass, total coverage 90.31% (CI
-gate 80%).
-Notes: Closes Phase 0. Per-module coverage highlights:
-  - core/: pool 92%, config 97%, discovery 98%, serialization 98%,
-    routing 75%, turn_handling 88%
-  - cli/: entry 100%, params 100%, types 100%, commands 93%,
-    livekit 86%, reporter 86%, dashboard 82%, __init__ 54% (the
-    dunder __getattr__ + missing-extra branch is intentionally
-    untested; needs an environment without typer/rich)
-  - observability/: snapshot 100%, stream 100%, metrics 84%
-  - tui/app 100%
-  - openrtc/__init__ 80% (the PackageNotFoundError fallback runs
-    only outside an installed environment)
-Phase 0 reorganization is finished: 11 file moves/extractions,
-3 verification gates all green. Phase 1 (coroutine pool prototype)
-starts next.
-
-## 2026-05-03 10:30 UTC — verify: openrtc dev / list / tui CLI still work
-Files: none changed (verification-only iteration).
-Tests: not re-run (covered last iteration). Smoke commands:
-  - `uv run openrtc --help`: top-level help renders; lists list,
-    start, dev, console, connect, download-files, tui.
-  - `uv run openrtc dev --help`: command resolves; OpenRTC option
-    panel renders (--agents-dir, --default-stt, etc.).
-  - `uv run openrtc tui --help`: command resolves; --watch option
-    documented with default openrtc-metrics.jsonl.
-  - `uv run openrtc list ./examples/agents
-       --default-stt openai/gpt-4o-mini-transcribe
-       --default-llm openai/gpt-4.1-mini
-       --default-tts openai/gpt-4o-mini-tts`: end-to-end success;
-    Rich table prints both example agents (dental, restaurant) with
-    their string providers.
-Notes: This is the same smoke check `make dev` runs. The `openrtc`
-console-script entrypoint resolves through the new `openrtc.cli`
-package and the renamed `openrtc.cli.commands` module (was
-`cli_app.py`); discovery still loads agents from
-`examples/agents/`.
-
-## 2026-05-03 10:18 UTC — verify: public surface still resolves after Phase 0
-Files: none changed (verification-only iteration).
-Tests: ran an explicit round-trip script (not committed) plus the
-       full suite (130/130 pass; ruff and mypy clean).
-Notes: Confirmed end-to-end after the Phase 0 reorganization:
-  - `from openrtc import AgentPool, AgentConfig,
-    AgentDiscoveryConfig, agent_config, ProviderValue,
-    __version__` resolves.
-  - The bound classes carry their canonical paths
-    (`openrtc.core.pool.AgentPool`,
-    `openrtc.core.config.AgentConfig`,
-    `openrtc.core.config.AgentDiscoveryConfig`).
-  - `AgentPool().add(...)` constructs an AgentConfig and
-    list_agents()/get() round-trip.
-  - The `@agent_config(name=..., greeting=...)` decorator attaches
-    AgentDiscoveryConfig metadata under `__openrtc_agent_config__`.
-  - `ProviderValue` resolves to `str | object` (TypeAlias).
-The smoke script intentionally lives in /tmp because spawn-safety
-guard rejects __main__-scoped agent classes without source files;
-running via `python <file>` exercises the real path.
-
-## 2026-05-03 10:05 UTC — refactor: move tui_app.py into tui/ package
-Files: git mv src/openrtc/tui_app.py -> src/openrtc/tui/app.py
-       (via temporary tui_pkg_new/ to dodge the file-vs-directory
-       naming collision that bit the cli move),
-       new src/openrtc/tui/__init__.py (empty package marker),
-       src/openrtc/cli/commands.py (1 import: openrtc.tui_app
-       -> openrtc.tui.app),
-       tests/test_cli.py (3 import sites: 1 monkeypatch string,
-       1 inline `import openrtc.tui_app as tu`, 1 inline
-       `from openrtc.tui_app import MetricsTuiApp`),
-       tests/test_tui_app.py (replace_all rewrote 14 inline
-       `from openrtc.tui_app import ...` and 1
-       `import openrtc.tui_app as tu`),
-       README.md (project tree section), CLAUDE.md (sidecar mention).
-Tests: 130/130 pass. ruff: clean. mypy: clean.
-Notes: Pure rename per Phase 0 refactor rules. No behavior change.
-Used `git mv` so blame is preserved on the moved module.
-
-## 2026-05-03 09:50 UTC — refactor: move CLI modules into a cli/ package
-Files: 7 git mv operations (via temporary cli_pkg_new/ to avoid the
-       cli.py / cli/ file-vs-directory naming collision):
-       cli.py -> cli/entry.py,
-       cli_app.py -> cli/commands.py (renamed from app.py — see notes),
-       cli_dashboard.py -> cli/dashboard.py,
-       cli_livekit.py -> cli/livekit.py,
-       cli_params.py -> cli/params.py,
-       cli_reporter.py -> cli/reporter.py,
-       cli_types.py -> cli/types.py.
-       New: cli/__init__.py with main re-export and an eager `app`
-       binding (with __getattr__ fallback when the [cli] extra is
-       absent).
-       Updated 4 internal cross-references inside cli/* files.
-       Updated 4 test files (test_cli.py: many monkeypatch + import
-       sites, test_cli_params.py: 1 import + docstring,
-       test_metrics_stream.py: 1 import). Updated 4 docs/config
-       references (docs/cli.md, README.md, CLAUDE.md,
-       CONTRIBUTING.md).
-Tests: 130/130 pass. ruff: clean. mypy: clean.
-Notes: Deviation from the .agents/TODO.md target tree: cli_app.py
-became cli/commands.py rather than cli/app.py. The TODO target
-tree gives both `cli/__init__.py` and `cli/app.py`, but Python
-treats `openrtc.cli.app` as both the submodule and the Typer
-attribute the package re-exports — `from openrtc.cli import app`
-returns the wrong thing depending on import order. Renaming the
-submodule file removes the collision and lets the Typer instance
-keep the natural `app` name. Behavior, public API, console-script
-entrypoint (`openrtc.cli:main` in pyproject.toml) all preserved.
-
-## 2026-05-03 09:20 UTC — refactor: extract observability/snapshot.py from metrics.py
-Files: src/openrtc/observability/snapshot.py (new, 80 LOC:
-       ProcessResidentSetInfo, SavingsEstimate, PoolRuntimeSnapshot
-       and its to_dict),
-       src/openrtc/observability/metrics.py (~75 LOC removed; added
-       a re-import of the snapshot trio to keep
-       openrtc.observability.metrics.PoolRuntimeSnapshot resolvable
-       for any external user that already imports it from there),
-       4 src import sites updated to the canonical
-       openrtc.observability.snapshot path (cli_dashboard.py,
-       core/pool.py, observability/stream.py — the latter previously
-       imported from metrics, now from snapshot directly),
-       5 tests rewired (conftest.py, test_cli.py,
-       test_metrics_stream.py, test_resources.py, test_tui_app.py).
-Tests: 130/130 pass. ruff: clean. mypy: clean.
-Notes: Subtask 3 of 3 from the observability split. The split was
-not strictly required by tests (metrics.py still re-exports the
-snapshot types) but updating internal users to the canonical path
-matches the Phase 0 refactor rule "Update all imports in one pass
-per moved file." Public API unchanged.
-
-## 2026-05-03 09:05 UTC — refactor: rename metrics_stream.py to observability/stream.py
-Files: git mv src/openrtc/metrics_stream.py ->
-       src/openrtc/observability/stream.py,
-       5 src import sites (cli_types.py, cli_app.py, cli_reporter.py,
-       tui_app.py: import + module docstring),
-       2 test files (test_metrics_stream.py: 1 site,
-       test_tui_app.py: 2 sites).
-Tests: 130/130 pass. ruff: clean (auto-fixed 3 import-order issues
-in tui_app.py and the two test files). mypy: clean.
-Notes: Pure rename (subtask 2 of 3 from the observability split).
-Used `git mv` so blame is preserved. Public API unchanged.
-
-## 2026-05-03 08:55 UTC — refactor: rename resources.py to observability/metrics.py
-Files: src/openrtc/observability/__init__.py (new, empty),
-       git mv src/openrtc/resources.py ->
-       src/openrtc/observability/metrics.py,
-       2 src import sites (cli_dashboard.py, core/pool.py,
-       metrics_stream.py — three actually),
-       6 test sites (test_cli.py, test_metrics_stream.py: 2 places,
-       test_resources.py: 2 lines, test_tui_app.py, conftest.py).
-Tests: 130/130 pass. ruff: clean. mypy: clean.
-Notes: Pure rename (subtask 1 of 3 from the observability split).
-The dynamic import pattern in tests/test_metrics_stream.py:200
-needed an additional rewrite (`from openrtc import resources as
-resources_mod` -> `from openrtc.observability import metrics as
-resources_mod`) since simple substring replace missed the
-`from openrtc import resources` style. test_resources.py kept its
-`resources_module` local alias (just rebound to the new module).
-Public API unchanged.
-
-## 2026-05-03 08:40 UTC — chore: split observability extraction into three subtasks
-Files: .agents/TODO.md (one item replaced by three).
-Tests: not run (TODO-only edit).
-Notes: The TODO line "Create observability/ package. Rename
-resources.py → observability/metrics.py, metrics_stream.py →
-observability/stream.py. Extract PoolRuntimeSnapshot to
-observability/snapshot.py." bundled three operations (one rename,
-one rename, one extract+split) totaling ~600 LOC of file movement
-and ~12 import sites — too large for one iteration per PROMPT.md.
-Split into three sequential subtasks. Next iteration picks up the
-first one.
-
-## 2026-05-03 08:25 UTC — refactor: extract core/turn_handling.py from pool.py
-Files: src/openrtc/core/turn_handling.py (new, 161 LOC:
-       _DEPRECATED_TURN_HANDLING_KEYS, _build_session_kwargs,
-       _default_turn_handling, _default_turn_detection,
-       _supports_multilingual_turn_detection,
-       _extract_deprecated_turn_options,
-       _deprecated_turn_options_to_turn_handling,
-       _merge_turn_handling),
-       src/openrtc/core/pool.py (~140 LOC removed; added import
-       from .turn_handling; dropped now-unused `os` and `warnings`
-       imports).
-Tests: 130/130 pass. ruff: clean. mypy: clean.
-Notes: No tests needed updating. The existing patch site
-`monkeypatch.setattr("openrtc.core.pool._build_session_kwargs", ...)`
-in tests/test_pool.py:569 still works because pool.py imports the
-symbol at module level — the patch replaces pool.py's local binding,
-which is what `_run_universal_session` looks up at call time.
-Public API unchanged.
diff --git a/.agents/PROMPT.md b/.agents/PROMPT.md
deleted file mode 100644
index e3c1718..0000000
--- a/.agents/PROMPT.md
+++ /dev/null
@@ -1,129 +0,0 @@
-# OpenRTC-Python v0.1 — Implementation Agent (Ralph Loop)
-
-You are an autonomous engineering agent shipping **OpenRTC-Python v0.1**.
-You run inside the Anthropic `ralph-loop` plugin. Each time you try to
-exit, the Stop hook re-feeds your prompt. Treat each re-prompt as one
-Ralph iteration. Make exactly one focused unit of progress per iteration,
-then attempt to exit.
-
-The loop terminates when you output `<promise>OPENRTC_V01_COMPLETE</promise>`
-as your final message, OR `--max-iterations` is reached. **Never** emit
-the promise tag unless every condition under "Completion criteria"
-below is genuinely true. Do not lie to escape the loop.
-
-## Source of truth (read every iteration before doing anything else)
-
-1. `docs/design/v0.1.md` — the locked design spec. Read only the
-   sections relevant to your current task; do not skim the whole thing
-   every iteration.
-2. `AGENTS.md` — coding standards, naming, comment policy. Follow exactly.
-3. `.agents/TODO.md` — the task list. Pick the next unchecked task.
-4. `.agents/JOURNAL.md` — read the last 5 entries to understand state
-   without re-reading the codebase.
-
-## Your workflow (every iteration)
-
-1. **Orient.** Read this PROMPT.md, TODO.md, and the last 5 entries of
-   JOURNAL.md. Cross-reference the design doc section the task points to.
-2. **Pick.** Find the first unchecked task `[ ]` in TODO.md. If blocked
-   or unclear, read the design doc section it references. If still
-   unresolvable, mark `[?]` with a note in TODO.md and pick the next.
-3. **Do.** Execute that one task. Stay in scope — do not opportunistically
-   refactor adjacent code unless the task itself requires it.
-4. **Verify.** Run `make test` (or `uv run pytest`). For density-related
-   tasks, run the relevant benchmark. Run `make lint` and `make typecheck`.
-   Fix all errors before proceeding.
-5. **Update files:**
-   - Mark the task `[x]` in TODO.md.
-   - Append a JOURNAL.md entry (format below).
-   - If you discovered new work, add it to the "Discovered work"
-     section of TODO.md.
-6. **Commit.** One commit per task. Conventional commit format
-   (`feat:`, `fix:`, `refactor:`, `test:`, `docs:`, `chore:`).
-   Example: `feat(execution): add CoroutineJobExecutor skeleton`.
-   Do NOT add `Co-Authored-By: Claude` or `🤖 Generated with Claude Code`
-   trailers. The author identity comes from local `git config user.name`,
-   which is already correct.
-7. **Try to exit.** The Stop hook will re-feed this prompt for the next
-   iteration. Do not chain a second task — exit cleanly first.
-
-## Hard rules
-
-- **Never** modify `docs/design/v0.1.md` to make a task easier. The
-  design is locked. If a task is genuinely impossible, mark it `[?]`
-  in TODO.md, write a finding to JOURNAL.md, and pick the next task.
-- **Never** delete or rewrite tests to make them pass. Failing tests
-  are bugs in your code. The exception is intentionally updating tests
-  for a behavior change explicitly required by a task — say so in
-  JOURNAL.md.
-- **Never** introduce a new external dependency without an explicit
-  TODO.md task approving it.
-- **Never** push to main. Work in a feature branch named
-  `v0.1/<short-task-slug>`. Create a PR if one doesn't exist for the
-  current chunk of work.
-- **Never** run `git commit --no-verify` or otherwise bypass git hooks.
-- **Always** run `make lint` and `make typecheck` before committing.
-  No `# type: ignore` or `# noqa` without an inline comment explaining
-  why.
-- **Always** match existing code style. No introduced bullet comments,
-  no emoji in code, no AI-narration comments ("# This function does X").
-  Follow AGENTS.md.
-- **Always** preserve backward compatibility on `isolation="process"`.
-  Existing tests must continue to pass.
-
-## Scope reminders
-
-- **In scope:** changes called out in TODO.md.
-- **Out of scope (defer to v0.2+):** multi-participant rooms, GPU,
-  Rust/PyO3, replacing AgentServer, plugin marketplace.
-- If you find tempting refactors not in TODO.md, add them as `[ ]`
-  items in the "Discovered work" section and move on.
-
-## What "one task" means
-
-A task is something you can finish in one iteration — typically 30–90
-minutes of work, one logical unit, one commit. If a TODO item feels
-larger, your first action is to break it down into smaller items in
-TODO.md, commit that breakdown as
-`chore: split <task> into subtasks`, and exit. The next iteration
-picks up the first subtask.
-
-## JOURNAL.md entry format
-
-Terse and factual. No celebrations, no narration of feelings, no
-"successfully implemented" prose.
-
-    ## 2026-05-03 14:32 UTC — feat(execution): add CoroutineJobExecutor skeleton
-    Files: src/openrtc/execution/coroutine.py (new, 87 LOC),
-           tests/execution/test_coroutine_executor.py (new, 4 tests).
-    Tests: 128/128 pass. Coverage 81%.
-    Notes: Implements JobExecutor Protocol per
-    livekit/agents/ipc/job_executor.py:23. Status transitions
-    verified. launch_job deferred to next task — currently raises
-    NotImplementedError.
-
-## Completion criteria
-
-Output `<promise>OPENRTC_V01_COMPLETE</promise>` as your final message
-ONLY when **all** of the following are simultaneously true:
-
-1. Every task in `.agents/TODO.md` is marked `[x]` or `[~]`
-   (intentionally skipped with documented reason).
-2. `make test` exits 0 with all tests passing on Python 3.11, 3.12, 3.13.
-3. `make lint` exits 0 with zero warnings.
-4. `make typecheck` exits 0.
-5. The Phase 1 density benchmark in `docs/design/v0.1.md` §7 shows
-   ≥ 50 concurrent sessions at ≤ 4 GB peak RSS, no errors. Results
-   committed to `docs/benchmarks/density-v0.1.md`.
-6. All 12 acceptance criteria in `docs/design/v0.1.md` §8 are
-   demonstrably satisfied. Verify each one before emitting the promise.
-7. The integration test for crash isolation (criterion §8.5) passes:
-   one session raising `RuntimeError` does not affect 4 sibling
-   sessions in the same coroutine worker.
-8. `isolation="process"` regression: full v0.0.17 test suite still
-   passes when run against process mode.
-
-If any one of these is not true, you are not done. Pick the next task
-and continue. Do not emit the promise to escape the loop. Lying about
-completion will be detected when the user reviews the work, and is a
-direct violation of these instructions.
diff --git a/.agents/TODO.md b/.agents/TODO.md
deleted file mode 100644
index e647616..0000000
--- a/.agents/TODO.md
+++ /dev/null
@@ -1,678 +0,0 @@
-# OpenRTC-Python v0.1 — Task List
-
-Pick the **first** unchecked task. Tasks are roughly ordered by
-dependency. Do not skip ahead unless a task is blocked.
-
-Status legend: `[ ]` todo, `[x]` done, `[~]` skipped (note why),
-`[?]` blocked (note why).
-
----
-
-## Phase 0 — Repository structure refactor
-
-Current layout is flat (15 files at top level). Reorganize into
-domain-grouped packages before adding new code. This makes the
-coroutine work clean and gives the project headroom.
-
-Target layout (also documented in design §6.1):
-
-    src/openrtc/
-    ├── __init__.py
-    ├── py.typed
-    ├── types.py                  # was provider_types.py
-    ├── core/
-    │   ├── __init__.py
-    │   ├── pool.py               # AgentPool (slim)
-    │   ├── config.py             # AgentConfig, AgentDiscoveryConfig, @agent_config
-    │   ├── routing.py            # extracted from pool.py
-    │   ├── discovery.py          # extracted from pool.py
-    │   ├── serialization.py      # _ProviderRef logic
-    │   └── turn_handling.py      # deprecated kwargs translation
-    ├── execution/
-    │   ├── __init__.py
-    │   ├── coroutine.py          # NEW: CoroutinePool, CoroutineJobExecutor
-    │   ├── coroutine_server.py   # NEW: _CoroutineAgentServer
-    │   └── prewarm.py            # shared prewarm helpers
-    ├── observability/
-    │   ├── __init__.py
-    │   ├── metrics.py            # was resources.py
-    │   ├── stream.py             # was metrics_stream.py
-    │   └── snapshot.py           # PoolRuntimeSnapshot etc
-    ├── cli/
-    │   ├── __init__.py
-    │   ├── entry.py              # was cli.py (lazy entrypoint)
-    │   ├── app.py                # was cli_app.py
-    │   ├── dashboard.py          # was cli_dashboard.py
-    │   ├── livekit.py            # was cli_livekit.py
-    │   ├── params.py             # was cli_params.py
-    │   ├── reporter.py           # was cli_reporter.py
-    │   └── types.py              # was cli_types.py
-    └── tui/
-        ├── __init__.py
-        └── app.py                # was tui_app.py
-
-Refactor rules:
-- Use `git mv` to preserve blame.
-- Update all imports in one pass per moved file.
-- Re-export public symbols from `src/openrtc/__init__.py` so the
-  user-facing `from openrtc import AgentPool` still works.
-- After each move: run tests; commit before moving the next file.
-- Do NOT change behavior — pure file moves and import rewrites only.
-
-Tasks:
-- [x] Delete dead code: `_version.py`, `AgentPool._resolve_agent`,
-  `AgentPool._handle_session`, underscore-prefixed exports in
-  `cli_app.__all__`. Verify no external references.
-- [x] Rename `provider_types.py` → `types.py`.
-- [x] Create `core/` package. Move `pool.py` into it (no split yet).
-- [x] Extract `core/config.py` from `pool.py`: `AgentConfig`,
-  `AgentDiscoveryConfig`, `agent_config` decorator.
-- [x] Extract `core/routing.py` from `pool.py`: `_resolve_agent_config`
-  and routing helpers (currently `pool.py:781-853`).
-- [x] Extract `core/discovery.py` from `pool.py`: `discover()`
-  module loading helpers (currently `pool.py:378-431`).
-- [x] Extract `core/serialization.py` from `pool.py`: `_ProviderRef`,
-  `_PROVIDER_REF_KEYS`, `_try_build_provider_ref`,
-  `__getstate__/__setstate__` helpers (currently `pool.py:573-646`).
-- [x] Extract `core/turn_handling.py` from `pool.py`: deprecated
-  kwargs translation logic (currently `pool.py:42-53, 649-778`).
-- [x] Create `observability/` package skeleton (empty
-  `__init__.py`) and rename `resources.py` →
-  `observability/metrics.py`. Update all import sites.
-- [x] Rename `metrics_stream.py` → `observability/stream.py`.
-  Update all import sites.
-- [x] Extract `PoolRuntimeSnapshot` (and the
-  `ProcessResidentSetInfo` / `SavingsEstimate` payload dataclasses
-  it embeds) from `observability/metrics.py` to
-  `observability/snapshot.py`. `metrics.py` imports the snapshot
-  types back in.
-- [x] Create `cli/` package. Move all `cli_*.py` files in, dropping
-  the `cli_` prefix. Update entrypoint references. (Note: `cli_app.py`
-  → `cli/commands.py`, not `cli/app.py`, because Python collides
-  the submodule name with the re-exported `app` Typer instance at
-  the package level. Documented in `cli/__init__.py`.)
-- [x] Create `tui/` package. Move `tui_app.py` to `tui/app.py`.
-- [x] Verify `from openrtc import AgentPool, AgentConfig,
-  AgentDiscoveryConfig, agent_config, ProviderValue` still works.
-- [x] Verify `openrtc dev`, `openrtc list`, `openrtc tui` still work.
-- [x] Verify all 124 tests still pass. (Suite has grown to 130
-  since the original count; full CI coverage gate also satisfied
-  at 90.31%, well above the 80% floor.)
-
----
-
-## Phase 1 — Coroutine pool prototype (Week 1)
-
-Goal: prove the density win. Stop and reassess if we can't hit 50
-sessions in 4 GB.
-
-Tasks:
-- [x] Pin `livekit-agents~=1.5` exactly in `pyproject.toml`.
-- [x] Read `livekit/agents/ipc/job_executor.py` at the pinned
-  version. Document the `JobExecutor` Protocol surface in
-  `docs/design/job-executor-protocol.md`.
-- [x] Read `livekit/agents/ipc/proc_pool.py`. Document the
-  `ProcPool` surface that `AgentServer` calls.
-- [x] Read `livekit/agents/worker.py`. Document where
-  `AgentServer` instantiates and uses `_proc_pool`.
-- [x] Add `isolation: Literal["coroutine", "process"]` parameter to
-  `AgentPool.__init__`, default `"coroutine"`. Thread through but
-  don't act on it yet — just plumbing.
-- [x] Add `max_concurrent_sessions: int = 50` parameter to
-  `AgentPool.__init__`. Plumbing only.
-- [x] Create `execution/coroutine.py`: skeleton classes
-  `CoroutineJobExecutor` and `CoroutinePool` satisfying the
-  `JobExecutor` Protocol but raising `NotImplementedError` in all
-  methods. Add basic unit tests verifying the Protocol shape.
-- [x] Implement `CoroutineJobExecutor.initialize()` and `aclose()`.
-- [x] Implement `CoroutineJobExecutor.launch_job(info)`: construct
-  `JobContext` referencing the shared `JobProcess` singleton;
-  schedule the entrypoint as `asyncio.Task`; wrap exceptions to
-  prevent escape. (Note: actual `JobContext` construction is
-  delegated to a `context_factory` callable injected at executor
-  construction time. The CoroutinePool will own the real factory
-  once it's wired up; tests inject stubs.)
-- [x] Implement `CoroutineJobExecutor.kill()` and status reporting.
-  (Note: `kill()` is NOT part of the upstream JobExecutor Protocol
-  at 1.5.0 — it is an OpenRTC-internal forceful escalation hook
-  beyond `aclose()`. Status reporting was already correct via the
-  property; the iteration verifies idle / in-flight / completed
-  semantics under kill.)
-- [x] Implement `CoroutinePool.start()`: invoke `setup_fnc` once,
-  populate the singleton `JobProcess.userdata` with shared models.
-- [x] Implement `CoroutinePool.launch_job()`: instantiate a
-  `CoroutineJobExecutor`, track it, return.
-- [x] Implement `CoroutinePool.current_load()`:
-  `len(active) / max_concurrent_sessions`. (Note: not part of the
-  upstream ProcPool surface; AgentPool will register the pool's
-  current_load as a custom load_fnc when the wiring lands.)
-- [x] Implement `CoroutinePool.aclose()`: drain — cancel all
-  executors, await them.
-- [x] Create `execution/coroutine_server.py`: `_CoroutineAgentServer`
-  subclass that swaps `_proc_pool` for our `CoroutinePool`.
-- [x] Wire `AgentPool` to choose between `AgentServer()` and
-  `_CoroutineAgentServer(...)` based on `isolation` parameter.
-- [x] First end-to-end smoke test: `AgentPool(isolation="coroutine")`
-  registers, accepts one simulated job, runs it to completion.
-- [x] Density benchmark script `tests/benchmarks/density.py`: spawn
-  50 simulated jobs concurrently in one worker; record peak RSS.
-- [x] Run density benchmark. Record results in
-  `docs/benchmarks/density-v0.1.md`.
-
-**Phase 1 success gate:** density benchmark shows ≥ 50 concurrent
-sessions at ≤ 4 GB RSS, no errors. If not met, add a
-"Phase 1 reassessment" section to TODO.md and stop.
-
----
-
-## Phase 2 — Productionize (Week 2)
-
-Tasks:
-- [x] Per-job error isolation test: a session raising
-  `RuntimeError` does not affect 4 sibling sessions.
-- [x] Implement worker supervisor: track consecutive session
-  failures; after N (default 5), call `aclose()` and exit non-zero.
-- [x] Implement graceful drain on SIGTERM: stop accepting jobs;
-  await in-flight to complete. (Pool primitive landed:
-  `CoroutinePool.drain()` + `CoroutineJobExecutor.join()`. The
-  SIGTERM handler shim that calls into them belongs at the CLI
-  layer and is implicit via `AgentServer.drain()` which already
-  awaits `proc.join()` on every executor — our executor's
-  `join` is now wired to satisfy that.)
-- [x] Add CLI flag `--isolation` to `cli/app.py` (default
-  `coroutine`). Add `--max-concurrent-sessions` (default 50).
-  Wire through `cli/params.py`. (Note: `cli_app.py` is now
-  `cli/commands.py` after the Phase 0 reorg; flags landed there.)
-- [x] Set up containerized LiveKit dev server for integration tests
-  in CI (`docker-compose.test.yml`).
-- [x] Write integration test: 5 concurrent real calls in one
-  coroutine worker, all complete with real STT/LLM/TTS.
-  Mark with `pytest.mark.integration`. (Skips when LiveKit dev
-  server unreachable OR `OPENAI_API_KEY` is unset; the
-  validation runs in CI environments with both available.)
-- [x] Verify `isolation="process"` mode behaves identically to
-  v0.0.17 (regression test against existing test suite).
-- [x] Backpressure test: with `max_concurrent_sessions=10`, the
-  11th job is rejected; LiveKit dispatch sees `load >= 1.0`.
-  (Note: backpressure in v0.1 is cooperative; the dispatcher
-  reads load_fnc and routes elsewhere — the pool itself does
-  not hard-reject. If the dispatcher races and sends one
-  anyway, the pool accepts it and the next load read tells the
-  dispatcher to back off harder. Documented in the test
-  module's docstring.)
-- [x] Drain test: SIGTERM with 3 in-flight sessions waits for
-  completion before worker exits. (Verified at the pool layer
-  the way a CLI signal handler would invoke it: drain task is
-  observably pending while sessions block, completes only after
-  release, and aclose() leaves no residual asyncio tasks on the
-  loop. Real subprocess + signal delivery is platform-specific
-  and outside the unit boundary.)
-- [x] Add CI canary job that runs `pytest -m integration` against
-  the latest `livekit-agents` release (allowed to fail;
-  informational).
-- [x] Add CI density benchmark job; fail if peak RSS > 4 GB.
-- [x] Update `README.md`: add isolation modes section, density
-  benchmark table, when-to-use-which guidance.
-- [x] Update `docs/concepts/architecture.md` with coroutine-mode
-  lifecycle.
-- [x] Add migration note to `docs/changelog.md` for v0.1.0 entry,
-  flagging the default behavior change (process → coroutine).
-- [x] Bump version to `0.1.0` in `pyproject.toml`. (The version is
-  hatch-vcs-derived from git tags; the literal "bump" is the
-  `fallback_version = "0.1.0.dev0"` raw-option for dev checkouts
-  without a reachable tag, kept in sync with the
-  `__init__.py` PackageNotFoundError fallback. The actual
-  `0.1.0` version comes from tagging `v0.1.0` — handled in the
-  next task.)
-- [?] Tag `v0.1.0` and verify PyPI publish workflow succeeds.
-  Blocked on operator: tagging + pushing + creating a GitHub
-  release that triggers the publish.yml PyPI workflow requires
-  human credentials and intent (PyPI token + release notes).
-  All preparation is complete:
-  - changelog migration note staged in [Unreleased]
-    (docs/changelog.md);
-  - hatch-vcs fallback set to 0.1.0.dev0 (pyproject.toml +
-    src/openrtc/__init__.py); a `v0.1.0` git tag will yield
-    exactly `0.1.0` from hatch-vcs;
-  - publish.yml triggers on release and auto-prepends the
-    versioned section to docs/changelog.md (see workflow);
-  - all other §8 acceptance criteria are discharged in the
-    test suite + benchmarks + docs.
-  Operator runbook: cherry-pick / merge feat/light-websocket
-  into main, then `git tag v0.1.0 && git push --tags`, then
-  open a GitHub release on the tag pasting the relevant body
-  from the [Unreleased] block in docs/changelog.md.
-
-**Phase 2 success gate:** all 12 acceptance criteria in
-`docs/design/v0.1.md` §8 pass.
-
----
-
-## Discovered work
-
-- [x] Add `actionlint` pre-commit hook (rhysd/actionlint v1.7.7)
-  to validate GitHub Actions workflow YAML syntax + semantics
-  (action inputs/outputs, expressions, shell-script `run:`
-  bodies via shellcheck, security-relevant patterns). Catches
-  workflow syntax errors at commit time instead of "the
-  workflow runs once on `push` and then fails for some opaque
-  reason." All 8 existing workflows pass on first run.
-- [x] Add `codespell` pre-commit hook to catch spelling
-  typos in source, docs, and journal entries. Pinned at
-  v2.4.2. Skip-list excludes auto-generated lockfiles
-  (`*.lock`, `package-lock.json`), binary asset
-  directories (`assets`, `htmlcov`, `dist`, `build`,
-  `.mypy_cache`, `.ruff_cache`); `--ignore-words-list=ist`
-  whitelists "IST" (Indian Standard Time, used in cron
-  schedules + journal). Hook runs on every pre-commit
-  pass; doesn't need a CI counterpart since pre-commit.ci
-  bot also picks it up automatically.
-- [x] Add `.github/workflows/audit.yml` to run `pip-audit
-  --strict` on every PR + weekly. Catches CVEs in production
-  + dev deps. Two triggers cover the two failure modes:
-  per-PR catches a contributor pulling in a dep with a known
-  CVE before merge; the Monday cron catches CVEs disclosed
-  *after* a clean merge ages (the most common failure mode).
-  `--strict` so advisories without a fix yet still fail —
-  the alternative is silent rot. Verified locally:
-  `pip-audit` reports "No known vulnerabilities found"
-  against the current dev environment.
-- [x] Extend `.github/workflows/build.yml` with a wheel
-  smoke-install step. After `uv build` and `twine check`,
-  install the produced wheel into a clean venv and assert
-  `import openrtc; openrtc.AgentPool / openrtc.agent_config`
-  resolve and `__version__` is a string. `twine check`
-  validates metadata only; this validates the runtime file
-  layout — catches "wheel built but missed a package" /
-  "module-load-time import broke" classes of bug. Tried
-  `--no-deps` first to avoid pulling livekit-agents over
-  the network; doesn't work because `openrtc/__init__.py`
-  imports `Agent` from `livekit.agents` at load time, so
-  a clean install of `openrtc` cannot succeed without its
-  runtime deps. Verified locally: install + import + version
-  print all succeed in a throwaway uv venv.
-- [x] Add `.github/workflows/build.yml` build-sanity CI step.
-  Runs `uv build` (wheel + sdist) and `twine check dist/*`
-  on every PR + push to main. Uploads the artifacts so a
-  reviewer can sanity-check the wheel contents without having
-  to build locally. publish.yml already builds at release time;
-  the new workflow catches packaging regressions (broken
-  pyproject.toml, missing files, malformed metadata) at
-  code-review time, before they can fail the publish workflow
-  with a half-tagged release. Verified locally: `uv build`
-  produces a 0.1.0.dev wheel + sdist (hatch-vcs-derived
-  version), `twine check` passes both.
-- [x] Document the developer-experience improvements landed
-  across this loop in the v0.1.0 changelog block under a new
-  "Developer experience" subsection. Lists the coverage
-  ratchet (100% combined; 99% gate; branch tracking on);
-  mypy strict enablement; expanded ruff selects (SIM/PT/RET/
-  PERF/PIE/ICN/TID/BLE/A); pre-commit mypy hook; `make ci`
-  aggregate target; Dependabot config; PR template;
-  .editorconfig; SECURITY.md. Prefixed with a "User-facing
-  behavior is unchanged by these" caveat so readers know
-  these don't affect runtime semantics.
-- [x] Add `.editorconfig` so file-level conventions (charset,
-  EOL, final newline, trailing whitespace, indent) stay
-  consistent regardless of the contributor's editor /
-  IDE config. Settings match what's already in the repo:
-  Python + TOML use 4-space indent (PEP 8 / ruff default
-  + existing pyproject.toml style); YAML / JSON / Markdown
-  / shell use 2-space; Makefile uses literal tabs (required
-  by make).
-- [x] Add `.github/PULL_REQUEST_TEMPLATE.md`. GitHub auto-populates
-  the PR description with this template; the checklist nudges
-  contributors to confirm `make ci` passes, that tests are
-  updated, that docs/changelog reflect public-surface changes,
-  and to pick a "type of change" classifier so the reviewer
-  knows what shape of review to apply (bug fix / breaking /
-  refactor / docs / CI). Short on purpose - not a checklist
-  bureaucracy, just the four things that catch the most common
-  PR-rejection reasons.
-- [x] Refresh CONTRIBUTING.md to reflect the v0.1 dev-workflow
-  improvements landed across this loop. New sections:
-  - Mention that `mypy` runs in `strict = true` mode (so
-    contributors know untyped defs / implicit Optional are
-    hard failures, not warnings).
-  - Document the `make ci` aggregate target as the one-shot
-    "did I break the PR?" command.
-  - Document the pre-commit setup (`uv run pre-commit install`),
-    explain the hooks (ruff + ruff-format + file hygiene +
-    `mypy --strict src/`), and call out the `files:` filter
-    that skips typecheck on tests/docs-only commits.
-- [x] Add `SECURITY.md` so vulnerability reports have a documented
-  intake path (GitHub Security Advisories preferred, email
-  fallback to `hello@mahimai.dev`). Includes the supported-versions
-  matrix (0.1.x latest patch only — 0.0.x is superseded), the
-  expected response timeline (acknowledge in 3 business days,
-  triage in 7), and an out-of-scope section steering upstream
-  livekit-agents reports to the right place. GitHub auto-surfaces
-  this file in the repo's Security tab and overview sidebar.
-- [x] Add `.github/dependabot.yml` for weekly Python +
-  GitHub-Actions dep updates. Two ecosystems pinned (pip via
-  pyproject.toml; github-actions for the workflow files). Bundles
-  dev-tooling bumps (ruff/mypy/pytest/pre-commit/typer/rich) so
-  a typical week is one PR not many. `livekit-agents` is
-  explicitly ignored — the `~=1.5` pin is deliberate (design
-  §9.1: we hook internal-ish surfaces and the canary job
-  watches the next minor for early warning). Schedule is
-  Monday 08:00 IST so PRs land at the start of the work week.
-- [x] Add a `make ci` aggregate target that runs every gate the
-  CI workflow runs in the same order: `lint`, `format-check`,
-  `typecheck`, `test` (with the 99% coverage gate). One command
-  for "did I break the PR?" Saves running four separate make
-  targets every time before pushing.
-- [x] Add a local pre-commit hook that runs `mypy --strict src/`
-  before every commit. The CI matrix already runs typecheck on
-  every PR, but contributors didn't get the same feedback
-  locally — now `git commit` blocks on type errors the same way
-  it blocks on ruff/format errors. The hook uses
-  `language: system` so it picks up the current `uv run mypy`
-  environment, and `pass_filenames: false` because mypy needs
-  the full source tree (per-file mypy can't resolve cross-module
-  types). Trigger restricted via `files:` to source/.toml
-  changes so commits that only touch tests or docs don't pay
-  the typecheck cost.
-  rulesets. 3 issues, all already-intentional, fixed with
-  inline noqa + explanation:
-  - `execution/coroutine.py:203` — `aclose`'s defensive
-    `except Exception:` swallow now mirrors `join`'s
-    `# noqa: BLE001 — wrapper has already set FAILED + logged`
-    annotation that was already there.
-  - `tests/test_pool.py:872, 873` — `globals` / `locals`
-    parameter names in the `_import_without_silero` stub
-    must match `__import__`'s real signature so the stub
-    forwards positionally; added `# noqa: A002 — must match
-    __import__ signature` on each line.
-  one batch (only 1 violation surfaced across all five). Removed
-  the redundant `return None` at the end of
-  `CoroutineJobExecutor.initialize` (RET501) — function returns
-  None implicitly, the explicit return read as more code than
-  it was. The other 4 rulesets came in clean and now lock down
-  performance anti-patterns (PERF), style cleanups (PIE),
-  import-name conventions (ICN), and import banishments (TID).
-- [x] Enable ruff's `PT` (flake8-pytest-style) ruleset. Fixed
-  the 7 reported issues:
-  - PT022 in tests/integration/conftest.py: the
-    `livekit_dev_server` fixture had no teardown; switched
-    `yield` -> `return` and dropped the `Iterator[...]`
-    return annotation.
-  - PT011 (tests/test_coroutine_server.py:62): `pytest.raises(Exception)`
-    was deliberately broad; added `match=".*"` and `# noqa: PT011`
-    so the intent is documented inline.
-  - PT011 (tests/test_pool.py:183): `pytest.raises(ValueError)`
-    around `pool.add` duplicate name; added the proper
-    `match="already registered"`.
-  - PT018 in 4 places (tests/test_coroutine_skeleton.py): split
-    composite asserts (`assert isinstance(x, str) and len(x) > 0`,
-    `assert task is not None and task.done()`) into separate
-    statements so failure messages pinpoint which clause broke. Replaced
-  3 `try/except/pass` blocks with `contextlib.suppress(...)`
-  in tests/benchmarks/density.py,
-  tests/integration/test_concurrent_real_calls.py, and
-  tests/test_coroutine_coverage.py. Ignored `SIM117`
-  (nested `with` collapsing) because it consistently hurts
-  readability for monkey-patch + `pilot` setups in the test
-  suite; documented the ignore inline.
-- [x] Enable mypy `strict = true` for the source tree. Fixed
-  the only two issues that surfaced:
-  `src/openrtc/core/pool.py:73` (`AgentSession` -> `AgentSession[None]`
-  to satisfy the generic Userdata_T parameter) and
-  `src/openrtc/cli/commands.py:175`
-  (`_make_standard_livekit_worker_handler` now declares
-  `-> Callable[..., None]`). Strict mode bundles ~10 additional
-  checks (disallow_untyped_defs, no_implicit_optional,
-  strict_equality, disallow_any_generics, etc.) so future
-  contributions can't silently regress type safety.
-- [x] Ratchet the v0.1 coverage gate from 95% to 99% (was bumped
-  from 80% to 95% earlier in this loop; now that line + branch
-  is at 100.00% the floor moves up again). 1pp cushion is
-  intentional: branch coverage adds many edges per function so
-  even a small new helper can push combined % below 100% even
-  with full intent. Bumped in three places: Makefile,
-  test.yml CI matrix, codecov.yml (project + patch). Codecov
-  range nudged from `85...100` to `90...100`.
-- [x] Close the 22 missing branches surfaced once
-  `[tool.coverage.run] branch = true` landed.
-  **All 22 closed across 5 batches; project sits at 100.00%
-  combined line + branch coverage.**
-  **Batch 1 closed (8 branches):** cli/commands.py 351->354;
-  cli/dashboard.py 240->249, 257->284; cli/livekit.py 74->76;
-  core/pool.py 430->432; core/routing.py 36->46, 56->67;
-  core/turn_handling.py 69->71. (99.06% -> 99.40%)
-  **Batch 2 closed (4 branches):** cli/reporter.py 97->99
-  (live=None periodic tick); observability/stream.py 137->exit
-  (close on never-opened sink); observability/metrics.py
-  364->361 (VmRSS line with no value); core/discovery.py
-  24->27 (existing module file differs from resolved path).
-  (99.40% -> 99.57%)
-  **Batch 3 closed (6 branches):** execution/coroutine.py
-  231->233 (kill on non-RUNNING preserves status);
-  279->293 (success path skips status flip when externally
-  set); 286->288 (exception path skips same flip); 528->526
-  (aclose timeout skips executors without kill method);
-  571->578 (launch_job emits process_job_launched even when
-  executor sets no _task); 679->exit (failure-limit branch
-  tolerates None callback). (99.57% -> 99.83%)
-  **Batch 4 closed (3 branches):** tui/app.py 149->154
-  (wall_time_unix missing maps to "n/a"); 125->117 (record
-  with unknown `kind` skipped via parser monkeypatch);
-  127->117 (EVENT record with non-dict payload skipped via
-  parser monkeypatch). (99.83% -> 99.96%)
-  **Batch 5 closed (1 branch):** cli/__init__.py 32->36
-  (eager `from openrtc.cli.commands import app` skipped when
-  `_optional_typer_rich_missing()` returns True — exercised by
-  monkey-patching the helper and `importlib.reload(cli_pkg)`,
-  with cleanup that restores the helper and reloads again).
-  (99.96% -> 100.00%)
-
-## Old discovered work
-
-(Add new tasks here as they come up. Keep this section ordered by
-priority.)
-
-- [x] Document `--isolation` and `--max-concurrent-sessions` in
-  `docs/cli.md`. (Found while auditing §8.9 for completeness:
-  the flags shipped in `cli/commands.py`, the README, and the
-  test suite, but the standalone CLI doc page didn't mention
-  them. v0.1 release-blocker for §8.9.)
-- [x] Sweep current docs for stale module paths after the Phase 0
-  reorg. (Audit found one residual reference to
-  `openrtc.resources` in `docs/cli.md`, updated to
-  `openrtc.observability.metrics`. The remaining references
-  live in `docs/design/v0.1.md` (locked) and the historical
-  audit doc, both correctly preserved.)
-- [x] Refresh GitHub bug report template for v0.1: bump stale
-  version placeholders (0.0.15 -> 0.1.0; 1.4.3 -> 1.5.0) and
-  add an "Isolation mode" dropdown so triage of v0.1 issues
-  can route by mode without a follow-up question.
-- [x] Write `docs/release-v0.1.md` operator runbook so the §8.12
-  tagging+publishing step (the only `[?]` blocker on v0.1)
-  has a literal step-by-step checklist. Linked from
-  CONTRIBUTING.md's new "Releasing" section.
-- [x] README "Public API at a glance" lists v0.1 constructor
-  kwargs (isolation, max_concurrent_sessions,
-  consecutive_failure_limit) and read-only properties.
-  (Section was written pre-v0.1 and only listed the v0.0.x
-  surface; users reading just the API summary would miss the
-  new knobs without digging into the "Isolation modes"
-  section above.)
-- [x] Add `make bench` target. (Existing Makefile had `test`,
-  `lint`, `format`, `typecheck`, `dev` but no shorthand for
-  the v0.1 density gate. `make bench` now runs
-  `tests/benchmarks/density.py --sessions 50 --rss-budget-mb
-  4096`, matching the CI gate exit-code contract.)
-- [x] VitePress sidebar links the new density benchmark page.
-  (Added `Density benchmark (v0.1)` entry under Reference so
-  users evaluating OpenRTC from the docs site find the v0.1
-  numbers without having to open the GitHub repo. The release
-  runbook intentionally stays repo-only — operator-facing,
-  not user-facing.)
-- [x] Replace the lone remaining `NotImplementedError` stub
-  with its real (no-op) implementation. (`CoroutineJobExecutor.start`
-  was the last "skeleton" raise; coroutine mode has no
-  subprocess to spawn so `start` flips `started=True` and
-  returns. Drops the `_SKELETON_HINT` constant entirely;
-  updates the test that asserted the raise to assert the
-  no-op state machine; updates the module docstring to drop
-  "lifecycle methods land one iteration at a time" prose.)
-- [x] Enable branch coverage as the v0.1 hardness gate. Adds
-  `[tool.coverage.run] branch = true` to pyproject.toml so
-  `make test` and the CI matrix both report combined
-  line+branch coverage by default. Combined % drops from
-  100% (line-only) to 99.06% (line+branch) - 22 missing
-  branches surface across 13 files (mostly "false case of a
-  conditional" edges). Still well above the 95% fail-under
-  floor. Leaves the per-branch gap-closing as discovered
-  work for follow-up iterations.
-- [x] Lock the v0.1 coverage ratchet at 95% (was 80%) across the
-  Makefile, test.yml CI workflow, and codecov.yml project +
-  patch targets. The current project sits at 100%, so 95% gives
-  contributors ~10pp of headroom for legitimate
-  `# pragma: no cover`-able defensive code without letting the
-  numbers slide back into v0.0.x territory. Codecov range
-  bumped from `70...100` to `85...100` so the colored bar
-  visually anchors at the new minimum.
-- [x] Close `execution/coroutine.py` coverage gap (97% -> 100%):
-  5 tests in tests/test_coroutine_coverage.py covering the
-  last defensive branches: `_consume_cancelled_task_exception`
-  swallowing `InvalidStateError` when called on a not-done
-  task (the post-`add_done_callback` race window);
-  `CoroutineJobExecutor.join` swallowing `CancelledError`
-  from a racing cancel of the in-flight task; same `join`
-  swallowing an `Exception` from a task that bypassed
-  `_run_entrypoint`; `aclose` swallowing a non-CancelledError
-  exception raised post-cancel (task that catches
-  CancelledError and re-raises something else); and
-  `_build_job_context` real-room branch when `info.fake_job=False`
-  (instantiates an actual `livekit.rtc.Room` — constructor is
-  side-effect-free, native libs only fire on `.connect()`).
-  Project-wide coverage now 100%.
-- [x] Close `core/discovery.py` coverage gap (98% -> 100%):
-  1 test in tests/test_discovery.py exercising the
-  `_load_module_from_path` defensive raise when
-  `importlib.util.spec_from_file_location` returns None
-  (monkey-patched). Covers the last "spec is None or
-  spec.loader is None" guard before the spec is used to
-  build the module object.
-- [x] Close `cli/__init__.py` (54% -> 100%) and `openrtc/__init__.py`
-  (80% -> 100%) coverage gaps. 4 tests in tests/test_cli.py:
-  the package-level `__getattr__("app")` raises ImportError
-  with the `openrtc[cli]` install hint when extras are missing,
-  returns the live Typer app via lazy import when extras are
-  present, and raises AttributeError for unknown attribute
-  names; `openrtc.__version__` reverts to the `0.1.0.dev0`
-  fallback sentinel when `importlib.metadata.version` raises
-  PackageNotFoundError (via importlib.reload). Locks the
-  install-hint contract and the dev-checkout version fallback
-  before tagging.
-- [x] Close `cli/dashboard.py` coverage gap (82% -> 100%):
-  11 tests in tests/test_dashboard.py covering: pure-helper
-  edges (`_format_percent` returning "—" for missing or
-  zero baseline, ratio-rounding; `_memory_style` for None /
-  green / yellow / red thresholds; `_truncate_cell` short
-  pass-through and ellipsis append); `print_list_rich_table`
-  `—` source-column for agents without source_path;
-  `print_list_plain` source_size append + Resource summary
-  trigger; `print_resource_summary_plain` known-path-caveat
-  branch + unavailable-RSS branch (via monkey-patched
-  `get_process_resident_set_info`); `print_resource_summary_rich`
-  unavailable-RSS branch. Locks the dashboard rendering
-  contract before tagging.
-- [x] Close `core/pool.py` coverage gap (93% -> 100%):
-  7 tests in tests/test_pool.py covering: empty/whitespace
-  agent name rejection in `add()`; `run()` raises when zero
-  agents are registered; `run()` hands the configured server
-  to LiveKit's `cli.run_app` (covers the success path on
-  the run() side); `_prewarm_worker` defends against an
-  empty runtime state; `_run_universal_session` raises
-  early when no agents are registered;
-  `_load_shared_runtime_dependencies` raises a clear
-  RuntimeError when livekit silero is missing (via
-  builtins.__import__ monkey-patch) AND happy-path
-  returns the silero module + MultilingualModel class
-  when the plugins are installed.
-- [x] Close `observability/metrics.py` coverage gap (84% -> 100%):
-  18 tests in tests/test_resources.py covering: negative
-  byte clamp in `format_byte_size`; `file_size_bytes`
-  OSError fallback; `estimate_shared_worker_savings`
-  short-circuits (agent_count=0 and shared_worker_bytes=None);
-  `get_process_resident_set_info` Linux + Windows-style
-  unavailable branches via monkey-patched `sys.platform`;
-  `_linux_rss_bytes` happy-path proc-status parsing,
-  unreadable-procfs OSError, and missing-VmRSS-line; the
-  `_macos_rss_bytes` OSError-from-getrusage and
-  zero-ru_maxrss branches; `record_session_finished`
-  keep-positive count; parametrized `__setstate__` type
-  validation across 6 typed fields. Also replaces an
-  unreachable defensive `return` in `format_byte_size`
-  with `raise AssertionError(...)  # pragma: no cover`
-  so the dead line stops eating coverage.
-- [x] Close `cli/livekit.py` coverage gap (86% -> 100%):
-  11 tests in tests/test_cli.py exercising the LiveKit CLI
-  handoff edges: `--` separator + `=`-form pass-through in
-  `_strip_openrtc_only_flags_for_livekit`; empty-argv +
-  unknown-subcommand short-circuits in
-  `inject_cli_positional_paths`; "flag already in tail"
-  no-op branches for all three positional rewriters
-  (agents-dir / worker / tui-watch); the
-  `_livekit_env_overrides` setter for the three non-URL
-  keys (api_key, api_secret, log_level); the connect
-  handoff with `--participant-identity` + `--log-level`;
-  `_discover_or_exit` for `NotADirectoryError` and
-  `PermissionError`. Locks the CLI handoff contract before
-  tagging.
-- [x] Close `cli/reporter.py` coverage gap (86% -> 100%):
-  2 tests in tests/test_metrics_stream.py exercising the
-  Rich-dashboard path that the existing JSONL-only tests
-  don't reach: a direct unit test of
-  `_build_dashboard_renderable` (returns a Rich Panel built
-  from the pool snapshot), and an integration test of the
-  `dashboard=True` branch through `_run` with a stub `Live`
-  monkeypatched into the reporter (covers the `live.update(...)`
-  periodic-tick branch and the JSON snapshot file write).
-- [x] Close `cli/commands.py` coverage gap (93% -> 100%):
-  4 tests in tests/test_cli.py exercising the programmatic
-  `main()` exit-code mapping: `argv=None` reads from sys.argv
-  (covers the sys.argv branch); bare `SystemExit()` returns 0;
-  string `SystemExit` code maps to 1; non-raising inner command
-  falls through to 0. Locks the exit-code contract that any
-  embedder of `openrtc.cli.main` relies on.
-- [x] Close `core/serialization.py` coverage gap (98% -> 100%):
-  5 tests in tests/test_serialization.py exercising
-  `_extract_provider_kwargs` (returns {} when `_opts` is None
-  or attribute is missing; extracts set options) and
-  `_filter_provider_kwargs` (drops the OpenAI `NotGiven`
-  sentinel; passes through explicit `None`). Locks the
-  spawn-safe serialization edge cases that the higher-level
-  pool tests don't exercise directly.
-- [x] Close `core/config.py` coverage gap (97% -> 100%):
-  6 tests in tests/test_config.py exercising
-  `_normalize_optional_name` validation through the public
-  `@agent_config` decorator (non-string name + greeting raise
-  RuntimeError "must be a string"; whitespace-only name +
-  greeting raise "cannot be empty"; whitespace stripping;
-  None passes through). Locks the user-facing input
-  validation in pure unit tests so a future refactor can't
-  silently relax the contract.
-- [x] Close `core/turn_handling.py` coverage gap (88% -> 100%):
-  16 focused unit tests in tests/test_turn_handling.py for the
-  per-key deprecated-kwarg translations
-  (`min_endpointing_delay`, `max_endpointing_delay`,
-  `allow_interruptions` true/false, `discard_audio_if_uninterruptible`,
-  `min_interruption_duration`, `min_interruption_words`,
-  `false_interruption_timeout`,
-  `agent_false_interruption_timeout`,
-  `resume_false_interruption`, `turn_detection`), the
-  `LIVEKIT_REMOTE_EOT_URL` / inference-executor branches in
-  `_supports_multilingual_turn_detection`, and the
-  non-Mapping `turn_handling` passthrough. Locks down the
-  v0.0.x compat surface before tagging.
-- [x] Close `core/routing.py` coverage gap (76% -> 100%):
-  empty-agents guard (line 25), room-metadata branch (line 33),
-  string-JSON metadata parse path (lines 56-67), blank/scalar/
-  empty-value mapping returns None (lines 60, 63, 77). All
-  pre-v0.1 code paths but reachable via real LiveKit metadata
-  (which arrives as JSON strings). Strengthens the §8.2
-  spirit ("≥80% coverage of new code") by also raising the
-  pre-existing routing surface to 100% before tagging.
diff --git a/.agents/skills/openrtc-python/SKILL.md b/.agents/skills/openrtc-python/SKILL.md
deleted file mode 100644
index bdd8a61..0000000
--- a/.agents/skills/openrtc-python/SKILL.md
+++ /dev/null
@@ -1,179 +0,0 @@
----
-name: openrtc-python
-description: >-
-  Write and wire up LiveKit voice agents using OpenRTC so that multiple agents
-  run inside a single shared worker process. Use when the user asks to create a
-  voice agent, add a new agent to an existing pool, configure STT/LLM/TTS
-  providers, set up agent routing, or run multiple LiveKit agents together with
-  OpenRTC.
-license: MIT
-compatibility: Requires Python 3.11+ and uv (or pip). Requires the openrtc package.
-metadata:
-  author: mahimailabs
-  version: "1.0"
----
-
-## Directory layout
-
-```
-project/
-├── agents/
-│   ├── restaurant.py      # one Agent subclass per file
-│   ├── dental.py
-│   └── support.py
-├── main.py                # AgentPool entrypoint
-├── pyproject.toml
-└── .env                   # LIVEKIT_URL, provider API keys
-```
-
-- One `Agent` subclass per file. `discover()` picks the first local subclass.
-- No `__init__.py` needed. Files starting with `_` are skipped.
-- Filename stem becomes the agent name unless `@agent_config(name=...)` overrides it.
-
-## Step 1 — Create an agent file
-
-```python
-# agents/restaurant.py
-from livekit.agents import Agent, RunContext, function_tool
-from openrtc import agent_config
-
-
-@agent_config(name="restaurant", greeting="Welcome to reservations.")
-class RestaurantAgent(Agent):
-    def __init__(self) -> None:
-        super().__init__(
-            instructions="You help callers book restaurant reservations."
-        )
-
-    @function_tool
-    async def check_availability(
-        self, context: RunContext, party_size: int, time: str
-    ) -> str:
-        """Check whether a table is available."""
-        return f"A table for {party_size} at {time} looks good."
-```
-
-`@agent_config(...)` is optional. All fields fall back to pool defaults when
-omitted. The decorator accepts: `name`, `stt`, `llm`, `tts`, `greeting`.
-
-## Step 2 — Create the entrypoint
-
-```python
-# main.py
-from pathlib import Path
-from dotenv import load_dotenv
-from openrtc import AgentPool
-
-load_dotenv()
-
-pool = AgentPool(
-    default_stt="deepgram/nova-3:multi",
-    default_llm="openai/gpt-4.1-mini",
-    default_tts="cartesia/sonic-3",
-)
-pool.discover(Path("./agents"))
-pool.run()
-```
-
-Use `discover()` for the standard flat-directory layout. For explicit control
-(subdirectories, conditional registration), use `pool.add()` instead — read
-[references/api.md](references/api.md) for the `add()` signature.
-
-For advanced provider config (custom parameters, non-default endpoints), pass
-provider objects instead of strings — read
-[references/providers.md](references/providers.md) when configuring non-default
-provider settings.
-
-## Step 3 — Set environment variables
-
-```bash
-LIVEKIT_URL=ws://localhost:7880
-LIVEKIT_API_KEY=devkey
-LIVEKIT_API_SECRET=secret
-```
-
-Add only the provider keys your agents use. Read
-[references/providers.md](references/providers.md) for the full mapping of
-provider names to environment variable names.
-
-## Step 4 — Validate and run
-
-```bash
-# Validate discovery works (no server needed)
-openrtc list --agents-dir ./agents \
-  --default-stt deepgram/nova-3:multi \
-  --default-llm openai/gpt-4.1-mini \
-  --default-tts cartesia/sonic-3
-```
-
-If an agent is missing from the output: check the file is in `agents/`, has
-exactly one `Agent` subclass at module scope, and the filename doesn't start
-with `_`. Fix and re-run `openrtc list` until all agents appear.
-
-```bash
-# Development mode (auto-reload) — set LIVEKIT_* env vars first
-openrtc dev ./agents
-
-# Production mode
-openrtc start ./agents
-
-# Same LiveKit subcommands as python agent.py: console, connect, download-files
-# openrtc console ./agents
-# openrtc connect ./agents --room my-room
-# openrtc list ./agents
-# openrtc download-files ./agents
-
-# Optional: JSON Lines metrics + sidecar TUI (pip install 'openrtc[cli,tui]')
-# openrtc dev ./agents ./openrtc-metrics.jsonl
-# openrtc tui
-# openrtc tui ./other-metrics.jsonl
-
-# Or run the entrypoint directly
-python main.py dev
-```
-
-## Routing
-
-When a call arrives, `AgentPool` resolves the agent in this priority:
-
-1. `ctx.job.metadata["agent"]`
-2. `ctx.job.metadata["demo"]`
-3. `ctx.room.metadata["agent"]`
-4. `ctx.room.metadata["demo"]`
-5. Room name prefix match (`restaurant-call-123` → `restaurant`)
-6. First registered agent (fallback)
-
-Unknown metadata names raise `ValueError` — no silent fallback.
-
-## Gotchas
-
-- **Agent classes must be defined at module scope.** Classes inside functions
-  cannot be pickled for spawned workers — you'll get a serialization error at
-  startup.
-- **`discover()` does not recurse into subdirectories.** It only scans `*.py`
-  in the given directory. For nested agent layouts, use `pool.add()`.
-- **`pool.run()` delegates to `livekit.agents.cli.run_app()`.** The first CLI
-  argument must be `dev` or `start` (e.g. `python main.py dev`). Without it,
-  the process exits immediately with a usage error.
-- **`openrtc dev|start|…` sets up discovery then calls the same LiveKit CLI.**
-  OpenRTC-only flags (`--agents-dir`, `--dashboard`, `--metrics-jsonl`, …) are
-  stripped from `sys.argv` before LiveKit parses arguments—do not expect LiveKit
-  to understand them.
-- **Provider objects must be pickleable.** OpenRTC has built-in serialization
-  for `livekit.plugins.openai` STT, TTS, and LLM. Other providers: use string
-  identifiers or ensure the object is natively pickleable.
-- **`session_kwargs` direct kwargs win.** When the same key appears in both
-  `session_kwargs={}` and as a direct keyword to `add()`, the direct keyword
-  takes precedence.
-- **`greeting` fires after `ctx.connect()`** via `session.generate_reply()`.
-  If `None`, no greeting is generated and the agent waits silently for the
-  caller to speak.
-
-## Adding a new agent — checklist
-
-- [ ] Create `agents/<name>.py` with one `Agent` subclass at module scope
-- [ ] Optionally add `@agent_config(name="...", greeting="...")` for overrides
-- [ ] Add `@function_tool` methods for any callable tools
-- [ ] Run `openrtc list --agents-dir ./agents` — verify the agent appears
-- [ ] If missing, check filename, class scope, and `_` prefix — fix and re-run
-- [ ] Test with `openrtc dev --agents-dir ./agents`
diff --git a/.agents/skills/openrtc-python/references/api.md b/.agents/skills/openrtc-python/references/api.md
deleted file mode 100644
index 63149ab..0000000
--- a/.agents/skills/openrtc-python/references/api.md
+++ /dev/null
@@ -1,87 +0,0 @@
-# OpenRTC API reference
-
-Read this when you need the exact signature for `pool.add()`, `pool.discover()`,
-session kwargs, or other `AgentPool` methods.
-
-## Imports
-
-```python
-from openrtc import AgentPool, AgentConfig, AgentDiscoveryConfig, agent_config
-```
-
-## `AgentPool(...)`
-
-```python
-AgentPool(
-    *,
-    default_stt: str | Any = None,
-    default_llm: str | Any = None,
-    default_tts: str | Any = None,
-    default_greeting: str | None = None,
-)
-```
-
-## `pool.add(...)`
-
-```python
-pool.add(
-    name: str,                                  # unique routing name
-    agent_cls: type[Agent],                     # Agent subclass (module scope)
-    *,
-    stt: str | Any = None,                      # overrides pool default
-    llm: str | Any = None,                      # overrides pool default
-    tts: str | Any = None,                      # overrides pool default
-    greeting: str | None = None,                # spoken after connect
-    session_kwargs: Mapping[str, Any] = None,   # extra AgentSession kwargs
-    **session_options: Any,                      # direct AgentSession kwargs (win over session_kwargs)
-) -> AgentConfig
-```
-
-Raises `ValueError` on duplicate/empty name. Raises `TypeError` if
-`agent_cls` is not an `Agent` subclass.
-
-## `pool.discover(...)`
-
-```python
-pool.discover(agents_dir: str | Path) -> list[AgentConfig]
-```
-
-Scans for `*.py` files (skips `__init__.py` and `_`-prefixed). Each file must
-define exactly one local `Agent` subclass. Reads `@agent_config(...)` metadata
-if present. Internally calls `pool.add()` for each discovered agent.
-
-## Other methods
-
-```python
-pool.list_agents() -> list[str]             # names in registration order
-pool.get(name: str) -> AgentConfig          # raises KeyError
-pool.remove(name: str) -> AgentConfig       # raises KeyError
-pool.run() -> None                          # starts LiveKit worker
-pool.server -> AgentServer                  # underlying server instance
-```
-
-## `@agent_config(...)`
-
-```python
-@agent_config(
-    *,
-    name: str | None = None,       # defaults to filename stem
-    stt: str | Any = None,
-    llm: str | Any = None,
-    tts: str | Any = None,
-    greeting: str | None = None,
-)
-```
-
-All fields optional. Omitted fields inherit pool defaults.
-
-## Session kwargs
-
-Common kwargs forwarded to `AgentSession(...)` via `session_kwargs` or direct
-keyword arguments to `add()`:
-
-| Key | Type | Purpose |
-|---|---|---|
-| `max_tool_steps` | `int` | Max tool-call rounds per turn |
-| `preemptive_generation` | `bool` | Start LLM before user finishes |
-| `turn_handling` | `dict \| object` | Turn detection / interruption config |
diff --git a/.agents/skills/openrtc-python/references/providers.md b/.agents/skills/openrtc-python/references/providers.md
deleted file mode 100644
index 76ba96f..0000000
--- a/.agents/skills/openrtc-python/references/providers.md
+++ /dev/null
@@ -1,65 +0,0 @@
-# Provider reference
-
-Read this when configuring non-default providers, using provider objects instead
-of strings, or looking up which environment variable a provider needs.
-
-## String format
-
-`provider/model` or `provider/model:variant`. Passed through to `livekit-agents`.
-
-### STT
-
-| String | Provider |
-|---|---|
-| `deepgram/nova-3` | Deepgram Nova 3 |
-| `deepgram/nova-3:multi` | Deepgram Nova 3 multilingual |
-| `assemblyai/...` | AssemblyAI |
-| `google/...` | Google Cloud STT |
-
-### LLM
-
-| String | Provider |
-|---|---|
-| `openai/gpt-4.1-mini` | OpenAI GPT-4.1 Mini |
-| `openai/gpt-4.1` | OpenAI GPT-4.1 |
-| `groq/llama-4-scout` | Groq Llama 4 Scout |
-| `anthropic/claude-sonnet-4-20250514` | Anthropic Claude Sonnet 4 |
-
-### TTS
-
-| String | Provider |
-|---|---|
-| `cartesia/sonic-3` | Cartesia Sonic 3 |
-| `elevenlabs/...` | ElevenLabs |
-| `openai/tts-1` | OpenAI TTS-1 |
-
-## Provider objects
-
-Use when you need custom parameters or non-default endpoints:
-
-```python
-from livekit.plugins import openai
-
-pool = AgentPool(
-    default_stt=openai.STT(model="gpt-4o-mini-transcribe"),
-    default_llm=openai.responses.LLM(model="gpt-4.1-mini"),
-    default_tts=openai.TTS(model="gpt-4o-mini-tts"),
-)
-```
-
-OpenRTC has built-in pickle support for `livekit.plugins.openai` STT, TTS, and
-LLM types. Other provider objects must be natively pickleable or use string
-identifiers instead.
-
-## Environment variables
-
-| Provider | Variable |
-|---|---|
-| LiveKit | `LIVEKIT_URL`, `LIVEKIT_API_KEY`, `LIVEKIT_API_SECRET` |
-| Deepgram | `DEEPGRAM_API_KEY` |
-| OpenAI | `OPENAI_API_KEY` |
-| Cartesia | `CARTESIA_API_KEY` |
-| Groq | `GROQ_API_KEY` |
-| ElevenLabs | `ELEVENLABS_API_KEY` |
-| Anthropic | `ANTHROPIC_API_KEY` |
-| AssemblyAI | `ASSEMBLYAI_API_KEY` |
diff --git a/.github/workflows/deploy-docs.yml b/.github/workflows/deploy-docs.yml
index 997906c..dde749f 100644
--- a/.github/workflows/deploy-docs.yml
+++ b/.github/workflows/deploy-docs.yml
@@ -6,8 +6,6 @@ on:
       - main
     paths:
       - 'docs/**'
-      - 'package.json'
-      - 'package-lock.json'
       - '.github/workflows/deploy-docs.yml'
   workflow_dispatch:
 
@@ -26,6 +24,9 @@ jobs:
     environment:
       name: github-pages
       url: ${{ steps.deployment.outputs.page_url }}
+    defaults:
+      run:
+        working-directory: docs
 
     steps:
       - name: Check out repository
@@ -36,7 +37,7 @@ jobs:
         with:
           node-version: '20'
           cache: 'npm'
-          cache-dependency-path: package-lock.json
+          cache-dependency-path: docs/package-lock.json
 
       - name: Configure GitHub Pages
         uses: actions/configure-pages@v5
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
index cb64e1f..37db280 100644
--- a/.github/workflows/docs.yml
+++ b/.github/workflows/docs.yml
@@ -4,8 +4,6 @@ on:
   pull_request:
     paths:
       - 'docs/**'
-      - 'package.json'
-      - 'package-lock.json'
       - '.github/workflows/docs.yml'
       - '.github/workflows/deploy-docs.yml'
   push:
@@ -13,14 +11,15 @@ on:
       - main
     paths:
       - 'docs/**'
-      - 'package.json'
-      - 'package-lock.json'
       - '.github/workflows/docs.yml'
       - '.github/workflows/deploy-docs.yml'
 
 jobs:
   build-docs:
     runs-on: ubuntu-latest
+    defaults:
+      run:
+        working-directory: docs
 
     steps:
       - name: Check out repository
@@ -35,7 +34,7 @@ jobs:
         uses: actions/cache@v4
         with:
           path: ~/.npm
-          key: ${{ runner.os }}-node-20-${{ hashFiles('package-lock.json') }}
+          key: ${{ runner.os }}-node-20-${{ hashFiles('docs/package-lock.json') }}
           restore-keys: |
             ${{ runner.os }}-node-20-
 
diff --git a/.gitignore b/.gitignore
index 623673c..dc5f782 100644
--- a/.gitignore
+++ b/.gitignore
@@ -2,6 +2,7 @@
 __pycache__/
 *.py[codz]
 *$py.class
+.agents
 
 # C extensions
 *.so
@@ -224,3 +225,10 @@ src/openrtc/_version.py
 
 .DS_Store
 .cursor
+
+# VitePress build/cache output (docs/ ships package.json + scripts).
+docs/.vitepress/dist/
+docs/.vitepress/cache/
+
+# Throwaway uv venv used by the wheel-smoke-install CI step (build.yml).
+.smoke/
diff --git a/SECURITY.md b/SECURITY.md
deleted file mode 100644
index dbd0db6..0000000
--- a/SECURITY.md
+++ /dev/null
@@ -1,56 +0,0 @@
-# Security policy
-
-## Supported versions
-
-OpenRTC is in active 0.1.x development. Security fixes land on the latest
-0.1.x patch release; older minors do not receive backports.
-
-| Version | Supported |
-|---------|-----------|
-| 0.1.x   | Yes (latest patch) |
-| 0.0.x   | No (superseded by 0.1.0) |
-
-## Reporting a vulnerability
-
-Please **do not** open a public GitHub issue for security reports.
-
-Use one of:
-
-1. **GitHub Security Advisories** (preferred):
-   <https://github.com/mahimailabs/openrtc/security/advisories/new>.
-   Allows private discussion + a coordinated CVE if warranted.
-2. **Email** the maintainer at `hello@mahimai.dev` with the subject
-   prefix `[openrtc-security]`.
-
-Include:
-
-- A short description of the issue.
-- Reproduction steps or a minimal proof-of-concept.
-- Affected version(s) (`pip show openrtc`).
-- Your assessment of severity / impact (best guess is fine).
-
-## What to expect
-
-- Acknowledgement within **3 business days**.
-- A first triage assessment (severity, scope, fix plan) within
-  **7 business days**.
-- A patch release timeline communicated once the issue is reproduced.
-- Public disclosure (advisory + changelog entry) coordinated with the
-  reporter, typically after the patch release ships.
-
-This is a single-maintainer project. Response times are best-effort and
-may extend during travel or peak workload; high-severity reports
-(remote code execution, credential exfiltration, persistent
-denial-of-service) are prioritized.
-
-## Out of scope
-
-Issues that do not constitute a vulnerability in OpenRTC itself:
-
-- Issues in upstream `livekit-agents`, `livekit`, or any plugin
-  (report directly to the upstream project).
-- Misconfiguration in the operator's deployment (e.g. exposing LiveKit
-  API secrets in logs by adding `--log-level=DEBUG` in production).
-- Denial-of-service via deliberately exhausting `max_concurrent_sessions`
-  on a single worker (this is the documented backpressure mechanism;
-  use horizontal scaling).
diff --git a/docs/api/pool.md b/docs/api/pool.md
index d1a6d31..1174f96 100644
--- a/docs/api/pool.md
+++ b/docs/api/pool.md
@@ -83,6 +83,8 @@ Use `agent_config(...)` to attach discovery metadata to a standard LiveKit
 Create a pool that manages multiple LiveKit agents in one worker process.
 
 ```python
+from typing import Literal
+
 from livekit.plugins import openai
 
 pool = AgentPool(
@@ -90,19 +92,51 @@ pool = AgentPool(
     default_llm=openai.responses.LLM(model="gpt-4.1-mini"),
     default_tts=openai.TTS(model="gpt-4o-mini-tts"),
     default_greeting="Hello from OpenRTC.",
+    isolation="coroutine",
+    max_concurrent_sessions=50,
+    consecutive_failure_limit=5,
 )
 ```
 
 Constructor defaults are used when an agent registration or discovered agent
 module omits those values.
 
+### Constructor kwargs
+
+| Argument | Type | Default | Notes |
+| --- | --- | --- | --- |
+| `default_stt` / `default_llm` / `default_tts` | `ProviderValue \| None` | `None` | Provider used when `add()` / `discover()` omits one. |
+| `default_greeting` | `str \| None` | `None` | Greeting used when none is configured per agent. |
+| `isolation` | `Literal["coroutine", "process"]` | `"coroutine"` | Worker isolation mode. **`"coroutine"`** is the v0.1 default and runs every session as an `asyncio.Task` in one worker process; **`"process"`** preserves v0.0.17 behavior (one OS subprocess per session via `livekit-agents`'s `ProcPool`). See [Architecture → Coroutine-mode lifecycle](../concepts/architecture#coroutine-mode-lifecycle). |
+| `max_concurrent_sessions` | `int` | `50` | Coroutine-mode backpressure threshold. The worker reports `current_load >= 1.0` to LiveKit dispatch once this many sessions are in flight, so new jobs route elsewhere. Must be `>= 1`. Ignored in process mode (livekit-agents' own load math applies there). |
+| `consecutive_failure_limit` | `int` | `5` | After this many non-`SUCCESS` session terminations in a row, the coroutine pool's supervisor schedules `aclose()` so the deployment platform restarts the worker (bounded blast radius for systemic bugs). Must be `>= 1`. Ignored in process mode. |
+
+### Read-only properties
+
+| Property | Returns | Notes |
+| --- | --- | --- |
+| `pool.isolation` | `Literal["coroutine", "process"]` | The configured isolation mode (set in the constructor). |
+| `pool.max_concurrent_sessions` | `int` | The configured backpressure threshold. |
+| `pool.consecutive_failure_limit` | `int` | The configured supervisor threshold. |
+
+### Migration note
+
+Existing code that does `pool = AgentPool()` keeps working but now runs every
+session in coroutine mode by default. Pass `isolation="process"` to stay on the
+v0.0.17 process-per-session model. The full migration block lives in the v0.1.0
+section of [the changelog](../changelog).
+
 ## `server`
 
 ```python
 server = pool.server
 ```
 
-Returns the underlying LiveKit `AgentServer` instance.
+Returns the underlying LiveKit `AgentServer` instance. Under
+`isolation="coroutine"`, this is an internal `_CoroutineAgentServer` subclass
+that swaps `livekit.agents.ipc.proc_pool.ProcPool` for an `openrtc.execution.coroutine.CoroutinePool`
+during `run()`. Under `isolation="process"`, this is the vanilla
+`livekit.agents.AgentServer`.
 
 ## `add()`
 
@@ -213,11 +247,17 @@ Removes and returns a registered `AgentConfig`.
 pool.run()
 ```
 
-Starts the LiveKit worker application.
+Starts the LiveKit worker application by handing the configured
+[`server`](#server) to `livekit.agents.cli.run_app`. Under
+`isolation="coroutine"` (the v0.1 default), the worker hosts every session as
+an `asyncio.Task` inside one process; under `isolation="process"` it spawns
+one OS subprocess per session, matching v0.0.17 behavior.
 
 ### Raises
 
-- `RuntimeError` if called before any agents are registered
+- `RuntimeError` if called before any agents are registered. The same guard
+  fires inside `_prewarm_worker` if a worker process spawns with an empty
+  registry.
 
 ## `runtime_snapshot()`
 
@@ -276,3 +316,12 @@ pool = AgentPool(
 pool.discover(Path("./agents"))
 pool.run()
 ```
+
+## See also
+
+- [Architecture → Coroutine-mode lifecycle](../concepts/architecture#coroutine-mode-lifecycle)
+  for the per-session task lifecycle, supervisor, drain, and
+  `current_load` semantics.
+- [Density benchmark (v0.1)](../benchmarks/density-v0.1) for the
+  ≥50-sessions-per-worker numbers backing the default `max_concurrent_sessions`.
+- [Changelog](../changelog) for the v0.1.0 migration block.
diff --git a/docs/getting-started.md b/docs/getting-started.md
index a35f0f2..4d22f94 100644
--- a/docs/getting-started.md
+++ b/docs/getting-started.md
@@ -3,7 +3,7 @@
 ## Requirements
 
 OpenRTC requires Python **`>=3.11,<3.14`** and depends on
-`livekit-agents[openai,silero,turn-detector]~=1.4`. **3.10 is not supported**
+`livekit-agents[openai,silero,turn-detector]~=1.5`. **3.10 is not supported**
 (LiveKit’s Silero / turn-detector stack pulls `onnxruntime`, which does not ship
 wheels for CPython 3.10 in current releases). See the repository’s
 `CONTRIBUTING.md` for `uv` workflows.
diff --git a/docs/index.md b/docs/index.md
index 772fd2f..510e453 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -18,12 +18,14 @@ hero:
       link: /cli
 
 features:
+  - title: Coroutine-mode worker (v0.1)
+    details: Host 50+ concurrent sessions per process as asyncio tasks instead of paying one subprocess per session. Cooperative backpressure routed back to LiveKit dispatch via current_load.
   - title: Multi-agent routing
     details: Dispatch the right Agent implementation from a single worker using room or job metadata.
   - title: Shared prewarm
     details: Load VAD, turn detection, and other heavy dependencies once for every session in the pool.
   - title: LiveKit-native runtime
-    details: Built on livekit-agents with familiar dev, start, console, and connect-style workflows.
+    details: Built on livekit-agents with familiar dev, start, console, and connect-style workflows. Drop into `isolation="process"` for v0.0.17 parity when you need hard process isolation.
   - title: CLI and observability
     details: Optional openrtc CLI with JSON output, resource hints, JSONL metrics, and a Textual sidecar TUI.
 ---
@@ -35,4 +37,6 @@ features:
 - [AgentPool API](./api/pool)
 - [Examples](./examples)
 - [CLI](./cli)
+- [Density benchmark (v0.1)](./benchmarks/density-v0.1)
+- [Changelog](./changelog)
 - [GitHub Pages deployment](./deployment/github-pages)
diff --git a/package-lock.json b/docs/package-lock.json
similarity index 100%
rename from package-lock.json
rename to docs/package-lock.json
diff --git a/package.json b/docs/package.json
similarity index 53%
rename from package.json
rename to docs/package.json
index 53ac83d..85b3432 100644
--- a/package.json
+++ b/docs/package.json
@@ -3,9 +3,9 @@
   "private": true,
   "type": "module",
   "scripts": {
-    "docs:dev": "vitepress dev docs",
-    "docs:build": "vitepress build docs",
-    "docs:preview": "vitepress preview docs"
+    "docs:dev": "vitepress dev .",
+    "docs:build": "vitepress build .",
+    "docs:preview": "vitepress preview ."
   },
   "devDependencies": {
     "vitepress": "^1.6.4"